Traditional protein identification methods, such as immunoblotting, chemical sequencing of endosomes, coigration analysis of known or unknown proteins, or overexpression of genes that are meaningful in an organism are often time consuming, labor intensive, and unsuitable for high throughput. Screening. Currently, selected techniques include image analysis for protein identification, microsequencing, amino acid component analysis for further identification of peptide fragments, and techniques related to mass spectrometry.
Image analysis technology
The "Starry" 2-DE map analysis cannot rely on instinctive intuition. Up-regulation, down-regulation, appearance and disappearance of spots on each image may occur under physiological and pathological conditions, and computer-based data must be relied upon. Processing, quantitative analysis. Image analysis includes spot detection, background subtraction, spotting, and database construction with a range of high quality 2-DE gel production (low background staining, high reproducibility). First, the system typically used to acquire images is a charge coupled CCD camera; a laser densitometer and Phospho or Fluoro imagers digitize the image. And become a pixel-based space and grid. Next, filtering and deformation are performed on the gray level of the image, and image processing is performed to perform spot detection. The Laplacian, Gaussian, DOG opreator separates the meaningful area from the background, precisely defining the intensity, area, perimeter and direction of the spot.
The spots detected by image analysis must be consistent with the spots observed by the naked eye. Under this principle, most systems analyze the center of gravity or the highest peak of the spot. The edge detection software accurately describes the appearance of the spot and performs edge detection and proximity analysis to increase accuracy. The basic tools of threshold analysis, edge detection, erosion, and expanded speckle detection can also restore co-migrated speckle boundaries. The PC-based software Phoretix-2D is challenging the old Unix-based 2-D analysis package. Third, once the spots on the 2-DE image are detected, many images need to be analyzed for comparison, addition, subtraction, or averaging. Since 100% reproducibility in 2-DE is difficult, the ratio of proteins between gels is a challenge for image analysis systems. The advent of IPG technology has made spotting easier.
Therefore, a greater degree of similarity can be observed in length and parallelism by the spot matching vector algorithm. Well-known software systems for matching include Quest, Lips, Hermes, Gemini, etc. Computer methods such as similarity, cluster analysis, hierarchical classification, and major factor analysis have been adopted, while neural networks, wavelet transforms, and practical analysis are in the future. Can be used. The ratio is usually operated by one person, which manually sets about 50 prominent spots as "road signs" for cross-matching. After that, expand to the entire glue.
Microsequencing of proteins has become the cornerstone of protein analysis and identification, providing sufficient information. Although amino acid component analysis and peptide fingerprinting (PMF) can identify proteins isolated from 2-DE, the most common N-terminal Edman degradation remains the primary technique for identification. Automation of protein microsequencing has been achieved. The gel-separated protein is first directly blotted onto a PVDF membrane or a glass fiber membrane, stained, cleaved, and then placed directly in a sequencer for identification of proteins at the subpicomole level.
However, there are several points to note: Edman degrades very slowly, and the sequence is produced at a rate of 1 amino acid per 40 min; Edman is expensive to degrade compared to mass spectrometry; reagents are expensive, and each amino acid costs 3 to 4$. This suggests that generalized Edman degradation proteins are not suitable for the analysis of hundreds of proteins. However, if there are only a few meaningful proteins on a gel, or if it is necessary to clone a gene if it cannot be determined by other techniques, then generalized Edman degradation sequencing is required.
Recently, the application of automated Edman degradation produces short N-terminal sequence tags, which is the use of the mass spectrometry sequence tag concept for Edman degradation and has become a powerful protein identification service. When simple improvements are made to Edman's hardware to quickly generate N-terminal sequence tags of 10-20/d, sequence checkers will be suitable for identification in smaller proteomes. Proteins can be more reliably identified if combined with other protein properties such as amino acid composition analysis, peptide quality, protein molecular weight, and isoelectric point. Select the BLAST program to match the database. At present, a Tagldent search program can also be used to compare species between species and improve their role in proteomics research.
Mass spectrometry related technology
Mass spectrometry has become an important technology for linking proteins and genes, opening the door to large-scale automated protein identification. The mass spectrum used to analyze a protein or polypeptide has two major components, 1) the ion source into which the sample is taken, and 2) the device that measures the molecular weight of the intervening ions. The first is matrix-assisted laser desorption ionization time-of-flight mass spectrometry as a pulsed ionization technique. It produces ions from the solid phase specimen and measures its molecular weight in the flight tube. This is followed by electrospray mass spectrometry, a method of continuous ionization that produces ions from the liquid phase, combined with quadrupole mass spectrometry or measured molecular weight in a time-of-flight detector. In recent years, the devices and technologies of protein identification by mass spectrometry have made great progress.
In MALDI-TOF, the most important advances are ion reflectors and delayed extraction, which can achieve fairly accurate molecular weights. In ESI-MS, the advent of nanoscale electro-mist sources made it possible to analyze microscaled samples in 30-40 min.
Reversed-phase liquid chromatography coupled with tandem mass spectrometry can be detected at dozens of picomole levels; if combined with tandem mass spectrometry, it can be detected at low picomole to high femtomole levels; when using capillary electrophoresis and tandem mass spectrometry When used in combination, it can be detected at a level smaller than femtomole. It can even be done at the attomole level. At present, proteins are identified by a combination of enzymatic hydrolysis, liquid chromatography separation, tandem mass spectrometry and computer algorithms. The following is a description of how to identify proteins by mass spectrometry using peptide fingerprinting and sequencing of peptide fragments.
Amino acid component analysis
Amino acid component analysis was first used as a tool for identifying proteins in 1977, it is a unique "footprint" technology. Using amino acid component characteristics of protein heterogeneity, it becomes a sequence-independent property, different from peptide quality or sequence tag. Latter showed for the first time that data on amino acid composition can be used to identify proteins from 2-DE gels. The components of the protein are determined by radiolabeled amino acids, or the protein is blotted onto a PVDF membrane and subjected to acidic hydrolysis at 155 ° C for 1 h. The amino acid of this simple step is automatically derivatized within 40 min of each sample. It was separated by chromatography and routinely analyzed for 100 proteins/week.
According to the score representing the difference in the number of the two components, the protein in the database is ranked. The "champion" protein has the closest component to the unknown protein, considering the difference between the protein scores of the champion and runners, only the protein of the champion. Great credibility. There are several programs available on the Internet for amino acid component analysis, such as AACompIdent, ASA, FINDER, AAC-PI, PROP-SEARCH, etc., where in PROP-SEARCH, the positions of components, sequences and amino acids are used to retrieve the same Source protein. However, there are still some disadvantages such as amino acid variation due to insufficient acidic hydrolysis or partial degradation. Therefore, it should be combined with other protein properties for identification.