What is protein de novo sequencing?
De novo protein sequencing is a new sequencing technology that does not rely on any known sequence or protein database information and directly determines the amino acid sequence of a protein.
Principles of protein de novo sequencing
The principle of protein de novo sequencing: based on the regular fragmentation of the peptide molecules after protease cleaved in mass spectrometry, the specific fragmentation pattern is found, and the corresponding amino acid information is calculated according to the mass difference between the mass spectra peaks, and the post-translational modification on the amino acid.
Why use de novo sequencing to analyze proteins?
When using traditional methods to analyze proteins, whether it is based on MALDI-TOF mass spectrometer or nanoLC-MS/MS platform, the process of identifying proteins requires the help of a sequence database containing the identified protein, and then combining the mass spectrometry data with the database theory. The molecular weight data obtained after sequence fragmentation can be compared to complete the sequencing and identification of protein samples. However, in actual situations, the sequence information of the protein samples may not be included in the existing database, such as a brand new protein that we have not yet discovered, or a protein proposed from a new species. In addition, many times the theoretical data is not comprehensive, and the unobtained part is often the key. For example, for a certain sequence-modified enzyme on the market, it is very difficult for competitors to obtain complete information about the enzyme through reverse engineering, because the sequence changes of enzymes are diverse, including point mutations, deletions, insertions, and substitutions, modification, fusion and so on, so there are thousands of possibilities, but no way to start. In addition to enzymes, other protein drugs also need to be optimized through mutation modification to increase drug stability or yield. There are more difficult situations. For example, certain protein sequences in protein drugs have mutations. Even a small amount of mutations may cause changes in biological activity and increase the probability of immune response.
In order to make up for the shortcomings of traditional protein identification methods in unknown protein sequence and mutation analysis, the new technology of protein de novo sequencing was promoted.
The analysis process
1. Multiple digestion of protein samples with protease
2. Use a high performance liquid chromatograph to perform a series of mass spectrometry analysis on the purified fragments to obtain an accurate sequence without deviation
3. Perform mass spectrometric identification on the obtained unbiased accurate sequence
4. Use MassAnalyzer for sequence reverse verification
The technical advantages
It achieves full protein sequence coverage. Through the use of multiple enzymes to digest the sample, the complementarity between different digested peptides can realize the splicing of 100% of the entire sequence of protein molecules.
It has advanced data processing algorithms. Based on the literature published in Nature Biotechnology and the professional proteomics journal JPR, combined with the existing professional protein sequence analysis software, using wide input and narrow output. The principle is to realize accurate analysis of protein sequence without missing any valid data information.