Cancer is a disease caused by some genetic and epigenetic changes. In its simplest form, cancer is a genetic disease caused by a cell genome change which includes point mutation, insertion mutation, deletion mutation and chromosome translocation. At present, the basic research of cancer is undergoing a revolution in genetics.
Cancer bioinformatics promotes the progress of science and technology in the post genomic era, which enables molecular biologists to study DNA (genome), mRNA (transcriptome) and protein sequence (proteome) more precisely. Describing the mechanism of cancer in a comprehensive context provides an opportunity for researchers to obtain more useful data for analysis and combining them in a new way.
Database resources of cancer bioinformatics
The Cancer Genome Anatomy Project (CGAP) was initiated and maintained by the National Cancer Institute (NCI) in 1996. So far, it has become the first in the field of cancer genetics. In addition, the program has used new technologies to build hundreds of libraries. The new technologies mentioned include serial analysis of gene expression (SAGE) and massively parallel signaling sequencing (MPSS). SAGE technology is a comprehensive analysis method for rapid analysis of gene expression information developed in recent years, and it is one of the best methods for recognized transcriptome profiling research. MPSS is a large-scale, high-throughput new technology for gene analysis based on DNA sequencing. It can obtain the gene expression sequence through the establishment of tag library, the connection of microbeads to tags, enzymatic digestion ligation reaction and biological information analysis. MPSS has the characteristics of low expression level, small gene difference, no need to know the sequence of genes in advance, automation and high throughput.
In Brazil, the FAPESP/LICR Human Cancer Genome Project (HCGP) has studied EST from more than 1 million prevalent tumors using a new technology called open reading frame EST. The expression sequences studied by CGAP and HCGP were integrated into the International Database Cancer Gene Expression. Both CGAP and HCGP programs have been combined, which essentially share a common goal—to create a catalogue of cancer expression, and they annotate and submit to GenBank sequences of millions of tumors and normal tissues. The aim of these two programs is to determine the unique expression patterns of genes in normal cells, precancerous cells and cancer cells, with a view to improving the detection, diagnosis and treatment of patients.
The Cancer Biomedical Information Grid (caBIG) is an ambitious new program funded and maintained by the American Cancer Institute (NCI). It aims to establish a cancer network that integrates four types of information: information interfaces, vocabulary/terminology and ontology, data elements and information models. The caBIG initiative is a grid project organized voluntarily by researchers and organisations with the goal of creating a global network of cancer research. To this end, efforts have been made to develop standards for the application and analysis process so that cooperation and data sharing can be more easily carried out. In addition, caBIG undertook development projects in different fields, such as the development of clinical trial management systems, Ontology acquisition tools and in vivo imaging systems.
Methods applied in cancer bioinformatics
With the generation of large-scale data and the advent of analytical techniques, the pattern of cancer research is changing. The application of genomics, transcriptomics, proteomics, and bioinformatics has enabled people to test a large number of new hypotheses, thus contributing to the development of cancer research. The application of these large-scale technologies has expanded the detectable number of genetic variants associated with the development of specific types of cancer and has been able to integrate molecular characteristics to predict cancer and treatment response.
The so-called genomics refers to the development and application of DNA maps, new sequencing technologies and computer programs to analyze the whole genome structure and function of the living body. An important new content of genomics technology is the incorporation of biomedical research, and the most relevant one is "cancer genomics" that integrates a large amount of data and computer resources and helps to study the genomic structural changes of cancer cells or cancer tissues.
All transcripts of a genome at any time are called transcriptomes. Similar to genomics concepts, transcriptomics is defined as the science of comprehensively studying the transcriptome. Unlike the genome, the transcriptome is very dynamic, not only between different tissues of the same organism, but also between pathological status, such as cancerous and healthy tissues. Based on this result, many researchers have studied the expression profiles of a large number of genes and tried to identify the expression of genes in cancer tissues.
Protein is an important component of organisms and participates in almost all physiological and cellular metabolic processes. Large-scale studies of all proteins expressed in a cell or tissue, their modifications and interactions are called proteomics which is generally considered the next step in the study of biological systems after genomics and transcriptomics. However, proteome research is far more complex than genomics, which is determined by the intrinsic complex characteristics of proteins, such as the wide variety of post-translational modifications of proteins.
Cancer bioinformatics deals with organization and data so that important trends and patterns can be identified, with the ultimate goal of discovering new therapeutic and/or diagnostic options for cancer. The first step towards this goal is to find a blueprint for gene expression that represents specific cancer conditions. It is generally accepted that biological state and physiology cannot be represented by the expression of a gene. Therefore, in order to reveal molecular markers that represent the initiation and progression of cancer, researchers have conducted extensive genomic analyses, such as microarrays of gene expression, microarray-comparative genomic hybridization (Array CGH), and tissue chips. However, considerable changes occur during specific cancerous stages, including post-replication, transcription, translation or post-translation stages of the genome and modification stages, such as gene amplification, altered RNA splicing, phosphorylation, methylation, and differences in protein secretion and stability, which cannot be envisaged by genomic analysis. The analysis and identification of proteome can identify and quantitatively analyze all proteins in biological samples.