Chapter 2.3. Use of High-Throughput Technologies for Discovery of Biological Targets for Diagnostics


Chin Kai Ling, Aziana Ismail, Phua Kia Kien

Art work
Waiting room
Immune response age related differences should be 
considered to develop suitable diagnostic methods for each age group. 
I was selfish before. 
I wanted to save my family. 
Everyone must be saved, the whole world. 
Andei Tarkovsky 


Certain technological breakthroughs have allowed the simultaneous and rapid analysis of large numbers of genes (genomics), messenger RNAs (transcriptomics), peptides and proteins (proteomics), lipids (lipidomics), carbohydrates (glycomics), and the intermediate products of metabolism (metabolomics). This is made possible by the availability of high-throughput technologies (HTTs) and automation techniques in the field of biology. The key principle is experimental parallelization where numerous tests can be run simultaneously instead of carrying out single experiments consecutively. HTTs facilitate increased experimental load without increasing the test or development time, thereby reducing the time needed for biological target discovery, and shortening the time-to-market for the development of diagnostics.

HTTs include analytical tools such as bioinformatics, statistics, and data mining algorithms, which enable high-throughput data analysis. When HTTs are applied together, important information regarding the identity and characteristics of targets may be uncovered more effectively compared with conventional techniques, which do not provide the depth of knowledge needed. The application of HTTs has led to the discovery of new biomarkers that fulfill numerous functions, including the diagnosis of infectious diseases (diagnostic biomarkers), predicting the outcome of diseases (prognostic biomarkers), facilitating molecular epidemiological studies (epidemiological biomarkers) and epigenetic research (epigenetic biomarkers), and monitoring drug therapeutics (pharmacodynamic or efficacy-response biomarkers). Over time, HTTs will become more affordable and the biological targets they help discover will be used as biomarkers for routine diagnosis and personalized medicine, even for people in the “bottom billion.”

Next-generation sequencing (NGS)

DNA or RNA sequencing is the process of determining the precise order of nucleotide bases, namely, adenine, guanine, and cytosine, present in both DNA and RNA, and thymine for DNA or uracil for RNA. The older method of nucleic acid sequencing, i.e., Sanger sequencing, selectively incorporates a chain-terminating dideoxynucleotide (ddNTP), using DNA polymerase during in vitro DNA replication. This allows the determination of the nucleotide sequence of nucleic acids. A copy of the single-stranded DNA is made. This chain-termination principle was adopted to create the next-generation sequencing (NGS) platform where four different fluorescent-labeled ddNTPs are used instead. The development of HTTs has enabled the large-scale sequencing of thousands or even millions of nucleic acid sequences concurrently and accurately at minimal cost and high speed. NGS is applicable to whole-genome sequencing and re-sequencing (DNA-Seq), transcriptional profiling (RNA-Seq), DNA–protein interactions (ChIP-Seq), and epigenomic studies (DNA methylation) [1].

Different variations of the NGS platform have been developed based on the chemistry and user requirements to sequence short or long nucleic acid chains. The technologies employed by Roche (Life Sciences 454 (pyrosequencing)) and Illumina (Solexa sequencing-by-synthesis) are most commonly used despite the development of new emerging platforms, such as that developed by Pacific Biosciences, which allows much larger nucleic acid lengths to be sequenced (Table 1). Generally, the principle behind these platforms is to randomly fragment DNA or RNA into smaller pieces, then ligate adapter sequences to the DNA or complementary DNA (cDNA) fragments for clonal amplification in preparation for sequencing. Sequenced reads are then mapped with a reference genome to construct a DNA or cDNA library. These massive parallel sequencing platforms facilitate high-throughput sequencing for both hosts and pathogens, and enable studies on the correlation between the genome and disease pathogenesis, thereby expediting biomarker discovery [2, 3].

Table 1. Comparison of next-generation sequencing (NGS) technologies [2, 3]

Microorganisms within the same species are relatively genetically homogenous. However, they show extensive phenotypic diversity and differences in bacterial virulence. The most common DNA alteration is single-nucleotide polymorphism (SNP) whereby a single-base germline variation occurs among strains within the same species. Although SNPs within a coding sequence may not necessarily change the amino acid sequence of the protein (called synonymous SNPs), they may affect the virulence of the pathogen, resistance to antibiotic treatment, and hence the severity of the disease. The application of NGS in whole-genome sequencing is a technological advancement in the study of genetic variation and its association with bacterial virulence.

In a study to define the genotype of Salmonella enterica serovar Typhi (S. Typhi), the cause of typhoid fever, from Kathmandu, Nepal, using the Illumina platform, 1,500 SNPs were found representing eight distinct haplotypes from 62 S. Typhi isolates [4]. Sixty-eight percent of the isolates belonged to the H58 haplotype that is associated with multi-drug resistance (MDR), and appears to be the predominant cause of severe pediatric typhoid in Kathmandu [4]. By applying the same HTT method to haplotype S. Typhi isolates from all over the world, the researchers found that the ancestral MDR strain originated from the Indian continent. It has spread to other parts of the world, including Southeast Asia, Western Asia, and East Africa, where it has rapidly replaced local S. Typhi strains over the last 30 years, and causes high morbidity and mortality in these countries today [5]. In Malaysia, even though the incidence of the H58 haplotype is less than 1% of the isolates studied [6] (Figure 1), it is possible to infer from the genealogical diagram that Malaysia might experience the same “epidemic of transmission” as her neighboring countries in the future. Thus, HTTs can be applied to genotype (fingerprint) pathogens, as well as to deduce the probability of the pathogen mutating to a more virulent form in the future. 

Figure 1. Genealogical diagram of 282 Salmonella Typhi isolates from Kelantan, Malaysia, showing their haplotype distribution and divergence from their common ancestor (h9). Bold numerals denote the number of mutations between the nodes, whereas un-numbered branches denote single mutations between the nodes. The diameter of each node represents the number of isolates found in the haplotype. The H58 haplotype is annotated as h15 [7].

The field of transcriptomics represents a recent advance in HTT research. Unlike the genome, the expression of mRNA sequences (also called the transcriptome) changes with the external environment. RNA sequencing (also called RNA-Seq) benefits from the application of HTTs, which enable very large numbers of extracted RNAs to be sequenced simultaneously for transcriptome profiling. It facilitates precise measurement of specific RNA or transcript levels of RNA at a given time in an organism, providing further understanding of the development of the organism and its relationship to a disease. RNA-Seq has been applied to characterize the transcriptome of coding and non-coding genes of S. Typhi [8]. The combination of transcriptomic and proteomic data analysis has revealed the presence of an OmpR regulon that contributes to the pathogenicity of S. Typhi [8].

Recently, it has been shown that a group of non-coding RNA molecules produced by bacteria, known as small RNAs (sRNAs) or microRNAs (miRNAs) in eukaryotes, are involved in disease development. These non-coding RNAs enable precise gene regulation at the post-transcriptional level. Many researchers have reported changes in miRNA expression in several diseases, and have demonstrated that such changes have potential as a clinical biomarker of cancer [9]. In Salmonella infections, unique miRNAs with important immune functions have been implicated in disease development including modulation of the host’s immune system and fine-tuning of immune responses [10]. Moreover, Salmonella itself can produce miRNA-like RNAs that can potentiate bacterial virulence by suppressing the host’s immune system [11]. In Salmonella Typhimurium, 19 unique sRNAs have been found to affect the ability of the pathogen to adapt to environmental changes, replicate within mammalian host cells, and thus control the virulence and pathogenesis of the disease [12, 13].

ChIP-seq has been applied to gain a better understanding of the regulon and to identify novel genes within the Salmonella OmpR network. ChIP-seq combines chromatin-immunoprecipitation (ChIP) with next-generation DNA-sequencing to reveal how DNA interacts with proteins of interest, often the transcriptional targets in regulating gene expression that influence phenotype adaptation mechanisms [14]. The researchers showed that this OmpR protein interacts with DNA in regulating gene expression, which may contribute to the nutrient-scavenging ability of the pathogen in the inflamed intestine during microbial colonization [14].

Epigenetics involves genetic changes in ways other than by alterations in the DNA sequence itself. The chemical compounds of the epigenome are not part of the DNA sequence itself but are attached to the DNA bases. DNA methylation is an important epigenetic mechanism and is reported to influence the expression level of Salmonella virulence genes [15]. Quantitative analysis of DNA methylation patterns and the location of histone post-translational modifications using NGS technology has potential to uncover new diagnostic or prognostic biomarkers, which is not possible using conventional bioinformatics or quantitative polymerase chain reaction (PCR) methods [16]. The gene regulation associated with non-coding RNA is another important element that influences the virulence and pathogenesis of Salmonella [10, 13]. 


Microarray or chip-based technology enables the screening of hundreds or thousands of targets concurrently to quantify gene expression when subjected to various conditions. It has been applied to identify specific biomarkers related to diseases. Generally, probes (DNA oligonucleotides) or antigens (proteins, carbohydrates, or lipids) are precisely spotted onto glass microscope slides and hybridized with targets (labeled DNA or antibodies, respectively) under high-stringency conditions. The hybridization is usually detected and quantified by fluorescence-based detection of fluorescein-isothiocyanate (FITC)-labeled targets to determine the relative abundance of nucleic acids, proteins, carbohydrates, lipids, or metabolites in the test sample.

Hinchliffe et al. (2003) conducted a microarray study to find new biomarkers for Yersinia pestis, a gram-negative bacterium that causes Bubonic Plague (a highly lethal and rapidly progressing necrotizing pneumonia). DNA sequences representing 100% of the 4,221 predicted coding sequences of Y. pestis were amplified and spotted onto microarray slides [17]. The genomic differences among 11 species of the genus Yersinia were analyzed using this Y. pestis-specific DNA microarray. The results revealed 292 chromosomal genes commonly shared by all the tested Yersinia species, and 16 genes were found to be specific to Y. pestis. This genomic difference from other serotypes is important for the pathogen’s adaptation in different environmental niches, and can be used as a biomarker for the pathogen [18]. In a different study by Han et al. (2007) investigated the transcriptional response of the bacteria under multiple environmental perturbations using cDNA microarray technology. The virulence genes of Y. pestis were found to be differentially regulated under low pH, nutrient limitation, oxygen stress, and starvation conditions, which explains its ability to survive in the host’s hostile environment [19].

The most commonly used microarrays are protein microarrays. They are used to study the interactions and activities of proteins, and also to determine their functionality in a disease. However, the major problem with protein arrays is the difficulty in obtaining pure proteins for the test. Among 202 virulence genes from Y. pestis selected for cloning and expression, only 172 were successfully expressed and only 149 were purified and spotted onto microarray slides. Thirteen of the proteins showed very strong antigenicity in immunized rabbits, and provided new protein biomarkers for the development of vaccines and diagnostics [20].

Carbohydrates are complex sugars that are present on the surface of pathogens, i.e., the cell membranes and secreted proteins. Since carbohydrates are less mutable than proteins and can be specific to each pathogen, highly sensitive oligosaccharide microarrays containing specific bacterial carbohydrates were probed with antibodies elicited during infections in the host for biomarker identification. However, a major problem with carbohydrate analysis is the difficulty of lipopolysaccharide (LPS) isolation. Thus, the LPS antigens were produced synthetically [21]. Polysaccharides representing the LPS inner core structures of various gram-negative bacteria were prepared using synthetic oligosaccharides ranging from mono- to tetra-saccharides. The microarray screening showed antibodies that specifically recognize a Y. pestis trisaccharide. This analysis helped to uncover the structural and chemical elements that determine the specificity and selectivity of LPS recognition by these antibodies [22].

Lipids are important as cell structural components and as signaling transduction molecules. Lipid A is reported to be a component of the endotoxin that is responsible for the activation of the immune response in Y. pestis infection [23]; lipid microarrays have been developed for its detection and have been reported to be effective [24]. Therefore, it is possible that lipids may act as potential biomarkers in the future.

Mass spectrometry (MS)

In addition to microarray, MS is another technologically important advancement in biomarker identification. MS is commonly applied in analytical laboratories for quantitative and qualitative studies on a variety of biomolecules, including nucleic acids, proteins, lipids, carbohydrates, and metabolites. It involves measuring the mass-to-charge ratio (m/z) of the molecule. MS has been successfully applied for the molecular diagnosis of infectious diseases including the identification of pathogens, strain-typing, and the detection of antibiotic-resistant strains [25]. To obtain the m/z value, the molecules must be vaporized or ionized. Several types of MS are now available; they depend on ionization techniques such as matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI). The choice of technique depends on the sample preparation required. For example, MALDI is commonly applied for two-dimensional gel electrophoresis (2-DE), where the unknown protein spot is excised from the gel before proceeding to MS analysis. MALDI is typically applied with a time-of-flight (TOF) analyzer and has certain advantages over ESI-MS. Data interpretation in MALDI-TOF is greatly simplified owing to the presence of singly charged ions (+1), and it is therefore more suitable for the analysis of complex mixtures where separation is not critical. In ESI, multiple charged ions are generated (+2, +3, +4, etc.). Therefore, it is compatible with chromatography separation, i.e., liquid chromatography (LC), gas chromatography (GC), or capillary electrophoresis (CE) using tandem MS analyzers, such as triple quadrupoles (QqQ), quadrupole ion trap, and Fourier transform ion cyclotron resonance (FT-ICR) [26]. However, a study comparing the analytical performance of LC-ESI-MS/MS and MALDI-MS/MS showed that both MS methods increases the proteomic coverage for complex samples and more unique proteins were observed using LC-ESI compared with MALDI alone [27].


The integration of chromatographic methods into MS analysis has led to the discovery of more biomarkers and a better understanding of their application to the diagnosis and prognosis of disease. Chromatography allows discrimination between proteins of the same mass but has different retention times, and is therefore suitable for separating stereoisomers and distinguishing between closely related strains that vary by one or a few proteins. However, it provides little information about the compound. Thus, the combination of chromatography and MS could help to characterize the identified biomarkers. It is also notable that a matching MS database is important for targeted sequence identification. A study conducted by Lau et al. (2015) using a combination of ultra-high performance liquid chromatography (ultra-HPLC) and MS (ESI-Q-TOF) to identify unique Mycobacterium tuberculosis metabolites detected 24 metabolites in the ion chromatogram, but only seven could be identified successfully using MS owing to a lack of information in currently available databases [28].

M. tuberculosis causes tuberculosis. Its cell wall is rich in lipids and is important as a barrier to interaction with the environment. Analysis of M. tuberculosis glycolipids using fast-atom-bombardment (FAB)-MS and GC-MS revealed the presence of toxic glycolipids, which play an important role as virulence factors in the early stage of infection and the expression of pathogenicity [29]. GC-MS has also been applied to analyze mycobacterial lipids directly from sputum samples for microorganism speciation to differentiate tuberculosis (TB) from non-tuberculosis (NTB) due to the rise of infections and antimicrobial resistance in this genus [30, 31]. Generally, HPLC is preferable because it has superior resolving power when separating a mixture of compounds. It has been used with computerized software to generate mycolic acid patterns for the identification of mycobacterial species, a solution known as the Sherlock Mycobacteria Identification System (SMIS) [32]. Moreover, HPLC has been applied for the purification of lipoarabinomannan (LAM), a surface-exposed lipoglycan that is important for intracellular survival and the latency of M. tuberculosis [33], and an immunodominant 38-kDa lipoprotein antigen, a phosphate-binding protein that serves as an initial receptor for the active transport of nutrients in M. tuberculosis [34, 35]. The former antigen is used in a commercially available test, MycoDot [36], and the latter is used in Pathozyme-Myco, Pathozyme-TB, and ICT diagnostics kits [37] for the detection of IgG antibodies in tuberculosis patients. Furthermore, the introduction of LC-MS together with analysis databases, such as LipidDB [38], MycoMass, and MycoMap [39], has provided comprehensive lipid profiling for M. tuberculosis. This has enabled strain differentiation and the identification of specific molecules, which may be used as biomarkers.

The cell wall of M. tuberculosis contains sugars and glycoproteins that play a critical role in host–pathogen interactions, antigenicity, and virulence [40]. The fundamental importance of N-glycans in biological processes and the alteration of N-glycosylation patterns during disease states are key to the diagnosis of tuberculosis. Hydrophilic interaction liquid chromatography (HILIC) can help to enrich and separate the highly hydrophilic glycan chains of glycopeptides from non-glycopeptides [41]. HILIC and MS have also been applied to identify lipids [42] and metabolites [43], which are involved in dynamic interactions within the cell. Capillary electrophoresis (CE) is also suitable for the separation of polar and charged compounds such as glycans. CE/ESI-MS [44] and CE/MALDI-TOF [45] have helped to characterize manno-oligosaccharide caps from lipoarabinomannans (LAMs), which are amphipathic, complex glycoconjugates found in mycobacterial cell walls that alter macrophage functions and enhance M. tuberculosis intra-macrophagic survival.

A variety of isotopic labeling techniques have been applied for relative quantitation, including isobaric tagging (iTRAQ or TMT-tagging), non-isobaric tagging (mTRAQ or acetylation), and label-free quantitation techniques (spectral counting). Two-dimentional difference gel electrophoresis (2D-DIGE) is a technique in which proteins are labeled with light and heavy isotopes at different reactive sites of the peptides and proteins; it can be directly incorporated with MS for relative quantification. It has been reported that iTRAQ labeling is superior to mTRAQ, and has helped to quantify three times more phosphopeptides and twice as many proteins as mTRAQ [46]. In a study to identify potential serum biomarkers for M. tuberculosis using iTRAQ-coupled LC-MS/MS, 100 proteins were differentially expressed; 45 proteins were upregulated and 55 proteins were downregulated in pulmonary TB patients’ sera. These proteins were found to be immune response stimulators, and four were antigenic for TB diagnosis [47]. iTRAQ also facilitated the discovery of mycobacterium-specific biomarkers, where it was used to elucidate the pathogenesis of two mycobacterial diseases at the cellular and molecular levels [48].

MS can also be coupled with conventional methods such as PCR for the surveillance of pathogens. Simner et al. (2013) used this approach of combining a broad-range PCR methods with ESI-MS to simultaneously detect M. tuberculosis and non-tuberculous mycobacterial (NTM) directly from cultures (solid or broth). This method allows the identification of microorganisms, and provides phenotypic drug resistance patterns to physicians for patient management within 6 hours, compared with standard laboratory sequencing methods, which take several days [49].

Gel-based electrophoresis

Two-dimensional gel electrophoresis (2-DE) helps to separate mixtures of molecules (proteins, lipoproteins, and glycoproteins) according to their isoelectric points and molecular weights prior to subjecting them to MS analysis for molecular identification. A study was carried out to identify the proteins expressed in S. Typhi biofilms in an environment mimicking the human gallbladder, which is the niche site for chronic typhoid carriage. S. Typhi resides in the gallbladder and forms a biofilm that enables it to evade the host immune system and resist bile and antibiotics. It is postulated that biofilm proteins could serve as biomarkers for the diagnosis of asymptomatic typhoid carriers, which remain elusive because stool culture methods are insensitive. 2-DE gel analysis of the biofilm proteins showed 15 unique protein spots when compared with non-biofilm proteins (planktonic) of the bacteria using PDQuest software (Figure 2), and downstream analyses were conducted using MALDI-TOF/TOF MS to identify the proteins. One of the spots, 4707, was found to be due to TolC (Figure 3), an outer membrane efflux protein that determines the virulence and pathogenesis of S. Typhi. Furthermore, the quantitative capability of MS showed that the amount of TolC protein produced was dependent on the amount of bile challenge (Figure 4). Structural and functional studies of a 50-kDa outer-membrane protein reported to be specific to S. Typhi, which forms the antigen used in the commercial TyphidotTM test for the diagnosis of typhoid fever, showed that this antigen is indeed a variant of the TolC protein [50, 51].

Figure 2. 2-D polyacrylamide gel electrophoresis (PAGE) protein profiles of Salmonella Typhi biofilm cells (left) and planktonic cells (right). The red arrows indicate the location of unique protein spots found by comparing biofilm with planktonic cells. Spot 4704 (circled in purple) is identified as S. Typhi virulence protein TolC.

Figure 3. MASCOT data for Spot 4704 indicating that the unknown biomarker associated with biofilm adaptation in Salmonella Typhi was indeed the outer membrane protein TolC.

Figure 4. Effect of bile concentration on TolC protein expression in Salmonella Typhi biofilm. Red arrows indicate the position of the TolC protein and the size of the peaks represents the protein concentration in 3-D view.

2-DE can easily and efficiently interface with other biochemical techniques such as immunoblotting to deduce the identity of antigens [52]. A study was carried out to identify the immunogenic proteins of Salmonella Typhi in typhoid fever sera. The major problem with using serum to detect the antigens produced by the microbe inside the host is that they are small and exist only in small amounts, and can therefore be masked by more abundant proteins in the serum of the host. The in vivo proteins of the pathogen might be relevant to the disease state of the host [53]. Affinity chromatography fractionation was applied to remove the abundant proteins from serum (albumin and IgG) (Figure 5). Analyses on the fractionated sera were carried out using 2-DE western blotting, and the results showed that the sera of typhoid fever subjects reacted with proteins from the typhoid fever fractionated sera. Spot M13 was found to contain the immunogenic proteins of S. Typhi (Figure 6). Further downstream analysis by LC-MS/MS on spot M13 revealed the presence of two S. Typhi proteins, namely hemolysin E (HlyE) and tryptophan tRNA ligase (Figure 7). Other studies using microarray [54] and immunoaffinity proteomics-based technology [55] have also established the importance of HlyE in typhoid fever, and it was therefore selected and cloned for further immunogenicity studies. The diagnostic sensitivity and specificity of the recombinant HlyE protein was tested using indirect ELISA, and was found to be 70% sensitive and 100% specific [56], which supports its use for the diagnosis of acute typhoid fever.

Figure 5. Serum protein profiles of typhoid fever subjects before (a) and after (b) fractionation using an affinity chromatography column that contained Cibacron Blue dye and Protein A to remove serum albumin and IgG, respectively. Fractionation helps to remove these highly abundant proteins, and concentrates low abundance and low-molecular weight (below 20 kDa) proteins (red box).

Figure 6. 2-DE western blotting analyses of fractionated pooled typhoid fever sera (a) and pooled normal control sera (b) blotted with pooled typhoid fever sera and secondary horseradish peroxidase (HRP)-conjugated rabbit anti-human IgM. Gel image analysis was conducted using GelScape software. Spots that were reactive with typhoid fever sera are marked in red, whereas spots that were not reactive with typhoid fever sera are marked in yellow. Spot M13 (circled in black), which showed reactivity with the antibody of typhoid fever sera, was excised from the 2-DE gel, and the protein was subjected to further analysis using LC-MS/MS.

Figure 7. MASCOT data analysis of Spot 13 revealed the presence of two Salmonella Typhi proteins (hemolysin E and tryptophan tRNA ligase).

A variant of the 2-DE method is 2D-DIGE, where multiple samples can be labeled with size-matched and charge-matched spectrally resolvable fluorescent cyanine dyes (for example Cy2, Cy3, and Cy5) before electrophoretically separating in the same gel. Depending on the excitation wavelength of the dye, each sample can be observed separately. This method helps to increase statistical confidence in differential biomolecule expression studies because differences in spot fluorescence intensity are mainly due to biological and not technical variation. This helps to minimize inter-gel variability, abolish the time needed to stain the gels, and reduce the cost, because proteins from different samples may be compared visually at the same time.

Many comparison studies applying 2D-DIGE with MS have successfully identified the unique proteins that are involved in Leishmaniasis virulence and metabolism during infection [57], or under stressful environment conditions [58], and have discovered antigens that have inter-species differential diagnostic values [59] and are expressed at different stages of the parasitic life cycle [60]. 2D-DIGE offers the advantages of sub-nanogram sensitivity because it is able to detect small differences in protein concentrations between control and test samples. Rukmangadachar et al. (2011) reported that the sensitive fluorescent dye can differentially identify 26 spots associated with Leishmaniasis infection; these spots cannot be visualized by Coomassie staining owing to the low sensitivity (10 ng) of the staining method and the scarcity of the proteins [61].

Nuclear magnetic-resonance spectrometry (NMR)

Researchers can perform MS to elucidate the size of the targets but not their structure. Nuclear magnetic resonance spectrometry (NMR) has become the preferred method for determining the structure of organic compounds and characterizing their elemental composition, the order of their atoms, and their stereo-chemical orientation. There are two types of NMR spectrometer: continuous wave (CW) and Fourier transform (FT). FT-NMR is widely applied because it is the more efficient of the two types. It is faster because all frequencies can be excited at once without the need to sweep, and multiple scanning is therefore possible, which improves the signal-to-noise ratio. However, FT is more expensive because it uses a liquid helium cooling system, whereas CW uses water-cooled electromagnets.

NMR is widely applied for metabolomics, which is the study of low-molecular weight biochemical compounds that reflect the physiological state of an organism, i.e., they are used to compare diseased and healthy states. Metabolic profiling has been performed to elucidate host–parasite interactions and to discover infection-related metabolite patterns. NMR spectroscopy is the optimum method for the detection of small amounts of the O-chain structures that characterize malarial parasites. Four species of human malaria parasites have been reported, but Plasmodium falciparum is responsible for most malaria-attributed morbidity and mortality. Parasite-specific waste molecules are secreted in high concentrations in the urine, saliva, or sweat of malaria-infected patients. In an infected mouse model, three urinary metabolites from P. berghei were identified as candidate biomarkers for malaria diagnosis using NMR for chemical structure identification coupled with LC for separation and MS for determination of the parent ion or fragmental ion molecular weights [62]. Another malarial species, P. vivax, which is generally reported to cause a benign disease in the Indian subcontinent, has recently caused severe pathological complications such as impaired liver and kidney function. Differential expression levels of some urinary metabolites in P. vivax-infected individuals were identified using NMR spectroscopy when compared with non-malarial fever patients and healthy controls [63]. Since the urinary metabolites identified from different species showed variations, these differential metabolites could be potential biomarkers for species-specific diagnosis, removing the challenges presented by an invasive biopsy approach to clinical diagnosis.

Bioinformatics: available databases for target analyses 

HTTs with bioinformatics and high-throughput data analysis have significantly improved biomarker discovery. First, powerful multi-tasking computers running bioinformatics software allow the storage, retrieval, organization, and analysis of large amounts of raw data from image or signal processing. Second, on-line databases make libraries of life-science information, which are collected from scientific experiments and published articles, readily available for in silico experimentation and computational analyses. These two major advances in bioinformatics have helped process raw data into useful biological information to assist biomarker discovery. Based on predicted nucleic acid sequences, database searches and suitable algorithms could help identify target genes or proteins, and elucidate their functions, structures, and locations within the cell. Specialized software such as Molecular Evolutionary Genetic Analysis (MEGA) turn voluminous data into visual diagrams that help us understand host–pathogen interactions, metabolic pathways, and evolutionary biology.  HTTs have been successfully applied for biomarker discovery (Tables 2-7; Figure 8). 

Table 2. List of genomic databases [64-66]

Table 3. List of RNAs databases [67-73].

Table 4. List of proteomics databases [74-78]

Table 5. List of lipidomics databases [79-81]

Table 6. List of glycomics databases [82, 83]

Table 7. List of metabolomics databases [84-86]

Figure 8. Summary of high-throughput technology (HTT) platforms and their applications in biomarker discovery.


HTTs have been successfully applied for biomarker discovery (Figure 8). With further validation studies on biomarkers and advances in miniaturization technology, HTTs will one day be routinely applied for the diagnosis of infectious diseases, and will be available and accessible to people in low-resource settings. This is important for the future development of the “point-of-care” diagnostics agenda, and for meeting the World Health Organization’s “ASSURED” criteria: Affordable, Sensitive, Specific, User-friendly, Rapid & Robust, Equipment-free, and Delivered, for the entire world [87].


  1. de Magalhaes JP, Finch CE, Janssens G. (2010) Next-generation sequencing in aging research: emerging applications, problems, pitfalls and possible solutions. Ageing Res Rev 9(3): p. 315-23.
  2. Dunne WM Jr, Westblade LF, Ford B. (2012) Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory. Eur J Clin Microbiol Infect Dis, 31(8): p. 1719-26.
  3. Morozova O, Marra MA. (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5): p. 255-64.
  4. Holt, K.E., et al., High-throughput bacterial SNP typing identifies distinct clusters of Salmonella Typhi causing typhoid in Nepalese children. BMC Infect Dis, 2010. 10: p. 144.
  5. Wong, V.K., et al., Phylogeographical analysis of the dominant multidrug-resistant H58 clade of Salmonella Typhi identifies inter- and intracontinental transmission events. Nat Genet, 2015. 47(6): p. 632-9.
  6. Ja’afar, J.a.N., et al., Single Nucleotide Polymorphism Genotyping of Salmonella Enterica Serovar Typhi Isolates in Kelantan Malaysia Using Pyrosequencing Assigned Haplotypes. Malaysian Journal of Public Health Medicine, 2013. 13(2): p. 3.
  7. Ja'afar, J.a.N., et al., Epidemiological analysis of typhoid fever in Kelantan from a retrieved registry. Malaysian Journal of Microbiology, 2013. 9(2): p. 147-151.
  8. Perkins, T.T., et al., A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet, 2009. 5(7): p. e1000569.
  9. Etheridge, A., et al., Extracellular microRNA: a new source of biomarkers. Mutat Res, 2011. 717(1-2): p. 85-90.
  10.  Ordas, A., et al., MicroRNA-146 function in the innate immune transcriptome response of zebrafish embryos to Salmonella typhimurium infection. BMC Genomics, 2013. 14: p. 696.
  11. Gu, H., et al., A Salmonella-encoded microRNA-like RNA facilitates bacterial invasion and intracellular replication via suppressing host cell inducible nitric oxide synthase. The Journal of Immunology, 2014. 192(132.9).
  12. Hebrard, M., et al., sRNAs and the virulence of Salmonella enterica serovar Typhimurium. RNA Biol, 2012. 9(4): p. 437-45.
  13. Padalon-Brauch, G., et al., Small RNAs encoded within genetic islands of Salmonella typhimurium show host-induced expression and role in virulence. Nucleic Acids Res, 2008. 36(6): p. 1913-27.
  14. Perkins, T.T., et al., ChIP-seq and transcriptome analysis of the OmpR regulon of Salmonella enterica serovars Typhi and Typhimurium reveals accessory genes implicated in host colonization. Mol Microbiol, 2013. 87(3): p. 526-38.
  15. Lopez-Garrido, J. and J. Casadesus, Regulation of Salmonella enterica pathogenicity island 1 by DNA adenine methylation. Genetics, 2010. 184(3): p. 637-49.
  16. Redshaw, N., et al., Quantification of epigenetic biomarkers: an evaluation of established and emerging methods for DNA methylation analysis. BMC Genomics, 2014. 15(1): p. 1174.
  17. Hinchliffe, S.J., et al., Application of DNA microarrays to study the evolutionary genomics of Yersinia pestis and Yersinia pseudotuberculosis. Genome Res, 2003. 13(9): p. 2018-29.
  18. Wang, X., et al., Yersinia genome diversity disclosed by Yersinia pestis genome-wide DNA microarray. Can J Microbiol, 2007. 53(11): p. 1211-21.
  19. Han, Y., et al., Comparative transcriptomics in Yersinia pestis: a global view of environmental modulation of gene expression. BMC Microbiol, 2007. 7: p. 96.
  20. Li, B., et al., Protein microarray for profiling antibody responses to Yersinia pestis live vaccine. Infect Immun, 2005. 73(6): p. 3734-9.
  21. Anish, C., et al., Plague detection by anti-carbohydrate antibodies. Angew Chem Int Ed Engl, 2013. 52(36): p. 9524-8.
  22. Broecker, F., et al., Epitope recognition of antibodies against a Yersinia pestis lipopolysaccharide trisaccharide component. ACS Chem Biol, 2014. 9(4): p. 867-73.
  23. Amedei, A., et al., Role of immune response in Yersinia pestis infection. J Infect Dev Ctries, 2011. 5(9): p. 628-39.
  24. Saliba, A.E., et al., A quantitative liposome microarray to systematically characterize protein-lipid interactions. Nat Methods, 2014. 11(1): p. 47-50.
  25. Nomura, F., Proteome-based bacterial identification using matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS): A revolutionary shift in clinical diagnostic microbiology. Biochim Biophys Acta, 2014.
  26. Everley, R.A., et al., Liquid chromatography/mass spectrometry characterization of Escherichia coli and Shigella species. J Am Soc Mass Spectrom, 2008. 19(11): p. 1621-8.
  27. Yang, Y., et al., A comparison of nLC-ESI-MS/MS and nLC-MALDI-MS/MS for GeLC-based protein identification and iTRAQ-based shotgun quantitative proteomics. J Biomol Tech, 2007. 18(4): p. 226-37.
  28. Lau, S.K., et al., Identification of specific metabolites in culture supernatant of Mycobacterium tuberculosis using metabolomics: exploration of potential biomarkers. Emerging Microbes & Infections, 2015. 4: p. 1-10.
  29. Fujiwara, N., [Distribution of antigenic glycolipids among Mycobacterium tuberculosis strains and their contribution to virulence]. Kekkaku, 1997. 72(4): p. 193-205.
  30. Dang, N.A., H.G. Janssen, and A.H. Kolk, Rapid diagnosis of TB using GC-MS and chemometrics. Bioanalysis, 2013. 5(24): p. 3079-97.
  31. Dang, N.A., et al., Validation of biomarkers for distinguishing Mycobacterium tuberculosis from non-tuberculous mycobacteria using gas chromatography-mass spectrometry and chemometrics. PLoS One, 2013. 8(10): p. e76263.
  32. Kellogg, J.A., et al., Application of the Sherlock Mycobacteria Identification System using high-performance liquid chromatography in a clinical laboratory. J Clin Microbiol, 2001. 39(3): p. 964-70.
  33. Venisse, A., et al., Structural features of lipoarabinomannan from Mycobacterium bovis BCG. Determination of molecular mass by laser desorption mass spectrometry. J Biol Chem, 1993. 268(17): p. 12401-11.
  34. Devi, K.R., et al., Purification and characterization of three immunodominant proteins (38, 30, and 16 kDa) of Mycobacterium tuberculosis. Protein Expr Purif, 2002. 24(2): p. 188-95.
  35. Chang, Z., et al., The immunodominant 38-kDa lipoprotein antigen of Mycobacterium tuberculosis is a phosphate-binding protein. J Biol Chem, 1994. 269(3): p. 1956-8.
  36. Somi, G.R., et al., Evaluation of the MycoDot test in patients with suspected tuberculosis in a field setting in Tanzania. Int J Tuberc Lung Dis, 1999. 3(3): p. 231-8.
  37. Pottumarthy, S., V.C. Wells, and A.J. Morris, A comparison of seven tests for serological diagnosis of tuberculosis. J Clin Microbiol, 2000. 38(6): p. 2227-31.
  38. Sartain, M.J., et al., Lipidomic analyses of Mycobacterium tuberculosis based on accurate mass measurements and the novel "Mtb LipidDB". J Lipid Res, 2011. 52(5): p. 861-72.
  39. Layre, E., et al., A comparative lipidomics platform for chemotaxonomic analysis of Mycobacterium tuberculosis. Chem Biol, 2011. 18(12): p. 1537-49.
  40. Sonawane, A., et al., Role of glycans and glycoproteins in disease development by Mycobacterium tuberculosis. Crit Rev Microbiol, 2012. 38(3): p. 250-66.
  41. Mechref, Y., et al., Quantitative glycomics strategies. Mol Cell Proteomics, 2013. 12(4): p. 874-84.
  42. Cajka, T. and O. Fiehn, Comprehensive analysis of lipids in biological systems by liquid chromatography-mass spectrometry. Trends Analyt Chem, 2014. 61: p. 192-206.
  43. Cubbon, S., et al., Metabolomic applications of HILIC-LC-MS. Mass Spectrom Rev, 2010. 29(5): p. 671-84.
  44. Monsarrat, B., et al., Characterization of mannooligosaccharide caps in mycobacterial lipoarabinomannan by capillary electrophoresis/electrospray mass spectrometry. Glycobiology, 1999. 9(4): p. 335-42.
  45. Ludwiczak, P., et al., Structural characterization of Mycobacterium tuberculosis lipoarabinomannans by the combination of capillary electrophoresis and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Anal Chem, 2001. 73(10): p. 2323-30.
  46. Mertins, P., et al., iTRAQ labeling is superior to mTRAQ for quantitative global proteomics and phosphoproteomics. Mol Cell Proteomics, 2012. 11(6): p. M111 014423.
  47. Xu, D.D., et al., Discovery and identification of serum potential biomarkers for pulmonary tuberculosis using iTRAQ-coupled two-dimensional LC-MS/MS. Proteomics, 2014. 14(2-3): p. 322-31.
  48. Seth, M., et al., Biomarker discovery in subclinical mycobacterial infections of cattle. PLoS One, 2009. 4(5): p. e5478.
  49. Simner, P.J., et al., Identification of Mycobacterium species and Mycobacterium tuberculosis complex resistance determinants by use of PCR-electrospray ionization mass spectrometry. J Clin Microbiol, 2013. 51(11): p. 3492-8.
  50. Ismail, A., Z.S. Kader, and O. Kok-Hai, Dot enzyme immunosorbent assay for the serodiagnosis of typhoid fever. Southeast Asian J Trop Med Public Health, 1991. 22(4): p. 563-6.
  51. Choong, Y.S., et al., Structural and functional studies of a 50 kDa antigenic protein from Salmonella enterica serovar Typhi. J Mol Graph Model, 2011. 29(6): p. 834-42.
  52. Rabilloud, T., et al., Two-dimensional gel electrophoresis in proteomics: Past, present and future. J Proteomics, 2010. 73(11): p. 2064-77.
  53. Tirumalai, R.S., et al., Characterization of the low molecular weight human serum proteome. Mol Cell Proteomics, 2003. 2(10): p. 1096-103.
  54. Liang, L., et al., Immune profiling with a Salmonella Typhi antigen microarray identifies new diagnostic biomarkers of human typhoid. Sci Rep, 2013. 3: p. 1043.
  55. Charles, R.C., et al., Characterization of anti-Salmonella enterica serotype Typhi antibody responses in bacteremic Bangladeshi patients by an immunoaffinity proteomics-based technology. Clin Vaccine Immunol, 2010. 17(8): p. 1188-95.
  56. Ong, E.B., et al., Multi-isotype antibody responses against the multimeric Salmonella Typhi recombinant hemolysin E antigen. Microbiol Immunol, 2015. 59(1): p. 43-7.
  57. Rukmangadachar, L.A., et al., Two-dimensional difference gel electrophoresis (DIGE) analysis of sera from visceral leishmaniasis patients. Clin Proteomics, 2011. 8(1): p. 4.
  58. Chromy, B.A., et al., Proteomic characterization of Yersinia pestis virulence. J Bacteriol, 2005. 187(23): p. 8172-80.
  59. Bien, J., et al., Comparative analysis of excretory-secretory antigens of Trichinella spiralis and Trichinella britovi muscle larvae by two-dimensional difference gel electrophoresis and immunoblotting. Proteome Sci, 2012. 10(1): p. 10.
  60. Costa, M.M., et al., Analysis of Leishmania chagasi by 2-D difference gel electrophoresis (2-D DIGE) and immunoproteomic: identification of novel candidate antigens for diagnostic tests and vaccine. J Proteome Res, 2011. 10(5): p. 2172-84.
  61. Weiss, W., F. Weiland, and A. Gorg, Protein detection and quantitation technologies for gel-based proteome analysis. Methods Mol Biol, 2009. 564: p. 59-82.
  62. Tritten, L., et al., Metabolic profiling framework for discovery of candidate diagnostic markers of malaria. Sci Rep, 2013. 3: p. 2769.
  63. Sengupta, A., et al., Global host metabolic response to Plasmodium vivax infection: a 1H NMR based urinary metabonomic study. Malar J, 2011. 10: p. 384.
  64. Field, D., E.J. Feil, and G.A. Wilson, Databases and software for the comparison of prokaryotic genomes. Microbiology, 2005. 151(Pt 7): p. 2125-32.
  65. Catanho, M. and A.B. Miranda, Comparing genomes:databases and computational tools for comparative analysis of prokaryotic genomes. Electronic Journal of Communication Information & Innovation in Health, 2007. 1(2): p. 334-355.
  66. Catanho, M. and A.B. Miranda, Bioinformatics and TB vaccine development: A comparative genomic approach. The art & Science of Tuberculosis Vaccine Development, 2014. 2nd edition(Chapter 3.2): p. 435-450.
  67. Andronescu, M., et al., RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics, 2008. 9: p. 340.
  68. Gan, H.H., et al., RAG: RNA-As-Graphs database--concepts, analysis, and features. Bioinformatics, 2004. 20(8): p. 1285-91.
  69. Pundhir, S. and J. Gorodkin, MicroRNA discovery by similarity search to a database of RNA-seq profiles. Front Genet, 2013. 4: p. 133.
  70. Bao, H., et al., MicroRNA buffering and altered variance of gene expression in response to Salmonella infection. PLoS One, 2014. 9(4): p. e94352.
  71. Krek, A., et al., Combinatorial microRNA target predictions. Nat Genet, 2005. 37(5): p. 495-500.
  72. Lewis, B.P., et al., Prediction of mammalian microRNA targets. Cell, 2003. 115(7): p. 787-98.
  73. Maragkakis, M., et al., Accurate microRNA target prediction correlates with protein repression levels. BMC Bioinformatics, 2009. 10: p. 295.
  74. Apweiler, R., A. Bairoch, and C.H. Wu, Protein sequence databases. Curr Opin Chem Biol, 2004. 8(1): p. 76-80.
  75. Berggard, T., S. Linse, and P. James, Methods for the detection and analysis of protein-protein interactions. Proteomics, 2007. 7(16): p. 2833-42.
  76. Schmidt, A., I. Forne, and A. Imhof, Bioinformatic analysis of proteomics data. BMC Syst Biol, 2014. 8 Suppl 2: p. S3.
  77. Yang, J.M. and C.H. Tung, Protein structure database search and evolutionary classification. Nucleic Acids Res, 2006. 34(13): p. 3646-59.
  78. Castrignano, T., et al., The PMDB Protein Model Database. Nucleic Acids Res, 2006. 34(Database issue): p. D306-9.
  79. Yetukuri, L., et al., Bioinformatics strategies for lipidomics analysis: characterization of obesity related hepatic steatosis. BMC Syst Biol, 2007. 1: p. 12.
  80. Oresic, M., Bioinformatics and computational approaches applicable to lipidomics. European Journal of Lipid Science and Technology, 2009. 111: p. 99-106.
  81. Wheelock, C.E., et al., Bioinformatics strategies for the analysis of lipids. Methods Mol Biol, 2009. 580: p. 339-68.
  82. Baycin Hizal, D., et al., Glycoproteomic and glycomic databases. Clin Proteomics, 2014. 11(1): p. 15.
  83. Aoki-Kinoshita, K.F., Using databases and web resources for glycomics research. Mol Cell Proteomics, 2013. 12(4): p. 1036-45.
  84. Shulaev, V., Metabolomics technology and bioinformatics. Brief Bioinform, 2006. 7(2): p. 128-39.
  85. Fitzpatrick, M.A., C.M. McGrath, and S.P. Young, Pathomx: an interactive workflow-based tool for the analysis of metabolomic data. BMC Bioinformatics, 2014. 15(1): p. 396.
  86. Johnson, C.H., et al., Bioinformatics: the next frontier of metabolomics. Anal Chem, 2015. 87(1): p. 147-56.
  87. Wu, G. and M.H. Zaman, Low-cost tools for diagnosing and monitoring HIV infection in low-resource settings. Bull World Health Organ, 2012. 90(12): p. 914-20.