Advanced course in Bioinformatics

Course No. BTP03

Duration: 6 month with a minimum of 90 working days.
Eligibility: Graduate/ Post graduate students who have done a basics course in Bioinformatics are eligible to apply for the advanced course. Students, who have completed/ are pursuing the diploma or degree course in Bioinformatics are also eligible to apply. 
Mode of training: The training will consist of lectures; delivered by competent resource persons and practical demonstrations and a project work.
Examinations:  There will be written examinations at the end of the course. The project dissertation will be evaluated by the internal and external experts.

Maximum: 24 credits

Theory course

Credits: 8

Maximum Marks: 200

Minimum contact hours/ week: 8


Sequence analysis: Scoring matrices:

  • Detailed method of derivation of the PAM and BLOSUM matrices

Pairwise sequence alignments:

  • Needleman and Wuncsh, Smith and Waterman algorithms and their implementation

Multiple sequence alignments (MSA):

Use of HMM-based Algorithm for MSA (e.g. SAM Method) 

Sequence patterns and profiles:

  • Repeats: Tandem and Interspersed repeats, repeat finding, Motifs, consensus, position weight matrices
  • Algorithms for derivation of and searching sequence patterns: MEME, PHI-BLAST, SCanProsite and PRATT
  • Algorithms for generation of sequence profiles: Profile Analysis method of Gribskov, HMMer, PSI-BLAST

Protein and nucleic acid properties: e.g. Proteomics tools at the ExPASy server and EMBOSS Taxonomy and phylogeny:

  • Phylogenetic analysis algorithms such as maximum Parsimony, UPGMA, Transformed Distance, Neighbors-Relation, Neighbor-Joining, Probabilistic models and associated algorithms such as Probabilistic models of evolution and maximum likelihood algorithm, Bootstrapping methods, use of tools such as Phylip, Mega, PAUP
  • Analysis of regulatory RNAs: Databases and tools

Structural Biology:

  • Experimental methods for Biomolecular structure determination: X-ray and NMR
  • Identification/assignment of secondary structural elements from the knowledge of 3-D structure of macromolecule using DSSP and STRIDE methods
  • Prediction of secondary structure: PHD and PSI-PRED methods

Tertiary Structure prediction:

  • Fundamentals of the methods for 3D structure prediction (sequence similarity/identity of target proteins of known structure, fundamental principles of protein folding etc.) Homology Modeling, fold recognition, threading approaches, and ab-initio structure prediction methods

Structure analysis and validation:

  • Pdbsum, Whatcheck, Procheck,Verify3D and ProsaII
  • Critical assesment of Structure prediction(CASP)
  • Structures of oligomeric proteins and study of interaction interfaces

Molecular modeling and simulations:

  • Macro-molecular force fields, salvation, long-range forces
  • Geometry optimization algorithms: Steepest descent, conjugate gradient
  • Various simulation techniques: Molecular mechanics, conformational searches, Molecular Dynamics, Monte Carlo, genetic algorithm approaches, Rigid and Semi-Flexible Molecular Docking


  • Large scale genome sequencing strategies
  • Genome assembly and annotation
  • Genome databases of Plants, animals and pathogens
  • Metagenomics
  • Gene networks: basic concepts, computational model such as Lambda receptor and lac operon
  • Prediction of genes, promoters, splice sites, regulatory regions: basic principles, application of methods to prokaryotic and eukaryotic genomes and interpretation of results
  • Basic concepts on identification of disease genes, role of bioinformatics-OMIM database, reference genome sequence, integrated genomic maps, gene expression profiling; identification of SNPs, SNP database (DbSNP). Role of SNP in Pharmacogenomics, SNP arrays
  • DNA microarray: database and basic tools, Gene Expression Omnibus (GEO), ArrayExpress, SAGE databases
  • DNA microarray: understanding of microarray data, normalizing microarray data, detecting differential gene expression, correlation of gene expression data to biological process and computational analysis tools (especially clustering approaches)

Comparative genomics:

  • Basic concepts and applications, BLAST2, MegaBlast algorithms, PipMaker, AVID, Vista, MUMmer, applications of suffix tree in comparative genomics, synteny and gene order comparisons
  • Comparative genomics databases: Clusters of Orthologous Groups (COGs)

Functional genomics:

  • Application of sequence based and structure-based approaches to assignment of gene functions e.g. sequence comparison, structure analysis (especially active sites, binding sites) and comparison, pattern identification, etc. Use of various derived databases in function assignment, use of SNPs for identification of genetic traits
  • Gene/Protein function prediction using Machine learning tools: supervised/unsupervised learning, Neural network, SVM etc.


  • Protein arrays: basic principles
  • Computational methods for identification of polypeptides from mass spectrometry
  • Protein arrays: bioinformatics-based tools for analysis of proteomics data (Tools available at ExPASy Proteomics server); databases (such as InterPro) and analysis tools
  • Protein-protein interactions: databases such as STRINGS, DIP, PPI server and tools for analysis of protein-protein interactions
  • Modeling biological systems
  • Systems biology Use of computers in simulation of cellular subsystems
  • Metabolic networks, or network of metabolites and enzymes, Signal transduction networks, Gene regulatory networks, Metabolic pathways: databases such as KEGG, EMP , MetaCyc, AraCyc

Drug design:

  • Drug discovery process
  • Role of Bioinformatics in drug design
  • Target identification and validation and lead optimization
  • Different systems for representing chemical structure of small molecules like SMILES etc
  • Generation of 3D coordinates of small molecules
  • Structure-based drug design: Identification and Analysis of Binding sites and virtual screening
  • Ligand based drug design: Structure Activity Relationship : QSARs and QSPRs, QSAR Methodology, Pharmacophore mapping
  • In silico prediction ADMET properties for Drug Molecules

Vaccine design:

  • Reverse vaccinology and immunoinformatics
  • Databases in Immunology
  • Principles of B-cell and T-cell epitope prediction

Project work

Credits: 16

Maximum Marks: 400

Minimum contact hours/ week: 16

Project work: The project work shall be based on the course content of the syllabus and the topic of the project work shall be decided based on the need of the student and the available expertise.

Suggested Readings:

  • David W Mount (2004), Bioinformatics: Sequence And Genome Analysis
  • Durbin et al (2007) Biological Sequence Analysis: Probabilistic models of protein and Nucleic acids
  • Thomas E. Creighton (1993), Proteins: structures and molecular properties
  • Johann Gasteiger and Thomas Engel (2003), Chemoinformatics
  • Philip E. Bourne and Helge Weissig (2003), Structural Bioinformatics