Basics in Bioinformatics

Course No. BTP02

Duration: 6 month with a minimum of 90 working days
Eligibility: Graduate/ Post graduate students in Engineering/ Life Sciences (Biological/ Chemical/ Medical/ Agriculture) are eligible to apply.
Mode of training: The training will consist of lectures; delivered by competent resource persons and practical demonstrations  and a project work.
Examinations:  There will be written examinations at the end of the course. The project dissertation will be evaluated by the internal and external experts.

Maximum: 24 credits

Theory course

Credits: 8

Maximum Marks: 200

Minimum contact hours/ week: 8


Part A. Basics of Cell and Molecular Biology

DNA and RNA:

  • Types of base pairing Watson-Crick and Hoogsteen; types of double helices (A, B, Z), triple and quadruple stranded DNA structures, geometrical as well as structural features; structural and geometrical parameters of each form and their comparison; various types of interactions of DNA with proteins, small molecules
  • RNA secondary and tertiary structures, t-RNA tertiary structure


  • Principles of protein structure; Peptide bond, phi, psi and chi torsion angles, ramachandran map, anatomy of proteins Hierarchical organization of protein structure Primary. Secondary, Super secondary, Tertiary and Quaternary structure; Hydrophobicity of amino acids, Packing of protein structure, Structures of oligomeric proteins and study of interaction interfaces


  • The various building blocks (monosaccharides), configurations and conformations of the building blocks; formations of polysaccharides and structural diversity due to the different types of linkages
  • Glyco-conjugates: various types of glycolipids and glycoproteins

Part B. Bioinformatics

Major Bioinformatics Resources: NCBI, EBI, ExPASy, RCSB:

  • The knowledge of various databases and bioinformatics tools available at these resources, organization of databases: data contents and formats, purpose and utility in Life Sciences

Open access bibliographic resources and literature databases:

  • Open access bibliographic resources related to Life Sciences viz., PubMed, BioMed Central, Public Library of Sciences (PloS)

Sequence databases:

  • Formats, querying and retrieval
  • Nucleic acid sequence databases: GenBank, EMBL, DDBJ;
  • Protein sequence databases: Uniprot-KB: SWISS-PROT, TrEMBL, PIR-PSD
  • Repositories for high throughput genomic sequences: EST, STS GSS, etc.;
  • Genome Databases at NCBI, EBI, TIGR, SANGER
  • Viral Genomes
  • Archeal and Bacterial Genomes;
  • Eukaryotic genomes with special reference to model organisms (Yeast, Drosophila, C. elegans, Rat, Mouse, Human, plants such as Arabidopsis thaliana, Rice, etc.)

3D Structure Database: PDB, NDB

  • Chemical Structure database: Pubchem
  • Gene Expression database: GEO, SAGE

Derived Databases:

  • Knowledge of the following databases with respect to: basic concept of derived databases, sources of primary data and basic principles of the method for deriving the secondary data, organization of data, contents and formats of database entries, identification of patterns in given sequences and interpretation of the same-
    -Sequence: InterPro, Prosite, Pfam, ProDom, Gene Ontology
    -Structure classification database: CATH, SCOP, FSSP
    -Protein-Protein interaction database: STRING

Compilation of resources:

  • NAR Database and Web server Issues and other resources published in Bioinformatics related journals

Sequence Analysis: 

File formats:

  • Various file formats for bio-molecular sequences: GenBank, FASTA, GCG, MSF etc

Basic concepts:

  • Sequence similarity, identity and homology, definitions of homologues, orthologues, paralogues

Scoring matrices:

  • basic concept of a scoring matrix, Matrices for nucleic acid and proteins sequences, PAM and BLOSUM series, principles based on which these matrices are derived

Pairwise sequence alignments:

  • Basic concepts of sequence alignment: local and global alignments, Needleman and Wunsch, Smith and Waterman algorithms for pairwise alignments, gap penalties, use of pairwise alignments for analysis of Nucleic acid and protein sequences and interpretation of results.

Multiple sequence alignments (MSA):

  • The need for MSA, basic concepts of various approaches for MSA (e.g. progressive, hierarchical etc.). Algorithm of CLUSTALW and PileUp and their application for sequence analysis (including interpretation of results), concept of dandrogram and its interpretation
  • MAST

Database Searches:

  • Keyword-based searches using tools like ENTREZ and SRS
  • Sequence-based searches: BLAST and FASTA

Sequence patterns and profiles:

  • Basic concept and definition of sequence patterns, motifs and profiles, various types of pattern representations viz. consensus, regular expression (Prosite-type) and sequence profiles; profile-based database searches using PSI-BLAST, analysis and interpretation of profile-based searches

Taxonomy and phylogeny: 

  • Basic concepts in systematics, taxonomy and phylogeny; molecular evolution; nature of data used in Taxonomy and Phylogeny, Definition and description of phylogenetic trees and various types of trees

Protein and nucleic acid properties:

  • Computation of various parameters using proteomics tools at the ExPASy server and EMBOSS

Comparative genomics:

  • Basic concepts and applications, whole genome alignments: understanding significance. Artemis as an example

Structural Biology:

3-D structure visualization and simulation:

  • Visualization of structures using Rasmol or SPDBV or CHIME or VMD
  • Basic concepts in molecular modeling: different types of computer representations of molecules. External coordinates and Internal Coordinates
  • Non-Covalent Interactions and their role in Biomolecular structure and function
  • Fundamentals of Receptor-ligand interactions.

Classification and comparison of protein 3D structures:

  • Purpose of 3-D structure comparison and concepts, Algorithms : CE, VAST and DALI, concept of coordinate transformation, RMSD, Z-score for structural comparision
  • Databases of structure-based classification; CATH, SCOP and FSSP

Secondary structure prediction:

  • Algorithms viz. Chou Fasman, GOR methods; nearest neighbor and machine learning based methods, analysis of results and measuring the accuracy of predictions.

Tertiary Structure prediction:

  • Fundamentals of the methods for 3D structure prediction (sequence similarity/identity of target proteins of known structure, fundamental principles of protein folding etc.) Homology/ comparative Modeling, fold recognition, threading approaches, and ab initio structure prediction methods.

Project work

Credits: 16

Maximum Marks: 400

Minimum contact hours/ week: 16

Project work: The project work shall be based on the course content of the syllabus and the topic of the project work shall be decided based on the need of the student and the available expertise.

Suggested Readings:

  • Lehninger (2013), Principles of Biochemistry
  • Alberts et al. (2002), The Molecular biology of the Cell 4th edition
  • B.Alberts, D.Bray, K.Hopkin and A.Johnson (2013), Essential Cell Biology
  • Watson, JD., Hopkins,NH., Roberts, JW and Steitz,JA (1970), Molecular Biology of the Gene
  • David W Mount (2004), Bioinformatics: Sequence And Genome Analysis, 2nd  Edition, Cold Spring Harbor Press
  • Durbin et al (2007) Biological Sequence Analysis: Probabilistic models of Protein and Nucleic acids Cambridge University Press.
  • Thomas E. Creighton (1993), Proteins: structures and molecular properties