International Journal of Computational Bioinformatics and In Silico Modeling
ABSTRACT: Recent advancement in microarray technology has helped to generate huge amount of gene expression data sets very rapidly. Major challenge is to analyze and explore these data sets to find the genes having similar profiles and hence predict their functions and pathways. To achieve this majorly used technique is clustering. Clustering is to find appropriate number of clusters as well as subsets belonging to those clusters. Many clustering techniques have been used to cluster time series as well as sample gene expression data sets but no single one is reported to be best in general conditions. In this research we have performed clustering of gene expression data sets using rule based classifiers (CART, C5, CHAID, QUEST) by training them using train data sets prepared by using some efficient heuristic clustering (we have used k-means).We have shown comparison of these models for testing and validation data sets and then these models can be generalized for clustering gene expression data sets by selecting appropriate model corresponding to preferences. Here we have assumed that all the data is being generated from same or similar source. Main benefit of using these models is simplicity and efficiency in terms of speed and storage. Hence we have used supervised and unsupervised techniques to generate and compare the models for efficient and accurate clustering of gene expression data sets.
KeyWords: Rule Based Classifier; Semisupervised Clustering; Gene Expression Data Set.
How to cite: S. Tripathi and R.B. Mishra. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 257-261
ABSTRACT: Mitogen-Activated Protein Kinase (MAPK) cascades are conserved signaling modules found in all eukaryotic cells including plant, fungi and animals. A mitogen-activated protein kinase cascade minimally consists of three kinases such as MAPKKK, MAPKK and MAPK. The WRKY family is among the ten largest families of transcription factors in higher plants and is found throughout the green lineage. Literature studies have shown that WRKY proteins are involved in many different biological processes such as response to wounding, senescence, development, dormancy, drought tolerance, solar ultravilot-B radiation, metabolism, hormone signaling pathways and cold. However, numerous WRKY proteins are also involved in response to biotic stress through protein-protein interactions with MAPKs. We have modeled all the MAPKs and WRKY protein sequences derived from The Arabidopsis Information Resource (TAIR) by homology modeling and protein-protein docking were performed to predict the potential interacting partners of MAPK with WRKY. We found that WRKY2, 6, 7, 8, 9, 13, 15, 16, 17, 18, 19, 21, 23, 25, 32, 43, 45, 47, 50, 51, 52, 53, 55, 58, 63, 64, 66, 71 and 75 are common interacting partners for MAPK group A, B, and C whereas WRKY3 and WRKY40 are common interacting partner for group D of MAPK in Arabidopsis thaliana. These WRKY plays vital role in disease resistance, senescence and various abiotic and biotic stresses. Present study provides the protein-protein interaction network with global binding energy that can be used for designing signal transduction pathways and help to find target against various abiotic and biotic stresses. The results of present study require wet lab experimentation using phosphoproteomics, Yeast two hybrid system and affinity immuno chip based system to validate these findings.
KeyWords: MAPK, Transcription factor, WRKY, Homology Modeling, Protein-protein docking
How to cite: Rajesh Kumar Pathak et. al. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 262-268
ABSTRACT: The protein thermostable direct hemolysin (TDH) secreted by the diarrhea causing food pathogen Vibrio parahaemolyticus is considered a major virulence factor of this pathogen. Structural analysis of TDH in this study shows the protein to closely resemble fungal fruit-body lectins which have a high specificity towards cell membrane glycoproteins. Few fruit-body lectins are known to bind Thomsen-Friedenreich (TF) antigen, a disaccharide over expressed on cell-surface glycoproteins in a variety of carcinomas. In this study, we have shown that TDH has the ability to recognize specific carbohydrates such as N-acetyl galactosamine (NGA, GalNAc), N-acetylglucosamine (NAG, GlcNAc) and Galactose 1-3 N-acetylgalactosamine (Galβ1→3GalNAc, Thomsen-Friedenreich (TF) antigen) as seen in fungal lectins. The features of molecular recognition are explained in detail.
KeyWords: Vibrio parahaemolyticus, Thermostable direct hemolysin, Lectin, NAG, NGA, Thomsen-Friedenreich antigen
How to cite: Malathi Shekar and Indrani Karunasagar. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 269-274
ABSTRACT: Cyclin-dependent kinases are a small family of serine/threonine protein kinases which control the cell cycle progression. CDKs require cyclin subunits for activity and hence cyclinT-CDK9 which leads to downregulation of the transcription of anti-apoptotic proteins may lead to superior anti-cancer efficacy. Several inhibitors have been reported in literature as CDK inhibitors. Therefore, computer aided drug design strategy has gained much prominence due to the fast and efficient means of studying protein-ligand interactions. In this paper, we report cross-docking strategy as a method to select protein target for virtual screening studies.
KeyWords: PDB, CDK-9, cross-docking, virtual screening
How to cite: Ravi Kumar Kurapati et al. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 275-277
ABSTRACT: The Pair wise sequence alignment methods are used to find out the best-matching pair wise local or global alignments of two query sequences. Protein sequence alignment is one of the critical tasks of computational biology which forms the basis of many other tasks like protein structure prediction, protein function prediction and phylogenetic analysis. In this paper we made a study on Pair Wise Local alignment and consider:(1) what sorts of algorithms should be considered to perform local alignment (2) Algorithm to check whether the alignment is of high quality (3) the statistical methods used to evaluate the significance of an alignment score i.e z-score statastics (4) scoring system used to rank alignments i.e the algorithm used to find optimal (or good) scoring alignments and scoring measurements such as Blosum, Pam matrices.
KeyWords: Pair wise local alignment, Protein Sequence, Score significant, Sequence Alignment, Sequence Similarity, Statistical measures, optimal alignment
How to cite: G. Pratyusha et al. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 278-284
ABSTRACT: Biologists often use RNA-Sequencing (RNA-Seq) to identify a limited number of genes for subsequent validation, and one important factor for candidate gene selection is the fold-change in expression between two groups. However, RNA-Seq produces a wide range of read counts per gene, and genes with a low coverage of reads can produce artificially high fold-change values. In this paper, we present a solution to this problem: adding a factor between 0.01 and 1 to normalized expression values. This conclusion is based upon analysis of a large patient cohort of paired tumor and normal samples from patients with lung adenocarcinomas as well as a small, two-group cell line dataset. The optimal factor to add to normalized expression values is chosen based upon testing a range of factors on: the number of genes or transcripts whose expression is effectively censored (using three different alignment algorithms) and 2) the potential level of bias introduced by the factor (defined by comparing unadjusted gene lists). The robustness of these trends is also tested by comparing multiple mRNA quantification and differential expression algorithms. The relationship between RPKM cutoff and concordance between gene lists produced using different statistical methods can be complicated, but this study emphasizes that simple statistical analysis (amendable to the use of rounded RPKM values) at least provides equal quality results as popular algorithms for RNA-Seq differential expression.
KeyWords: : DEG = Differentially Expressed Gene; RNA-Seq = RNA-Sequencing; RPKM = Reads Per Kilobase per Million.
How to cite: Charles D. Warden et al. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 285-292
ABSTRACT: The architecture of the HIV 1 genome has been studied in great detail over the years since its involvement with the deadly disease AIDS around the world. Identification of the various genes and their products as well as their role in the host cell has also been elucidated. Research targeted towards generation of therapeutic interventions for preventing the disease has mainly focused on the non-nucleoside reverse transcriptase inhibitors as well as the Nucleoside reverse transcriptase inhibitors. Modern therapeutic strategies use combination therapies for the treatment of HIV affected patients. However the prevalence of the disease worldwide clearly indicates towards the low efficacy of such treatments. RNA structural regulatory elements are important genomic landmarks which can influence the host cell proteins by direct or indirect binding as well as the overall folding of the genome and its expression. Glioblastoma multiforme has been found to be prevalent in many HIV -1 affected patients. This analysis identifies musashi binding elements which play pivotal roles in neural signal transduction by serving as binding sites of musashi binding proteins which regulate notch signalling pathways.
KeyWords: : HIV -1, Genome analysis, Musashi Binding element (MBE), notch signalling.
How to cite: Rahul Banik et al. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 293-296
ABSTRACT: Neuro-psychiatric Disease prediction is very important and challenging task in Bioinformatics. In this paper we have used proteins represented by a set of amino acid sequences, extracted from the NCBI and classified to build the models. In this paper we have used Support vector machine to predict Disease detection which is more efficient for resolving linear and non linear classification problems. We have used protein dataset available at NCBI using features such as primary structures as training parameters and ADHD (Attention Deficit Hyperactivity Disorder), Dementia, Mood Disorder, OCD (Obsessive Compulsive Disorder) and Schizophrenia as output. Here we used expert model of support vector machine, with RBF kernel function where width is 0.10 and parameter C is 10. The result in this paper using these parameters shows that the overall average accuracy of SVM is better than ANN and C5 in Neuropsychiatric Disease prediction through Protein sequences.
KeyWords: : Neuro-psychiatric Disease; Amino Acid; svm; ANN; C5.
How to cite: Brijendra Gupta and RB Mishra. Int J Comput Bioinfo In Silico Model. 2(6) 2013: 297-301