Care MA; Barrans S; Worrillow L; Jack A; Westhead DR; Tooze RM A microarray platform-independent classification tool for cell of origin class allows comparative analysis of gene expression in diffuse large B-cell lymphoma. PLoS One 8 e55895-, 2013
DOI:10.1371/journal.pone.0055895
View abstract
Cell of origin classification of diffuse large B-cell lymphoma (DLBCL) identifies subsets with biological and clinical significance. Despite the established nature of the classification existing studies display variability in classifier implementation, and a comparative analysis across multiple data sets is lacking. Here we describe the validation of a cell of origin classifier for DLBCL, based on balanced voting between 4 machine-learning tools: the DLBCL automatic classifier (DAC). This shows superior survival separation for assigned Activated B-cell (ABC) and Germinal Center B-cell (GCB) DLBCL classes relative to a range of other classifiers. DAC is effective on data derived from multiple microarray platforms and formalinfixed paraffin embedded samples and is parsimonious, using 20 classifier genes. We use DAC to perform a comparative analysis of gene expression in 10 data sets (2030 cases). We generate ranked meta-profiles of genes showing consistent class-association using≥6 data sets as a cut-off: ABC (414 genes) and GCB (415 genes). The transcription factor ZBTB32 emerges as the most consistent and differentially expressed gene in ABC-DLBCL while other transcription factors such as ARID3A, BATF, and TCF4 are also amongst the 24 genes associated with this class inall datasets. Analysis of enrichment of 12323 gene signatures against meta-profiles and all data sets individually confirms consistent associations with signatures of molecular pathways, chromosomal cytobands, and transcription factor binding sites. We provide DAC as an open access Windows application, and the accompanying meta-analyses as a resource.
hide
Ptasinska A; Assi SA; Mannari D; James SR; Williamson D; Dunne J; Hoogenkamp M; Wu M; Care M; McNeill H; Cauchy P; Cullen M; Tooze RM; Tenen DG; Young BD; Cockerill PN; Westhead DR; Heidenreich O; Bonifer C Depletion of RUNX1/ETO in t(8;21) AML cells leads to genome-wide changes in chromatin structure and transcription factor binding. Leukemia 26 1829-1841, 2012
DOI:10.1038/leu.2012.49
View abstract
The t(8;21) translocation fuses the DNA-binding domain of the hematopoietic master regulator RUNX1 to the ETO protein. The resultant RUNX1/ETO fusion protein is a leukemia-initiating transcription factor that interferes with RUNX1 function. The result of this interference is a block in differentiation and, finally, the development of acute myeloid leukemia (AML). To obtain insights into RUNX1/ETO-dependant alterations of the epigenetic landscape, we measured genome-wide RUNX1- and RUNX1/ETO-bound regions in t(8;21) cells and assessed to what extent the effects of RUNX1/ETO on the epigenome depend on its continued expression in established leukemic cells. To this end, we determined dynamic alterations of histone acetylation, RNA Polymerase II binding and RUNX1 occupancy in the presence or absence of RUNX1/ETO using a knockdown approach. Combined global assessments of chromatin accessibility and kinetic gene expression data show that RUNX1/ETO controls the expression of important regulators of hematopoietic differentiation and self-renewal. We show that selective removal of RUNX1/ETO leads to a widespread reversal of epigenetic reprogramming and a genome-wide redistribution of RUNX1 binding, resulting in the inhibition of leukemic proliferation and self-renewal, and the induction of differentiation. This demonstrates that RUNX1/ETO represents a pivotal therapeutic target in AML.
hide
Ptasinska A; James SR; Hoogenkamp M; Care M; Cullen M; Tooze RM; Cockerill PN; Bonifer C; Cauchy P; Assi SA; Westhead DR; Mannari D; Young BD; Williamson D; McNeill H; Heidenreich O; Wu M; Tenen DG; Dunne J Depletion of RUNX1/ETO in t(8;21) AML cells leads to genome-wide changes in chromatin structure and transcription factor binding Leukemia 26 1829-1841, 2012
DOI:10.1038/leu.2012.49
View abstract
The t(8;21) translocation fuses the DNA-binding domain of the hematopoietic master regulator RUNX1 to the ETO protein. The resultant RUNX1/ETO fusion protein is a leukemia-initiating transcription factor that interferes with RUNX1 function. The result of this interference is a block in differentiation and, finally, the development of acute myeloid leukemia (AML). To obtain insights into RUNX1/ETO-dependant alterations of the epigenetic landscape, we measured genome-wide RUNX1-and RUNX1/ETO-bound regions in t(8;21) cells and assessed to what extent the effects of RUNX1/ETO on the epigenome depend on its continued expression in established leukemic cells. To this end, we determined dynamic alterations of histone acetylation, RNA Polymerase II binding and RUNX1 occupancy in the presence or absence of RUNX1/ETO using a knockdown approach. Combined global assessments of chromatin accessibility and kinetic gene expression data show that RUNX1/ETO controls the expression of important regulators of hematopoietic differentiation and self-renewal. We show that selective removal of RUNX1/ETO leads to a widespread reversal of epigenetic reprogramming and a genome-wide redistribution of RUNX1 binding, resulting in the inhibition of leukemic proliferation and self-renewal, and the induction of differentiation. This demonstrates that RUNX1/ETO represents a pivotal therapeutic target in AML.© 2012 Macmillan Publishers Limited.
hide
Barrans SL; Crouch S; Care MA; Worrillow L; Smith A; Patmore R; Westhead DR; Tooze R; Roman E; Jack AS Whole genome expression profiling based on paraffin embedded tissue can be used to classify diffuse large B-cell lymphoma and predict clinical outcome. Br J Haematol 159 441-453, 2012
DOI:10.1111/bjh.12045
View abstract
This study tested the validity of whole-genome expression profiling (GEP) using RNA from formalin-fixed, paraffin-embedded (FFPE) tissue to sub-classify Diffuse Large B-cell Lymphoma (DLBCL), in a population based cohort of 172 patients. GEP was performed using Illumina Whole Genome cDNA-mediated Annealing, Selection, extension&Ligation, and tumours were classified into germinal centre (GCB), activated B-cell (ABC) and Type-III subtypes. The method was highly reproducible and reliably classified cell lines of known phenotype. GCB and ABC subtypes were each characterized by unique gene expression signatures consistent with previously published data. A significant relationship between subtype and survival was observed, with ABC having the worst clinical outcome and in a multivariate survival model only age and GEP class remained significant. This effect was not seen when tumours were classified by immunohistochemistry. There was a significant association between age and subtype (mean ages ABC - 72·8 years, GC - 68·4 years, Type-III - 64·5 years). Older patients with ABC subtype were also over-represented in patients who died soon after diagnosis. The relationship between prognosis and subtype improved when only patients assigned to the three categories with the highest level of confidencewere analysed. This study demonstrates that GEP-based classification of DLBCL can be applied to RNA extracted from routine FFPE samples and has potential for use in stratified medicine trials and clinical practice.
hide
Lichtinger M; Ingram R; Hannah R; Müller D; Clarke D; Assi SA; Lie-A-Ling M; Noailles L; Vijayabaskar MS; Wu M; Tenen DG; Westhead DR; Kouskoff V; Lacaud G; Göttgens B; Bonifer C RUNX1 reshapes the epigenetic landscape at the onset of haematopoiesis. EMBO J 31 4318-4333, 2012
DOI:10.1038/emboj.2012.275
View abstract
Cell fate decisions during haematopoiesis are governed by lineage-specific transcription factors, such as RUNX1, SCL/TAL1, FLI1 and C/EBP family members. To gain insight into how these transcription factors regulate the activation of haematopoietic genes during embryonic development, we measured the genome-wide dynamics of transcription factor assembly on their target genes during the RUNX1-dependent transition from haemogenic endothelium (HE) to haematopoietic progenitors. Using a Runx1-/- embryonic stem cell differentiation model expressing an inducible Runx1 gene, we show that in the absence of RUNX1, haematopoietic genes bind SCL/TAL1, FLI1 and C/EBPβ and that this early priming is required for correct temporal expression of the myeloid master regulator PU.1 and its downstream targets. After induction, RUNX1 binds to numerous de novo sites, initiating a local increase in histone acetylation and rapid global alterations in the binding patternsof SCL/TAL1 and FLI1. The acquisition of haematopoietic fate controlled by Runx1 therefore does not represent the establishment of a new regulatory layer on top of a pre-existing HE program but instead entails global reorganization of lineage-specific transcription factor assemblies.
hide
Barrans SL; Worrillow L; Tooze R; Jack AS; Crouch S; Smith A; Roman E; Care MA; Westhead DR; Patmore R Whole genome expression profiling based on paraffin embedded tissue can be used to classify diffuse large B-cell lymphoma and predict clinical outcome British Journal of Haematology 159 441-453, 2012
DOI:10.1111/bjh.12045
View abstract
This study tested the validity of whole-genome expression profiling (GEP) using RNA from formalin-fixed, paraffin-embedded (FFPE) tissue to sub-classify Diffuse Large B-cell Lymphoma (DLBCL), in a population based cohort of 172 patients. GEP was performed using Illumina Whole Genome cDNA-mediated Annealing, Selection, extension&Ligation, and tumours were classified into germinal centre (GCB), activated B-cell (ABC) and Type-III subtypes. The method was highly reproducible and reliably classified cell lines of known phenotype. GCB and ABC subtypes were each characterized by unique gene expression signatures consistent with previously published data. A significant relationship between subtype and survival was observed, with ABC having the worst clinical outcome and in a multivariate survival model only age and GEP class remained significant. This effect was not seen when tumours were classified by immunohistochemistry. There was a significant association between age and subtype (mean ages ABC - 72·8 years, GC - 68·4 years, Type-III - 64·5 years). Older patients with ABC subtype were also over-represented in patients who died soon after diagnosis. The relationship between prognosis and subtype improved when only patients assigned to the three categories with the highest level of confidencewere analysed. This study demonstrates that GEP-based classification of DLBCL can be applied to RNA extracted from routine FFPE samples and has potential for use in stratified medicine trials and clinical practice. © 2012 Blackwell Publishing Ltd.
hide
Lichtinger M; Ingram R; Müller D; Clarke D; Noailles L; Bonifer C; Hannah R; Göttgens B; Assi SA; Vijayabaskar MS; Westhead DR; Lie-A-Ling M; Kouskoff V; Lacaud G; Wu M; Tenen DG RUNX1 reshapes the epigenetic landscape at the onset of haematopoiesis EMBO Journal 31 4318-4333, 2012
DOI:10.1038/emboj.2012.275
View abstract
Cell fate decisions during haematopoiesis are governed by lineage-specific transcription factors, such as RUNX1, SCL/TAL1, FLI1 and C/EBP family members. To gain insight into how these transcription factors regulate the activation of haematopoietic genes during embryonic development, we measured the genome-wide dynamics of transcription factor assembly on their target genes during the RUNX1-dependent transition from haemogenic endothelium (HE) to haematopoietic progenitors. Using a Runx1-/- embryonic stem cell differentiation model expressing an inducible Runx1 gene, we show that in the absence of RUNX1, haematopoietic genes bind SCL/TAL1, FLI1 and C/EBPβ and that this early priming is required for correct temporal expression of the myeloid master regulator PU.1 and its downstream targets. After induction, RUNX1 binds to numerous de novo sites, initiating a local increase in histone acetylation and rapid global alterations in the binding patternsof SCL/TAL1 and FLI1. The acquisition of haematopoietic fate controlled by Runx1 therefore does not represent the establishment of a new regulatory layer on top of a pre-existing HE program but instead entails global reorganization of lineage-specific transcription factor assemblies. © 2012 European Molecular Biology Organization.
hide
Cocco M; Stephenson S; Care MA; Newton D; Barnes NA; Davison A; Rawstron A; Westhead DR; Doody GM; Tooze RM In Vitro Generation of Long-lived Human Plasma Cells JOURNAL OF IMMUNOLOGY 189 5773-5785, 2012
DOI:10.4049/jimmunol.1103720
Leddin M; Perrod C; Hoogenkamp M; Ghani S; Assi S; Heinz S; Wilson NK; Follows G; Schonheit J; Vockentanz L; Mosammam AM; Chen W; Tenen DG; Westhead DR; Gottgens B; Bonifer C; Rosenbauer F Two distinct auto-regulatory loops operate at the PU.1 locus in B cells and myeloid cells BLOOD 117 2827-2838, 2011
DOI:10.1182/blood-2010-08-302976
Stead LF; Wood IC; Westhead DR KvSNP: accurately predicting the effect of genetic variants in voltage-gated potassium channels BIOINFORMATICS 27 2181-2186, 2011
DOI:10.1093/bioinformatics/btr365
Bradford JR; Needham CJ; Tedder P; Care MA; Bulpitt AJ; Westhead DR GO-At: in silico prediction of gene function in Arabidopsis thaliana by combining heterogeneous data PLANT J 61 713-721, 2010
DOI:10.1111/j.1365-313X.2009.04097.x
Doody GM; Care MA; Burgoyne NJ; Bradford JR; Bota M; Bonifer C; Westhead DR; Tooze RM An extended set of PRDM1/BLIMP1 target genes links binding motif type to dynamic repression NUCLEIC ACIDS RES 38 5336-5350, 2010
DOI:10.1093/nar/gkq268
Stead LF; Wood IC; Westhead DR KvDB; Mining and Mapping Sequence Variants in Voltage-Gated Potassium Channels HUM MUTAT 31 908-917, 2010
DOI:10.1002/humu.21295
Tedder PMR; Bradford JR; McConkey GA; Bulpitt AJ; Westhead DR PlasmoPredict: a gene function prediction website for Plasmodium falciparum TRENDS PARASITOL 26 107-110, 2010
DOI:10.1016/j.pt.2009.12.004
Forth T; McConkey GA; Westhead DR MetNetMaker: a free and open-source tool for the creation of novel metabolic networks in SBML format BIOINFORMATICS 26 2352-2353, 2010
DOI:10.1093/bioinformatics/btq425
Tedder PMR; Bradford JR; Needham CJ; McConkey GA; Bulpitt AJ; Westhead DR Gene function prediction using semantic similarity clustering and enrichment analysis in the malaria parasite Plasmodium falciparum BIOINFORMATICS 26 2431-2437, 2010
DOI:10.1093/bioinformatics/btq450
Yu JT; Guo MZ; Needham CJ; Huang YC; Cai L; Westhead DR Simple sequence-based kernels do not predict protein-protein interactions BIOINFORMATICS 26 2610-2614, 2010
DOI:10.1093/bioinformatics/btq483
Whitaker JW; Westhead DR Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees, 2010
DOI:10.1007/978-3-642-12340-5_6
Whitaker JW; McConkey GA; Westhead DR The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes GENOME BIOL 10 -, 2009
DOI:10.1186/gb-2009-10-4-r36
Whitaker JW; Westhead DR; McConkey GA Alio intuitu: the automated reconstruction of the metabolic networks of parasites TRENDS PARASITOL 25 396-397, 2009
DOI:10.1016/j.pt.2009.06.001
Tedder PMR; Bradford JR; Needham CJ; McConkey GA; Bulpitt AJ; Westhead DR Bayesian data integration and enrichment analysis for predicting gene function in malaria., 2009
Whitaker JW; McConkey GA; Westhead DR Prediction of horizontal gene transfers in eukaryotes: approaches and challenges (vol 37, pg 792, 2009) BIOCHEM SOC T 37 1145-1145, 2009
Care MA; Bradford JR; Needham CJ; Bulpitt AJ; Westhead DR Combining the Interactome and Deleterious SNP Predictions to Improve Disease Gene Identification HUM MUTAT 30 485-492, 2009
DOI:10.1002/humu.20917
Webb EC; Westhead DR The transcriptional regulation of protein complexes; a cross-species perspective GENOMICS 94 369-376, 2009
DOI:10.1016/j.ygeno.2009.08.003
Needham CJ; Manfield IW; Bulpitt AJ; Gilmartin PM; Westhead DR From gene expression to gene regulatory networks in Arabidopsis thaliana BMC SYST BIOL 3 -, 2009
DOI:10.1186/1752-0509-3-85
Tedder P; Zubko E; Westhead DR; Meyer P Small RNA analysis in Petunia hybrida identifies unusual tissue-specific expression patterns of conserved miRNAs and of a 24mer RNA RNA 15 1012-1020, 2009
DOI:10.1261/rna.1517209
Whitaker JW; Letunic I; McConkey GA; Westhead DR metaTIGER: a metabolic evolution resource NUCLEIC ACIDS RES 37 D531-D538, 2009
DOI:10.1093/nar/gkn826
Gaskell EA; Smith JE; Pinney JW; Westhead DR; McConkey GA A Unique Dual Activity Amino Acid Hydroxylase in Toxoplasma gondii PLOS ONE 4 -, 2009
DOI:10.1371/journal.pone.0004801
Whitaker JW; McConkey GA; Westhead DR Prediction of horizontal gene transfers in eukaryotes: approaches and challenges BIOCHEMICAL SOCIETY TRANSACTIONS 37 792-795, 2009
DOI:10.1042/BST0370792
Tooze RM; Care MA; Westhead DR; Doody GM An Expanded Set of Direct BLIMP-1 Targets Identifies Novel Links in Differentiation, Immune Response and Lymphoma., 2009
Wambua L; Mcconkey G; Westhead DR Discovery of Novel Drug Targets Against Pathogenic Protozoa: The Promise of Metabolic Reconstruction INFECT GENET EVOL 9 375-375, 2009
Adamo A; Pinney JW; Kunova A; Westhead DR; Meyer P Heat Stress Enhances the Accumulation of Polyadenylated Mitochondrial Transcripts in Arabidopsis thaliana PLOS ONE 3 -, 2008
DOI:10.1371/journal.pone.0002889
Westhead DR; Manfield IW; Needham CJ Bioinformatic approaches to biological systems., 2008
Liu B; Westhead DR; Boyett MR; Warwicker J Modelling the pH-dependent properties of Kv1 potassium channels. J Mol Biol 368 328-335, 2007
DOI:10.1016/j.jmb.2007.02.041
View abstract
It is known that the pH dependence of conductance for the rat potassium channel Kv1.4 is susbstantially reduced upon mutation of either H508 or K532. These residues lie in the extracellular mouth of the channel pore. We have used continuum electrostatics to investigate their interactions with K(+) sites in the pore. The predicted scale of interactions between H508/K532 and potassium sites is sufficient to significantly alter potassium occupancy and thus channel function. We interpret the effect of K532 mutation as indicating that the pH-dependent effect requires not only an ionisable group with a suitable pK(a) value (i.e. histidine), but also that other charged groups set the potential profile at a threshold level. This hypothesis is examined in the context of pH dependence for other members of the Kv1 family, and may represent a general tool with which to study potassium channels.
hide
Care MA; Needham CJ; Bulpitt AJ; Westhead DR Deleterious SNP prediction: be mindful of your training data! BIOINFORMATICS 23 664-672, 2007
DOI:10.1093/bioinformatics/btl649
Needham CJ; Bradford JR; Bulpitt AJ; Westhead DR A primer on learning in Bayesian networks for Computational Biology PLoS Computational Biology 3:e129 1409-1416, 2007
Pinney JW; Papp B; Hyland C; Warnbua L; Westhead DR; McConkey GA Metabolic reconstruction and analysis for parasite genomes TRENDS PARASITOL 23 548-554, 2007
DOI:10.1016/j.pt.2007.08.013
Manfield IW; Devlin PF; Jen CH; Westhead DR; Gilmartin PM Conservation, convergence, and divergence of light-responsive, circadian-regulated, and tissue-specific expression patterns during evolution of the Arabidopsis GATA gene family PLANT PHYSIOL 143 941-958, 2007
DOI:10.1104/pp.106.090761
Zhang Y; Westhead DR Functional gene networks: A preliminary study on a modified genetic algorithm for candidate discovery in large microarray datasets, 2007
Garrow AG; Westhead DR A consensus algorithm to screen genomes for novel families of transmembrane beta barrel proteins PROTEINS 69 8-18, 2007
DOI:10.1002/prot.21439
Mardia KV; Nyirongo VB; Green PJ; Gold ND; Westhead DR Bayesian refinement of protein functional site matching BMC BIOINFORMATICS 8 -, 2007
DOI:10.1186/1471-2105-8-257
Westhead DR; Bulpitt AJ; Bradford JR; Needham CJ Protein function prediction in Arabidopsis thaliana, 2006
Johnson R; Gamblin RJ; Ooi L; Bruce AW; Donaldson IJ; Westhead DR; Wood IC; Jackson RM; Buckley NJ Identification of the REST regulon reveals extensive transposable element-mediated binding site duplication NUCLEIC ACIDS RES 34 3862-3877, 2006
DOI:10.1093/nar/gkl525
Manfield IW; Jen CH; Pinney JW; Michalopoulos I; Bradford JR; Gilmartin PM; Westhead DR Arabidopsis Co-expression Tool (ACT): web server tools for microarray-based gene expression analysis NUCLEIC ACIDS RES 34 W504-W509, 2006
DOI:10.1093/nar/gkl204
Hyland C; Pinney JW; McConkey GA; Westhead DR metaSHARK: a WWW platform for interactive exploration of metabolic networks NUCLEIC ACIDS RES 34 W725-W728, 2006
DOI:10.1093/nar/gkl196
Needham CJ; Bradford JR; Bulpitt AJ; Westhead DR Inference in Bayesian networks NAT BIOTECHNOL 24 51-53, 2006
Bradford JR; Needham CJ; Bulpitt AJ; Westhead DR Insights into protein-protein interfaces using a Bayesian network prediction method J MOL BIOL 362 365-386, 2006
DOI:10.1016/j.jmb.2006.07.028
Jen CH; Manfield IW; Michalopoulos I; Pinney JW; Willats WGT; Gilmartin PM; Westhead DR The Arabidopsis co-expression tool ((ACT)): a WWW-based tool and database for microarray-based gene expression analysis PLANT J 46 336-348, 2006
DOI:10.1111/j.1365-313X.2006.02681.x
Needham CJ; Bradford JR; Bulpitt AJ; Care MA; Westhead DR Predicting the effect of missense mutations on protein function: analysis with Bayesian networks BMC BIOINFORMATICS 7 -, 2006
DOI:10.1186/1471-2105-7-405
McConkey GA; Pinney JW; Gaskell EA; Wambua L; Hyland C; Shirley MW; Westhead DR Predicting parasite metabolic networks: A fisherman's nightmare?, 2006
Bradford JR; Needham CJ; Bulpitt AJ; Westhead DR Insights into protein-protein interfaces using a Bayesian network prediction method, 2006
Mardikian S; Gillet VJ; Jackson RM; Westhead DR CINF 68-Studying the effects of individual interaction energies in a variety of protein-ligand complexes ABSTR PAP AM CHEM S 232 -, 2006
Mardikian S; Gillet VJ; Jackson RM; Westhead DR Using multiobjective optimization to study the strengths of different interaction energies in protein-ligand complexes ABSTR PAP AM CHEM S 232 411-411, 2006
Taib M; Pinney JW; Westhead DR; McDowall KJ; Adams DJ Differential expression and extent of fungal/plant and fungal/bacterial chitinases of Aspergillus fumigatus ARCH MICROBIOL 184 78-81, 2005
DOI:10.1007/s00203-005-0028-x
Bradford JR; Westhead DR Improved prediction of protein-protein binding sites using a support vector machines approach BIOINFORMATICS 21 1487-1494, 2005
DOI:10.1093/bioinformatics/bti242
Needham CJ; Bradford JR; Bulpitt AJ; Westhead DR Application of Bayesian networks to two classification problems in bioinformatics, 2005
Garrow AG; Agnew A; Westhead DR TMB-Hunt: a web server to screen sequence sets for transmembrane beta-barrel proteins NUCLEIC ACIDS RES 33 W188-W192, 2005
DOI:10.1093/nat/gki384
Jen CH; Michalopoulos I; Westhead DR; Meyer P Natural antisense transcripts with coding capacity in Arabidopsis may have a regulatory role that is not linked to double-stranded RNA degradation. Genome Biol 6 R51-, 2005
DOI:10.1186/gb-2005-6-6-r51
View abstract
Overlapping transcripts in antisense orientation have the potential to form double-stranded RNA (dsRNA), a substrate for a number of different RNA-modification pathways. One prominent route for dsRNA is its breakdown by Dicer enzyme complexes into small RNAs, a pathway that is widely exploited by RNA interference technology to inactivate defined genes in transgenic lines. The significance of this pathway for endogenous gene regulation remains unclear.
hide
Torrance GM; Gilbert DR; Michalopoulos I; Westhead DR Protein structure topological comparison, discovery, and matching service Bioinformatics 21 2537-2538, 2005
DOI:10.1093/bioinformatics/bti331
View abstract
We describe a fold level fast protein comparison and motif matching facility based on the TOPS representation of structure. This provides an update to a previous service at the EBI, with a better graph matching with faster results and visualization of both the structures being compared against and the common pattern of each with the target domain.
hide
Pinney JW; Shirley MW; McConkey GA; Westhead DR metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella NUCLEIC ACIDS RES 33 1399-1409, 2005
DOI:10.1093/nar/gki285
Nyirongo V; Mardia KV; Westhead DR EM algorithm, bayesian and distance approaches to matching functional sites In Quantitative Biology, Shape Analysis, and Wavelets , 2005
Sharma-Oates A; Quirke P; Westhead DR TmaDB: a repository for tissue microarray data BMC BIOINFORMATICS 6 -, 2005
DOI:10.1186/1471-2105-6-218
Sadowski MI; Parish JH; Westhead DR Automated derivation and refinement of sequence length patterns for protein sequences using evolutionary computation BIOSYSTEMS 81 247-254, 2005
DOI:10.1016/j.biosystems.2005.05.001
Garrow AG; Agnew AM; Westhead DR TMB-Hunt: An amino acid composition based method to screen proteomes for beta-barrel transmembrane proteins BMC Bioinformatics 6 pp.56-, 2005
DOI:10.1186/1471-2105-6-56
View abstract
Background: Beta-barrel transmembrane (bbtm) proteins are a functionally important and diverse group of proteins expressed in the outer membranes of bacteria ( both gram negative and acid fast gram positive), mitochondria and chloroplasts. Despite recent publications describing reasonable levels of accuracy for discriminating between bbtm proteins and other proteins, screening of entire genomes remains troublesome as these molecules only constitute a small fraction of the sequences screened. Therefore, novel methods are still required capable of detecting new families of bbtm protein in diverse genomes.
Results: We present TMB-Hunt, a program that uses a k-Nearest Neighbour (k-NN) algorithm to discriminate between bbtm and non-bbtm proteins on the basis of their amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, an accuracy of 92.5% was achieved, with 91% sensitivity and 93.8% positive predictive value (PPV), using a rigorous cross-validation procedure.
A major advantage of this approach is that because it does not rely on beta-strand detection, it does not require resolved structures and thus larger, more representative, training sets could be used. It is therefore believed that this approach will be invaluable in complementing other, physicochemical and homology based methods. This was demonstrated by the correct reassignment of a number of proteins which other predictors failed to classify. We have used the algorithm to screen several genomes and have discussed our findings.
Conclusion: TMB-Hunt achieves a prediction accuracy level better than other approaches published to date. Results were significantly enhanced by use of evolutionary information and a system for calibrating k-NN scoring. Because the program uses a distinct approach to that of other discriminators and thus suffers different liabilities, we believe it will make a significant contribution to the development of a consensus approach for bbtm protein detection.
hide
Sharma-Oates A; Quirke P; Westhead DR TmaDB: A tissue microarray database Journal of Pathology 204 pp.51-, 2004
Sharma-Oates A; Quirke P; Westhead DR TmaDB: A tissue microarray database, 2004
Michalopoulos I; Torrance GM; Gilbert DR; Westhead DR TOPS: an enhanced database of protein structural topology NUCLEIC ACIDS RES 32 D251-D254, 2004
DOI:10.1093/nar/gkh060
McConkey GA; Pinney JW; Westhead DR; Plueckhahn K; Fitzpatrick TB; Macheroux P; Kappes B Annotating the Plasmodium genome and the enigma of the shikimate pathway TRENDS PARASITOL 20 60-65, 2004
DOI:10.1016/j.pt.2003.11.001
Gilbert DR; Westhead DR; Viksna J Techniques for comparison, pattern matching and pattern In Artificial Intelligence and Heuristic Methods in Bioinformatics , 2003
Mardia KV; Taylor CC; Westhead DR Structural bioinformatics revisited In Stochastic geometry, biological structure and images , 2003
Krishnan VG; Westhead DR A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function. Bioinformatics 19 2199-2209, 2003
View abstract
The large volume of single nucleotide polymorphism data now available motivates the development of methods for distinguishing neutral changes from those which have real biological effects. Here, two different machine-learning methods, decision trees and support vector machines (SVMs), are applied for the first time to this problem. In common with most other methods, only non-synonymous changes in protein coding regions of the genome are considered.
hide
Siepen JA; Radford SE; Westhead DR Beta edge strands in protein structure prediction and aggregation. Protein Sci 12 2348-2359, 2003
DOI:10.1110/ps.0306003
View abstract
It is well established that recognition between exposed edges of beta-sheets is an important mode of protein-protein interaction and can have pathological consequences; for instance, it has been linked to the aggregation of proteins into a fibrillar structure, which is associated with a number of predominantly neurodegenerative disorders. A number of protective mechanisms have evolved in the edge strands of beta-sheets, preventing the aggregation and insolubility of most natural beta-sheet proteins. Such mechanisms are unfavorable in the interior of a beta-sheet. The problem of distinguishing edge strands from central strands based on sequence information alone is important in predicting residues and mutations likely to be involved in aggregation, and is also a first step in predicting folding topology. Here we report support vector machine (SVM) and decision tree methods developed to classify edge strands from central strands in a representative set of protein domains. Interestingly, rules generated by the decision tree method are in close agreement with our knowledge of protein structure and are potentially useful in a number of different biological applications. When trained on strands from proteins of known structure, using structure-based (Dictionary of Secondary Structure in Proteins) strand assignments, both methods achieved mean cross-validated, prediction accuracies of approximately 78%. These accuracies were reduced when strand assignments from secondary structure prediction were used. Further investigation of this effect revealed that it could be explained by a significant reduction in the accuracy of standard secondary structure prediction methods for edge strands, in comparison with central strands.
hide
Williams A; Gilbert DR; Westhead DR Multiple structural alignment for distantly related all beta structures using TOPS pattern discovery and simulated annealing. Protein Eng 16 913-923, 2003
DOI:10.1093/protein/gzg116
View abstract
Topsalign is a method that will structurally align diverse protein structures, for example, structural alignment of protein superfolds. All proteins within a superfold share the same fold but often have very low sequence identity and different biological and biochemical functions. There is often significant structural diversity around the common scaffold of secondary structure elements of the fold. Topsalign uses topological descriptions of proteins. A pattern discovery algorithm identifies equivalent secondary structure elements between a set of proteins and these are used to produce an initial multiple structure alignment. Simulated annealing is used to optimize the alignment. The output of Topsalign is a multiple structure-based sequence alignment and a 3D superposition of the structures. This method has been tested on three superfolds: the beta jelly roll, TIM (alpha/beta) barrel and the OB fold. Topsalign outperforms established methods on very diverse structures. Despite the pattern discovery working only on beta strand secondary structure elements, Topsalign is shown to align TIM (alpha/beta) barrel superfamilies, which contain both alpha helices and beta strands.
hide
Campbell SJ; Gold ND; Jackson RM; Westhead DR Ligand binding: functional site location, similarity and docking. Curr Opin Struct Biol 13 389-395, 2003
View abstract
Computational methods for the detection and characterisation of protein ligand-binding sites have increasingly become an area of interest now that large amounts of protein structural information are becoming available prior to any knowledge of protein function. There have been particularly interesting recent developments in the following areas: first, functional site detection, whereby protein evolutionary information has been used to locate binding sites on the protein surface; second, functional site similarity, whereby structural similarity and three-dimensional templates can be used to compare and classify and potentially locate new binding sites; and third, ligand docking, which is being used to find and validate functional sites, in addition to having more conventional uses in small-molecule lead discovery.
hide
Mardia KV; Nyirongo V; Westhead DR Protein matching using amino acids information In Stochastic geometry, biological structure and images , 2003
Bradford JR; Westhead DR Asymmetric mutation rates at enzyme-inhibitor interfaces: Implications for the protein-protein docking problem PROTEIN SCI 12 2099-2103, 2003
DOI:10.1110/ps.0306303
Dalton J; Michalopoulos I; Westhead DR Calculation of helix packing angles in protein structures Bioinformatics 19 1298-1299, 2003
DOI:10.1093/bioinformatics/btg141
View abstract
Software is presented for the calculation of packing angles and geometry of helical secondary structure elements in protein structures. AVAILABILITY: C language source code and documentation is available from http://www.bioinformatics.leeds.ac.uk.
hide
Pinney JW; Westhead DR; McConkey GA Petri Net representations in systems biology BIOCHEMICAL SOCIETY TRANSACTIONS 31 1513-1515, 2003
Siepen JA; Radford SE; Westhead DR β edge strands in protein structure prediction and aggregation Protein Science 12 2348-2359, 2003
DOI:10.1110/ps.03234503
View abstract
It is well established that recognition between exposed edges ofβ-sheets is an important mode of protein-protein interaction and can have pathological consequences; for instance, it has been linked to the aggregation of proteins into a fibrillar structure, which is associated with a number of predominantly neurodegenerative disorders. A number of protective mechanisms have evolved in the edge strands of β-sheets, preventing the aggregation and insolubility of most natural β-sheet proteins. Such mechanisms are unfavorable in the interior of a β-sheet. The problem of distinguishing edge strands from central strands based on sequence information alone isimportant in predicting residues and mutations likely to be involved in aggregation, and is also a first step in predicting folding topology. Here we report support vector machine (SVM) and decision tree methods developed to classify edge strands from central strands in a representative set of protein domains. Interestingly, rules generated by the decision tree method are in close agreement with our knowledge of protein structure and are potentially useful in a number of different biological applications. When trained on strands from proteins of known structure, using structure-based (Dictionary of Secondary Structure in Proteins) strand assignments, both methods achieved mean cross-validated, prediction accuracies of -78%. These accuracies were reduced when strand assignments from secondary structure prediction were used. Further investigation of this effect revealed that it could be explained by a significant reduction in the accuracy of standard secondary structure prediction methods for edge strands, in comparison with central strands.
hide
Mardia KV; Westhead DR New major challenges in bioinformatics In Statistics of large datasets - functional and image data, bioinformatics and data mining , 2002
Siepen JA; Westhead DR The fibril one on-line database: Mutations, experimental conditions and trends associated with amyloid fibril formation Protein Science 11 1862-1866, 2002
DOI:10.1110/ps.0204302
Westhead DR; Parish JH; Twyman RM Instant Notes in Bioinformatics, 2002
Williams A; Westhead DR Sequence relationships in the legume lectin fold and other jelly rolls PROTEIN ENG 15 771-774, 2002
Gilbert D; Westhead DR; Viksna J; Thornton J A computer system to perform structure comparison using TOPS representation of protein structure Computers&Chemistry 26 23-30, 2001
DOI:10.1016/S0097-8485(01)00096-1
Pickering SJ; Bulpitt AJ; Efford N; Gold ND; Westhead DR AI-based algorithms for protein surface comparisons COMPUT CHEM 26 79-84, 2001
View abstract
Many current methods for protein analysis depend on the detection of similarity in either the primary sequence, or the overall tertiary structure (the C-alpha atoms of the protein backbone). These common sequences or structures may imply similar functional characteristics or active properties. Active sites and ligand binding sites usually occur on or near the surface of the protein; so similarly shaped surface regions could imply similar functions. We investigate various methods for describing the shape properties of protein surfaces and for comparing them. Our current work uses algorithms from computer vision to describe the protein surfaces. and methods from graph theory to compare the surface regions. Early results indicate that we can successfully match a family of related ligand binding sites, and find their similarly shaped surface regions. This method of surface analysis could be extended to help identify unknown surface regions for possible ligand binding or active sites. (C) 2001 Elsevier Science Ltd. All rights reserved.
hide
Haider S; Westhead DR; Davies LA; Hopkins PM; Boyett MR; Harrison SM Inhibitory action of halothane on the transient outward K+ channel: Potential sites of interaction with the protein. BIOPHYS J 78 221A-221A, 2000
Gilbert D; Westhead DR; Nagano N; Thornton J Motif-based searching in TOPS protein topology databases Bioinformatics 15 317-326, 1999
View abstract
Motivation: TOPS cartoons are a schematic abstraction of protein three-dimensional structures in two
dimensions, and are used for understanding and manual comparison of protein folds. Recently, an algorithm
that produces the cartoons automatically from protein structures has been devised and cartoons have been
generated to represent all the structures in the structural databank. There is now a need to be able to define
target topological patterns and to search the database for matching domains.
Results: We have devised a formal language for describing TOPS diagrams and patterns, and have
designed an efficient algorithm to match a pattern to a set of diagrams. A pattern-matching system has been
implemented, and tested on a database derived from all the current entries in the Protein Data Bank (15000
domains). Users can search on patterns selected from a library of motifs or; alternatively, they can define
their own search patterns.
hide
Westhead DR; Slidel TWF; Flores TPJ; Thornton JM Protein structural topology: Automated analysis and diagrammatic representation Protein Science 8 897-904, 1999
Baxter CA; Murray CW; Clark DE; Westhead DR; Eldridge MD Flexible docking using Tabu search and an empirical estimate of binding affinity Proteins : Structure, Function, and Genetics 33 367-382, 1998
View abstract
This article describes the implementation of a new docking approach. The method uses a Tabu search
methodology to dock flexibly ligand molecules into rigid receptor structures. It uses an empirical objective
function with a small number of physically based terms derived from fitting experimental binding affinities for
crystallographic complexes. This means that docking energies produced by the searching algorithm provide
direct estimates of the binding affinities of the ligands, The method has been tested on 50 ligand-receptor
complexes for which the experimental binding affinity and binding geometry are known, All water molecules
are removed from the structures and ligand molecules are minimized in vacuo before docking. The lowest
energy geometry produced by the docking protocol is within 1.5 Angstrom root-mean square of the
experimental binding mode for 86% of the complexes. The lowest energies produced by the docking are in
fair agreement with the known free energies of binding for the ligands. Proteins 33:367-382, 1998, (C) 1998
Wiley-Liss, Inc.
hide
Westhead DR; Hatton DC; Thornton JM An atlas of protein topology cartoons available on the World-Wide Web Trends in Biochemical Sciences 23 35-36, 1998
Westhead DR; Thornton JM Protein structure prediction. Curr Opin Biotechnol 9 383-389, 1998
View abstract
Genome sequencing projects continue to provide a flood of new protein sequences, and prediction methods remain an important means of adding structural information. Recently, there have been advances in secondary structure prediction, which feed, in turn, into improved fold recognition algorithms. Finally, there have been technical improvements in comparative modelling, and studies of the expected accuracy of three-dimensional structural models built by this method.
hide
Luscombe NM; Milburn D; Jones S; Karmirantzou M; Thornton JM; Laskowski RA; Westhead DR New tools and resources for analysing protein structures and their interactions Acta Crystallographica Section D: Biological Crystallography 54 1132-1138, 1998
DOI:10.1107/S0907444998007318
View abstract
The determination of protein structures has furthered our understanding of how various proteins perform their functions. With the large number of structures currently available in the PDB, it is necessary to be able to easily study these proteins in detail. Here new software tools are presented which aim to facilitate this analysis; these include the PDBsum WWW site which provides a summary description of all PDB entries, the programs TOPS and NUCPLOT to plot schematic diagrams representing protein topology and DNA-binding interactions, SAS a WWW-based sequence-analysis tool incorporating structural data, and WWW servers for the analysis of protein-protein interfaces and analyses of over 300 haem-binding proteins.
hide
Westhead DR; Clark DE; Murray CW Comparison of heuristic search algorithms for molecular docking Journal of Computer - Aided Molecular Design 11 209-228, 1997
Murray CW; Clark DE; Auton TR; Firth MA; Li J; Sykes RA; Waszkowycz B; Westhead DR; Young SC PRO_SELECT: Combining structure-based drug design and combinatorial chemistry for rapid lead discovery. 1. Technology Journal of Computer-Aided Molecular Design 11 193-207, 1997
View abstract
This paper describes a novel methodology, PRO_SELECT, which combines elements of structure-based drug design and combinatorial chemistry to create a new paradigm for accelerated lead discovery. Starting with a synthetically accessible template positioned in the active site of the target of interest, PRO_SELECT employs database searching to generate lists of potential substituents for each substituent position on the template. These substituents are selected on the basis of their being able to couple to the template using known synthetic routes and their possession of the correct functionality to interact with specified residues in the active site. The lists of potential substituents are then screened computationally against the active site using rapid algorithms. An empirical scoring function, correlated to binding free energy, is used to rank the substituents at each position. The highest scoring substituents at each position can then be examined using a variety of techniques and a final selection is made. Combinatorial enumeration of the final lists generates a library of synthetically accessible molecules, which may then be prioritised for synthesis and assay. The results obtained using PRO_SELECT to design thrombin inhibitors are briefly discussed.
hide