Welcome to the Protein Fold Design Genomics Laboratory

 

Professor William E. Balch

 
 

Linking the evolution of variation in the human population to computational modeling using covariance as a universal language base to describe the function of the protein fold in health and disease.

 

Overview

 

The Balch laboratory works to provide an integrated view of variation in the genome and the corresponding changes in the proteome in response to natural selection to define protein function and structure on a residue-by-residue basis in the individual. Defining the role of genetic variation in shaping biological diversity is critical to understanding the global forces at play in health, disease, and aging.

 
 
 
  • The Balch laboratory has pioneered numerous insights including exocytic trafficking of proteins to cell surface involving the discovery and the structure determination of vesicle coat and tethering systems, the proteostasis pathways that help the protein to fold in the cell, and new machine learning approaches based on the principle of Spatial Covariance to understand the role of variation in the physiology of genetic disease and host-pathogen evolution.

  • Proteostasis, a concept the Balch Lab helped develop, is the collective of protein folding machineries in the cell that manage the genome to proteome transformation - it generates and manages the protein fold design in response to an individual's genetic variation and environmental factors.

    Proteostasis is comprised of multiple cellular pathways: membrane vesicle coat complexes directing trafficking to and from the cell surface; chaperone/co-chaperone systems that continuously buffer the protein fold for function in response to genetic variation and the environment; small molecules that alter the behavior of the proteostasis folding machinery to correct the protein fold when defective.

  • The Balch laboratory has pioneered the use of Gaussian process (GP) to tackle human disease. GP is a probabilistic platform that allows us to quantitatively interpret billions of years of experiments performed by nature in creating diversity through natural selection and fitness. The GP emergent concept of Spatial Covariance (SCV) captures cellular information flow in response to inherited and somatic disease to understand mechanistically the natural complexity found in the population from a precision medicine perspective to treat the individual.

  • Over 10,000 rare inherited diseases impact human health-span ranging from weak to severe pathologies with variable onset during development and aging. They arise through natural variation in the population, each variant providing insights into the rules governing evolution of the genome and its transformation into the proteome. The Balch laboratory studies a wide range of genetic diseases using both experimental and GP based strategies to generate therapeutics to resolve disease.

  • Along with aging comes the inevitable failure of multiple biological systems that protect us from the daily insults of the environment. Of particular interest to us is the collapse of proteostasis in the context of viral challenge including influenza and SARS-CoV-2. We are actively pursuing mass spectrometry and novel Gaussian process based computational approaches that capture changes in the stress-related proteostasis signaling pathways and components that fail to protect the aging population from pulmonary disease leading to a shortened lifespan.

  • A major challenge in understanding variation in driving the human healthspan is the impact of host-pathogen relationships is exemplified by the recent SARS-CoV-2 worldwide pandemic. Using GP, we have developed ways to explore its genome design on an allele-by-allele basis that defines its entire lifecycle from infection to release of new virus. These efforts are providing insight into the ‘Red Queen’ effect in which the fast-track variation found in the virus is counter-balanced by reciprocal, yet slower-track, responses by the immune system of the host.

Research Projects

 

Both experimental and computational approaches are used in the Balch Lab to define biological relationships in the context of human variation. Current efforts utilize a Gaussian Process Spatial CoVariance platform to link the sequence information found in the genome to the proteome - providing a common platform to describe the molecular, biochemical, biophysical and structural features responsible for the evolution of protein fold design dictating health in the individual.

 
 
 
 
  • Cystic Fibrosis (CF) is a genetic disease caused by molecular variation in the cystic fibrosis conductance transmembrane regulator (CFTR) that manages chloride conductance in the human lung. When CFTR is deficient, the patient develops chronic obstructive pulmonary disease (COPD). GP based computational approaches are used to understand the impact of variation on the thermodynamics and biology of the CFTR protein fold design. These results provide mechanistic insights to inform screens that use GP triangulation to define and discover the role of small molecule therapeutics in disease management. (More Info)

  • Alpha 1-Antitrypsin Deficiency (AATD) is an age-related disease of the liver-lung axis in response to genetically inherited variants of Alpha-1-Antitrypsin (AAT). Disease causing variants lead to a gain-of-toxic function aggregation in the liver and prevent the normal secretion of AAT into the serum resulting in a loss-of-function in the lung that leads to chronic obstructive pulmonary disease (CODP). We apply GP in combination with high throughput screening (HTS) experimental approaches to link populational genotypic diversity to phenotypic diversity in the individual. This allows us to identify the roles of covariant clusters in the AAT protein fold that play distinct roles leading to aggregation disease in the liver and function in the lung. We hope to manage the different genotypes responsible for disease by developing small molecule therapeutics that alter the behavior of the AAT fold and the proteostasis folding machinery to correct the protein fold when defective. (More Info)

  • Niemann-Pick C1 (NPC1) disease is a rare genetic disorder triggered by mutations in NPC1, a multi-spanning transmembrane protein that is trafficked through the exocytic pathway to late endosomes and lysosomes (LE/Ly) to globally manage cholesterol homeostasis. Defects triggered by >300 NPC1 variants found in the human population that inhibit export of NPC1 protein from the endoplasmic reticulum and/or function in downstream LE/Ly, leading to cholesterol accumulation and onset of neurodegeneration in childhood. Using GP to model NPC1 genetic diversity we have shown how the Hsp70 chaperone/co-chaperone system can adjust SCV ‘tolerance’ and ‘set-points’ for protein fold design on a residue-by-residue basis to differentially regulate variant trafficking, stability, and cholesterol homeostasis. (More Info)

  • A major challenge in understanding variation is its role in host-pathogen relationships as exemplified by the recent SARS-CoV-2 worldwide pandemic. Using GP, we have developed ways to explore the entire genome design (~30,000 nucleotide bases) to understand the impact of variation in each of the components of responsible for the viral lifecycle. We address mechanistically how the entire genome sequence is re-coded through variation using GP based SCV relationships to generate the successive break-through strains including the Alpha, Delta, and Omicron variants driving pandemic surges. Such host-pathogen competitive relationships are well-described by the 'Red Queen' effect. Understanding the underlying GP based probabilistic rules driving pathogen versus host fitness may enable more effective therapeutic management in future pandemics. (More Info)

  • Along with aging comes the inevitable failure of multiple biological systems that protect us from the daily insults of the environment. Of particular interest to us is the collapse of proteostasis in the context of viral challenge including influenza and SARS-CoV-2. We are actively pursuing mass spectrometry and novel Gaussian process based computational approaches that capture changes in the stress-related proteostasis signaling pathways and components that fail to protect the aging population from pulmonary disease leading to a shortened lifespan. More Info

  • RuBisCo (ribulose-1,5-bisphosphate carboxylase/oxygenase) is the most abundant protein on Earth. It is the central enzyme involved in the process by which atmospheric carbon dioxide (CO2) is converted by plants and other photosynthetic organisms to energy-rich molecules such as glucose. Using variation in RuBisCO encoded by the genome of multiple plant species, we will address how these changes are used to optimize CO2 fixation. The goal of the project is to provide a universal platform for understanding the genotypic features of the RuBisCo fold using GP that will improve plants as a food source and potentially impact climate change through improved carbon fixation. (More Info)