Complete outlines for the Winter 2023 courses are now online.
The research portfolio of Soufan Lab encompasses cheminformatics and bioinformatics with applications in life sciences.
Soufan Lab aims to advance life sciences by developing innovative methods, systems and resources for targeted knowledge discovery from biological data.
More than ever, there is a growing availability and accessibility to biological and chemical data in relation to advances in next generation sequencing, mass spectrometry, array-based methods and others technologies. Together, bio-cheminformatics, covers a range of computational methods that can be used to predict interactions between biomolecules (e.g., proteins) and chemicals (e.g., ligands) at large scales. For example, developing gene expression studies to increase screening of biological activities of chemicals is the frontier in environmental and health studies. In the context of drug discovery, bio-cheminformatics allows to tackle the problem of predicting off-target proteins that lead to side effects which in turn, limit efficacy of many existing medicines. Associations of compounds and proteins is necessary to process structural alerts in environmental toxicology, and detect patterns in chemicals that can cause certain adverse effect in organs. Other types of complex chemicals and omics (transcriptomics, proteomics, metabolomics, etc.) interactions shape domains like nutrigenomics which focuses on studying relationship between human genome, nutrition and health. All of this is key to understand molecular mechanisms and reveal detailed interactions in life systems which eventually will help in tackling interrelated questions about treatments, long term effects and impacts of the environment. Our three main components of the proposed research program are listed next.
With expansion in omics data (volume and dimensionality), there is a need for faster, more reliable and more cost-effective AI models to find top relevant variables. Biomarker discovery aims at finding top indicators that explain connections to treatment conditions (time, dose, exposure) and target meta data (age, tissue, histology). In complex biological systems, biomarker analysis facilitates understanding of the underlying mechanisms, assists in capturing states and changeable signatures of genes, proteins, metabolites and chemicals
Chemicals influence on biological targets not only varies by genetic factors but greatly via environmental ones. Due to rapid emergence in environmental conditions towards intake of chemicals, there is a growing need to discover novel chemical structures. Domain applications will range from finding new cures (i.e., health) to characterizing unknown mixtures with reported toxic effects (i.e., toxicology). With the sheer magnitude of the chemical search space (10200 molecules could exist) and limited functional reference libraries, the goal is to develop a solution to learn, predict and characterize functions of novel chemical structures.
Key challenges in bio/cheminformatics data analysis include, but are not limited to, access to standardized analysis workflows, interactive analytics for decision making, sharing and reproducibility. Reproducibility is not only sharing the source code but training data, parameters, steps and all possible details to reduce randomness effects (e.g., numbers generated to kick of model training).