Breadcrumb

Accelerated Machine Learning for Computational Proteomics

Dr. John Halloran, PostDoc UC Davis
ABSTRACT –

In the past few decades, mass spectrometry-based proteomics has dramatically improved our fundamental knowledge of biology, leading to advancements in the understanding of diseases and methods for clinical diagnoses. However, the complexity and sheer volume of typical proteomics datasets make both fast and accurate analysis difficult to accomplish simultaneously; while machine learning methods have proven themselves capable of incredibly accurate proteomic analysis, such methods deter use by requiring extremely long runtimes in practice. In this talk, we will discuss two core problems in computational proteomics and how to accelerate the training of their highly accurate, but slow, machine learning solutions. For the first problem, wherein we seek to infer the protein subsequences (called peptides) present in a biological sample, we will improve the training of graphical models by deriving emission functions which render conditional-maximum likelihood learning concave. For the second problem, wherein we seek to further improve peptide identification accuracy by classifying correct versus incorrect identifications, we will speed up support vector machine learning using a combination of improved convex optimization and extensive parallelization. Together, these speedups ensure globally-convergent parameters while reducing analysis time on massive datasets from several days to just several hours.

Dr. John Halloran

Tags