Using bio-monitoring data to infer ecological dynamics in streams and rivers
Government agencies have long collected biological samples to assess and monitor the environmental health of streams and rivers. In California, one such example of this monitoring is the Surface Water Ambient Monitoring Protocol (SWAMP); monitoring under this protocol has generated a vast amount of data that has recently been made publicly available. These data include...
Accelerated Machine Learning for Computational Proteomics
In the past few decades, mass spectrometry-based proteomics has dramatically improved our fundamental knowledge of biology, leading to advancements in the understanding of diseases and methods for clinical diagnoses. However, the complexity and sheer volume of typical proteomics datasets make both fast and accurate analysis difficult to accomplish simultaneously; while machine learning methods have proven...
Data Science and Environmental Systems: Applications of Deterministic Models, Optimization, and Machine Learning to Address Multi-scale Air Quality Challenges
Globally, human exposure to air pollution is a known risk factor for increased morbidity and mortality, and its chemical composition can vary significantly by region and season. Variabilities are largely driven by topography, meteorology, land cover, and human activities. State-of-the-science air quality modeling systems, such as the U.S. EPA’s Community Multiscale Air Quality (CMAQ) model...
The adapting brain: The role of posterior parietal cortex in learning and adaption
The ability to select between competing options and adapt to new situations underlies our impressive capabilities of playing soccer, flying aircrafts and skiing on the Olympics. To select between actions, the brain needs an accurate representation of the state of the body and the environment it is in. Despite the sophistication of our sensory system...
Mobile AR/VR with Edge-based Deep Learning
Augmented and virtual reality (AR/VR) are at the frontier of mobile computing. While AR/VR applications are gaining popularity today, the technologies to support these applications are far from mature. This talk will first outline the current state of mobile AR/VR platforms, including what functionality is currently available, what is needed/desired, and how edge computing can...
Putting the ‘Science’ Into Data Science
When people talk about skills that are important for data science, they tend to focus only on the technical skills, like statistics and computer programming. Often overlooked is the scientific mindset. Being a critical thinker helps you interpret data and avoid doing analysis on auto-pilot. A skeptical mindset will keep you vigilant for the “silent...
Constructing quantitative models of pathogen evolution
Highly mutable pathogens such as influenza and HIV pose a serious threat to public health. Better understanding of how these pathogens evolve could inform efforts to treat and prevent infection. In this talk, I’ll discuss the statistical problem of inferring an evolutionary model from data, and how we’ve developed a new method to solve this...
Information Loss in Neural Classifiers from Sampling
An estimator is limited to the information that it has about the variable it's estimating. But this information is limited to what the estimator has seen from the samples training it. The full information of a random variable cannot be transferred to an estimator by finite samples - some information is lost. This presentation analyzes...
Training machines to understand the Universe
Upcoming large-scale datasets in astrophysics will challenge our ability to effectively analyze and interpret the data. Surveys of the 2020s (e.g., Euclid, LSST, WFIRST and SPHEREx) will provide multiple deep views of the universe, each survey with its own observational characteristics such as noise levels, resolution, and wavelength coverage. How do we best interpret the...
Constructing Confidence Intervals for Selected Parameters
In large-scale problems, it is common practice to select important parameters by a procedure such as the BH procedure (Benjamini and Hochberg, 1995) and construct confidence intervals (CIs) for further investigation while the false coverage-statement rate (FCR) for the CIs is controlled at a desired level. Although the well-known BY CIs (Benjamini and Yekutieli, 2005)...