Abstract: A key objective of decomposition analysis is to identify risks or resources (‘mediators’) that contribute to disparities between groups of individuals defined by social characteristics such as race, ethnicity, gender, class, and sexual orientations. In decomposition analysis, a scholarly interest often centers on estimating how much the disparity (e.g., health disparities between Black women...
Abstract: Over 782,000 individuals in the U.S. have end-stage kidney disease with about 72% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience high mortality and frequent hospitalizations, at about twice per year. These poor outcomes are exacerbated at key time periods, such as the fragile period after transition to dialysis. In order to...
Abstract: What happened when Twitter deplatformed 70,000 right-wing extremists following the January 6 insurrection? Using a panel of over a half million active Twitter users and a sharp regression discontinuity design, we test the causal effects of this intervention on the circulation of misinformation by those deplatformed, and by users from adjacent groups such as...
Abstract: An overarching goal in machine learning is to enable accurate statistical inference in the setting where the sample size is less than the number of parameters. This overparameterized setting is particularly common in deep learning where it is typical to train large neural nets with relatively smaller sample sizes and little concern of overfitting...
Abstract: Extracting hidden patterns of multiview data containing heterogeneous feature representations is attracting more and more attention in various scientific fields such as image processing and natural language processing. In this talk we will present a comprehensive unsupervised framework that leverages existing and novel multiview learning models, towards obtaining a single node embedding from a...
Abstract: Agricultural systems are pressured by growing global population, increasing water scarcity, and changing climate. In the pursuit of increasing food security, agriculture (especially intensive systems) should also minimize negative and undesired impacts on the environment and on rural societies. Part of the solution to this challenge lies in understanding how environmental factors such as...
Abstract: My lab investigates the immune responses to infection and inflammation using mouse models of parasitic worm infection and clinical samples from sepsis patients. Our ultimate goal is to identify protective or pathogenic immune pathways that we can target for diagnostic or therapeutic purposes. In our mouse infection models we investigate macrophages as first responders...
Abstract: Learning a numeric representation (also known as embedded vector, or simply embedding) for a piece of binary code (an instruction, a basic block, a function, or even an entire program) has many important security applications, ranging from vulnerability search, plagiarism detection, to malware classification. By reducing a binary code with complex control-flow and data-flow...
The air quality and fire management communities are faced with increasingly difficult decisions regarding critical fire management activities, given the potential contribution of wildland fires to fine particulate matter (PM2.5). Unfortunately, in model frameworks used for air quality management, the ability to represent PM2.5 from fires is severely limited. This is due in part to...
Social insects include the termites, ants and the social bees and wasps, which are a very large and ecologically very successful group of animals. They are also of tremendous importance for humans. Whereas some social insects are serious pest species that become increasingly difficult to control, others are of central importance for agricultural food production...
The Berkeley Institute for Data Science (or BIDS) was founded as part of a high-profile, multi-university initiative funded by the Moore and Sloan Foundations, collectively known as the Moore-Sloan Data Science Environments (or MSDSE), with the mission of creating ``institutional change'' around data science in academia. I will discuss some of the lessons learned in...
Much of current application efforts of data science in both of ecology and genomics has been focusing on the data-driven, static but not fully dynamic understanding of those systems. In this talk, I will introduce our recent work on fusioning data- and model-driven approaches to understand the fundamental nitrogen biochemical processes in fluctuating soil redox...
The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile...
Government agencies have long collected biological samples to assess and monitor the environmental health of streams and rivers. In California, one such example of this monitoring is the Surface Water Ambient Monitoring Protocol (SWAMP); monitoring under this protocol has generated a vast amount of data that has recently been made publicly available. These data include...
In the past few decades, mass spectrometry-based proteomics has dramatically improved our fundamental knowledge of biology, leading to advancements in the understanding of diseases and methods for clinical diagnoses. However, the complexity and sheer volume of typical proteomics datasets make both fast and accurate analysis difficult to accomplish simultaneously; while machine learning methods have proven...
Globally, human exposure to air pollution is a known risk factor for increased morbidity and mortality, and its chemical composition can vary significantly by region and season. Variabilities are largely driven by topography, meteorology, land cover, and human activities. State-of-the-science air quality modeling systems, such as the U.S. EPA’s Community Multiscale Air Quality (CMAQ) model...
The ability to select between competing options and adapt to new situations underlies our impressive capabilities of playing soccer, flying aircrafts and skiing on the Olympics. To select between actions, the brain needs an accurate representation of the state of the body and the environment it is in. Despite the sophistication of our sensory system...
Augmented and virtual reality (AR/VR) are at the frontier of mobile computing. While AR/VR applications are gaining popularity today, the technologies to support these applications are far from mature. This talk will first outline the current state of mobile AR/VR platforms, including what functionality is currently available, what is needed/desired, and how edge computing can...
When people talk about skills that are important for data science, they tend to focus only on the technical skills, like statistics and computer programming. Often overlooked is the scientific mindset. Being a critical thinker helps you interpret data and avoid doing analysis on auto-pilot. A skeptical mindset will keep you vigilant for the “silent...
Highly mutable pathogens such as influenza and HIV pose a serious threat to public health. Better understanding of how these pathogens evolve could inform efforts to treat and prevent infection. In this talk, I’ll discuss the statistical problem of inferring an evolutionary model from data, and how we’ve developed a new method to solve this...