Research

My research develops general-purpose methods for scientists that (1) learn from both domain expertise and experimental data, (2) automatically discover meaningful representations, and (3) enable new analysis capabilities to help transform the scientific process. I aim to build effective machine learning and computer vision systems that work with scientists in order to accelerate discovery across domains, especially those with complex, spatiotemporal data, such as behavior analysis, neuroscience, and medical data analysis. My work intersects different areas of CV & ML, including representation learning, interpretable modeling, program synthesis, self-supervised learning, and generative modeling.

Some examples of my areas of interest below, with representative papers:

Integrating Expert Knowledge with Neurosymbolic Learning. There exists a lot of domain knowledge in scientific fields. With a focus on behavior analysis, we developed methods that incorporate symbolic knowledge with learning to improve data efficiency as well as enable symbolically interpretable models. We collaborate closely with scientists to design interpretable Domain-Specific Language for their domains.
[Task Programming – CVPR 2021][AutoSWAP – CVPR 2022]

Structure Discovery from High-Dimensional Data. Lower-dimensional underlying structure is often easier to analyze than raw data. We proposed new methods for self-supervised learning that discover structures automatically across diverse types of data. For example, for learning organism body structures of mice, flies, human, and jellyfish from video.
[Keypoint Discovery – CVPR 2022][Neurosymbolic Encoders – TMLR 2022]

Efficient and General-Purpose Representation Learning. Effective intermediate representations have the potential to benefit a range of downstream analysis tasks. For example, we demonstrate that by learning view-invariant pose embeddings, our method improves performance on tasks including pose retrieval, action recognition, and video alignment.
[Pose Embeddings – ECCV 2020][View Disentanglement – CVPR 2021]

Benchmarks that Bridge Science & ML. Standardized datasets and benchmarks are crucial for measuring model performance. We introduced one of the first large-scale dataset and competition based on challenges from behavioral neuroscience to the ML community, and have since expanded our dataset to include evaluation protocol for both supervised learning and representation learning.
[CalMS21 – NeurIPS Datasets 2021][MABe 2022 – ICML 2023]