Research

My research develops computational methods and systems that accelerate scientific workflows requiring expert attention. For example, in the study of animal behavior, researchers need to define and categorize actions from large amounts of videos; each researcher poses different questions, with different behaviors, using different experimental design.

By developing approaches that enable new ways to collaborate between human experts and machine learning systems, I aim to help accelerate discovery across fields and unlock hidden insights such as novel categories from data, previously unknown relationships, and connections across previously disparate experiments. My work intersects different areas of CV & ML, including representation learning, interpretable modeling, program synthesis, self-supervised learning, and generative modeling. For example:

Integrating Expert Knowledge with Neurosymbolic Learning. There exists a lot of domain knowledge in scientific fields. With a focus on behavior analysis, we developed methods that incorporate symbolic knowledge with learning to improve data efficiency as well as enable symbolically interpretable models. We collaborate closely with scientists to design interpretable Domain-Specific Language for their domains.
[Task Programming – CVPR 2021][AutoSWAP – CVPR 2022]

Structure Discovery from High-Dimensional Data. Lower-dimensional underlying structure is often easier to analyze than raw data. We proposed new methods for self-supervised learning that discover structures automatically across diverse types of data. For example, for learning organism body structures of mice, flies, human, and jellyfish from video.
[Keypoint Discovery – CVPR 2022][Neurosymbolic Encoders – TMLR 2022]

Efficient and General-Purpose Representation Learning. Effective intermediate representations have the potential to benefit a range of downstream analysis tasks. For example, we demonstrate that by learning view-invariant pose embeddings, our method improves performance on tasks including pose retrieval, action recognition, and video alignment.
[Pose Embeddings – ECCV 2020][View Disentanglement – CVPR 2021]

Benchmarks that Bridge Science & ML. Standardized datasets and benchmarks are crucial for measuring model performance. We introduced one of the first large-scale dataset and competition based on challenges from behavioral neuroscience to the ML community, and have since expanded our dataset to include evaluation protocol for both supervised learning and representation learning.
[CalMS21 – NeurIPS Datasets 2021][MABe 2022 – ICML 2023]