DI Lab Homepage

Our research program employs interdisciplinary and integrated scientific approaches, combining experimental studies with computational modeling, and connecting human learning with machine learning. We examine the fine-grained structure of moment-to-moment micro-behaviors using advanced data collection and data analytics techniques. These bottom-up insights complement and guide our top-down theoretical hypotheses about the real-time nature of perceptual, cognitive, and learning processes through which developmental intelligence emerges over time.

Infant eye movements in cross-situational learning	Statistical Language learning Human children are prodigious language learners. Confronted with a “blooming, buzzing confusion”, young children learn word-referent associations by computing distributional statistics across the co-occurrences of words from highly ambiguous learning situations. Our research on statistical word learning focuses on three pertinent questions: 1) what statistical regularities in everyday environments are relevant to language learning; 2) how such learning inputs are jointly created by parents and children; and 3) what computational mechanisms allow the brain to aggregate the statistical information across multiple learning situations. Representative Publications: Rapid word learning under uncertainty via cross-situational statistics. Statistical word learning at scale: the baby's view is better More papers can be found here
Vision, Attention, and Action We view behavior as a self-organizing outcome emerging from a complex dynamic system of perceptual, cognitive and motor processes. We study how sensory-motor behaviors in the real world, usually observed by a sequence of interwoven real-time behaviors, enable both children and adults to create and select different pathways to find a dynamically stable solution in each moment. Toward this goal, we use head-mounted cameras and eye trackers to record egocentric videos when participants perform everyday tasks. The egocentric videos approximate the contents of the wearer’s fields of view (FOVs) and the eye tracking data indicate where the wearer looks within these FOVs. Analyzing egocentric vision and eye tracking data provides a unique opportunity to understand how visual, cognitive and motor processes are tightly intertwined in real time to support learning and development. Representative Publications: Infant sustained attention but not joint attention to objects at 9 months predicts vocabulary at 12 and 15 months. Self-Generated variability in object images predicts vocabulary growth. More papers can be found here	Action anticipation during food preparation Free-Flowing Toy play
Motion tracking during parent-child interaction Dual eye tracking in the classic "whiteroom"	Multimodal Social Interaction Humans are social animals. Early development and learning rely on interactions with other social beings. For example, the parent “jiggles” an object, the infant looks, and simultaneously the parent provides the name. The time-locked social signals encoded in multiple modalities play a vital role in learning, by enhancing and selecting some physical correlations, and thereby making them more salient and thus learnable. In this way, the effects of high-level social cognition can be grounded in embodied multimodal behaviors that are part of a natural social interaction. Inferences about the mental states of others arise from reading their external bodily actions in the real world. We take this idea of embodied multimodal interaction to examine both typically developing and atypically developing populations. Representative Publications: The social origins of sustained attention in one-year-old human infants Multiple Sensory-Motor Pathways Lead to Coordinated Visual Attention Cognitive Science More papers can be found here
Computer Vision and Machine Learning How can human learning teach us about how machines can learn? The recent progress in machine learning has been largely driven by leveraging large-scale datasets and computing power. Compared with machine learners, human learners are more efficient in that we don’t need to rely on a huge amount of data in training and we are better in generalization. Can we build machines to learn in human-like ways to emulate the efficiency and generalization prominent in human learning? We start to explore this idea by using egocentric video collected from young children to train state-of-the-art learning algorithms. Representative Publications: Toddler-Inspired Visual Object Learning A Self Validation Network for Object-Level Human Attention Estimation More papers can be found here	Automatic Object detection from egocentric vision
Building a word-referent associative matrix based on object-name co-occurrences in toy play	Cognitive Modeling How can computational models be used to advance developmental science and cognitive science? Recent advances in sensing technology make it increasingly feasible to collect dense sensory data that captures the perceptual inputs children receive (e.g. via wearable eye trackers). The availability of such datasets has created the opportunity (and challenge) to apply computational models to raw sensory data perceived by human learners to simulate how they accomplish various kinds of learning tasks, such as visual object recognition and word learning. The “in-principle” solutions offered by computational models provide not only powerful ways to quantify the informativeness of the learning inputs in everyday learning environments, but also useful tools to discover the cognitive mechanisms through which human learners use the same data to solve the same problems. Representative Publications: Modeling Cross-Situational Word-Referent Learning: Prior Questions. An Associative Model of Adaptive Inference for Learning Word-Referent Mappings More papers can be found here