Text-and-audio methods

Name: Text-and-audio methods
Start: 2024-01-30T13:00:00Z
End: 2024-01-30T14:00:00Z
Location: Lecture Theatre 2, Computer Laboratory, William Gates Building

Abstract

This talk supports the R255 Advanced Topics in Machine Learning course module on Multimodal Learning and provides a bird’s eye view of the rapidly evolving text-audio landscape, with a focus on music as a primary example of audio data. I will first present types of tasks that exist in this space, then discuss data curation challenges and follow with an overview of some existing retrieval and generation methods, including a quick primer on diffusion models. Finally, I will describe current evaluation metrics and their limitations.

Date

Jan 30, 2024 1:00 PM — 2:00 PM

Event

Artificial Intelligence Research Group Talks (Computer Laboratory)

Location

Lecture Theatre 2, Computer Laboratory, William Gates Building

Cambridge, United Kingdom

Dr Cătălina Cangea

Senior Research Scientist

Senior Research Scientist at Google DeepMind, with a PhD in ML from the University of Cambridge, and inhaler of music :) Focus on generative music models, finding signals in data and human evaluation. Motivated by contributing ML-based knowledge and improvements to real-world systems!