I have added a new entry to the “publications” section of my About page for a paper that I’ve been a co-author of.
This contribution dates back to my earlier work at Twenty Billion Neurons in Berlin, the AI company that has since been acquired by Qualcomm.
In that role, I worked on the Python source code of the real-time inference stack that’s still in use at Qualcomm and I was involved in early work that went into the dataset which is now being made public as part of the paper.
I am happy about this publication in particular because it was presented at Neurips, the prestigious conference in Machine Learning and AI.
The paper is titled “What to Say and When to Say it: Live Fitness Coaching as a Testbed for Situated Interaction”.
It introduces a new dataset of exercise videos and presents a reference model to perform video stream analysis and provide corrective feedback to the user. The work presents one possible method to combine language models with real-time video processing, something that we will for sure see more of in the next couple of years.
Full citation: Panchal, S., Bhattacharyya, A., Berger, G., Mercier, A., Bohm, C., Dietrichkeit, F., … & Memisevic, R. (2024). What to Say and When to Say it:
Live Fitness Coaching as a Testbed for Situated Interaction
You can read the PDF in full online: https://arxiv.org/pdf/2407.08101