Thomas
Dethmann

ML Research Engineer | Musician
Audio enthusiast at the intersection of music, acoustics, and deep learning.
01 / About
Background
Speech & ML
Multi-modal LLMs, ASR, data augmentation, and deep learning for audio at Fraunhofer IAIS.
Audio Engineering
DSP, mixing, recording, and acoustics consulting.
Research
Award-winning thesis, ITG 2025 publication, and EU research projects.
Musician
Multi-instrumentalist, drummer, and three years of drum set teaching.
My background bridges the humanities and engineering. I hold degrees in English & Musicology (B.A.), Audio & Video Engineering (B.Eng.), and Audio Communication & Technology (M.Sc.), including a semester abroad at Lawrence Technological University (GPA 4.0, Dean's Honor Roll) in Southfield, MI USA.
Today I work as an ML Research Engineer at Fraunhofer IAIS, developing multi-modal large language models that integrate speech and text for ASR, spoken question answering, and translation (published at ITG Berlin 2025). I also build LLM/ASR-based pipelines — including video summarization for the Lamarr Institute, data augmentation for production ASR systems such as air-traffic control scenarios, prompt engineering for LLM fine-tuning as part of the MWL project, and multilingual translation models for a major German broadcaster.
Previously I interned as a Machine Learning Engineer at Apple, building deep neural networks for automatic drum transcription as part of my award-winning Master's thesis. I also worked at Brainworx Audio on hardware modeling and ML-based instrument classification, and at Fraunhofer HHI on real-time spectrum analysis for drone communications.
I've been making music since age ten and was the drummer of Fil der Protagonist. As a multi-instrumentalist with professional training in recording, mixing, and sound engineering, I also worked as an acoustics consultant — planning room acoustics for offices, schools, and concert halls — and spent three years teaching drum set to students of all ages and skill levels.
02 / Work
Projects
Automatic Drum Transcription
Deep learning-based automatic drum transcription (ADT) system. At first, you hear the originally recorded drums followed by the ADT version, where all subtypes of drum hits are mapped to Kick, Snare, Tom, Hi-Hat, Crash, Ride, and Bell. Finally, the original transitions to the ADT version for direct comparison.
COVID-19 Sonification
In order to better understand the changes of COVID-19, we implemented a sonification of the RKI COVID-19 dataset using Python and Faust.
RTAP Synthesizer
A functioning Pure Data synthesizer patch with an additional external written in C. Developed as a collaborative student project.
03 / Music
Solo / Band
Final Exam
During my Bachelor's in Audio and Video, I studied drums for 3 years. This improvised solo was performed as part of my final exam.
Fil der Protagonist
Band performances and releases with Fil der Protagonist.
04 / Contact
Get in Touch
Interested in collaborating on ML, audio, or music tech projects? Feel free to reach out.