Thomas
Dethmann

ML Research Engineer | Musician

Audio enthusiast at the intersection of music, acoustics, and deep learning.

scroll

01 / About

Background

Speech & ML

Multi-modal LLMs, ASR, data augmentation, and deep learning for audio at Fraunhofer IAIS.

Audio Engineering

DSP, mixing, recording, and acoustics consulting.

Research

Award-winning thesis, ITG 2025 publication, and EU research projects.

Musician

Multi-instrumentalist, drummer, and three years of drum set teaching.

My background bridges the humanities and engineering. I hold degrees in English & Musicology (B.A.), Audio & Video Engineering (B.Eng.), and Audio Communication & Technology (M.Sc.), including a semester abroad at Lawrence Technological University (GPA 4.0, Dean's Honor Roll) in Southfield, MI USA.

Today I work as an ML Research Engineer at Fraunhofer IAIS, developing multi-modal large language models that integrate speech and text for ASR, spoken question answering, and translation (published at ITG Berlin 2025). I also build LLM/ASR-based pipelines — including video summarization for the Lamarr Institute, data augmentation for production ASR systems such as air-traffic control scenarios, prompt engineering for LLM fine-tuning as part of the MWL project, and multilingual translation models for a major German broadcaster.

Previously I interned as a Machine Learning Engineer at Apple, building deep neural networks for automatic drum transcription as part of my award-winning Master's thesis. I also worked at Brainworx Audio on hardware modeling and ML-based instrument classification, and at Fraunhofer HHI on real-time spectrum analysis for drone communications.

I've been making music since age ten and was the drummer of Fil der Protagonist. As a multi-instrumentalist with professional training in recording, mixing, and sound engineering, I also worked as an acoustics consultant — planning room acoustics for offices, schools, and concert halls — and spent three years teaching drum set to students of all ages and skill levels.

02 / Work

Projects

Deep Learning

Automatic Drum Transcription

Deep learning-based automatic drum transcription (ADT) system. At first, you hear the originally recorded drums followed by the ADT version, where all subtypes of drum hits are mapped to Kick, Snare, Tom, Hi-Hat, Crash, Ride, and Bell. Finally, the original transitions to the ADT version for direct comparison.

PythonTensorflowLibrosa

Thesis