<Photo1. (From left) Ph.D candidate Yong-hoo Kwon, M.S candidate Do-hwan Kim, Professor Jung-woo Choi, Dr. Dong-heon Lee>
'Acoustic separation and classification technology' is a next-generation artificial intelligence (AI) core technology that enables the early detection of abnormal sounds in areas such as drones, fault detection of factory pipelines, and border surveillance systems, or allows for the separation and editing of spatial audio by sound source when producing AR/VR content.
On the 11th of July, a research team led by Professor Jung-woo Choi of KAIST's Department of Electrical and Electronic Engineering won first place in the 'Spatial Semantic Segmentation of Sound Scenes' task of the 'DCASE2025 Challenge,' the world's most prestigious acoustic detection and analysis competition.
This year’s challenge featured 86 teams competing across six tasks. In this competition, the KAIST research team achieved the best performance in their first-ever participation to Task 4. Professor Jung-woo Choi’s research team consisted of Dr. Dong-heon, Lee, Ph.D. candidate Young-hoo Kwon, and M.S. candidate Do-hwan Kim.
Task 4 titled 'Spatial Semantic Segmentation of Sound Scenes' is a highly demanding task requiring the analysis of spatial information in multi-channel audio signals with overlapping sound sources. The goal was to separate individual sounds and classify them into 18 predefined categories. The research team plans to present their technology at the DCASE workshop in Barcelona this October.
< external_image >
<Figure 1. Example of an acoustic scene with multiple mixed sounds>
Early this year, Dr. Dong-heon Lee developed a state-of-the-art sound source separation AI that combines Transformer and Mamba architectures. During the competition, centered around researcher Young-hoo Kwon, they completed a ‘chain-of-inference architecture' AI model that performs sound source separation and classification again, using the waveforms and types of the initially separated sound sources as clues. This AI model is inspired by human’s auditory scene analysis mechanism that isolates individual sounds by focusing on incomplete clues such as sound type, rhythm, or direction, when listening to complex sounds.
Through this, the team was the only participant to achieve double-digit performance (11 dB) in 'Class-Aware Signal-to-Distortion Ratio Improvement (CA-SDRi)*,' which is the measure for ranking how well the AI separated and classified sounds, proving their technical excellence.
Prof. Jung-woo Choi remarked, "The research team has showcased world-leading acoustic separation AI models for the past three years, and I am delighted that these results have been officially recognized." He added, "I am proud of every member of the research team for winning first place through focused research, despite the significant increase in difficulty and having only a few weeks for development."
< external_image >
<Figure 2. Time-frequency patterns of sound sources separated from a mixed source>
The IEEE DCASE Challenge 2025 was held online, with submissions accepted from April 1 to June 15 and results announced on June 30. Since its launch in 2013, the DCASE Challenge has served as a premier global platform of IEEE Signal Processing Society for showcasing cutting-edge AI models in acoustic signal processing.
This research was supported by the Mid-Career Researcher Support Project and STEAM Research Project of the National Research Foundation of Korea, funded by the Ministry of Education, Science and Technology, as well as support from the Future Defense Research Center, funded by the Defense Acquisition Program Administration and the Agency for Defense Development.
<(From left) Dr. Dongjo Yoon, Professor Je-Kyun Park from the Department of Bio and Brain Engineering, (upper right) Professor Yoonkey Nam, Dr. Soo Jee Kim> Existing three-dimensional (3D) neuronal culture technology has limitations in brain research due to the difficulty of precisely replicating the brain's complex multilayered structure and the lack of a platform that can simultaneously analyze both structure and function. A KAIST research team has successfully developed an integrated
2025-07-16KAIST announced on July 10th that it held a groundbreaking ceremony on July 9th for the expansion of its Creative Learning Building. This project, which celebrates the university's 50th anniversary, will become a significant donation-funded landmark and marks the official start of its construction. <(From left) President Kwang Hyung Lee, Former President Sung-Chul Shin> The groundbreaking ceremony was attended by key donors who graced the occasion, including KAIST President Kwang H
2025-07-10Professor Moon-Jeong Choi from KAIST’s Graduate School of Science and Technology Policy has been appointed as an advisor for "Innovate for Impact" at the AI for Good Global Summit, organized by the International Telecommunication Union (ITU), a specialized agency of the United Nations (UN). The ITU is the UN's oldest specialized agency in the field of information and communication technology (ICT) and serves as a crucial body for coordinating global ICT policies and standards. This advis
2025-07-08Latest generative AI models such as OpenAI's ChatGPT-4 and Google's Gemini 2.5 require not only high memory bandwidth but also large memory capacity. This is why generative AI cloud operating companies like Microsoft and Google purchase hundreds of thousands of NVIDIA GPUs. As a solution to address the core challenges of building such high-performance AI infrastructure, Korean researchers have succeeded in developing an NPU (Neural Processing Unit)* core technology that improves the in
2025-07-07<(From Left)Prof. Yong Man Ro and Ph.D. candidate Sejin Park> Se Jin Park, a researcher from Professor Yong Man Ro’s team at KAIST, has announced 'SpeechSSM', a spoken language model capable of generating long-duration speech that sounds natural and remains consistent. An efficient processing technique based on linear sequence modeling overcomes the limitations of existing spoken language models, enabling high-quality speech generation without time constraints. It is expe
2025-07-04