Mapping Soundtracks Everywhere – Source

Mapping Soundtracks Everywhere – Source

Imagine yourself on a beautiful beach. You probably imagine sand and sea but you also hear a symphony of howling winds, crashing waves and squawking seagulls. In this landscape – as well as in urban settings with neighbors talking, dogs barking and traffic sounds – sounds are important components of the overall sense of place.


In fact, sound is one of the primary senses that helps humans understand their environments, and healthy environmental conditions have been shown to have a strong relationship with human mental and physical health. Reliable methods for understanding the soundscape of a given geographic area are thus valuable for applications ranging from collective policy making about urban planning and noise management to individual decisions about where to buy a home or establish a business.

Nathan Jacobs, a professor of computer science and engineering, along with graduate students Subash Khanal, Sreekumar Sastry, and Ayush Dhakal, who study computer science and engineering at the McKelvey School of Engineering at Washington University in St. Louis, developed the geo-aware contrast technology. Acoustic Language Pre-Training (GeoCLAP), a new voice mapping framework that can be applied anywhere in the world. They presented their work on November 22 at the British Machine Vision Conference in Aberdeen, UK.

The team’s main innovation comes from their use of three different methods, or types of data, in their framework, which includes geotagged audio, text descriptions and public images. Unlike previous approaches to soundscape mapping that focused on only two approaches, GeoCLAP’s richer understanding allows users to generate potential soundscapes from text or audio queries for any geographic location.

“We have developed a simple and scalable method to create an audio map of any geographic area,” Jacobs said. “Our approach overcomes the limitations of previous soundscape mapping methods that were rule-based, often missing important sound sources, or relied on direct human observations, which are difficult to obtain in sufficient quantities away from popular tourist destinations. By leveraging the relationship Intrinsic between audio and local visual cues, our freely available multimedia and overhead imagery tool allows us to create audio maps of any region in the world.

Khanal S, Sastry S, Dhakal A, and Jacobs N. Learning three-modal embedding for zero-shot soundscape mapping. British Machine Vision Conference (BMVC), 20-24 November 2023. DOI: Code:

Originally published on the McKelvey College of Engineering website.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *