You’re standing in the middle of a forest. Close your eyes: what do you hear? Leaves rustling in the wind, birdsong… the chirruping of tiny wings vibrating at impossible speeds. It might be nature’s own heartbeat.
These diminutive musicians – insects – could hold the key to protecting our wild spaces, says Lucas Unterberger, data engineer at Capgemini’s Insights & Data team in Austria.
“Scientists have found a link between the range of insects present in a given location and the overall biodiversity level of that space,” he explains. “However, there are simply not enough trained entomologists out there to classify insects in situ on a large scale.”
Could artificial intelligence and machine learning be harnessed to reliably identify insect species through their chirps alone?
The “Biodiversity Buzz”
This was the question at the heart of Capgemini’s Global Data Science Challenge (GDSC) for 2023, which was launched in association with researchers from the Naturalis Biodiversity Center in Leiden, the Netherlands.
Dominik Lemm, a data scientist also based in Austria, and part of the winning team, describes the specific task: “The competing teams were asked to design a model that could accurately classify audio recordings of cicada and grasshopper ‘chirps’ into one of 66 sub-species. The team with the highest identification rate would be the winner.”
According to Dominik, this is just one small step towards a larger goal: helping the Naturalis researchers develop an economically viable system for identifying different types of insects through the analysis of audio files recorded by microphones placed in outdoor environments.
“If scientists can identify insect species quickly using AI, they will be able to monitor an area’s biodiversity rating, giving a boost to global conservation efforts,” he explains.
Data scientists, assemble!
Once again, the GDSC received huge interest from Capgemini colleagues around the world. Lucas, who had previously entered, says the number of participants rocketed from a few hundred to more than a thousand. “With so much international competition, it’s incredibly tough to win. But it’s the perfect opportunity to learn new skills and technologies,” he says.
When this year’s event was announced, he rapidly assembled a “dream team,” which, as well as Dominik, included Raffaela Heily, a mathematician and data scientist, and Lukas Kemetinger, who participated as a student consultant with a focus on AI.
Dominik, with a background in machine learning research, found himself recruited almost as soon as he joined the company. “It might have been my first day when Lucas asked me if I knew anything about audio machine learning. I was fascinated by the challenge and keen to dive right into the topic.”
With Lucas acting as project manager – a new experience for him – the team quickly divided responsibilities between them, focusing on their specialisms, and scheduled weekly meetings to discuss ideas and progress.
A winning formula
“The Naturalis team had prepared the dataset for us,” says Dominik. “But it was up to us to find the best way to process it. Under the rules, we were able to draw on resources from Amazon Web Services (AWS), worth up to $100 per week, to hone our models.”
The team’s strategy of establishing a prototype as quickly as possible seemed to pay off. “After Lucas mentioned the idea of insect sounds, I started reading up on the literature and considering possible approaches,” he says. “After the second or third week, we had a model that was working quite well, which left us time for fine-tuning.”
The fact they were all working in the Vienna office also gave them an edge. “We communicated and harmonized really effectively,” confirms Lukas. “For instance, we would discuss the project on Friday evenings over a beer. We had a great rapport and team spirit.”
While their solution is currently restricted to cicada and grasshopper identification, it might one day be expanded to identify almost any insect or animal sound. “As part of the prize, we’re planning a trip to the Netherlands to meet the Naturalis scientists and discuss how to take our model to the next level,” says Lucas.
And this is really the point of the GDSC – bridging the gap between scientific research and industry, focusing the brightest data science minds in the world intensely on a single social or environmental problem. “It’s a wonderful way to fast-track innovation in the area of sustainability, and a clear demonstration of the advantages machine learning can bring to biodiversity challenges,” says Lucas. “With so much uncertainty around AI, it’s great to show how this technology could shape a better future – for people and the planet.”