International conservation charity, ZSL (Zoological Society of London) has made another leap forward in its battle to protect animals using AI and machine learning (ML) from Google Cloud.

We’ve been privileged to partner with ZSL for three years, co-developing custom ML models to identify and better track endangered species around the world. The next dataset in ZSL’s arsenal to tackle animal conservation is sound—specifically gunshots captured by recording devices.

WWF estimates the illegal wildlife trade is worth about $20bn a year and has contributed to a catastrophic decline in some species. Technology, particularly machine learning, is at the forefront of conservation efforts, but standing up these systems in wildlife reserves is no walk in the park.

The analysis of acoustic (sound) data to support wildlife conservation is one of the major lines of work at ZSL’s monitoring and technology programme. Compared to camera traps that are limited to detection at close range, acoustic sensors can detect events up to 1 kilometre (about half a mile) away. This has the potential to enable conservationists to track wildlife behaviour and threats over much greater areas.

In early 2018, ZSL deployed 69 acoustic recording devices in the northern sector of the Dja Faunal Reserve, in Cameroon, central Africa. The objectives of the project were twofold: to collect acoustic data that could be analyzed for monitoring key endangered species, and to see if the acoustic data could be used to investigate illegal hunting activity. Over the course of a month, ZSL’s acoustic devices captured 267 days’ worth of continuous audio totalling 350GB. Even one-month’s worth of data would be too labor intensive for a human to listen to and analyze manually; so ZSL’s research team worked in collaboration with Google Cloud to find a quicker solution.

Map of the DJA Faunal Reserve.jpg
Figure 1 – Map of the DJA Faunal Reserve


Using BigQuery & ML models to rapidly identify and label different sounds

ZSL was particularly interested in identifying and analysing instances of gunshots. For each audio file in the dataset we needed to answer the following:

  1. Does it contain a gunshot?
    1. At what time index did it occur?
    2. What is the confidence level?

The team leveraged a pre-trained machine learning model called YAMNet, originally developed and open-sourced by Google. YAMNet is a deep net that predicts 521 audio event classes, and was trained using the soundtracks of millions of YouTube videos. YAMNet was used to recognize sound events in ZSL’s dataset, stored in Google Cloud Storage. The initial classification of 350GB worth of data took less than 15 minutes to complete and identified 1,746 instances with a high confidence of being gunshots.

The output from the classifier was pushed into a BigQuery table. Each classification represented a row in the table, including details of the acoustic recording device, it’s location, the time at which the sound occurred, the confidence level it contained a gunshot sound, and a reference to the originating audio file.  This allowed ZSL to quickly query and focus on only the audio files with the highest probability of containing a gunshot sound from thousands of hours of recording, for further analysis.

These instances would need to be manually listened to and visually inspected as spectrograms to be confirmed as gunshots. ZSL needed an easy way to listen to those audio clips identified as containing gunshots. So the next step was to build a Jupyter notebook using AI Platform to load, visualise and listen to a sample of audio files to validate the model’s findings as shown in Figure 2.

The team used the BigQuery API to return the Cloud Storage URL of each file corresponding to a gunshot instance identified with high confidence. Each audio file was then visualized as a spectrogram (to speed up validation), with a button for the researchers to enable playback of the sound, without needing to leave the notebook environment.

Audio file visualisation using AI Platform Notebooks.jpg
Figure 2: Audio file visualisation using AI Platform Notebooks. Spectrograms are visual representations of sound where the x-axis is time and the y-axis is frequency. (a) Spectrogram of a true gunshot: note the sudden offset at around 1.25 seconds and a steadily decaying tail; (b) Spectrogram of a false positive: a single hammer strike during device installation with no decaying tail.


Another benefit of storing the results in BigQuery, meant that location data for each acoustic recording device could be easily cross-referenced with instances of gunshot classifications attributed to that device. The team then visualised this data using the native geospatial capabilities in BigQuery (BigQuery GIS), as shown in Figure 3. The circles represent the positions of the acoustic monitors, and the size and opacity of the circles represent the density of instances identified as gunshots in the dataset, at each monitor. As machine learning models improve, this type of analysis could help pinpoint locations where law enforcement is required, or where extra monitoring is necessary.

Output from BigQuery Geo Vis Tool,.jpg
Figure 3 – Output from BigQuery Geo Vis Tool, Gunshot instance classification density by acoustic monitoring station


Key findings & next steps

ZSL confirmed three unique gunshots that took place during the study at three different locations, dates, and times. Manual validation is tedious work, and the instances returned by the Google classifier took about 2.5 hours; ordinarily this task could have taken many months of effort by a team of researchers.

In this short one-month study that only covered a portion of the reserve, the research team were able to contribute new insights to the human threats to species in the Dja reserve. Past data suggests gunshots are more likely to take place at night to evade ranger detection, but using ecoacoustics alone, ZSL provided evidence of illegal hunting occurring during the day.

Google Cloud provides ZSL with storage and quick analysis of large amounts of data for conservation purposes. In combination with low cost acoustic devices, this rapid data processing pipeline paves the way for expansion of the current study to monitor entire reserves for longer periods of time. This will ensure that ZSL can identify hotspots and seasonality of threats to wildlife and begin to inform and direct local law enforcement.

Down the road, ZSL’s findings will also inform development of on-device threat classification to enable longer, cheaper monitoring and, ultimately, real-time alerts. With animal populations under enormous pressure, technology and in-particular machine learning, has huge potential for enabling conservation groups like ZSL to deploy their resources more efficiently in the battle against illegal wildlife trade.


Omer Mahmood Head of Customer Engineering, CPG & Travel, UK & IE

Source: Google Cloud Blog

Previous IBM To Acquire Instana As Company Continues To Advance Its Hybrid Cloud And AI Strategy
Next Paving The Way For Software 2.0 With Kotlin