Project At A Glance

Objective: Locate and count the number of Capuchinbird calls in 100 three-minute clips recorded in various regions of a rainforest. This will help us gain meaningful insights by mapping the number of calls to population density of the species throughout the area.

Data: Kaggle Dataset for Z by HP's Unlocked: Challenge 3 [Download]

Implementation: Audio EDA, Pre-Processing and Slicing, Spectrogram Visualizations, Normalization, Deep Neural Network (Conv2D, Flatten, Dense Layers)

Results:

The model computed considerable metrics when trained for a measly two epochs: (Precision: 0.9795), (Recall: 0.9728).
A labelled file was generated as output (Results.csv) with a count of Capuchin-calls for every recording instance.
The project was a part of my submission for Z by HP's Unlocked: Audio Analysis Challenge.

Deployment: View this project on GitHub.

Dependencies

!pip install matplotlib tensorflow tensorflow-gpu tensorflow-io

Requirement already satisfied: matplotlib in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (3.5.2)
Requirement already satisfied: tensorflow in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (2.9.0)
Requirement already satisfied: tensorflow-gpu in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (2.9.0)
Requirement already satisfied: tensorflow-io in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (0.26.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (1.4.2)
Requirement already satisfied: cycler>=0.10 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: packaging>=20.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (21.3)
Requirement already satisfied: pillow>=6.2.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (9.1.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (4.33.3)
Requirement already satisfied: numpy>=1.17 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (1.22.4)
Requirement already satisfied: pyparsing>=2.2.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from matplotlib) (3.0.4)
Requirement already satisfied: six>=1.12.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.16.0)
Requirement already satisfied: protobuf>=3.9.2 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (3.20.1)
Requirement already satisfied: google-pasta>=0.1.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (0.2.0)
Requirement already satisfied: termcolor>=1.1.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.1.0)
Requirement already satisfied: keras<2.10.0,>=2.9.0rc0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (2.9.0)
Requirement already satisfied: typing-extensions>=3.6.6 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (4.1.1)
Requirement already satisfied: setuptools in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (61.2.0)
Requirement already satisfied: astunparse>=1.6.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.6.3)
Requirement already satisfied: gast<=0.4.0,>=0.2.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (0.4.0)
Requirement already satisfied: wrapt>=1.11.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.14.1)
Requirement already satisfied: absl-py>=1.0.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.0.0)
Requirement already satisfied: h5py>=2.9.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (3.6.0)
Requirement already satisfied: flatbuffers<2,>=1.12 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.12)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.46.3)
Requirement already satisfied: keras-preprocessing>=1.1.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (1.1.2)
Requirement already satisfied: opt-einsum>=2.3.2 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (3.3.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (0.26.0)
Requirement already satisfied: tensorboard<2.10,>=2.9 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (2.9.0)
Requirement already satisfied: tensorflow-estimator<2.10.0,>=2.9.0rc0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (2.9.0)
Requirement already satisfied: libclang>=13.0.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorflow) (14.0.1)
Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from astunparse>=1.6.0->tensorflow) (0.37.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorboard<2.10,>=2.9->tensorflow) (1.8.1)
Requirement already satisfied: google-auth<3,>=1.6.3 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorboard<2.10,>=2.9->tensorflow) (2.6.6)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorboard<2.10,>=2.9->tensorflow) (0.4.6)
Requirement already satisfied: requests<3,>=2.21.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorboard<2.10,>=2.9->tensorflow) (2.27.1)
Requirement already satisfied: markdown>=2.6.8 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorboard<2.10,>=2.9->tensorflow) (3.3.7)
Requirement already satisfied: werkzeug>=1.0.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorboard<2.10,>=2.9->tensorflow) (2.1.2)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from tensorboard<2.10,>=2.9->tensorflow) (0.6.1)
Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (0.2.8)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (5.1.0)
Requirement already satisfied: rsa<5,>=3.1.4 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (4.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow) (1.3.1)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (0.4.8)
Requirement already satisfied: idna<4,>=2.5 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (1.26.9)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (2022.5.18.1)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (2.0.12)
Requirement already satisfied: oauthlib>=3.0.0 in c:\users\naman\miniconda3\envs\audioc\lib\site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow) (3.2.0)

import os
from matplotlib import pyplot as plt
import tensorflow as tf
import tensorflow_io as tfio

Data

Loading

CAPUCHIN_FILE= os.path.join('data', 'Parsed_Capuchinbird_Clips', 'XC3776-3.wav')
NOT_CAPUCHIN_FILE= os.path.join('data', 'Parsed_Not_Capuchinbird_Clips', 'afternoon-birds-song-in-forest-0.wav')

file_contents= tf.io.read_file(CAPUCHIN_FILE)

Decoding

wav, sample_rate= tf.audio.decode_wav(file_contents, desired_channels= 1)

wav

<tf.Tensor: shape=(132300, 1), dtype=float32, numpy=
array([[-0.11117554],
       [-0.0378418 ],
       [ 0.05856323],
       ...,
       [-0.01077271],
       [-0.03436279],
       [-0.04879761]], dtype=float32)>

sample_rate

<tf.Tensor: shape=(), dtype=int32, numpy=44100>

Primitive Pre-Processing

def load_wav_16k_mono(filename):
    file_contents= tf.io.read_file(filename)
    
    wav, sample_rate= tf.audio.decode_wav(file_contents, desired_channels= 1)
    wav= tf.squeeze(wav, axis= -1)
    sample_rate= tf.cast(sample_rate, dtype=tf.int64)
    
    wav= tfio.audio.resample(wav, rate_in= sample_rate, rate_out= 16000)
    
    return wav

wave= load_wav_16k_mono(CAPUCHIN_FILE)
nwave= load_wav_16k_mono(NOT_CAPUCHIN_FILE)

Visualizations

plt.plot(wave)
plt.show()

plt.plot(nwave)
plt.show()

plt.plot(wave)
plt.plot(nwave)
plt.show()

POS= os.path.join('data', 'Parsed_Capuchinbird_Clips')
NEG= os.path.join('data', 'Parsed_Not_Capuchinbird_Clips')

pos= tf.data.Dataset.list_files(POS+'\*.wav')
neg= tf.data.Dataset.list_files(NEG+'\*.wav')

len(pos), len(neg)

(217, 593)

tf.ones(len(pos))

<tf.Tensor: shape=(217,), dtype=float32, numpy=
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)>

tf.zeros(len(neg))

<tf.Tensor: shape=(593,), dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
      dtype=float32)>

pos.as_numpy_iterator().next()

b'data\\Parsed_Capuchinbird_Clips\\XC178168-3.wav'

positives= tf.data.Dataset.zip((pos, tf.data.Dataset.from_tensor_slices(tf.ones(len(pos)))))
negatives= tf.data.Dataset.zip((neg, tf.data.Dataset.from_tensor_slices(tf.zeros(len(neg)))))

data= positives.concatenate(negatives)

data.shuffle(10000).as_numpy_iterator().next()

(b'data\\Parsed_Not_Capuchinbird_Clips\\crickets-chirping-crickets-sound-27.wav',
 0.0)

Clip Duration, Normalization and Zero-Padding

lengths= []
for file in os.listdir(os.path.join('data', 'Parsed_Capuchinbird_Clips')):
    tensor_wave= load_wav_16k_mono(os.path.join('data', 'Parsed_Capuchinbird_Clips', file))
    lengths.append(len(tensor_wave))

WARNING:tensorflow:5 out of the last 5 calls to <function pfor.<locals>.f at 0x000001E6F095C040> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:6 out of the last 6 calls to <function pfor.<locals>.f at 0x000001E6F095C4C0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

lengths[:10]

[40000, 48000, 56000, 48000, 56000, 64000, 64000, 64000, 56000, 56000]

mean= tf.math.reduce_mean(lengths)
min= tf.math.reduce_min(lengths)
max= tf.math.reduce_max(lengths)

mean, min, max

(<tf.Tensor: shape=(), dtype=int32, numpy=54156>,
 <tf.Tensor: shape=(), dtype=int32, numpy=32000>,
 <tf.Tensor: shape=(), dtype=int32, numpy=80000>)

def preprocess(file_path, label):
    wav= load_wav_16k_mono(file_path)
    wav= wav[:48000]
    zero_padding= tf.zeros([48000]- tf.shape(wav), dtype= tf.float32)
    wav= tf.concat([zero_padding, wav], 0)
    
    spectrogram= tf.signal.stft(wav, frame_length= 320, frame_step= 32)
    spectrogram= tf.abs(spectrogram)
    spectrogram= tf.expand_dims(spectrogram, axis= 2)
    
    return spectrogram, label

file_path, label= positives.shuffle(buffer_size= 10000).as_numpy_iterator().next()

Spectrogram

spectrogram, label= preprocess(file_path, label)

spectrogram

<tf.Tensor: shape=(1491, 257, 1), dtype=float32, numpy=
array([[[0.0000000e+00],
        [0.0000000e+00],
        [0.0000000e+00],
        ...,
        [0.0000000e+00],
        [0.0000000e+00],
        [0.0000000e+00]],

       [[0.0000000e+00],
        [0.0000000e+00],
        [0.0000000e+00],
        ...,
        [0.0000000e+00],
        [0.0000000e+00],
        [0.0000000e+00]],

       [[0.0000000e+00],
        [0.0000000e+00],
        [0.0000000e+00],
        ...,
        [0.0000000e+00],
        [0.0000000e+00],
        [0.0000000e+00]],

       ...,

       [[3.4047730e-02],
        [3.1229634e-02],
        [3.8121101e-02],
        ...,
        [2.8812696e-07],
        [3.1764142e-07],
        [2.9243529e-07]],

       [[2.2530884e-02],
        [2.1241199e-02],
        [2.0391349e-02],
        ...,
        [4.3568735e-07],
        [3.9645255e-07],
        [1.6391277e-07]],

       [[1.0637306e-02],
        [8.4740259e-03],
        [5.8991811e-03],
        ...,
        [7.1185252e-07],
        [3.2554652e-07],
        [8.9406967e-08]]], dtype=float32)>

label

1.0

plt.figure(figsize= (30, 20))
plt.imshow(tf.transpose(spectrogram)[0])
plt.show()

Data Preparation Pipeline

data= data.map(preprocess)
data= data.cache()
data= data.shuffle(buffer_size= 1000)
data= data.batch(16)
data= data.prefetch(8)

WARNING:tensorflow:Using a while_loop for converting IO>AudioResample

len(data)

51

Train-Test Split

train= data.take(36)
test= data.skip(36).take(15)

samples, labels= train.as_numpy_iterator().next()
samples.shape

(16, 1491, 257, 1)

labels

array([0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1.],
      dtype=float32)

Model Setup and Layers

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, Flatten

model= Sequential()
model.add(Conv2D(16, (3, 3), activation= 'relu', input_shape= (1491, 257, 1)))
model.add(Conv2D(16, (3, 3), activation= 'relu'))
model.add(Flatten())
model.add(Dense(128, activation= 'relu'))
model.add(Dense(1, activation= 'sigmoid'))

model.compile('Adam', loss= 'BinaryCrossentropy', metrics= [tf.keras.metrics.Recall(), tf.keras.metrics.Precision()])
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 1489, 255, 16)     160       
                                                                 
 conv2d_1 (Conv2D)           (None, 1487, 253, 16)     2320      
                                                                 
 flatten (Flatten)           (None, 6019376)           0         
                                                                 
 dense (Dense)               (None, 128)               770480256 
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 770,482,865
Trainable params: 770,482,865
Non-trainable params: 0
_________________________________________________________________

hist= model.fit(train, epochs= 2, validation_data= test)

Epoch 1/2
36/36 [==============================] - 816s 23s/step - loss: 13.7638 - recall: 0.9085 - precision: 0.8165 - val_loss: 0.6470 - val_recall: 0.8308 - val_precision: 0.9818
Epoch 2/2
36/36 [==============================] - 788s 21s/step - loss: 0.0566 - recall: 0.9728 - precision: 0.9795 - val_loss: 0.0423 - val_recall: 0.9855 - val_precision: 0.9855

Metrics and Visualization

Loss

plt.title('Loss')
plt.plot(hist.history['loss'], 'r')
plt.plot(hist.history['val_loss'], 'b')
plt.show()

Precision

plt.title('Precision')
plt.plot(hist.history['precision'], 'r')
plt.plot(hist.history['val_precision'], 'b')
plt.show()

Recall

plt.title('Recall')
plt.plot(hist.history['recall'], 'r')
plt.plot(hist.history['val_recall'], 'b')
plt.show()

Predictions

X_test, y_test= test.as_numpy_iterator().next()

X_test.shape, y_test.shape

((16, 1491, 257, 1), (16,))

yhat= model.predict(X_test)

1/1 [==============================] - 1s 971ms/step

yhat= [1 if prediction> 0.5 else 0 for prediction in yhat]

yhat

[0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0]

y_test.astype(int)

array([0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0])

Using our Solution on the Target Clips

Loading

def load_mp3_16k_mono(filename):
    """" wav file-> float tensor-> resample to single-channel audio """
    res= tfio.audio.AudioIOTensor(filename)

    tensor= res.to_tensor()
    tensor= tf.math.reduce_sum(tensor, axis= 1)/ 2 

    sample_rate= res.rate
    sample_rate= tf.cast(sample_rate, dtype= tf.int64)

    wav= tfio.audio.resample(tensor, rate_in= sample_rate, rate_out= 16000)
    return wav

mp3= os.path.join('data', 'Forest Recordings', 'recording_00.mp3')

wav= load_mp3_16k_mono(mp3)

wav

<tf.Tensor: shape=(2880666,), dtype=float32, numpy=
array([ 8.1433272e-12, -5.7019250e-12, -5.3486417e-12, ...,
       -1.1291276e-02, -1.4230422e-02, -3.0555837e-03], dtype=float32)>

Slicing

audio_slices= tf.keras.utils.timeseries_dataset_from_array(wav, wav, sequence_length= 48000, sequence_stride= 48000, batch_size= 1)

samples, index= audio_slices.as_numpy_iterator().next()

samples.shape

(1, 48000)

len(audio_slices)

60

Pre-Processing

def preprocess_mp3(sample, index):
    sample= sample[0]
    zero_padding= tf.zeros([48000] - tf.shape(sample), dtype= tf.float32)
    wav= tf.concat([zero_padding, sample], 0)
    
    spectrogram= tf.signal.stft(wav, frame_length= 320, frame_step= 32)
    spectrogram= tf.abs(spectrogram)
    spectrogram= tf.expand_dims(spectrogram, axis= 2)
    return spectrogram

audio_slices= tf.keras.utils.timeseries_dataset_from_array(wav, wav, sequence_length= 48000, sequence_stride= 48000, batch_size= 1)
audio_slices= audio_slices.map(preprocess_mp3)
audio_slices= audio_slices.batch(64)

Predictions

Initial Output

yhat= model.predict(audio_slices)
yhat= [1 if prediction> 0.96 else 0 for prediction in yhat]

len(yhat)

yhat

Grouping Adjacent Values for Longer Calls

from itertools import groupby

yhat= [key for key, group in groupby(yhat)]

tf.math.reduce_sum(yhat)

calls= tf.math.reduce_sum(yhat).numpy()
calls

Results Workflow

Iterate

results= {}
for file in os.listdir(os.path.join('data', 'Forest Recordings')):
    FILEPATH= os.path.join('data', 'Forest Recordings', file)
    
    wav= load_mp3_16k_mono(FILEPATH)
    audio_slices= tf.keras.utils.timeseries_dataset_from_array(wav, wav, sequence_length= 48000, sequence_stride= 48000, batch_size= 1)
    audio_slices= audio_slices.map(preprocess_mp3)
    audio_slices= audio_slices.batch(64)
    
    yhat= model.predict(audio_slices)
    
    results[file]= yhat

results

Label and Group

class_preds= {}
for file, logits in results.items():
    class_preds[file]= [1 if prediction> 0.96 else 0 for prediction in logits]

class_preds

postprocessed= {}
for file, scores in class_preds.items():
    postprocessed[file]= tf.math.reduce_sum([key for key, group in groupby(scores)]).numpy()

postprocessed

Export

import csv

with open('Results.csv', 'w', newline= '') as f:
    writer= csv.writer(f, delimiter= ',')
    writer.writerow(['recording', 'capuchin_calls'])
    for key_value in postprocessed.items():
        writer.writerow([key, value])