No description
Find a file
2024-12-12 06:27:27 -03:00
.github ci 2024-12-11 18:57:18 -03:00
models init 2024-12-11 18:32:17 -03:00
samples init 2024-12-11 18:32:17 -03:00
src Don't export reset + perf improvement 2024-12-12 06:27:27 -03:00
tests init 2024-12-11 18:32:17 -03:00
.gitignore init 2024-12-11 18:32:17 -03:00
LICENSE init 2024-12-11 18:32:17 -03:00
README.md Don't export reset + perf improvement 2024-12-12 06:27:27 -03:00
silerovad.nimble Don't export reset + perf improvement 2024-12-12 06:27:27 -03:00

Silero VAD

Voice Activity Detection using the Silero VAD ONNX model. Port of silero-vad-go.

Install

nimble install silerovad

Compatibility

Install ONNX

Onnxruntime dynamic library.

Use -d:silerovadNoDynLib if you want to avoid dynamic linking.

Usage

import pkg/silerovad

let samples = readWav("./samples/jfk.wav")
let cfg = newDetectorConfig(
  modelPath = "./models/silero_vad.onnx",
  sampleRate = 16000,
  threshold = 0.5,
  minSilenceDurationMs = 100,
  speechPadMs = 30,
  logLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_WARNING
)
var dtr = newDetector(cfg)
doAssert dtr.detect(samples) ==
  @[
    Segment(startAt: 0.29, endAt: 2.238),  # And so my fellow Americans
    Segment(startAt: 3.586, endAt: 3.774),  # ask
    Segment(startAt: 4.002, endAt: 4.382),  # not
    Segment(startAt: 5.378, endAt: 7.678),  # what your country can do for you
    Segment(startAt: 8.162, endAt: 10.654)  # ask what you can do for your country
  ]

Note last segment endAt is 0 if the data does not have silence at the end.

Examples

Notes

This library expects 16kHz samplerate and mono audio.

Use this command to convert audio files into the expected format:

ffmpeg -i audio_src.wav -ar 16000 -ac 1 audio_dest.wav

LICENSE

MIT