Skip to content

Commit

Permalink
feat(): can now pass custom models for speech-transcription
Browse files Browse the repository at this point in the history
  • Loading branch information
jgw96 committed Jul 10, 2024
1 parent 83e9aca commit 632dd8f
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 11 deletions.
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@

# Web AI Toolkit

**Currently in Alpha**

The Web AI Toolkit simplifies the integration of AI features, such as OCR and audio file transcription, into your application. It ensures optimal performance by running all AI workloads locally, leveraging WebGPU and WASM technologies.

## Installation
Expand All @@ -18,7 +16,7 @@ npm install web-ai-toolkit
| Function Name | Parameter | Type | Default Value |
|-----------------------|----------------|------------------------|---------------|
| transcribeAudioFile | audioFile | Blob | - |
| | model | "tiny" \| "base" | "tiny" |
| | model | string | "Xenova/whisper-tiny"|
| | timestamps | boolean | false |
| | language | string | "en-US" |
| textToSpeech | text | string | - |
Expand All @@ -42,7 +40,7 @@ Here are examples of how to use each function:
import { transcribeAudioFile } from 'web-ai-toolkit';

const audioFile = ...; // Your audio file Blob
const transcription = await transcribeAudioFile(audioFile, "base", true, "en-US");
const transcription = await transcribeAudioFile(audioFile, "Xenova/whisper-tiny", true, "en-US");
console.log(transcription);
```

Expand Down
2 changes: 1 addition & 1 deletion src/index.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
export async function transcribeAudioFile(audioFile: Blob, model: "tiny" | "base", timestamps: boolean = false, language: string = "en-US") {
export async function transcribeAudioFile(audioFile: Blob, model: string = "Xenova/whisper-tiny", timestamps: boolean = false, language: string = "en-US") {
try {
const { loadTranscriber, doLocalWhisper } = await import("./services/speech-recognition/whisper-ai");
await loadTranscriber(model, timestamps, language);
Expand Down
6 changes: 3 additions & 3 deletions src/services/speech-recognition/whisper-ai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ let whisperWorker: Worker;
// @ts-ignore
import WhisperWorker from './worker?worker&inline'

export async function loadTranscriber(model: "tiny" | "base", timestamps: boolean, language: string): Promise<void> {
export async function loadTranscriber(model: string = "Xenova/whisper-tiny", timestamps: boolean, language: string): Promise<void> {
return new Promise(async (resolve) => {
whisperWorker = new WhisperWorker();

Expand All @@ -22,7 +22,7 @@ export async function loadTranscriber(model: "tiny" | "base", timestamps: boolea
});
}

export function doLocalWhisper(audioFile: Blob, model: "tiny" | "base") {
export function doLocalWhisper(audioFile: Blob, model: string = "Xenova/whisper-tiny") {
return new Promise((resolve, reject) => {
try {
const fileReader = new FileReader();
Expand Down Expand Up @@ -70,7 +70,7 @@ export function doLocalWhisper(audioFile: Blob, model: "tiny" | "base") {
whisperWorker.postMessage({
type: "transcribe",
blob: audio,
model: model || "tiny",
model: model || "Xenova/whisper-tiny",
})

};
Expand Down
6 changes: 3 additions & 3 deletions src/services/speech-recognition/worker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ self.onmessage = async (e) => {
})
}
else if (e.data.type === "load") {
await loadTranscriber(e.data.model || "tiny", e.data.timestamps, e.data.language);
await loadTranscriber(e.data.model || 'Xenova/whisper-tiny', e.data.timestamps, e.data.language);
self.postMessage({
type: 'loaded'
});
Expand All @@ -29,12 +29,12 @@ self.onmessage = async (e) => {
}
}

export async function loadTranscriber(model: "tiny" | "base", timestamps: boolean, language: string): Promise<void> {
export async function loadTranscriber(model: string = "Xenova/whisper-tiny", timestamps: boolean, language: string): Promise<void> {
return new Promise(async (resolve) => {
if (!transcriber) {
env.allowLocalModels = false;
env.useBrowserCache = false;
transcriber = await pipeline('automatic-speech-recognition', `Xenova/whisper-${model}`, {
transcriber = await pipeline('automatic-speech-recognition', model || 'Xenova/whisper-tiny', {
// @ts-ignore
return_timestamps: timestamps,
language
Expand Down

0 comments on commit 632dd8f

Please sign in to comment.