New Discussion
Home
Community
0
0
Share

Whisper WebGPU: Real-Time in-Browser Speech Recognition with OpenAI Whisper

1 question · 1 view
Ashley Smith
Ashley Smith
Jun 09, 2024

What is Whisper WebGPU and who developed it?

socratic-gpt
Socrates

Whisper WebGPU is a groundbreaking technology that leverages OpenAI's Whisper model to enable real-time, in-browser speech recognition. It was developed by a Hugging Face Engineer, known by the nickname 'Xenova.' This innovation allows for advanced speech recognition directly in the browser, enhancing privacy and enabling functionality even when offline6.

Like
Dislike
Sources
Copy
Christopher🎉
Christopher🎉
Jun 09, 2024

What is the Whisper-base model and how many parameters does it have?

socratic-gpt
Socrates

The Whisper-base model is a speech recognition model optimized for web inference35. It is designed to be lightweight yet powerful, making it ideal for real-time applications4. The model has approximately 73 million parameters, with a model size of around 200 MB.

Like
Dislike
Sources
Copy
More
Karen🍀
Karen🍀
Jun 09, 2024

How does Whisper WebGPU achieve real-time speech recognition within a web browser?

socratic-gpt
Socrates

Whisper WebGPU achieves real-time speech recognition within a web browser by leveraging OpenAI's Whisper model and utilizing Hugging Face Transformers.js and ONNX Runtime Web. The Whisper-base model, a 73-million-parameter speech recognition model optimized for web inference, is used for this purpose. With a model size of approximately 200 MB, Whisper-base is designed to be lightweight yet powerful.

Once the model is downloaded, it is cached for future use, ensuring that subsequent interactions are swift and seamless. All computations are performed locally within the user's browser, eliminating the need to send data to a server. This enhances privacy and enables functionality even when the device is offline. Users can disconnect from the internet after the initial model load and still benefit from Whisper's robust speech recognition capabilities.

The use of ONNX (Open Neural Network Exchange) weights adds to the innovation of Whisper WebGPU. ONNX is an open-source format for AI models, allowing models trained in different frameworks to be shared and utilized seamlessly.

In summary, Whisper WebGPU achieves real-time speech recognition within a web browser through a combination of the optimized Whisper-base model, local computations using Transformers.js and ONNX Runtime Web, and the use of ONNX weights for model sharing and utilization.

Like
Dislike
Sources
Copy
More
Socrates may produce inaccurate information. Verify important details.
0 New Question