From d91122389ea6d8248fdfd5b9e2c14d2feba7da46 Mon Sep 17 00:00:00 2001
From: jhj0517 <97279763+jhj0517@users.noreply.github.com>
Date: Tue, 26 Nov 2024 21:38:35 +0900
Subject: [PATCH] Update README.md

---
 README.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/README.md b/README.md
index c8e3e7c..1dba518 100644
--- a/README.md
+++ b/README.md
@@ -7,6 +7,10 @@ A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper
 If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
 
 # Feature
+### Pipeline Diagram
+![Transcription Pipeline](https://github.com/user-attachments/assets/1d8c63ac-72a4-4a0b-9db0-e03695dcf088)
+
+
 - Select the Whisper implementation you want to use between :
    - [openai/whisper](https://github.com/openai/whisper)
    - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
@@ -81,7 +85,7 @@ Please follow the links below to install the necessary software:
 
 After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
 
-### Automatic Installation
+### Installation Using the Script Files
 
 1. git clone this repository
 ```shell
@@ -104,19 +108,14 @@ According to faster-whisper, the efficiency of the optimized whisper model is as
 If you want to use an implementation other than faster-whisper, use `--whisper_type` arg and the repository name.<br>
 Read [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for more info about CLI args.
 
-## Available models
-This is Whisper's original VRAM usage table for models.
+If you want to use a fine-tuned model, manually place the models in `models/Whisper/` corresponding to the implementation.
 
-|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
-|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
-|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
-|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
-| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
-| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
-| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
+Alternatively, if you enter the huggingface repo id (e.g, [deepdml/faster-whisper-large-v3-turbo-ct2](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2)) in the "Model" dropdown, it will be automatically downloaded in the directory.
 
+![image](https://github.com/user-attachments/assets/76487a46-b0a5-4154-b735-ded73b2d83d4)
 
-`.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
+# REST API
+If you're interested in deploying this app as a REST API, please check out [/backend](https://github.com/jhj0517/Whisper-WebUI/tree/master/backend).
 
 ## TODO🗓
 
@@ -127,6 +126,7 @@ This is Whisper's original VRAM usage table for models.
 - [x] Integrate with whisperX ( Only speaker diarization part )
 - [x] Add background music separation pre-processing with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)  
 - [ ] Add fast api script
+- [ ] Add CLI usages
 - [ ] Support real-time transcription for microphone
 
 ### Translation 🌐