You can only run locally after having successfully run the azd up
command. If you haven't yet, follow the steps in Azure deployment above.
- Run
azd auth login
- Change dir to
app
- Run
./start.ps1
or./start.sh
or run the "VS Code Task: Start App" to start the project locally.
When you run ./start.ps1
or ./start.sh
, the backend files will be watched and reloaded automatically. However, the frontend files will not be watched and reloaded automatically.
To enable hot reloading of frontend files, open a new terminal and navigate to the frontend directory:
cd app/frontend
Then run:
npm run dev
You should see:
> [email protected] dev
> vite
VITE v4.5.1 ready in 957 ms
➜ Local: http://localhost:5173/
➜ Network: use --host to expose
➜ press h to show help
Navigate to the URL shown in the terminal (in this case, http://localhost:5173/
). This local server will watch and reload frontend files. All backend requests will be routed to the Python server according to vite.config.ts
.
Then, whenever you make changes to frontend files, the changes will be automatically reloaded, without any browser refresh needed.
You may want to save costs by developing against a local LLM server, such as llamafile. Note that a local LLM will generally be slower and not as sophisticated.
Once the local LLM server is running and serving an OpenAI-compatible endpoint, set these environment variables:
azd env set USE_VECTORS false
azd env set OPENAI_HOST local
azd env set OPENAI_BASE_URL <your local endpoint>
azd env set AZURE_OPENAI_CHATGPT_MODEL local-model-name
Then restart the local development server. You should now be able to use the "Ask" tab.
- The "Chat" tab will only work if the local language model supports function calling.
- Your search mode must be text only (no vectors), since the search index is only populated with OpenAI-generated embeddings, and the local OpenAI host can't generate those.
- The conversation history will be truncated using the GPT tokenizers, which may not be the same as the local model's tokenizer, so if you have a long conversation, you may end up with token limit errors.
Note
You must set OPENAI_HOST
back to a non-local value ("azure", "azure_custom", or "openai")
before running azd up
or azd provision
, since the deployed backend can't access your local server.
For example, to point at a local Ollama server running the llama3.1:8b
model:
azd env set OPENAI_HOST local
azd env set OPENAI_BASE_URL http://localhost:11434/v1
azd env set AZURE_OPENAI_CHATGPT_MODEL llama3.1:8b
azd env set USE_VECTORS false
If you're running the app inside a VS Code Dev Container, use this local URL instead:
azd env set OPENAI_BASE_URL http://host.docker.internal:11434/v1
To point at a local llamafile server running on its default port:
azd env set OPENAI_HOST local
azd env set OPENAI_BASE_URL http://localhost:8080/v1
azd env set USE_VECTORS false
Llamafile does not require a model name to be specified.
If you're running the app inside a VS Code Dev Container, use this local URL instead:
azd env set OPENAI_BASE_URL http://host.docker.internal:8080/v1