Skip to content

AI-based media indexing, tagging, and semantic search engine for local files

Notifications You must be signed in to change notification settings

reasv/panoptikon

Repository files navigation

Panoptikon

State of the art, local, multimodal, multimedia search engine

Panoptikon indexes your local files using state-of-the-art AI and Machine learning models and makes difficult-to-search media such as images and videos easily findable.

Combining OCR, Whisper Speech To Text, CLIP Image Embeddings, Text Embeddings, Full Text Search, Automated Tagging, Automated Image Captioning, Panoptikon is the swiss army knife of local media indexing.

Panoptikon aims to be the text-generation-webui or stable-diffusion-webui for search. It is fully customizable, and allows you to easily configure custom models for any of the supported tasks. It comes with a wealth of capable models available out of the box, but adding another model or a newer finetune is never more than a few TOML configuration lines away. As long as a model is supported by any of the built-in implementation classes (Among other things, supporting OpenCLIP, Sentence Transformers, Faster Whisper) you can simply add the huggingface repo for your custom model to the inference server configuration, and it will immediately be available for use.

Panoptikon is designed to keep the index data from multiple models (or different configurations of the same model) side by side, letting you choose which one(s) to use at search time. As such, Panoptikon is an excellent tool for the purpose of comparing the real-world performance of different methods of data extraction or embedding models, also allowing you to leverage their combined power instead of only relying on one. For example, when searching for a tag, you can configure a list of tagging models to use, and choose whether to match an item if at least one model has set the tags you're searching for, or whether to require that all of them have.

The intended use of Panoptikon is for power users and more technically minded enthusiasts to leverage more powerful or custom open source models to index and search their files. Unlike tools such as Hydrus, Panoptikon will never copy, move or otherwise touch your files. Simply add your directories to the list of allowed paths, and run the indexing jobs. Panoptikon will build an index inside its own SQLite database, referencing the original source file paths. Files are kept track of by their hash, so there's no issue with renaming or moving them, so long as they remain within one of the directory trees Panoptikon has access to, and so long as you run the File Scan job regularly, or enable the scheduled cronjob.

Warning

Panoptikon is designed as a local service and is not intended to be exposed to the internet. It does not currently have any security features, and currently exposes, among other things, an API to access all your files, even outside of explicitly indexed directories. Panoptikon binds to localhost by default, and if you intend to expose it, you should add a reverse proxy with authentication such as HTTP Basic Auth or OAuth2 in front of it.

REST API

Panoptikon exposes a REST API that can be used to interact with the search and bookmarking functionality programmatically, as well as to retrieve the indexed data, the actual files, and their associated metadata. Additionally, inferio, the inference server, exposes an API under /api/inference that can be used to run batch inference using the available models.

The API is documented in the OpenAPI format. The interactive documentation generated by FastAPI can be accessed at /docs when running Panoptikon, for example at http://127.0.0.1:6342/docs by default. Alternatively, ReDoc can be accessed at /redoc, for example at http://127.0.0.1:6342/redoc by default.

API endpoints support specifying the name of the index and user_data databases to use, regardless of what databases are specified in environment variables (see below).

This is done through the index_db and user_data_db query parameters. If not specified, the databases specified in environment variables are used by default.

Installation

poetry install --with inference

To install the full system including the inference server dependencies. If you're running the inference server on a different machine, you can omit the --with inference flag, and set the INFERENCE_API_URL environment variable to point to the URL of the inference server (see below).

CUDA on Windows

If you're on windows and want CUDA GPU acceleration, you have to uninstall the default pytorch and install the correct version after running poetry install:

poetry run pip3 uninstall torch torchvision torchaudio -y
poetry run pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

You may have to repeat this after updates.

Other dependency issues

cuDNN

When running the Whisper implementation, which is based on CTranslate2, you may see errors related to cuDNN libraries. Download a version 8.x cuDNN package appropriate for your system from Nvidia, unpack the archive and save its contents inside the cudnn directory at the root of this repo. Make sure the cudnn folder contains bin, lib, include, etc as direct subfolders.

Weasyprint

This is only relevant if you intend to use Panoptikon with HTML files. Panoptikon uses Weasyprint to handle HTML files. You have to follow their Installation Guide in order to ensure all the external dependencies are present on your system. If they are present but not found, it's recommended to set the WEASYPRINT_DLL_DIRECTORIES environment variable to point to the correct folder.

Running Panoptikon

poetry run panoptikon

Will start panoptikon along with its inference server, listening by default at http://127.0.0.1:6342/

Everything except for adding new AI models and customizing existing ones can be done through the Next.js UI available at http://127.0.0.1:6339 by default.

First steps

Open the home page of the web ui and follow the instructions to get started. You'll have to add directories to the list of allowed paths, and then run the file scan job to index the files in those directories. Before being able to search, you'll also have to run data extraction jobs to extract text, tags, and other metadata from the files.

Bookmarks

You can bookmark any search result by clicking on the bookmark button on each thumbnail. Bookmarks are stored in a separate database, and can be accessed through the API, as well as search. To search in your bookmarks, open Advanced Search and enable the bookmarks filter, which will show you only the items you've bookmarked.

Bookmarks can belong to one or more "Groups" which are essentially tags that you can use to organize your bookmarks. You can create new groups by typing an arbitrary name in the Group field in Advanced Search and selecting it as the current group, then bookmarking an item.

Adding more models

See config/inference/example.toml for examples on how to add custom models from Hugging Face to Panoptikon.

Environment variables and config

Panoptikon accepts environment variables as config options. Panoptikon uses dotenv, so you can create a file called .env in this folder with all the environment variables and their values and it will be automatically applied at runtime.

HOST and PORT

Default:

HOST="127.0.0.1"
PORT="6342"

These determine where to bind the Panoptikon server which delivers both the inference API and the search and configuration UI. Warning: Do not expose Panoptikon to the internet without a reverse proxy and authentication. It is designed as a local service and does not have any security features.

INFERIO_HOST, INFERIO_PORT

Default

INFERIO_HOST="127.0.0.1"
INFERIO_PORT="7777"

These ONLY apply when the inference server (inferio) is run separately as standalone without Panoptikon. These determine where to bind the inference server which runs the models. To run the inference server separately, you can run poetry run inferio.

INFERENCE_API_URL

Default: Not set.

If you're running the inference server separately, you can point this to the URL of the inference server to allow Panoptikon to use it.

By default, a Panoptikon instance will run its own inference server, which also means that you can point INFERENCE_API_URL to another Panoptikon instance to leverage its inference server. For example, you might have a full Panoptikon instance running on your desktop or workstation, and another instance running on your laptop without a GPU, and you can point the laptop instance to the desktop instance's inference server to leverage the GPU. Simply configure the desktop instance to run the inference server on an IP reachable from the laptop, and set INFERENCE_API_URL to the URL of the desktop instance's inference server, for example http://192.168.1.16:6342. Don't add a trailing slash.

DATA_FOLDER

Default:

DATA_FOLDER="data"

Where to store the databases and logs. Defaults to "data" inside the current directory.

LOGLEVEL

Default

LOGLEVEL="INFO"

The loglevel for the logger. You can find the log file under [DATA_FOLDER]/panoptikon.log

INDEX_DB, USER_DATA_DB

Default:

INDEX_DB="default"
USER_DATA_DB="default"

The names of the default databases to use for indexing (files and extracted data) and user data (bookmarks). These are the databases that are used by default when no database is specified in the API request. Regardless of what databases are specified in the environment variables, the API endpoints support specifying the database to use through the index_db and user_data_db query parameters, and the UI allows the creation of new index databases and the selection of which index database to use for search and other operations.

TEMP_DIR

Default:

TEMP_DIR="./data/tmp"

Where to store temporary files. Defaults to ./data/tmp. These files are generally short-lived and are cleaned up automatically, but if you're running out of space on ./data/tmp you can set this to a different location.

SHOW_IN_FM_COMMAND, OPEN_FILE_COMMAND

Default: Not set

Panoptikon includes APIs to open files in the file manager or in the default application for the file type. These are used in the UI to allow you to open the file in your file manager or in the default application for the file type.

Panoptikon has sane defaults for each platform (Windows, Linux, MacOS) but you can override these by setting the SHOW_IN_FM_COMMAND and OPEN_FILE_COMMAND environment variables to your custom commands.

The strings {path}, {folder}, {filename} within your command will be replaced with the full path to the file, the folder containing the file, the filename with extension respectively.

You can also set these commands to no-ops by setting them to something like echo {path} or echo {filename} in order to disable the functionality. This is absolutely necessary if you intend to expose Panoptikon to the internet, as the default commands are shell commands that can be used to execute arbitrary code on your machine. By default this is basically remote code execution as a service.

ENABLE_CLIENT

Default:

ENABLE_CLIENT="true"

Whether to run the Next.js UI. If you're running Panoptikon in a headless environment, you can set this to false to disable the UI and only run the API server, then host the UI separately.

You can still access the API documentation at /docs and /redoc even if the UI is disabled.

CLIENT_HOST, CLIENT_PORT

Default:

CLIENT_HOST=HOST
CLIENT_PORT=6339

Where to bind the Next.js UI. Defaults to the same host as the API server and port 6339.

DISABLE_CLIENT_UPDATE

Default:

DISABLE_CLIENT_UPDATE="false"

Whether to disable the automatic update of the Next.js UI. You can set this to true to disable the automatic update of the UI when the server is restarted. If this is set to the default false, panoptikon will git pull the latest version of the UI on startup from the repository "master" branch. This might break Panoptikon if the UI is not compatible with the current version of the server. If you're not planning on constantly keeping Panoptikon up to date, you should set this to true after the first run, to prevent the UI from being updated to a version that is incompatible with the server.

After every update, you can set it to false again once to allow the UI to be updated on the next restart.

About

AI-based media indexing, tagging, and semantic search engine for local files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages