You need to install docker and docker-compose (if using Linux; on Windows and Mac compose is included with Docker Desktop).
Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area.
The easiest way to bring up and test DataHub is using DataHub Docker images which are continuously deployed to Docker Hub with every commit to repository.
You can easily download and run all these images and their dependencies with our quick start guide.
DataHub Docker Images:
- linkedin/datahub-gms
- linkedin/datahub-frontend
- linkedin/datahub-mae-consumer
- linkedin/datahub-mce-consumer
Dependencies:
If you want to test ingesting some data once DataHub is up, see Ingestion for more details.
See Using Docker Images During Development.
We use GitHub Actions to build and continuously deploy our images. There should be no need to do this manually; a successful release on Github will automatically publish the images.
This is not our recommended development flow and most developers should be following the Using Docker Images During Development guide.
To build the full images (that we are going to publish), you need to run the following:
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
This is because we're relying on builtkit for multistage builds. It does not hurt also set DATAHUB_VERSION
to
something unique.