Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: add terraform file and instructions for AWS development #651

Merged
merged 5 commits into from
Mar 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 10 additions & 30 deletions developer-guide.md → development/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
# Elastiknn Developer Guide
# Elastiknn Development

## Introduction

If you're reading this, there's a chance you'd like to contribute to Elastiknn. Very nice!
This document includes some notes about development of Elastiknn.

## Local Development Setup

### Prerequisites

You need at least the following software installed: git, Java 17, Python3.7, SBT, docker, docker compose, and [task](https://taskfile.dev).
I'm assuming you're running on a Linux or MacOS operating system.
I have no idea if any of this will work on Windows.
You need at least the following software installed: git, Java 21, Python 3.10, SBT, docker, docker compose, and [task](https://taskfile.dev).
We're assuming the operating system is Linux or MacOS.
There might be other software which is missing.
If so, please submit an issue or PR.

## AWS Development Setup

The [aws](./aws) directory contains a Terraform file and instructions for creating a development instance in AWS.

## Development

### Run a local Elasticsearch instance with the plugin installed

Once you have the prerequisites installed, clone the project and run:
Expand Down Expand Up @@ -110,24 +111,3 @@ Nearest neighbors search is a large topic. Some good places to start are:
- Lectures 13-20 of [this lecture series from IIT Kharagpur](https://www.youtube.com/watch?v=06HGoXE6GAs&list=PLbRMhDVUMngekIHyLt8b_3jQR7C0KUCul&index=14)
- Assignment 1 of Stanford's [CS231n course](https://cs231n.github.io/)
- This work-in-progress literature review of [nearest neighbor search methods related to Elasticsearch](https://docs.google.com/document/d/14Z7ZKk9dq29bGeDDmBH6Bsy92h7NvlHoiGhbKTB0YJs/edit)

## Maintaining releases for Elasticsearch 7.x

We maintain releases for Elasticsearch 7.x on the elaasticsearch-7x branch.

This branch should be updated anytime one of these happens:

1. A release of Elasticsearch 7.x.
2. A significant change to the build or testing setup.
3. A significant bugfix that can be easily backported.

Notably, we won't backport optimizations.
There is just too much difference between the 8.x and 7.x internals.

Keeping the two branches reasonably in sync is tricky, especially when compatible and non-compatible commits are interspersed.
One way to do this seems to be:

1. Branch off of elasticsearch-7x.
2. Merge main into the new branch, resolving any conflicts along the way.
3. Open a PR to merge the new branch into elasticsearch-7x.
4. Merge the branch with a merge commit, not a squash.
4 changes: 4 additions & 0 deletions development/aws/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.terraform
.terraform.lock*
terraform.tfstate
terraform.tfstate.backup
63 changes: 63 additions & 0 deletions development/aws/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# AWS Development Setup

This directory contains a [Terraform](https://www.terraform.io/) file for creating an Elastiknn development instance in AWS.

## Assumptions

* You already have an AWS account and a way to authenticate from the command line (e.g., an IAM user with a secret access key).
* You have Terraform installed. If not, [see these docs.](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli)
* You have an SSH key at `~/.ssh/id_ed25519.pub`. If not, update the path in elastiknn.tf.

## Cost

We're using the r6i.4xlarge instance, which is a rather large and expensive instance (about $25 / day).
Make sure to run `terraform destroy` when you're done so you don't encounter a surprise bill.
If you want to use a smaller instance, modify the elastiknn.tf file.

Storing the image

## Usage

1. Authenticate with AWS, e.g., export your client ID and secret access key:
```shell
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
```
2. Initialize Terraform:
```shell
terraform init
```
3. Create the EC2 Instance using Terraform:
```shell
terraform apply
...
aws_key_pair.elastiknn: Creating...
aws_security_group.elastiknn: Creating...
aws_key_pair.elastiknn: Creation complete after 0s [id=elastiknn]
aws_security_group.elastiknn: Creation complete after 2s [id=sg-017de4b9f5575ccfa]
aws_instance.elastiknn: Creating...
aws_instance.elastiknn: Still creating... [10s elapsed]
aws_instance.elastiknn: Creation complete after 13s [id=i-01a6c42c33782028f]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Outputs:

ssh_command = "ssh [email protected]"
```
4. Copy and run the `ssh_command` output from the previous step to ssh into the instance.
```shell
ssh [email protected]
```
5. Run the `setup.sh` script. This should take about 5 minutes:
```shell
ubuntu@ip-172-31-6-138:~$ ./setup.sh
...
Done!
```
6. Exit and restart the SSH session. This is to make the docker permissions take effect, so we don't have to run with docker with sudo. If anyone knows how to avoid this, please submit a PR!
7. At this point the development software has been installed, the elastiknn repo is cloned at ~/elastiknn, and elastiknn has been compiled. You should be able to start developing, running benchmarks, etc.
8. When you're done, destroy the EC2 instance.
```shell
terraform destroy
```
61 changes: 61 additions & 0 deletions development/aws/elastiknn.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
provider "aws" {
region = "us-west-2" # Replace with your desired region
}

resource "aws_key_pair" "elastiknn" {
key_name = "elastiknn"
public_key = file("~/.ssh/id_ed25519.pub")
}

resource "aws_security_group" "elastiknn" {
name = "elastiknn"
description = "Allow SSH access to Elastiknn instance"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}

data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"]
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["ubuntu/images/*ubuntu-jammy-22.04-amd64-server-*"]
}
}

resource "aws_instance" "elastiknn" {
ami = data.aws_ami.ubuntu.image_id
instance_type = "r6i.4xlarge"
key_name = aws_key_pair.elastiknn.key_name
tags = {
Name = "elastiknn"
}
security_groups = [aws_security_group.elastiknn.name]
root_block_device {
volume_size = 42
}
user_data = <<-EOF
#!/bin/bash
echo "${filebase64("setup.sh")}" | base64 --decode > /home/ubuntu/setup.sh
chmod +x /home/ubuntu/setup.sh
chown ubuntu /home/ubuntu/setup.sh
EOF
}

output "ssh_command" {
value = "ssh ubuntu@${aws_instance.elastiknn.public_dns}"
}
84 changes: 84 additions & 0 deletions development/aws/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
#!/bin/bash
set -e

echo "*************************************"
echo "** Seting Max Virtual Memory Areas **"
echo "*************************************"
sudo sysctl -w vm.max_map_count=262144

echo "*******************"
echo "** Installing gh **"
echo "*******************"
sudo apt-get -qq update
sudo apt-get -qq install gh
which gh

echo "***********************"
echo "** Installing Docker **"
echo "***********************"
sudo apt-get -qq install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -qq update
sudo apt-get -qq install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER

echo "*********************"
echo "** Installing Task **"
echo "*********************"
sudo apt-get -qq install -y snapd
sudo snap install --classic task
which task

echo "********************"
echo "** Installing SBT **"
echo "********************"
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get -qq update
sudo apt-get -qq install sbt
which sbt

echo "*********************"
echo "** Installing ASDF **"
echo "*********************"
git -c advice.detachedHead=false clone --depth 1 https://github.com/asdf-vm/asdf.git $HOME/.asdf --branch v0.14.0
echo "source $HOME/.asdf/asdf.sh" >> $HOME/.bashrc
source $HOME/.asdf/asdf.sh
asdf --version
asdf plugin add java
asdf plugin add python

echo "***********************************"
echo "** Installing Python Boilerplate **"
echo "***********************************"
# Install a bunch of system-level libraries which are required for asdf to be able to install python.
# It's quite annoying that this is required; ideally asdf install python 3.x.y would just work.
# But I don't know of a way to avoid it, and it seems to come up with all the python version managers.
sudo apt-get -qq update
sudo apt-get -qq install -y gcc make zlib1g-dev libssl-dev lzma liblzma-dev libbz2-dev libsqlite3-dev libreadline-dev libffi-dev libncurses5-dev libncursesw5-dev

echo "***********************"
echo "** Cloning Elastiknn **"
echo "***********************"
git clone https://github.com/alexklibisz/elastiknn.git
cd elastiknn

# Install the asdf packages specified by elastiknn/.tool-versions
echo "*************************************"
echo "** Installing ASDF Plugin Versions **"
echo "*************************************"
asdf install

echo "*************************"
echo "** Compiling Elastiknn **"
echo "*************************"
task jvmCompile

echo "Done!"
Loading