Skip to content

Commit

Permalink
Ntj/refactor and allinone (#1)
Browse files Browse the repository at this point in the history
* Refactor scripts into separate script files.
Mount script file or dir into each container.
Incorporate Aaron's db-in-a-container.
Add profiles to control which services (containers) are started.

* Changes after first test runs

* Rename various services and profiles.
Remove profiles for the default "distributed" deployment.
Move waitfor-nuoadmin code back into nuosm.
Add log capture and better wait semantics to start-monolith.
Add stop-nuodb script for internally triggered graceful shutdown of processes.

* Improve arg parsing in stop-nuodb.

* Add explicit docker-network config to all services.

* Only enable nuocd-te2 in the insights profile.

* Enable monolith to be scaled to multiple instances.
Map ports to dynamic "ephemeral" prts on host.
Remove hostname seting to allow dynamic hostname generation by docker compose.

* Revert monolith to statically mapped ports.
Set ENTRYPOINT and API_SERVER to localhst in monolith.
Add instadb service that is db-in-a-container with dynamically-mapped ports, so multiple instances can run simultaneously.

* Set LOGDIR default value in nuote.

* Add --timeout option to delete server-processes command to force immediate cleanup of stranded processes after a service restart.
Bump default NuoDB version to 5.0.1-2
Improve diagnostics and logging.

* Reimplement all timeouts and gates with the nuodocker timeouts.
Add new IMPORT_TIMEOUT variable.
Add and improve logging and diagnostics.
Refactor raftstate cleanup into new 'remove-zombie' script.

* Inject remove-zombies into every container.

* Add separate compose files for the different deployment styles. This allows commands such as:
docker compose -f instadb.yaml up -d
and:
docker compose -f instadb.yaml down

* Improve how errors are logged to console and file.
Update the README with the latest options.

* Update README file.

* Fixed typos, formatting. Some rewording/clarifications.

* Force TE internal port to be 48006, and force SM internal port to be 48007.
Add clarifications to README.
Fix typos in README.

* More tidying

* Fix a typo in the README.

---------

Co-authored-by: acabrele <[email protected]>
Co-authored-by: Paul Chapman <[email protected]>
Co-authored-by: Paul Chapman <[email protected]>
  • Loading branch information
4 people authored Mar 14, 2023
1 parent b232215 commit 9d109fa
Show file tree
Hide file tree
Showing 12 changed files with 814 additions and 260 deletions.
309 changes: 214 additions & 95 deletions README.md

Large diffs are not rendered by default.

328 changes: 171 additions & 157 deletions nuodb/docker-compose.yaml

Large diffs are not rendered by default.

24 changes: 16 additions & 8 deletions nuodb/env-default
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,36 @@
# default ENV VAR values
#

NUODB_IMAGE=nuodb/nuodb-ce:4.1.2.vee-4
NUODB_IMAGE=nuodb/nuodb-ce:5.0.1-2

DB_NAME=demo
DB_USER=dba
DB_PASSWORD=dba
ENGINE_MEM=1Gi
SQL_ENGINE=vee
LOGDIR=/var/log/nuodb

# Set to a larger value if SM startup takes unusually long
# - for example if IMPORT_LOCAL or IMPORT_REMOTE (see below) is a large file that takes multiple minutes to extract.
# Value is in seconds.
STARTUP_TIMEOUT=60
# docker compose restart policy.
# set to one of:
# - "no"
# - always
# - on-failure
# - unless-stopped
RESTART_POLICY=unless-stopped

# Set to a larger value if database startup takes unusually long
STARTUP_TIMEOUT=90

# Uncomment and set, or set on the docker-compose command-line to add further engine options
# ENGINE_OPTIONS=

# normally this is left unset, causing the default to be used.
ARCHIVE_PATH=

# Set to a larger value if IMPORT_x is set to a large file or dir that takes multiple minutes to restore.
# Value is in seconds.
IMPORT_TIMEOUT=

# set IMPORT_LOCAL to the path of a LOCAL tar file on the host where docker-compose is being run.
# The SM container will mount the file, extract (untar) it and use the contents as the initial state of the database.
IMPORT_LOCAL=
Expand All @@ -39,9 +50,6 @@ IMPORT_AUTH=
# can advise on any non-standard value required for IMPORT_LEVEL.
IMPORT_LEVEL=1

# Set this to 'true' if the content to be imported is a backupset output from a hotcopy --full WITHOUT the --simple option.
IMPORT_IS_BACKUPSET=false

# This value is not normally changed.
IMPORT_MOUNT=/var/opt/nuodb/import

Expand Down
47 changes: 47 additions & 0 deletions nuodb/instadb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
version: '3'

networks:
instadb:

services:
instadb:
image: $NUODB_IMAGE
# profiles: [ "instadb" ]
restart: ${RESTART_POLICY:-unless-stopped}
networks:
instadb:

# Do NOT remove this env_file value!!
env_file: .env

environment:
PEER_ADDRESS: localhost
NUODB_DOMAIN_ENTRYPOINT: localhost
NUOCMD_API_SERVER: localhost:8888
STARTUP_TIMEOUT: ${STARTUP_TIMEOUT:-90}
EXTERNAL_ADDRESS: ${EXTERNAL_ADDRESS:-localhost}
ARCHIVE_DIR: ${ARCHIVE_PATH:-/var/opt/nuodb/archive}
DB_OPTIONS: "mem ${ENGINE_MEM:-1Gi} execution-engine ${SQL_ENGINE:-vee} ${ENGINE_OPTIONS:-}"
ports:
- :48004-48006
- :8888
volumes:
- ./scripts:/usr/local/scripts
- ./scripts/stop-nuodb:/usr/local/bin/stop-nuodb
- ${IMPORT_LOCAL:-./empty-file}:${IMPORT_MOUNT:-/var/tmp/env}

command: [ "/usr/local/scripts/start-monolith" ]


# ycsb-demo:
# image: nuodb/ycsb:latest
# networks:
# net:
# depends_on:
# - te1
# environment:
# PEER_ADDRESS: ${PEER_ADDRESS:-nuoadmin1}
# DB_NAME:
# DB_USER:
# DB_PASSWORD:
# command: ["/driver/startup.sh"]
48 changes: 48 additions & 0 deletions nuodb/monolith.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
version: '3'

networks:
net:

services:
monolith:
image: $NUODB_IMAGE
# profiles: [ "monolith" ]
restart: ${RESTART_POLICY:-unless-stopped}
networks:
net:

# Do NOT remove this env_file value!!
env_file: .env

environment:
PEER_ADDRESS: ${PEER_ADDRESS:-db}
NUODB_DOMAIN_ENTRYPOINT: ${PEER_ADDRESS:-db}
NUOCMD_API_SERVER: localhost:8888
STARTUP_TIMEOUT: ${STARTUP_TIMEOUT:-90}
EXTERNAL_ADDRESS: ${EXTERNAL_ADDRESS:-localhost}
ARCHIVE_DIR: ${ARCHIVE_PATH:-/var/opt/nuodb/archive}
DB_OPTIONS: "mem ${ENGINE_MEM:-1Gi} execution-engine ${SQL_ENGINE:-vee} ${ENGINE_OPTIONS:-}"
hostname: ${PEER_ADDRESS:-db}
ports:
- 48004-48006:48004-48006
- 8888:8888
volumes:
- ./scripts:/usr/local/scripts
- ./scripts/stop-nuodb:/usr/local/bin/stop-nuodb
- ${IMPORT_LOCAL:-./empty-file}:${IMPORT_MOUNT:-/var/tmp/env}

command: [ "/usr/local/scripts/start-monolith" ]


# ycsb-demo:
# image: nuodb/ycsb:latest
# networks:
# net:
# depends_on:
# - te1
# environment:
# PEER_ADDRESS: ${PEER_ADDRESS:-nuoadmin1}
# DB_NAME:
# DB_USER:
# DB_PASSWORD:
# command: ["/driver/startup.sh"]
81 changes: 81 additions & 0 deletions nuodb/scripts/import-archive
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/bin/sh

# import the contents of the database archive

: ${IMPORT_LEVEL:=1}

# If archive IMPORT has been defined, and there is no existing archive, then perform the import
if [ -n "$IMPORT_LOCAL$IMPORT_REMOTE" -a ! -f "$ARCHIVE_DIR/1.atm" -a "$runningArchives" -eq 0 ]; then
echo "Importing into empty archive..."
[[ -n "$IMPORT_REMOTE" && "$IMPORT_REMOTE" != ?*://?* ]] && echo "ERROR: IMPORT_REMOTE is not a valid URL: $IMPORT_REMOTE - import aborted" && exit 98

# clean up any tombstone of the archive for this SM
if [ -n "$myArchive" ]; then
echo "Cleaning up archive tombstone for $HOSTNAME: $myArchive..."
[ $(nuocmd get archives --db-name $DB_NAME | wc -l) -eq 1 ] && echo "Cleaning up database first..." && nuocmd delete database --db-name $DB_NAME 2>&1 || exit 98
nuocmd delete archive --archive-id $myArchive --purge 2>&1 || exit 98
fi

# if IMPORT_REMOTE is set - work out whether to import from existing (IMPORT_LOCAL) cache
importFromCache='false'
if [ -n "$IMPORT_REMOTE" ]; then
[ -n "$IMPORT_AUTH" -a "$IMPORT_AUTH" != ':' ] && curlAuth="--user $IMPORT_AUTH"
if [ -n "$IMPORT_LOCAL" ]; then

# IMPORT_LOCAL is an empty dir
if [ -d "$IMPORT_MOUNT" -a $(ls -1 "$IMPORT_MOUNT" | wc -l) -eq 0 ]; then
echo "Extracting and caching $IMPORT_REMOTE into directory host:$IMPORT_LOCAL..."
time curl -k ${curlAuth:-} "$IMPORT_REMOTE" | tar xzf - --strip-components ${IMPORT_LEVEL} -C $IMPORT_MOUNT || exit 98
importFromCache='true'

# IMPORT_LOCAL is an empty file
elif [ ! -s "$IMPORT_MOUNT" ]; then
echo "Caching $IMPORT_REMOTE into file host:$IMPORT_LOCAL..."
time curl -k ${curlAuth:-} "$IMPORT_REMOTE" > "$IMPORT_MOUNT" || exit 98
importFromCache='true'

# IMPORT_LOCAL is not empty - assume it is a valid cache
else
echo "host:$IMPORT_LOCAL is not empty - assuming it contains a cached copy of $IMPORT_REMOTE."
importFromCache='true'
fi

# IMPORT_LOCAL is not set - so there is no local cache
else
echo "IMPORT_LOCAL is not set - caching disabled."
echo "Importing from $IMPORT_REMOTE into $ARCHIVE_DIR..."
time curl -k ${curlAuth:-} "$IMPORT_REMOTE" | tar xzf - --strip-components ${IMPORT_LEVEL} -C $ARCHIVE_DIR || exit 98
fi

# IMPORT_REMOTE is not set, so check that IMPORT_LOCAL is not empty
else
[ -f "$IMPORT_MOUNT" -a ! -s "$IMPORT_MOUNT" ] && echo "ERROR: IMPORT_LOCAL file host:$IMPORT_LOCAL is empty." && exit 98
[ -d "$IMPORT_MOUNT" -a $(ls -1 "$IMPORT_MOUNT" | wc -l) -eq 0 ] && echo "ERROR: IMPORT_LOCAL directory host:$IMPORT_LOCAL is empty." && exit 98
importFromCache='true'
fi

# IMPORT_LOCAL should now have the correct content - import it into the archive
if [ -n "$IMPORT_LOCAL" ]; then
[ -n "$IMPORT_REMOTE" -a "$importFromCache" = 'true' -a -s "$IMPORT_MOUNT" ] && echo "Using host:$IMPORT_LOCAL as a cached copy of $IMPORT_REMOTE..."
if [ -d "$IMPORT_MOUNT" ]; then
echo "Importing directory host:$IMPORT_LOCAL into $ARCHIVE_DIR..."
time nuodocker restore archive --origin-dir $IMPORT_MOUNT --restore-dir $ARCHIVE_DIR --db-name "$DB_NAME" --clean-metadata || exit 98
elif [ "$importFromCache" = 'true' -a -s "$IMPORT_MOUNT" ]; then
echo "Importing file host:$IMPORT_LOCAL into $ARCHIVE_DIR..."
time tar xf "$IMPORT_MOUNT" --strip-components ${IMPORT_LEVEL} -C "$ARCHIVE_DIR" || exit 98
else
echo "ERROR: IMPORT_LOCAL has been specified, but host:$IMPORT_LOCAL is not a valid import source - IMPORT_LOCAL must be a directory, an initially empty file, or a cached copy of IMPORT_REMOTE - import aborted..."
exit 98
fi
fi

# sanity check the imported content in the archive
[ -d "$ARCHIVE_DIR/full" ] && echo "ERROR: Imported data looks like a BACKUPSET (in which case IMPORT_LOCAL must be a DIRECTORY): $(ls -l $ARCHIVE_DIR | head -n 10)" && exit 98
[ ! -f "$ARCHIVE_DIR/1.atm" ] && echo "ERROR: Imported archive does not seem to contain valid data: $(ls -l $ARCHIVE_DIR | head -n 10)" && exit 98
echo "Imported data looks good: $(ls -l $ARCHIVE_DIR | head -n 5)"

# if the archive was not imported from a dir, then clean the meta-data in the archive
if [ ! -d "$IMPORT_MOUNT" ]; then
nuodocker restore archive --origin-dir "$ARCHIVE_DIR" --restore-dir "$ARCHIVE_DIR" --db-name "$DB_NAME" --clean-metadata || exit 99
fi
fi
25 changes: 25 additions & 0 deletions nuodb/scripts/remove-zombie
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/sh
#
# remove a zombie of the engine that is trying to start

# the caller mut specify the hostname
hostType=$1
hostName=$2

# wait until the admin layer has become ready
msg=$(nuocmd check servers --timeout ${STARTUP_TIMEOUT} --check-converged --check-active)
if [ $? -ne 0 ]; then
echo "$me: ERROR: Timed out waiting for admin layer to be ready: $msg"
exit 98
fi

myStartIds="$(nuocmd get processes --db-name $DB_NAME | grep 'type=$hostType' | grep 'address=$hostName/' | grep -o 'start-id: [0-9]*' | sed 's/start-id: //' )"

count=$(echo $myStartIds | wc -l)
echo "$(basename $0): Found $((count - 1)) matching start-ids: $myStartIds"

for id in $myStartIds ; do
# delete any matching engine processes still in the Raft state
msg="$(nuocmd shutdown process --server-id --start-id $id --evict --timeout 0)"
[ $? -ne 0 ] && echo "ERROR: Unable to remove engine with start-id $id: $msg"
done
63 changes: 63 additions & 0 deletions nuodb/scripts/start-monolith
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/bin/sh

# Start all 3 processes needed for a database in this (single) container.

PATH=$PATH:/usr/local/scripts

PEER_ADDRESS=$HOSTNAME

me="$(basename $0)"

echo "=================================="

# start a background nuoadmin process
start-nuoadmin &

# wait until the admin layer has become ready
msg=$(nuocmd check servers --timeout ${STARTUP_TIMEOUT} --check-converged --check-active)
if [ $? -ne 0 ]; then
echo "$me: ERROR: Timed out waiting for admin layer to be ready: $msg"
exit 98
fi

# delete any engine processes still in the Raft state
nuocmd shutdown server-processes --server-id "${PEER_ADDRESS}" --db-name "$DB_NAME" --evict --timeout 0

echo "$me: AP is ready - starting SM and TE"

# start a background nuosm process
start-nuosm &

# start a background nuote process
start-nuote &

echo "$me: Waiting for DB $DB_NAME to become RUNNING..."
nuocmd check database --db-name $DB_NAME --check-running --wait-for-acks --timeout "${STARTUP_TIMEOUT}" # wait for RUNNING SM
nuocmd check database --db-name $DB_NAME --check-running --wait-for-acks --timeout 10 # wait for RUNNING SM + all other engines are alive
if [ -n "$(nuocmd get processes --db-name $DB_NAME | grep 'type=TE' | grep 'state=RUNNING')" -a $? = 0 ]; then
echo "$me: Database is RUNNING..."
else
echo "$me: Database check timed out after $STARTUP_TIMEOUT sec"

echo "$me: $(nuocmd show database --db-name "$DB_NAME" --all-incarnations)"

if [ -n "$NUODB_DEBUG" ]; then
echo "$me: SM logs"
cat /var/log/nuodb/SM.log

echo
echo "$me: TE logs"
cat /var/log/nuodb/TE.log

echo
echo "$me: AP logs"
cat /var/log/nuodb/AP.log
fi
fi

echo "$me: $(nuocmd show domain)"

# wait for all child processes to stop
wait

echo "$me: Database $DB_NAME has been stopped. Exiting."
13 changes: 13 additions & 0 deletions nuodb/scripts/start-nuoadmin
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash
#
# Start a nuoadmin AP process

: ${LOGDIR:=/var/log/nuodb}
echo "Starting AP..."

nuoadmin -- \
pendingProcessTimeout=${STARTUP_TIMEOUT}000 \
pendingReconnectTimeout=90000 \
thrift.message.max=1073741824 \
processLivenessCheckSec=30 \
1>/dev/null | tee $LOGDIR/AP.log
Loading

0 comments on commit 9d109fa

Please sign in to comment.