Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to go about backup and restore #40

Open
duncdrum opened this issue Oct 14, 2018 · 11 comments
Open

how to go about backup and restore #40

duncdrum opened this issue Oct 14, 2018 · 11 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@duncdrum
Copy link
Contributor

duncdrum commented Oct 14, 2018

I see a number of ways we might go about baking backup and restore into the images. As this is pretty much the last big feature I'd like to add, I'm curious to hear opinions on which way to go.

Current situation:
User is on :release which is exist-db 4.x, once 5.x (a binary incompatible) major upgrade is out, just running:
docker pull existdb/existdb:release will create a broken instance.

Ideally, I would like it to trigger a backup and restore

we could:

  • tweak conf.xml's backup job
 <job type="system" name="check1" 
            class="org.exist.storage.ConsistencyCheckTask"
            cron-trigger="0 0 * * * ?">
  <parameter name="output" value="export"/>
  <parameter name="backup" value="yes"/>
  <parameter name="incremental" value="no"/>
  <parameter name="incremental-check" value="no"/>
  <parameter name="max" value="2"/>
</job>

to kick the db into recovery mode, assuming we also made sure that automatic backups were created in the first place, e.g. by using ONBUILD

  • include a call to an external script in docker-compose.yml so backup restore becomes the default when performing
docker-compose pull

this would go along with setting a restart_policy, rollback_config, and update_config and updating the docker-compose file version to 3.7.

  • (least favourite option) passing responsibility on to the user, with some instructions in the readme , something along the lines of:
docker exec exist java -jar start.jar -u admin -p admin-pass -b /db -d /exist-backup -ouri=xmldb:exist://192.168.1.2:80/xmlrpc'

(that last parameter will need some tweaking to work within the container network)

In either scenario, backups should happen to their own volume, so that one is a given in my mind.

@duncdrum duncdrum added enhancement New feature or request question Further information is requested labels Oct 14, 2018
@adamretter
Copy link
Member

@duncdrum I think this should be the users responsibility.

@grantmacken
Copy link
Contributor

@adamretter

I think this should be the users responsibility.

I agree, however should attempt to document, how to carry out tasks that are specific to eXist running in a container environment and doing a backup is most likely one of those tasks.
@duncdrum

backups should happen to their own volume

why not just backup to the '/tmp' dir then 'docker cp' the backup files into your host
and do the reverse when doing a restore.

Another alternate method comes to mind

docker exec ex java -jar tools/ant/lib/ant-launcher-1.10.2.jar -version

I think, the distroless uses JRE,
so to run ant tasks we will need to add tools.jar from the JDK

@duncdrum
Copy link
Contributor Author

why not just backup to the '/tmp' dir then 'docker cp' the backup files into your host
and do the reverse when doing a restore.

it's a good idea, but copying to local drive goes against all my docker instincts (shrug),
exist should already ships with an ant.jar so i think we should be able to call that without adding tools to the gcr image

@grantmacken
Copy link
Contributor

exist should already ships with an ant.jar

However, as I mentioned, ant depends on tools.jar and it complains if it can't be found

  1. the builder target uses JDK so tools.jar is available
  2. the final target uses JRE so tools.jar won't be found by ant

So either we add, tools.jar from the JDK, in this repo
or (better) the eXist repo includes it as part of their build dependencies.

@grantmacken
Copy link
Contributor

closed by mistake

@grantmacken grantmacken reopened this Oct 26, 2018
@duncdrum
Copy link
Contributor Author

@adamretter since we only require JRE for exist-db if the ant.jar needs a jdk thingy I'd say we should indeed ship with tools.jar

@dizzzz
Copy link
Member

dizzzz commented Oct 27, 2018

Shipping tools.jar only is probably not allowed, license wise. Additionally there might additional technical consequences. So I'd recommend to install the whole JDK.

@adamretter
Copy link
Member

We should not ship tools.jar like that.

I think we need to take a step back and think about the fundamentals here! We are being blinkered by Docker. Docker needs to work for us, not us working for Docker ;-)

The purpose of a backup is so that a user can get a full copy of their data and then move it to some backup media, in the past this was probably tape or CD-ROM but these days is likely a network share on a different machine.

Two sensible options that I see:

  1. The user initiates the backup from the host machine by using backup.sh on the host, and gives the URL of eXist-db running in the Docker container.

  2. The option that @grantmacken suggested. The user initiates the backup from the Docker container to /tmp or somewhere ephemeral, and then they use docker cp to get it to the host.

@duncdrum
Copy link
Contributor Author

duncdrum commented Oct 28, 2018

re the tools.jar the not backup specific upshot of our discussion seems to be that ant task can't be run with our docker images, we need to a) document this, b) remove the ant jar from the image if it is just dead weight.

re backup to /tmp nobody is working for docker, but in line with good practices for limiting interaction between the running container and the host vm it's not that simple. I have no idea what happens if on beanstalk (or its azure and google counterparts) you try to access /tmp on host, i very much doubt you simply can. So we should choose an example that works in those use-cases as well as in local dev testing.

@adamretter
Copy link
Member

adamretter commented Oct 28, 2018

if on beanstalk (or its azure and google counterparts) you try to access /tmp on host

I was talking about /tmp on the guest (i.e. in the container) not the host.

@duncdrum
Copy link
Contributor Author

ok that makes more sense, so here is my latest take on our discussion.
i m in favour of triggering server side backups inside a running container, since the chances that other processes might interfere with a client side backup in multi-container environments are pretty high. This has the added advantage that folks can just use the UI.

Instead of depending on a specific path existing in the bases gcr image like /tmp we should use the regular default path which we are generating anyway webapp/WEB-INF/data/export/

The readme gets a line for how to trigger a server side backup for a given container along the lines off:

docker exec exist java -jar start.jar client --no-gui --xpath "system:export('/export/backups',0,0)"
docker cp exist:exist-data/export/backups .

followed by:

docker cp .  exist:exist-data/export/backups
docker exec exist java -jar start.jar client --no-gui --xpath "system:restore('/export/backups', '', '')"

I ll see about the repair functions and take some screens.
Anything speaking against adding the backup location and system calls to the compose file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants