Skip to content
This repository has been archived by the owner on Jul 15, 2021. It is now read-only.

Differences in processed items from trust anchors from one validator to the other #128

Open
racompton opened this issue Jan 9, 2020 · 6 comments

Comments

@racompton
Copy link

Hello, I have two RPKI validators set up on the same subnet with the same access to the Internet. They have the same OS/software/config. The only difference between them is the IPs (both servers are dual stacked). One of my validators looks very similar to what is showing on https://rpki-validator.ripe.net/trust-anchors but the other shows a status of "Failed" when trying to connect to https://rrdp.apnic.net/notification.xml. On the server that is showing "Failed", I am able to manually do "wget https://rrdp.apnic.net/notification.xml" so it doesn't seem to be a connectivity issue. Is there a way to manually force an update or anything else I can try?
I'm also getting warning showing "Manifest next update time is in the past, local clock may be off" on both boxes but they are both set to UTC. I see these errors on https://rpki-validator.ripe.net so I'm assuming that it's an issue with the dates on the RIR's manifest and not the validators.

@racompton
Copy link
Author

racompton commented Jan 10, 2020

So my procedure for fixing this is:
Stop the rpki-validator service: "sudo systemctl stop rpki-validator-3.service"
Delete all the .xd files in /var/lib/rpki-validator-3/db: "sudo rm /var/lib/rpki-validator-3/db/.xd"
Start the rpki-validator service: "sudo systemctl start rpki-validator-3.service "
Re-install the ARIN TAL: "upload-tal.sh arin-rfc7730.tal http://localhost:8080/"
Check the web UI after 30 mins or so to see if things are fixed.

I'd like to put in a feature request for issues like this to be resolved in a more automated way in comparison to having to manually determine that there is an issue and then manually perform this procedure.

@lolepezy
Copy link
Contributor

Could you please clarify: did the validator without connectivity started to connect after you removed the database and restarted it? Do you use proxy?

@lolepezy
Copy link
Contributor

The reason for this behaviour is, I believe, that downloading the repository snapshot () from APNIC takes 15 minutes:

$ wget https://rrdp.apnic.net/4ea5d894-c6fc-4892-8494-cfd580a414e3/128129/snapshot.xml
snapshot.xml 22%[===============> ] 5.40M 24.2KB/s eta 14m 18s

We will have a look what can we do about it in the validator.

@racompton
Copy link
Author

Could you please clarify: did the validator without connectivity started to connect after you removed the database and restarted it? Do you use proxy?

Yes, it started to connect after I removed the database and restarted it. No, I don't use a proxy.

@racompton
Copy link
Author

FYI, I created a script to fix the validator when it gets out of wack.(https://github.com/racompton/restart-validator/blob/master/restart-validator.sh) which will stop the validator, delete all the database files, start the validator and then load in the ARIN TAL.

@zzaflemi
Copy link

I know this all depends on how you have the software deployed, but I think you just put a copy of the ARIN (or any other tal) in the preconfigured-tals directory. Then when the service starts up and creates a new database, it will just load it along with the other included tal. At least that has worked for me the many times I have deleted the DB in the past.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants