Portals are a way to ingest content into a CVMFS server.
We are basing the portal implementation on the S3 interface, this because S3 is easily available to anybody.
In particular we will consider in the portal design the implementation of S3 of:
- AWS
- minio
- CEPH (the one use at CERN)
The portal will expect buckets with the name equals to the name of the CVMFS
repository, and an extra bucket with the name $REPONAME.portal
If any of those repository is missing, the portal deamon will not consider any bucket.
As an example, if the portal is managing the repository foo.cern.ch
and
bar.fermilab.com
it will expect 4 buckets:
- foo.cern.ch
- foo.cern.ch.portal
- bar.fermilab.com
- bar.fermilab.com.portal
The *.portal
bucket will be used to communicate with the external world the
status of the system, we will refer to it as the status
bucket from now on.
The portal daemon will receive as input a TOML document with enumerated the S3
credential along with the domain, we expect an array of table, under the key
backend
Example:
# foo.cern.ch <- this line is a comment
[[backend]]
S3_ACCESS_KEY="..."
S3_SECRET_KEY="..."
S3_DOMAIN="..."
[[backend]]
S3_ACCESS_KEY="..."
S3_SECRET_KEY="..."
S3_DOMAIN="..."
For each backend the daemon will try to connect and will list all the bucket available.
If it finds a couple of buckets that follow the convention above and if the daemon is capable of open a transaction in the respective CVMFS repository, it will start to work on those buckets.
Being carefull with the keys is possible to have a single S3 instance being the storange for any number of portals
The daemon will be as much parallel as possible.
There will be an indipendent process entity for each repository the deamon work on.
On top of that there will be an extra one for each couple of bucket tha the deamon has encounter.
And finally another indipendent unit for each backend defined.
These "indipendent units" will be green-thread or go-routines or even erlang process, depending on the final implementation of the portal.
What is important is that they are cheap and indipendet from each other.
Now we will explore what each of these indipendet units (process from so on) do.
For each backend defined in the input file, we will spawn a process.
The process will continuosly list the buckets, merge this information with the repository the daemon is managing, and for each repository managed by the portal daemon will spawn:
- A ping process
- A repo process
It will not try to kill the processes when they are not needed anymore since
each one of them will commit suicide as soon as it detect that it should not
exists anymore.
We may want to stop a particular portal for a while without changing the credentials.
Another idea could be to put a file into the status bucket (the *.portal
bucket) with a meaning, in this way the operation of the portals could be
stopped manually without touching the configuration file.
The files could be like:
- RUN
- STOP
Or maybe, even better, a single file with the action inside. No, a single file will suffer from eventual consistency issues when we overwrite its content.
Maybe is better to use multiple files and exploit the "last modified" meta-data key, still possible to face problems with the eventual consistency deal but very unlikely.
The repo processing will simply keep pinging the S3 backend an upload a file
with the timestamp of the last successfull connection into the status bucket,
the *.portal
one.
The repository project list all the object into the bucket it is associated with.
It picks one and check if the file is already been uploaded into the repository itself, if it is, it pick another one, and so forth in loop.
If the file is not in the repository it start the upload procedure.
After the upload it start again.
S3 provides only enventual consistency for deleting operation, this means that after deleting a file we have no guarantee that a successive list will not find the file just delete there.
The simplest solution would be to just ignore the problem and in the unlikely case that the delete is too slow we just upload the same file twice, not an huge issue.
Another solution is too keep listing the files in the bucket untill we are sure that, at least in our region, the files is been removed from the index.
Another solution is to use lockfiles, uploading a $name.tar.hash().STATUS
where status would be one of the following:
- Downloading
- Ingesting
- Success
- Failure
- Deleted
- Retry
Those files could be used by the operator to understand what is going on or at what point of the process we are.
Moreover we could include the timestamp inside those files so to know at what point of the process each file is, detect inconsistencies, manage retries and so on.
The status files can be deleted by another process after some time (maybe 24 hours?).
I am quite keen to go for the last option, that is a little more complex but allow a lot more monitoring in the system, that will be quite essential.
This process is responsible to keep loading the configuration file and see if is necessary to spawn other process.
It will keep reloading the configuration file every 30 seconds or when it receives the SIGHUP signal.
It will start parsing the output of cvmfs_server list
to understand which
repository are available.
[For each line we split at the first space ' ' and select the first chunk that should be the name of the repository.]
For every backend the process will try to connect to it using the S3 API. If it fails it print an error message.
Once we are connected to a backend we procede to list all the bucket in that backend.
If we find any bucket that match one of the repository above we check if there
is also the status bucket ($REPONAME.portal
bucket).
If all those check success we proced to spawn a new PING
process and a
Repository
process passing to both the connection configuration and the name
of the repository.
The role of this process is to simply give feedback to the operator that the portal is working correctly.
It simply keep upload the same file PING
over and over with the content set
to the current timestamp.
It does so every 5 minutes, configurable.
This process is the one that does the real work.
To understand if the process should run we start by listing the status bucket.
The process start working if any of the following is true.
- The bucket is empty
- There is only the PING file
- A RUN file is present and a STOP file is not present
- Both a RUN file and a STOP file is prensent, but the last-modified file of the RUN file is successive to the one of the STOP file.
- There is not a STOP file.
Condition 1) and 2) collapse into 5)
In all other case the process should not do any work, it will start a timeout and repeat the check after 5 minutes.
If we decide that the process should run it start by listing the content of the main bucket.
It will sort the files by the last-modified field and start analyze each file.
It will start by computing the hash of the stat of the file.
With the name of the file and its metadata HASH will start to look for the status files.
It will continue the procedure if and only if:
- No status file are present
- Both
Failure
andRetry
file are present with theRetry
file being newer tham theFailure
one
If the process decides to continue it will start downloading the file.
In this phase we are moving the file from the S3 backend into our local storage.
We start by creating a temporary file.
We then upload the .DOWNLOADING
file into the status bucket and we try to
download the file writing it into the temporary file just created.
We re-try the download 3 times (configurable) in case of error.
If still we are unable to download the file we write the .FAILURE
file
logging the error, we delete the .DOWNLOADING
file and we move on the next
file.
For each failure we log it into STDERR.
Once the file is in the local storange we can proced to ingest it into CVMFS.
We start by uploading the .INGESTING
file into the status bucket.
Again we try to ingest the file a configurable number of time. (Can I
understand what error the cvmfs_server
return without parsing the output?)
If we were unsucessfull in ingesting the file we upload the .FAILURE
file
into the status bucket,then we delete the .INGESTING
file and finally we
clean up the temporary file and we move to the next file.
If we are successful in ingesting the file we upload the .SUCCESS
.
Once we successfully upload a file we proces to delete the file from S3.
We start by uploading the .DELETE
status file.
Then we issue the delete command.
If everything went correctly we should have the file ingested into CVMFS, the
.DOWNLOADING
, .INGESTING
, .DELETED
and .SUCESS
files into the status
bucket and not anymore the original file into the bucket.
If there was any error we should have the .FAILURE
status files with the
error logged in, along with the status file of the last attempt operation.
In the case it was a retry attempt, the previous status files would have been overwritten by the next one.
After all the files from the listing have been analyzed, we issue another list operation and if there are new files to work on we start to ingest them, otherwise we sleep for 10 minutes.
For every sucessfull upload we generate 4 status files, after a while all this status files could really get into the way of the operator.
We can spawn a garbage collector process that will clean up all those files.
A reasonable default would be to left untouched all the files relative to a failed attempt (no matter how old) and delete all the file returned by a sucessfull attempt after 24 hours.
We are not optimizing the number of calls to the S3 backend, those calls are not economically free, however they are extremely cheap and we pay only on the AWS backend, on minio and CEPH we don't really have these concerns.
We are keeping polling the S3 backend. This is an issue since we need to somehow decide how often poll it, and every time interval we pick will be wrong for some use case. This is mitigated letting the operator decide the poll timeout. The other option is to exploit, the callback capabilities of S3. However those callbacks are not really implemented in the CEPH implementation and there are several inconsistency on how they are implemented in AWS vs. minio.
Using the callbacks will eliminate both the concerns about the number of calls against the backend and the polling timeout, however it will make everything a more complex, both operationally (the S3 backend need to be set in such a way that calls the portal daemon) and from the coding point of view, more cases to manage.
It can be added in the configuration if we decide it is something nice to have and worth to work with. But at the moment I wouldn't bother too much with it.
EOF