This hands on session will make you familiar with the high performant GridFTP transfer mechanism. You will learn how to install the globus-url-copy command line utility, used to tranfer data to a GridFTP server and how to develop a script for automated ingestion of data into B2SAFE using GridFTP.
Total time for this hands on is estimated at 60 minutes.
Goal: install and configure the globus-url-copy command
Time: 10 minutes
Install the globus-data-management-client
from the globus package repo on your system. This will provide the Client Tools for data management, including the GridFTP client programs and globus-url-copy
Full installation instructions are available here: http://toolkit.globus.org/toolkit/docs/6.0/admin/install/
.
Download and install the package repo for your system:
wget http://toolkit.globus.org/ftppub/gt6/installers/repo/globus-toolkit-repo-latest.noarch.rpm && \
rpm -hUv globus-toolkit-repo-latest.noarch.rpm && \
yum install globus-data-management-client
Download and install the package repo for your system:
wget http://toolkit.globus.org/ftppub/gt6/installers/repo/globus-toolkit-repo_latest_all.deb && \
dpkg -i globus-toolkit-repo_latest_all.deb && \
apt-get update && apt-get install -y globus-data-management-client
Continue with this section if you want to install the globus-client-tools on your local machine. This allows you to ingest data into the B2SAFE test environment directly from your machine.
- Obtain your user certificate and private key from the remote server and place it in
~/.globus/
:
mkdir ~/.globus && \
scp <user>@145.100.58.133:~/user* ~/.globus/
Where user
is your groups user (irods1, irods2, ...) of the client vm.
- Install the certificate for the CA that signed the user certificate (only needed for untrusted CA's):
mkdir /etc/grid-security/certificates && \
scp <user>@145.100.58.133:~/ca/* /etc/grid-security/certificates/
- Initialize the proxy
The first time you can run the following command to get detailed information about your certificate:
grid-proxy-init -verify -debug
Normally you don't need to initialize the proxy with the -verify
and -debug
options. Thus the following suffices to initialize a new proxy certificate:
grid-proxy-init
Finally you have to supply the password for the user certificate and then the proxy certificate is generated. The final output will tell you when the proxy certificate will expire. By default the proxy certificate has a validity of 12 hours.
In order to access the irods machine via its hostnames irods4.alice
we have to add the following entry to the /etc/hosts
file:
echo "145.100.59.149 irods4.alice" >> /etc/hosts
Note: you might need to use sudo
to obtain write permission.
On the client VM a simple file structure has been prepared for this tutorial. You can use your own files, but in order to follow the tutorial it might be easier to copy these files to your local machine:
scp -r <user>@145.100.58.133:~/gridftp* .
Continue with section 1.3.
Continue with this section if you do not want to or cannot install the globus-client-tools on your local machine. This allows you to ingest data into the B2SAFE test environment from the client VM. If you want to ingest local data you will have to copy (scp) it to the client VM first. Credentials of the client VM are distributed during the workshop.
- Install your user certificate and private key in
~/.globus/
:
mkdir ~/.globus && \
cp *.pem ~/.globus/
- Initialize the proxy
The first time you can run the following command to get detailed information about your certificate:
grid-proxy-init -verify -debug
Normally you don't need to initialize the proxy with the -verify
and -debug
options. Thus the following suffices to initialize a new proxy certificate:
grid-proxy-init
Finally you have to supply the password for the user certificate and then the proxy certificate is generated. The final output will tell you when the proxy certificate will expire. By default the proxy certificate has a validity of 12 hours.
Continue with section 1.3.
You can use the grid-proxy-info
command to get information about the current proxy:
$ grid-proxy-info
subject : /O=Grid/OU=GlobusTest/OU=simpleCA-irods4.alice/OU=local/CN=GridFTP-001/CN=1963993271
issuer : /O=Grid/OU=GlobusTest/OU=simpleCA-irods4.alice/OU=local/CN=GridFTP-001
identity : /O=Grid/OU=GlobusTest/OU=simpleCA-irods4.alice/OU=local/CN=GridFTP-001
type : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path : /tmp/x509up_u1001
timeleft : 8:22:56
You can use the grid-proxy-destroy
command to delete the current proxy certificate.
You can use the grid-cert-info
command to display information about the current active user certificate (installed in ~/.globus/usercert.pem
).
You can use the grid-cert-diagnostics
command to run some diagnostics on the certificate setup on your system and the trust chain.
The grid-proxy-init -verify -debug
command will show some information about your user certificate and CA that signed the user certificate.
This is the user certificate and key file you just copied from the remote machine:
User Cert File: /root/.globus/usercert.pem
User Key File: /root/.globus/userkey.pem
The CA certificate and signing policy used to sign the user certificate is loaded from this directory:
Trusted CA Cert Dir: /etc/grid-security/certificates
Your proxy certificate is placed in this file:
Output File: /tmp/x509up_u0
The identity you will be using, based on the user certificate:
Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA-irods4.alice/OU=local/CN=GridFTP-001
Error: Couldn't verify the authenticity of the user's credential to generate a proxy from.
grid_proxy_init.c:956: globus_credential: Error verifying credential: Failed to verify credential
globus_gsi_callback_module: Could not verify credential
globus_gsi_callback_module: Could not verify credential
globus_gsi_callback_module: Error with signing policy
globus_gsi_callback_module: Error with signing policy
globus_sysconfig: Error getting signing policy file
globus_sysconfig: File does not exist: /etc/grid-security/certificates/dd5d9bb8.signing_policy is not a valid file
This error means the CA signing policy file for the user certificate is not in the expected location. Fetch the CA signing policy file from the remote alice
server an place it in /etc/grid-security/certificates
.
Error: Couldn't verify the authenticity of the user's credential to generate a proxy from.
grid_proxy_init.c:956: globus_credential: Error verifying credential: Failed to verify credential
globus_gsi_callback_module: Could not verify credential
globus_gsi_callback_module: Can't get the local trusted CA certificate: Cannot find trusted CA certificate with hash dd5d9bb8 in /etc/grid-security/certificates
This error means the CA certificate is not in the expected location. Fetch the CA certificate from the remote server and place it in /etc/grid-security/certificates
.
Goal: Getting familiar with the globus-url-copy command and performing some basic GridFTP actions.
Time: 10 minutes
There are many options available for the globus-url-copy
command. This section will show the parts explaining how to use the command and some of the most relevant options for this session.
Usage of the globus-url-copy
command:
# globus-url-copy -help
globus-url-copy [options] <sourceURL> <destURL>
globus-url-copy [options] -f <filename>
<sourceURL> may contain wildcard characters * ? and [ ] character ranges
in the filename only.
Any url specifying a directory must end with a forward slash '/'
If <sourceURL> is a directory, all files within that directory will
be copied.
<destURL> must be a directory if multiple files are being copied.
...
Some useful arguments:
-help | -usage
Print help
-version
Print the version of this program
-versions
Print the versions of all modules that this program uses
-q | -quiet
Suppress all output for successful operation
-v | -verbose
Display urls being transferred
-vb | -verbose-perf
During the transfer, display the number of bytes transferred
and the transfer rate per second. Show urls being transferred
-dbg | -debugftp
Debug ftp connections. Prints control channel communication
to stderr
-list <url to list>
-concurrency | -cc
Number of concurrent ftp connections to use for multiple transfers.
-sync
Only transfer files where the destination does not exist or differs
from the source. -sync-level controls how to determine if files
differ.
-sync-level <number>
Choose criteria for determining if files differ when performing a
sync transfer. Level 0 will only transfer if the destination does
not exist. Level 1 will transfer if the size of the destination
does not match the size of the source. Level 2 will transfer if
the timestamp of the destination is older than the timestamp of the
source, or the sizes do not match. Level 3 will perform a checksum of
the source and destination and transfer if the checksums do not match,
or the sizes do not match. The default sync level is 2.
In the following command examples irods<x>
is the irods username of your group. <x>
should be replaced with your group number.
Listing an iRODS collection in the remote server by using the -list
argument.
Example:
globus-url-copy -ipv6 -list gsiftp://irods4.alice/aliceZone/home/irods<x>/
Note: don't forget the trailing slash (/).
Since this GridFTP server is integrated with iRODS, the url to list consists of /<zone_name>/<collection>/<collection>/...
. Where the collection
part is the logical path inside the iRODS zone.
If you want more output on what is happening use the -dbg -v
arguments.
globus-url-copy -ipv6 -dbg -v -list gsiftp://irods4.alice/aliceZone/home/irods<x>/
This will output alot of information on the data sent to and received from the server.
Transfer a single file to the remote iRODS server:
globus-url-copy -ipv6 -vb single_file.txt gsiftp://irods4.alice/aliceZone/home/irods<x>/
Tranfer a directory to the remote iRODS server:
globus-url-copy -ipv6 -vb dataset1/ gsiftp://irods4.alice/aliceZone/home/irods<x>/dataset1/
This command will fail with the following message because the destination directory doesn't exist:
500 500-Command failed. : iRODS DSI. Error: rcDataObjCreate failed. SYS_INTERNAL_NULL_INPUT_ERR: , status: -24000.
Questions:
- Improve the command in such a way that the destination directory is properly created.
- Improve the command in such a way that all subdirectories are included as well.
Goal: use the globus-url-copy
command to implement a data transfer workflow using the B2SAFE service as remote storage.
Time: 30 minutes
The goal is to develop a script that will synchronize a directory tree from the client to the server and when run multiple times it should take into account changed and deleted files.
- Create the data synchronization script
- Synchronize your
gridftp<xyz>
directory to/aliceZone/home/alice/irods<x>/data/
- Verify the data is properly updated
- Synchronize again and verify no files are transfered
- Change a file
- Synchronize again and verify the file is properly updated
The iRODS and B2SAGE hands on session should be completed before this step.
The goal is to improve the data transfer workflow as follows:
- Configure the data transfer workflow in such a way that checksums are generated for the ingested data.
- Configure the data transfer workflow in such a way that PIDs are generated and assigned to ingested data.
You can use the ichksum
command to generate checksums for data objects in iRODS. Alternativly you can utilize the irule
command and invoke a rule which uses the <todo>
microservice.
In order the mint pids you can use the irule
command to call the B2SAFE microservices used to manage PIDs.
Usefull rules can be found in /etc/irods/eudat3.re
.
A selection (not a complete list) of PID management rules provided via B2SAFE:
- EUDATCreatePID(*parent_pid, *path, *ror, *iCATCache, *newPID)
- EUDATPidsForColl(*collPath)
- EUDATeCHECKSUMupdate(*PID, *path)
A better, and more advanced, approach is to use the iRODS rule engine to compute checksums and mint PIDs automatically. This has probably not been covered in the tutorials so far, so only proceed if you are ok with some trial and error.
You can find the iRODS server rule engine configuration here: /etc/irods/core.re
.
Some usefull hooks in this context are:
acPostProcForPut
- Rule for post processing the put operation.acPostProcForCopy
- Rule for post processing the copy operation.acPostProcForFilePathReg
- Rule for post processing the registrationacPostProcForCreate
- Rule for post processing of data object create.acPostProcForOpen
- Rule for post processing of data object open.acPostProcForPhymv
- Rule for post processing of data object phymv.acPostProcForRepl
- Rule for post processing of data object repl.
Make sure to limit the changes to the irods home directory for your user:
ON($objPath like "/aliceZone/home/irods<x>/*") {
# do something useful
}
More information on the iRODS microservices: https://docs.irods.org/master/doxygen/
- Upon ingestion of data in B2SAFE checksums are now computed, how could you obtain the checksums for data you ingest with the data transfer script?
- Upon ingestion of data in B2SAFE PIDs are now assigned, how could you obtain the PIDs for data you ingest with the data transfer script?
If you completed the data transfer workflow improvements by using the iRODS rule engine you can continue with this section.
The iRODS and B2STAGE hands on sessions should be completed before this step.
The goal if to configure the alice iRODS server in such a way that ingested data is replicated to the bob iRODS server as well using the B2SAFE service.
Client VM:
for each iRODS user the following layout is present:
/home/irods<x># tree
.
├── ca
│ ├── dd5d9bb8.0
│ └── dd5d9bb8.signing_policy
├── gridftp001
│ ├── dataset1
│ │ └── single_file.txt
│ └── dataset2
│ ├── 1.txt
│ ├── 2.txt
│ └── objects
│ └── 3.txt
├── usercert.pem
└── userkey.pem
5 directories, 8 files
Each certificate is named usercert.pem
with a passphrase of the form gridftp<xyz>
, where <xyz>
is the respective group number.
"remote" AliceZone:
/aliceZone/home/alice:
C- /aliceZone/home/alice/gridftp001
C- /aliceZone/home/alice/gridftp002
C- /aliceZone/home/alice/gridftp003
C- /aliceZone/home/alice/gridftp004
C- /aliceZone/home/alice/gridftp005
C- /aliceZone/home/alice/gridftp006
C- /aliceZone/home/alice/gridftp007
C- /aliceZone/home/alice/gridftp008
C- /aliceZone/home/alice/gridftp009
C- /aliceZone/home/alice/gridftp010