Getting Data from iPlant/CyVerse
This is the quick tutorial to download data from CyVerse (previously known as iPlant) on HPC clusters using iRODS. iRODS may or may not be available as module on your cluster, and if available as a module load the module to use irods
commands.
1
2
3
4
5
# search
module spider irods # or
module spider icommands
# load
module load <module name>
If the module is unavailable, you can either install it yourself following the guidelines here or skip installation by using the singularity container (and run these commands within the container)
1
2
3
4
5
#pull container
singularity pull --name irods.sif shub://mjstealey/singularity-irods-icommands
# use container
singularity shell --bind $PWD irods.sif
singularity> iinit
1. Login to your CyVerse account
Sometimes also referred as iRODS account, this is required to access private files from the CyVerse data store. This is done by logging in via iinit
command. You only need this once per cluster. As it involves data transfer, the best practice recommends using the data transfer nodes on HPC.
1
2
3
4
5
6
7
8
9
10
11
$ iinit
One or more fields in your iRODS environment file (.irodsEnv) are
missing; please enter them.
Enter the host name (DNS) of the server to connect to: data.iplantcollaborative.org
Enter the port number: 1247
Enter your irods user name: <username>
Enter your irods zone: iplant
Those values will be added to your environment file (for use by
other i-commands) if the login succeeds.
Enter your current iRODS password:
#password won't be displayed
Now you are logged in. Although your working directory doesn’t change, all iCommands
can now access the CyVerse datastore.
You can logout
anytime by typing iexit
command.
Alternatively, you can save this information in ~/.irods/irodsEnv
file as follows:
1
2
3
4
5
irodsHost data.iplantcollaborative.org
irodsPort 1247
irodsUserName <username>
irodsZone iplant
irodsPassword <your password>
This will let you connect automatically each time you want to login by just typing in iinit
command. You can also exclude the last line (password), so that it will prompt only password for logging in.
2. Download files
Navigate to the desired location/directory and use iget
command to download the desired file. You can also use any of the standard UNIX command by using i
as prefix (eg., icp
for copy, imv
to move, ipwd
to print working directory etc.,)
1
iget path/to/filename
If there are large number of files to be copied or you just want to sync 2 directories (local and iplant), you can also use rsync
command. rsync
synchronize the data between a local copy (local file system) and the copy stored in iRODS or between two iRODS copies. You can use rsync
to either:
- synchronization of data from the client’s local file system to iRODS
- from iRODS to the local file system
- from one iRODS path to another iRODS path
To synchronize (recursively) the data from the local directory foo1 to the iRODS collection foo2
1
irsync -r foo1 i:foo2
To synchronize (recursively) the data from the iRODS collection foo1 to the local directory foo2
1
irsync -r i:foo1 foo2
To synchronize (recursively) the data from the iRODS collection foo1 to another iRODS collection foo2
1
irsync -r i:foo1 i:foo2
Finally, if you want to transfer a single file from local host to remote (iplant), you can do it using iput
command.
1
iput filename
A full list of irods commands can be found here
References
For more information, follow these links: