Setting up a Globus Online Service
The School storage is continuously increasing and datasets of many terrabytes are no longer exceptional. One of the big problems with large datasets is how to transfer them from one system to another. One of our research groups collects micro tomography datasets of the order of tens of terrabytes. These need to be transferred from the facility where the data were collected to our storage for further processing. Traditionally services like ftp or rsync were used. These services work well for small datasets but not for these large datasets. This is where Globus Online steps in. Globus Online allows you to use your local identity to schedule moving large datasets from one system to another. As a user you log into the Globus Online service and authorise access to your data collections. Once authorised you can schedule data movement. Globus also allows guest access in which case it works a little like dropbox. This post is about setting up a managed Globus Endpoint in the School of GeoSciences using a base LCFG system.
A Globus Endpoint had been on our TODO list for years. The main stumbling block was that the identity provider (shibboleth) as implemented by the University of Edinburgh did not export all the metadata required by the Globus Service. Getting this changed required considering data protection implications and a change that had to be made at university level. Once the shibboleth service was reconfigured to provide the required information to Globus Online the implementation of the service became straight-forward.
We use LCFG to manage our Linux systems. All school storage is centralised using ceph. Individual directories are mounted using the automounter maps defined in LDAP. For our Globus Endpoint we wanted to use the same technology stack. Globus is managed using command line tools that setup a webserver and let’s encrypt certificates. There are LCFG components that will manage these. In the end I decided to let the globus tool do their configuration and use LCFG just to set up the base system and manage the firewall. We also restrict our Globus Endpoint to only expose users’ scratch directory and, on request, group directories. The globus package gets installed from our package service which mirrors the official Globus Online package repository.
Once the base system is installed and we got suitable firewall exceptions for the Globus Service in place we configured the system using the instructions in the official Globus Online Documentation. Given the long wait to get the IDP sorted out, the installation was very straight-forward.
In the School of GeoSciences we use the universities IDP for user authentication. However, we only allow a subset of university users onto our system (mostly just the members of the school). So, it was interesting to see what would happen when a university user but not a school member tried to access our service. As hoped, Globus Online complained that although the user was an authenticated member of the university the user named had no valid mapping for our system.