NorduGrid FAQ

1 Getting started with NorduGrid's ARC software.

1.1 What is ARC? Is it different from NorduGrid?
1.2 Where do I get the software?
1.3 What should I install?
1.4 How do I access the code repository?
1.5 Is the standalone client really standalone?

2 Certificates and proxies

2.1 What is a certificate?
2.2 How do I get a user certificate?
2.3 How do I generate a NorduGrid user certificate request?
2.4 When, why and how can I update the NorduGrid CA key?
2.5 How do I generate a non-NorduGrid user certificate request?
2.6 Why grid-cert-request crashes?
2.7 The subject of my NorduGrid user certificate request does not contain a Email=<my-email> field?
2.8 How do I generate a host certificate request?
2.9 What is CRL and what to do when it expires?

3 Authorization and authentication

3.1 Why does my server/client report authentication failure?

4 Virtual Organizations (VO)

4.1 What is a Virtual Organization (VO)?
4.2 How do I become a member of a Virtual Organization?
4.3 Why arcproxy or voms-proxy-init can not find my VO?

5 Submitting jobs

5.1 How do I pre-install my favorite software on the clusters?
5.2 I see some sites have my favorite software installed; how do I tell my jobs to use it?
5.3 I try to submit a job but every time it says "no cluster found".
5.4 Why my xrsl could not be parsed?
5.5 Why I am getting error 'All targets rejected job requests'?

6 Server Setup

6.1 Is NFS required to setup a NorduGrid cluster?
6.2 Can the cache directory be located on some remote computer and imported over NFS?

7 Gridftp server (gridftpd)

7.1 Why does my gridftpd have closed connection?

8 Information System

8.1 What is a GIIS?
8.2 Should I run a GIIS?
8.3 Why my cluster does not appear on the Grid monitor?
8.4 The monitor in debug mode says that my resource is PURGED, what does it mean?

1 Getting started with NorduGrid's ARC software.

1.1 What is ARC? Is it different from NorduGrid?

ARC stands for "Advanced Resource Connector" and is a Grid software developed by the NorduGrid collaboration. Except of developing ARC, the collaboration deals with such other things as the cross-Nordic Certificate Authority, coordination of usage of some Nordic computing resources, user support etc. NorduGrid is the name for the collaboration, and ARC is the software, so yes, there is a difference.

1.2 Where do I get the software?

There are several ways to download ARC: directly from the ftp/http server, via apt/yum repositories, or as a source from the code repository.

Please follow the instructions in the download area linked from the NorduGrid web page:

http://download.nordugrid.org

1.3 What should I install?

If you are an ordinary user, you can install either the binary client package (if you have admnistrator privileges), or you could install the nordugrid-arc-standalone package which contains all you need as a user.

For detailed information, consult the ARC User Guide or the client installation instructions at

http://www.nordugrid.org/documents/arc-client-install.html

If you want to install a new site, consult the ARC server-install document:

http://www.nordugrid.org/documents/arc-server-install.html

1.4 How do I access the code repository?

The code repository details are available at

http://svn.nordugrid.org

There is an option to download the tarball of the repository, if you wish to get the whole code.

Write-access to the repository is only available via https; for more information, consult ARC code repository instructions.

1.5 Is the standalone client really standalone?

No, the standalone client expects certain non-grid specific system libraries and tools to be installed at your computer. Most notably, it needs a Linux distribution with the following libraries and utilities:

glibc, bash, perl, libxml2, libltdl, libtool, autoconf, openssl

2 Certificates and proxies

2.1 What is a certificate?

A certificate is nothing more than an electronic passport. It contains information about your name and other details, e.g., your e-mail address or location. Contents of the certificate is determined by the rules set up by your national certificating authority (CA), which issues certificates. Having a certificate does not authorize you to use any resources on the Grid or elsewhere, it only identifies that you are who you claim to be. It is used to establish contact between you and another service, so that instead of typing your name, you simply present your certificate. A service may reject it, if it is not in its list of acceptable certificates.

The system is analogous to that of passports and national borders: for example, if you want to travel to USA, your country must be in the list of accepted ones, if you want to use your passport; otherwise you have to request a visa (which can be rejected). With computing resources, if you want to submit a job to a cluster, your CA must be in the list of those accepted by the cluster, and your certificate subject line must be in the list of accepted users on that cluster. To achieve it, you have to either contact each cluster owner upon receiving the certificate, or join a Virtual Organization, where manager can do it for you.

2.2 How do I get a user certificate?

First, check whether your country has already joined the TERENA eScience certificates network:

https://tcs-escience-portal.terena.org/

If so, proceed with instructions in the TERENA TCS portal.

If TCS does not help, try to locate your national Certification Authority (CA). Typically, Google is very helpful there. If your CA offers Web-based certificate request interface, follow their instructions.

If your CA accepts certificate requests only by e-mail (like e.g. the NorduGrid CA), follow instructions on the CA Web site to generate such request, and send this request to the address suggested on your CA Web site. Typical instructions for NorduGrid CA that make use of Globus certificate tools are shown below. If the request is correct, the Certificate Authority will sign your request and send your public certificate back to you.

2.3 How do I generate a NorduGrid user certificate request?

NorduGrid certificates should not be necessary for individual users any more, as users are advised to obtain certificates from the TERENA TCS portal:

https://tcs-escience-portal.terena.org/

In case you for some reason still need a NorduGrid certificate, you need to use some Grid software. Assuming you or your site administrator have already installed either Globus Toolkit, or nordugrid-arc-client package, or you are working with the nordugrid-arc-standalone client, you need to obtain a special NorduGrid CA configuration package:

ca_NorduGrid-certrequest-configuration

This package is available here

Once NorduGrid CA is configured (typicaly, by unpacking contents of that file in /etc/grid-security folder), execute the following:

grid-cert-request -int

and answer the questions. Please note that some of the fields are not supposed to be changed. The grid-cert-request program will generate a certificate request for that should be sent to ca@nordugrid.org. Before sending please verify that the subject of your certificate request has the form

/O=Grid/O=NorduGrid/OU=<your organization>/CN=<your name>/Email=<your email>

and does not contain non-ASCII characters (e.g., national accented letters in Unicode, non-latin letters etc). If these criteria are not satisfied, please rerun the grid-cert-request command.

Note that you cannot use the grid-cert-request command to request the NorduGrid certificate without the -int flag, as it causes failure.

2.4 When, why and how can I update the NorduGrid CA key?

When you suddenly can not create a proxy, submit a job, or do anything at all on the Grid, it may mean that your CA key and all credentials have expired. This happens when you do not udate your software frequently enough.

The NorduGrid CA (like any other CA) has credentials that have a limited lifetime. Once in a while they have to be renewed and updated everywhere: on client machines, in Web browsers, on Grid servers, on Web servers etc. Users' and host certificates can not be valid longer than the CA credentials, and hence all have to be updated as well. The steps are the following:

  1. Get the latest public certificates at http://ca.nordugrid.org (section "The NorduGrid CA public certificate").
    • Note: normally, the entire package is also available via standard repositories, such as e.g. NorduGrid downloads area (look for IGTF, ca_NorduGrid) and yum repositories, and via IGTF site
  2. Install these public CA certificates in /etc/grid-security/certificates, and/or in /your-standalone-path/etc/certificates, and in your browser and mailer (eventually you will have to remove the old, expired NorduGrid CA certificates).
  3. If you are a Grid site owner, you may need to restart grid services (esp. the Web-services) in order to load the new certificates.
  4. You must request new user and/or host certificates

2.5 How do I generate a non-NorduGrid user certificate request?

If you are not a resident in a Nordic country (Denmark, Finland, Norway, Iceland or Sweden), you must ask your local certificating authority (CA) about the procedure. NorduGrid client installation has only necessary utilities, but is not distributed with all the national CA configuration files. Obtain from your CA the files

  globus-host-ssl.conf.xxxxxxxx
  globus-user-ssl.conf.xxxxxxxx
  grid-security.conf.xxxxxxxx

(here xxxxxxxx depends on your national CA identity)

and store them in the proper directory:

/etc/grid-security/certificates

or

$NORDUGRID_LOCATION/etc/certificates

if you have installed a standalone client.

After doing this, type

grid-cert-request -int -ca

and answer the questions. Please note that some of the fields are not supposed to be changed. The grid-cert-request program will generate a certificate request for that should be sent to your CA by e-mail.

2.6 Why grid-cert-request crashes?

Most likely, you ran it without -int option. To request NorduGrid certificate, always execute

grid-cert-request -int

The reason is that NorduGrid certificates require e-mail field, while the Globus grid-cert-request utility by default only asks interactively for your names, and neither can ask for e-mail or guess it.

2.7 The subject of my NorduGrid user certificate request does not contain a Email=<my-email> field?

If you are using the nordugrid-arc-standalone package, please upgrade to the latest nordugrid-arc-standalone package.

If not, it is probably because your site administrator has forgotten to install the NorduGrid certrequest package. Please ask him to do it.

The package can be found here.

2.8 How do I generate a host certificate request?

Run:

grid-cert-request -host <hostname> -dir `pwd`

This will generate a host-certificate-request in the current directory.

2.9 What is CRL and what to do when it expires?

Sometimes you can not create a proxy, or do anything on the Grid, with an error message being like

"The available CRL has expired"

CRL stands for Certificate Revocation List, and is a file created by every CA and regularly udated. The fle usually have extension .r0 or .r1. The list is either empty (if no certificate has been revoked), or has bad certificates than must not be trusted. In total, there are almost 100 such lists. If any of them is outdated, Grid tools will refuse to work. To be on a safe side, one should update these lists once a day. This is normally done by a cron task.

However, if you have no administrator privileges, use ARC standalone client, use a notebook which is frequently turned off, or if you use a system that has no crontab (like MS Windows), you are likely to run into the problem with expired CRL.

There are two possible solutons:

  1. Use fetch-crl utility to update CRL. It can be used even by a non-privileged user, in which case option "-o" must point to the locaton where you have your *.r0 or *.r1 files.
  2. Simply remove *.r0 and *.r1 files. It is an insecure approach, but in certain cases there is no other option.

3 Authorization and authentication

3.1 Why does my server/client report authentication failure?

There can be many reasons for this; as a rule of thumb check that:

  1. Client has access to user's Certification Authority (CA) public certificates and server CA certificates
  2. Server must have access to own CA certificates and user CA certificates
  3. Permission and ownership of certificate and keys must be right (private keys readable only by the owner, public keys readable by all the relevant services)
  4. Take special care when you are running an ARC service as non-root. Make sure that the certificate files have the right permissions and are owned not by root, but by the user specified by the "user" configuration parameter in the corresponding [gridftpd], [httpsd] blocks of arc.conf.
  5. DNS reverse lookup on host must match
  6. CRLs on the server must be up-to-date
  7. Avoid running client commands from the root account on the same box that has the server installed: this may mix the order in which the certificates (user and host) are read
  8. If all of the above is OK, suspect corruption of either user or host public certificate (or even both). Corruption is known to occasionally take place during public certificate transfers over the Internet

4 Virtual Organizations (VO)

4.1 What is a Virtual Organization (VO)?

A Virtual Organization (VO) is basically a group of people that are authorized to run Grid jobs on a set of Grid resources. For example, a research project members can join in a VO, so that they can negotiate access to Grid resources, policies etc. Typically, a VO has a manager which maintains the list of members and contacts resource owners whenever a negotiation is needed, for example, if a new user has a certificate issued by a new CA, or CA public keys have changed. VO managers are normally in charge of negotiating resources available for the VO members. Each site on the Grid can subscribe to different VO's allowing all their members to run grid jobs on the corresponding site.

NorduGrid maintains a VO for users affiliated with Nordic academic institutions (VO name is nordugrid.org). Few VOs are set up for the purposes of testing and demonstrations. Many other VOs are authorized on the ARC-enabled resources, but they are managed outside the scope of the NorduGrid.

You can always create your own VO and negotiate access to Grid resources with resource owners personally. NorduGrid does not assist in such negotiations.

4.2 How do I become a member of a Virtual Organization?

You must read and accept Accepted Usage Policies, and submit a request via the respective VO management interface.

For more details, please consult

http://www.nordugrid.org/NorduGridVO

(follow the "Grid access" tab on the NorduGrid home page).

4.3 Why arcproxy or voms-proxy-init can not find my VO?

In case you getting errors like:

"VOMS Server for atlas not known!"

or

"Cannot get VOMS server atlas information from the vomses files"

it means you miss the definition of the VOMS server details for your VO (atlas in this example). The simplest way is to add the necessary information to the file

$HOME/.voms/vomses

In case you use a gLite UI client, this file may be found in $HOME/.edg/vomses

Consult your respective VO managers (follow "Configuration" link at most VOMS Web interfaces) to know what are your VO's VOMS server details.

Examples of valid entries for the vomses file are:

"gin.ggf.org" "kuiken.nikhef.nl" "15050" "/O=dutchgrid/O=hosts/OU=nikhef.nl/CN=kuiken.nikhef.nl" "gin.ggf.org"
"pamela" "voms.cnaf.infn.it" "15013" "/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it" "pamela"
"desy" "grid-voms.desy.de" "15104" "/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de" "desy"
"atlas" "lcg-voms.cern.ch" "15001" "/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch" "atlas"
"nordugrid.org" "voms.uninett.no" "15015" "/O=Grid/O=NorduGrid/CN=host/voms.ndgf.org" "nordugrid.org"
"testers.eu-emi.eu" "emitestbed07.cnaf.infn.it" "15002" "/C=IT/O=INFN/OU=Host/L=CNAF/CN=emitestbed07.cnaf.infn.it" "testers.eu-emi.eu"
"dteam" "lcg-voms.cern.ch" "15004" "/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch" "dteam"

5 Submitting jobs

5.1 How do I pre-install my favorite software on the clusters?

Some software is already pre-installed. The procedure of installing and advertising the software on the Grid is referred to as "creating runtime environment". Most of the environments enabled across NorduGrid sites and their partners is described at

http://gridrer.csc.fi/

A simple list of installed runtime environments can be retrieved with the help of the Monitor at

http://www.nordugrid.org/monitor

by either using the "Search" interface (select cluster, runtime environment), or by clicking any cluster name and then on the "Runtime environment" link.

In case you don't find your favorite software in these lists, you'll have to negotiate with the resource owners. If your work is a part of a national or a regional Grid project, please contact your respective project coordinators. NorduGrid can not force resource owners to install your favorite software, but if you are desperate, e-mail the NorduGrid support and we can try to help you.

5.2 I see some sites have my favorite software installed; how do I tell my jobs to use it?

Use the runtimeenvironment attribute in your job description (.xrsl) file. For example, the line

(runtimeenvironment=APPS/HEP/ROOT-4.0.1)

will make your client submitting the jobs only to those sites that have this particular software version (ROOT-4.0.1) installed, and will instruct the remote site to set up all the necessary pathes and environment variables needed by this software.

Please read the xRSL manual and the User Guide for more info on this attribute.

5.3 I try to submit a job but every time it says "no cluster found".

There could be several reasons for this. Try to submit the job again with debug information switched on

arcsub -d VERBOSE <your xrslfile>

For each cluster (target), you can now see the reasons why your job was rejected. Please modify your xrsl-file according to this information. For example, if you see the message for all clusters:

Queue rejected because it does not match the XRSL specification (disk)

it is probably because you have requested too much disk(space) for the job.

If all clusters report that you are not authorized to run there, it is probably because, you are not yet a member of VO (virtual organization), or something is wrong with your certificate.

If you see plenty of messages

Server unexpectedly closed connection

it most likely means your clock is out of sync. Make sure your workstation has clock properly synchronised, re-create the proxy, and try again. If it does not help, check the Section on authentication problems.

5.4 Why my xrsl could not be parsed?

You have made a mistake in your xrsl-specification and the xrsl-parser does not know what you want to do. Please consult the xRSL manual or the User Guide for the correct xrsl-notation.

5.5 Why I am getting error 'All targets rejected job requests'?

In all likelihood, something is wrong with your certificate. Check the following:

  1. Your CA key is up-to-date: check whether the latest version of IGTF packages is installed, see

    https://dist.eugridpma.org/distribution/igtf/

  2. Your computer clock is correct (skew of one minute can be bad enough)
  3. Your certificate is not broken (consult the CA which issued it)

6 Server Setup

6.1 Is NFS required to setup a NorduGrid cluster?

Short answer: Yes, at the moment a shared disk area among the front end and the nodes is required.

Long answer: The preferred installation (see Server side installation instructions) assumes that some disks (the grid area, the cache directory and the runtimeenvironment scripts) are NFS mounted on both the frontend and the nodes. Not having NFS results in losing functionality like the cache or the RuntimeEnvironments, furthermore the job submission backend of the Grid Manager needs to be modified.

6.2 Can the cache directory be located on some remote computer and imported over NFS?

In general the cache directory can be split into two subdirectories, a 'control' and a 'data' subdirectory. Due to some problems with file locking feature of NFS, it is strongly recommended that the 'control' subdirectory is placed at a local file system. 'data' subdirectory can be imported over NFS. To do that you can point "cachedata" variable in arc.conf to a directory that is NFS mounted and "cachedir" variable to the directory from a local file system.

7 Gridftp server (gridftpd)

7.1 Why does my gridftpd have closed connection?

There could be several reasons for this. Try to connect to the server and look for a hint in the log-file, /var/log/gridftpd.log.

If the gridftp process somehow was started as non-root, it cannot read the host-certificates that are owned by root. Another thing is to check that the host-certificates have the right file-permissions. The private key should be readable by root only and neither should have executable permissions. The problem could also arise from an outdated CRL.

8 Information System

8.1 What is a GIIS?

In our settings the Grid Index Information Service (GIIS) is an LDAP database backend which is used as a collection of links, it maintains a list of contact strings of local information databases (GRIS). The list (or index) of GRISes can be queried through an LDAP interface.

8.2 Should I run a GIIS?

Probably not. You should only run a GIIS if you coordinate resources. At the moment the only coordination of resources is done on the country level and several GIIS's already exists. See the following section for a list of currently running GIIS's.

8.3 Why my cluster does not appear on the Grid monitor?

There could be several reasons for this. Assuming you have a valid host certificate you problem could be:

grid-infosys was not started properly
You need to start the grid-infosys service. If your configuration is correct this results in a grid-info-soft-register process which periodically will start an ldapadd process.
You do not register to an EGIIS.
Here is a cluster registration unit appropriate for registering Danish clusters to the Grid:
[mds/gris/registration/ArisToDenmark]
regname="Denmark"
reghn=grid.nbi.dk
regperiod=30       # Try to register every 30 second
servicename=nordugrid-cluster-name
The list of country GIIS's can be found at:

http://www.nordugrid.org/NorduGridMDS/index_service.html

Note that you can do multiple registrations (eg. to several country GIIS's) by having more than one section of the above type. Note however that the section label should be different. Examples:
[mds/gris/registration/ArisToSwedenLund]
regname="Sweden"
reghn=quark.hep.lu.se
servicename=nordugrid-cluster-name

[mds/gris/registration/ArisToSwedenUppsala]
regname="Sweden"
reghn=grid.tsl.uu.se
servicename=nordugrid-cluster-name
  
You are not authorized to connect to a higher level EGIIS. This could be a country EGIIS or organization EGIIS.
The current EGIIS hierarchy is:

Cluster -> Country -> Top level

To get your cluster or storage element authorized please contact the appropriate GIIS administrator, see the list

Your cluster is badly misconfigured.
Try to run the Monitor in the debug mode to discover possible problems:

http://www.nordugrid.org/monitor/?debug=2

The machine's clock is not synchronized and badly off-time.
Consider synchronizing your clock and using ntp.
Your machine is behind a firewall that blocks access from the monitor client.
Ask NorduGrid support what is the current IP address of the monitor client and allow accesses from it over LDAP protocol to the port where you run ARIS or EGIIS.

8.4 The monitor in debug mode says that my resource is PURGED, what does it mean?

The PURGED registration status of the resource to an index means that your resource is not registering any longer or the registration information is lost due to improperly set timeouts or local clock. You should check the GIIS registration block of the globus.conf file and synchronize your clock.