Reana voms error

zhangr · 21 November 2021 13:25

Dear developers,

I have encountered some error with voms proxy on REANA. I followed the instruction from the REANA documentation.

I made it work earlier this week this task (step 1). However, now I cannot reproduce the results (see a resubmission of the same task). The error (see below) is that cannot access to the ‘/vomsproxy_cache/x509up_proxy’ which makes me think that there may be a server issue?

Is there anything I can check? Thanks for your help.

====
Container job failed, error: Nonevoms-proxy: :
Contacting voms2.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] “atlas”…
Remote VOMS server contacted succesfully.

/usr/bin/voms-proxy-init: line 3: 37 Killed java $VOMS_CLIENTS_JAVA_OPTIONS -cp $(build-classpath voms-clients-java voms-api-java canl-java bcpkix bcprov commons-cli commons-io) org.italiangrid.voms.clients.VomsProxyInit “$@”
chown: cannot access ‘/vomsproxy_cache/x509up_proxy’: No such file or directory

OOMKilled

===

tiborsimko · 1 December 2021 18:07

Hi @zhangr, sorry for the late reply!

I had a look at the workflow and it seems that the problem you are seeing is not related to VOMS proxy set up, but rather to the insufficient memory limits. I see that your jobs are using:

resources:
  - kubernetes_memory_limit: '64Mi'

While this is working on some installations, e.g. locally, we have seen some troubles for other workflows on CERN cluster. It is possible that it is related to the upgrade of Kubernetes and the new bigger REANA job controller images that the upgrade brought.

Can you please edit your 64Mi limits to use safer bigger value such as 256Mi and retry? This worked better for me in the past.

(We are going to modify REANA demo examples to use the higher value of 256Mi as well, just to be on the safer side!)

zhangr · 3 December 2021 16:35

Hi @tiborsimko,

Thanks for your investigation. Indeed, your suggestion fixed another problem of mine (so I can proceed with that thread) but the proxy error still persist. for example this task (I hope you can see it): REANA
You can see in the step.yaml, I now switched to 256Mi.

Do you have any idea what went wrong? Thanks in advance.

zhangr · 3 December 2021 16:41

To be more specific, the error is generated when I want to use Grid dataset as input.

resources:
      - voms_proxy: true

zhangr · 4 December 2021 19:36

Hi,

It turns out that my grid certificate is expired than it indicated to be. after regenerating a certification I was able to fix the error.

tiborsimko · 5 December 2021 23:37

Hi @zhangr, good then!