I see in the logs:
$ reana-client logs -w ana-susy-2019-12_skimming_all_all_all_fitting_theory_weight_tree_scanning.1
...
File "/usr/local/lib/python3.8/site-packages/yadage/stages.py", line 52, in apply
self.rule.apply(WorkflowView(adageobj, self.offset))
File "/usr/local/lib/python3.8/site-packages/yadage/stages.py", line 101, in apply
self.schedule()
File "/usr/local/lib/python3.8/site-packages/yadage/stages.py", line 144, in schedule
scheduler(self, self.stagespec)
File "/usr/local/lib/python3.8/site-packages/yadage/handlers/scheduler_handlers.py", line 198, in singlestep_stage
parameters = {
File "/usr/local/lib/python3.8/site-packages/yadage/handlers/scheduler_handlers.py", line 199, in <dictcomp>
k: select_parameter(stage.view, v)
File "/usr/local/lib/python3.8/site-packages/yadage/handlers/scheduler_handlers.py", line 49, in select_parameter
value = handler(wflowview, parameter)
File "/usr/local/lib/python3.8/site-packages/yadage/handlers/expression_handlers.py", line 158, in stage_output_selector
assert len(steps) == 1
AssertionError
This could indicate some troubles with the workflow definition?
Here are some tips:
(1) In order to ease the debugging, it would be good if you could include your reana.yaml
amongst the workflow input files. In this way it’ll be uploaded to the workspace together with the inputs. This is not necessary for successful workflow run, just to make the debugging easier, since reana.yaml
would be included in the workflow’s workspace by default. For example, see files: ...
part here:
inputs:
parameters:
did: 404958
xsec_in_pb: 0.00122
dxaod_file: https://recastwww.web.cern.ch/recastwww/data/reana-recast-demo/mc15_13TeV.123456.cap_recast_demo_signal_one.root
directories:
- workflow
files:
- reana.yaml
workflow:
type: yadage
file: workflow/workflow.yml
outputs:
files:
- statanalysis/fitresults/limit.png
(2) Have you tried to run reana-client validate
to see about any possible warnings regarding workflow parameters and output selectors?
(3) FWIW I’m seeing in specs/steps.yml
commands like:
cd /ttDM_DESY/run
xrdfs eoshome.cern.ch ls {input_dir} | grep '.root' > eos_paths.txt
mkdir -p inputs/$(basename {input_dir})
while read line; do xrdcp root://eoshome.cern.ch/$line inputs/$(basename {input_dir})/.; echo -e $(basename {input_dir}) >> signal.txt; done < eos_paths.txt
This may be troublesome in case of big input files, since the input files seem to be copied into a directory under /ttDM_DESY
. This directory lives inside the container, i.e. it does not use workflow’s workspace, but the container’s ephemeral storage (which is volatile and of the order of tens of GBs). If you copy big files (?) then the container could run out of ephemeral storage of the node and it could get killed.
How big are your input files? Could this be the cause?
Generally speaking, it seems more advantageous to consider the container docker image as a R/O provider of necessary environment and software, where no data get written to, and use the automatically provided R/W workflow workspace for all data operations. In other words, if you don’t write into hard-coded paths under /
such as /ttDM_DESY
, but rather write into the default workspace directory created by REANA and automatically mounted into your container at the execution time, you’ll be having more data workspace to work with, and reduce the risk of ephemeral storage exhaustion.
This is similar to running Docker containers locally: you wouldn’t probably write any big files inside the containers directly, but would rather use an external data volume that will be mounted to the docker container via -v
at the execution time.
Is it possible to rewrite your code to something as follows:
source /home/atlas/release_setup.sh
mkdir inputs
xrdcp root://example.org/bigfile.root inputs
/ttmDM_DESY/run.sh ./inputs
so that you always stay in the workspace?
If your input files that are being written into /ttDM_DESY
are guaranteed to be really small, of the order of a 5 GBs or thereabouts, then this is probably not the cause though. Only if they could reach 30 GBs or thereabouts would this be problematic.