Workflow with yadage

Hi all

I am really new into REANA, so it could be that my following question is very basic.
I am trying to create a very simple workflow in yadage following this tutorial [1] but I am getting an error that I dont understand. If it is worth mentioning, with simple workflow syntax another test job run perfectly [2].

This is the content of my reana.yml:

version: 0.7.2
inputs:
    files:
        - scripts/steps.yaml
        - scripts/workflow.yaml
workflow:
    type: yadage
    file: scripts/workflow.yaml

My script/workflow.yaml is:

stages:
- name: runNano
  dependencies: [init]
  scheduler:
      scheduler_type: singlestep-stage
      parameters:
          maxEvents: '1000'
          #maxEvents: {step: init, output: '1000'}
          outputFile: '{workdir}/output_simulation.root'
      step: {$ref: 'scripts/steps.yaml#/runNano'}

and my scripts/steps.yaml is:

runNano:
    process:
        process_type: 'interpolated-script-cmd'
        interpreter: bash
        script: |
            source /opt/cms/cmsset_default.sh
            cd /home/cmsusr/CMSSW_5_3_32/src/
            eval `scramv1 runtime -sh`
            cmsRun Analyzer/AOD2NanoAOD/configs/simulation_cfg.py maxEvents={maxEvents} outputFile={outputFile}
    environment: 
        environment_type: 'docker-encapsulated'
        image: gitlab-registry.cern.ch/algomez/cms-open-data
        imagetag: master
    publisher:
        publisher_type: iterpolated-pub
        publish:
            outputFile: '{outputFile}'

and this is the error that I am getting when I try to run reana-client create -w opendatatest:

Cannot create workflow opendatatest: 
{'scheduler_type': 'singlestep-stage', 'parameters': [{'key': 'maxEvents', 'value': '1000'}, {'key': 'outputFile', 'value': '{workdir}/output_simulation.root'}], 'step': {'process': {'process_type': 'interpolated-script-cmd', 'interpreter': 'bash', 'script': 'source /opt/cms/cmsset_default.sh\ncd /home/cmsusr/CMSSW_5_3_32/src/\neval `scramv1 runtime -sh`\ncmsRun Analyzer/AOD2NanoAOD/configs/simulation_cfg.py maxEvents={maxEvents} outputFile={outputFile}\n'}, 'environment': {'environment_type': 'docker-encapsulated', 'image': 'gitlab-registry.cern.ch/algomez/cms-open-data', 'imagetag': 'master', 'resources': [], 'envscript': '', 'env': {}, 'workdir': None, 'par_mounts': []}, 'publisher': {'publisher_type': 'iterpolated-pub', 'publish': {'outputFile': '{outputFile}'}}}} is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['stages']['items']['properties']['scheduler']:
    {'oneOf': [{'$ref': 'scheduler/singlestep-stage-schema.json#'},
               {'$ref': 'scheduler/multistep-stage-schema.json#'},
               {'$ref': 'scheduler/jq-stage-schema.json#'}],
     'type': 'object'}

On instance['stages'][0]['scheduler']:
    {'parameters': [{'key': 'maxEvents', 'value': '1000'},
                    {'key': 'outputFile',
                     'value': '{workdir}/output_simulation.root'}],
     'scheduler_type': 'singlestep-stage',
     'step': {'environment': {'env': {},
                              'environment_type': 'docker-encapsulated',
                              'envscript': '',
                              'image': 'gitlab-registry.cern.ch/algomez/cms-open-data',
                              'imagetag': 'master',
                              'par_mounts': [],
                              'resources': [],
                              'workdir': None},
              'process': {'interpreter': 'bash',
                          'process_type': 'interpolated-script-cmd',
                          'script': 'source /opt/cms/cmsset_default.sh\n'
                                    'cd /home/cmsusr/CMSSW_5_3_32/src/\n'
                                    'eval `scramv1 runtime -sh`\n'
                                    'cmsRun '
                                    'Analyzer/AOD2NanoAOD/configs/simulation_cfg.py '
                                    'maxEvents={maxEvents} '
                                    'outputFile={outputFile}\n'},
              'publisher': {'publish': {'outputFile': '{outputFile}'},
                            'publisher_type': 'iterpolated-pub'}}}

Failed validating 'oneOf' in schema['properties']['stages']['items']['properties']['scheduler']:
    {'oneOf': [{'$ref': 'scheduler/singlestep-stage-schema.json#'},
               {'$ref': 'scheduler/multistep-stage-schema.json#'},
               {'$ref': 'scheduler/jq-stage-schema.json#'}],
     'type': 'object'}

On instance['stages'][0]['scheduler']:
    {'parameters': [{'key': 'maxEvents', 'value': '1000'},
                    {'key': 'outputFile',
                     'value': '{workdir}/output_simulation.root'}],
     'scheduler_type': 'singlestep-stage',
     'step': {'environment': {'env': {},
                              'environment_type': 'docker-encapsulated',
                              'envscript': '',
                              'image': 'gitlab-registry.cern.ch/algomez/cms-open-data',
                              'imagetag': 'master',
                              'par_mounts': [],
                              'resources': [],
                              'workdir': None},
              'process': {'interpreter': 'bash',
                          'process_type': 'interpolated-script-cmd',
                          'script': 'source /opt/cms/cmsset_default.sh\n'
                                    'cd /home/cmsusr/CMSSW_5_3_32/src/\n'
                                    'eval `scramv1 runtime -sh`\n'
                                    'cmsRun '
                                    'Analyzer/AOD2NanoAOD/configs/simulation_cfg.py '
                                    'maxEvents={maxEvents} '
                                    'outputFile={outputFile}\n'},
              'publisher': {'publish': {'outputFile': '{outputFile}'},
                            'publisher_type': 'iterpolated-pub'}}}

Do you have any suggestion on what am I doing wrong?

Thanks a lot for the help.
cheers,

[1] Reproducible analyses
[2] REANA

Hi @algomez, you seem to have a typo in steps.yaml:

publisher_type: iterpolated-pub

It should read interpolated-pub.

Ups sorry for the stupid mistake, but thanks a lot for the help.

cheers,

Hi again,

I have a follow up question. I have a workflow that according to the reana-client status is finished but it is not doing anything. Can you help me with this?

My reana.yml:

version: 0.7.2
inputs:
    files:
        - scripts/steps.yaml
        - scripts/workflow.yaml
workflow:
    type: yadage
    file: scripts/workflow.yaml

My workflow.yaml:

stages:
    - name: createNanoSimulation
      dependencies: [init]
      scheduler:
          scheduler_type: 'singlestep-stage'
          parameters:
              maxEvents: '1000'
              isData: 'False'
              txtFile: 'Analyzer/AOD2NanoAOD/data/CMS_MonteCarlo2012_Summer12_DR53X_DYJetsToLL_M-50_TuneZ2Star_8TeV-madgraph-tarball_AODSIM_PU_RD1_START53_V7N-v1_20000_file_index.txt'
              outputRoot: 'output_simulation_numEvents1000.root'
          step: {$ref: 'scripts/steps.yaml#/createNano'}

    - name: createNanoData
      dependencies: [init]
      scheduler:
          scheduler_type: 'singlestep-stage'
          parameters:
              maxEvents: '1000'
              isData: 'True'
              txtFile: 'Analyzer/AOD2NanoAOD/data/CMS_Run2012B_DoubleMuParked_AOD_22Jan2013-v1_10000_file_index.txt'
              outputRoot: 'output_data_numEvents1000.root'
          step: {$ref: 'scripts/steps.yaml#/createNano'}

    - name: skimData
      dependencies: [createNanoData]
      scheduler:
          scheduler_type: singlestep-stage
          parameters:
              inputFile: {stage: 'createNanoData.createNano', output: outputFile}
              pngOutput: 'nMuonscheck_output_data_numEvent1000.png'
              pklOutput: 'output_data_numEvent1000.pkl'
              csvOutput: 'output_data_numEvent1000.csv'
          step: {$ref: 'scripts/steps.yaml#/createNano'}

    - name: skimSim
      dependencies: [createNanoSimulation]
      scheduler:
          scheduler_type: singlestep-stage
          parameters:
              inputFile: {stage: 'createNanoSimulation.createNano', output: outputFile}
              pngOutput: 'nMuonscheck_output_simulation_numEvent1000.png'
              pklOutput: 'output_simulation_numEvent1000.pkl'
              csvOutput: 'output_simulation_numEvent1000.csv'
          step: {$ref: 'scripts/steps.yaml#/createNano'}

and my steps.yaml is:

createNano:
    process:
        process_type: 'interpolated-script-cmd'
        interpreter: bash
        script: |
            pwd -LP
            ls -lh
            source /opt/cms/cmsset_default.sh
            cd /home/cmsusr/CMSSW_5_3_32/src/
            eval `scramv1 runtime -sh`
            cmsRun Analyzer/AOD2NanoAOD/configs/general_cfg.py maxEvents={maxEvents} txtFile={inputFile} isData={isData}
    environment: 
        environment_type: 'docker-encapsulated'
        image: gitlab-registry.cern.ch/algomez/cms-open-data
        imagetag: latest
    publisher:
        publisher_type: interpolated-pub
        publish:
            outputFile: 'output_simulation.root'

skimPandas:
    process:
        process_type: 'interpolated-script-cmd'
        interpreter: bash
        script: |
            ls -lh
            python Analyzer/rootToPandas.py --inputFile={inputFile}
    environment: 
        environment_type: 'docker-encapsulated'
        image: gitlab-registry.cern.ch/algomez/cms-open-data/rootpython
        imagetag: latest
    publisher:
        publisher_type: interpolated-pub
        publish:
            checkPng: '{pngOutput}'
            pklFile: '{pklOutput}'
            csvFile: '{csvOutput}'

In case it is helpful, here is the website of the workflow: REANA

Thanks a lot for the help.
cheers,

Hi @algomez, could you share the output of reana-client logs?

Hi @mvidalgarcia

here it is:

(reana) algomez@lxplus718:~/workingArea/OpenData/cms-open-data$ reana-client logs -w opendatatest 

==> Workflow engine logs
2021-03-24 14:24:06,576 | reana-workflow-engine-yadage | MainThread | INFO | getting socket..
2021-03-24 14:24:06,644 | yadage.creators | MainThread | INFO | no initialization data
2021-03-24 14:24:06,644 | reana-workflow-engine-yadage | MainThread | INFO | running workflow on context: {'publisher': <reana_commons.publisher.WorkflowStatusPublisher object at 0x7efe4ce8b970>, 'rjc_api_client': <reana_commons.api_client.JobControllerAPIClient object at 0x7efe4ce8bb80>, 'workflow_uuid': '0419ee3e-c450-44e7-8df4-863804742f9c', 'workflow_workspace': '/var/reana/users/3938cb60-6819-4c9a-89e5-657f0beadae2/workflows/0419ee3e-c450-44e7-8df4-863804742f9c', 'workflow_file': 'scripts/workflow.yaml', 'workflow_parameters': {}, 'operational_options': {'accept_metadir': True, 'toplevel': '/var/reana/users/3938cb60-6819-4c9a-89e5-657f0beadae2/workflows/0419ee3e-c450-44e7-8df4-863804742f9c/', 'initdir': '/var/reana/users/3938cb60-6819-4c9a-89e5-657f0beadae2/workflows/0419ee3e-c450-44e7-8df4-863804742f9c/', 'initfiles': []}, 'kwargs': {'workflow_json': ''}, 'cap_backend': <yadage.backends.packtivitybackend.PacktivityBackend object at 0x7efe4ce0a4f0>, 'workflow_file_abs_path': '/var/reana/users/3938cb60-6819-4c9a-89e5-657f0beadae2/workflows/0419ee3e-c450-44e7-8df4-863804742f9c/scripts/workflow.yaml', 'schema_name': 'yadage/workflow-schema', 'schemadir': None, 'specopts': {'toplevel': '/var/reana/users/3938cb60-6819-4c9a-89e5-657f0beadae2/workflows/0419ee3e-c450-44e7-8df4-863804742f9c/', 'schema_name': 'yadage/workflow-schema', 'schemadir': None, 'load_as_ref': False}, 'validopts': {'schema_name': 'yadage/workflow-schema', 'schemadir': None}, 'workflow_json': {'stages': [{'name': 'createNanoSimulation', 'dependencies': {'dependency_type': 'jsonpath_ready', 'expressions': ['init']}, 'scheduler': {'scheduler_type': 'singlestep-stage', 'parameters': [{'key': 'maxEvents', 'value': '1000'}, {'key': 'isData', 'value': 'False'}, {'key': 'txtFile', 'value': 'Analyzer/AOD2NanoAOD/data/CMS_MonteCarlo2012_Summer12_DR53X_DYJetsToLL_M-50_TuneZ2Star_8TeV-madgraph-tarball_AODSIM_PU_RD1_START53_V7N-v1_20000_file_index.txt'}, {'key': 'outputRoot', 'value': 'output_simulation_numEvents1000.root'}], 'step': {'process': {'process_type': 'interpolated-script-cmd', 'interpreter': 'bash', 'script': 'pwd -LP\nls -lh\nsource /opt/cms/cmsset_default.sh\ncd /home/cmsusr/CMSSW_5_3_32/src/\neval `scramv1 runtime -sh`\ncmsRun Analyzer/AOD2NanoAOD/configs/general_cfg.py maxEvents={maxEvents} txtFile={inputFile} isData={isData}\n'}, 'environment': {'environment_type': 'docker-encapsulated', 'image': 'gitlab-registry.cern.ch/algomez/cms-open-data', 'imagetag': 'latest', 'resources': [], 'envscript': '', 'env': {}, 'workdir': None, 'par_mounts': []}, 'publisher': {'publisher_type': 'interpolated-pub', 'publish': {'outputFile': 'output_simulation.root'}, 'glob': False, 'relative_paths': False}}}}, {'name': 'skimData', 'dependencies': {'dependency_type': 'jsonpath_ready', 'expressions': ['createNanoData']}, 'scheduler': {'scheduler_type': 'singlestep-stage', 'parameters': [{'key': 'inputFile', 'value': {'stage': 'createNanoData.createNano', 'output': 'outputFile'}}, {'key': 'pngOutput', 'value': 'nMuonscheck_output_data_numEvent1000.png'}, {'key': 'pklOutput', 'value': 'output_data_numEvent1000.pkl'}, {'key': 'csvOutput', 'value': 'output_data_numEvent1000.csv'}], 'step': {'process': {'process_type': 'interpolated-script-cmd', 'interpreter': 'bash', 'script': 'pwd -LP\nls -lh\nsource /opt/cms/cmsset_default.sh\ncd /home/cmsusr/CMSSW_5_3_32/src/\neval `scramv1 runtime -sh`\ncmsRun Analyzer/AOD2NanoAOD/configs/general_cfg.py maxEvents={maxEvents} txtFile={inputFile} isData={isData}\n'}, 'environment': {'environment_type': 'docker-encapsulated', 'image': 'gitlab-registry.cern.ch/algomez/cms-open-data', 'imagetag': 'latest', 'resources': [], 'envscript': '', 'env': {}, 'workdir': None, 'par_mounts': []}, 'publisher': {'publisher_type': 'interpolated-pub', 'publish': {'outputFile': 'output_simulation.root'}, 'glob': False, 'relative_paths': False}}}}]}, 'workflow_kwargs': {'workflow_json': {'stages': [{'name': 'createNanoSimulation', 'dependencies': {'dependency_type': 'jsonpath_ready', 'expressions': ['init']}, 'scheduler': {'scheduler_type': 'singlestep-stage', 'parameters': [{'key': 'maxEvents', 'value': '1000'}, {'key': 'isData', 'value': 'False'}, {'key': 'txtFile', 'value': 'Analyzer/AOD2NanoAOD/data/CMS_MonteCarlo2012_Summer12_DR53X_DYJetsToLL_M-50_TuneZ2Star_8TeV-madgraph-tarball_AODSIM_PU_RD1_START53_V7N-v1_20000_file_index.txt'}, {'key': 'outputRoot', 'value': 'output_simulation_numEvents1000.root'}], 'step': {'process': {'process_type': 'interpolated-script-cmd', 'interpreter': 'bash', 'script': 'pwd -LP\nls -lh\nsource /opt/cms/cmsset_default.sh\ncd /home/cmsusr/CMSSW_5_3_32/src/\neval `scramv1 runtime -sh`\ncmsRun Analyzer/AOD2NanoAOD/configs/general_cfg.py maxEvents={maxEvents} txtFile={inputFile} isData={isData}\n'}, 'environment': {'environment_type': 'docker-encapsulated', 'image': 'gitlab-registry.cern.ch/algomez/cms-open-data', 'imagetag': 'latest', 'resources': [], 'envscript': '', 'env': {}, 'workdir': None, 'par_mounts': []}, 'publisher': {'publisher_type': 'interpolated-pub', 'publish': {'outputFile': 'output_simulation.root'}, 'glob': False, 'relative_paths': False}}}}, {'name': 'skimData', 'dependencies': {'dependency_type': 'jsonpath_ready', 'expressions': ['createNanoData']}, 'scheduler': {'scheduler_type': 'singlestep-stage', 'parameters': [{'key': 'inputFile', 'value': {'stage': 'createNanoData.createNano', 'output': 'outputFile'}}, {'key': 'pngOutput', 'value': 'nMuonscheck_output_data_numEvent1000.png'}, {'key': 'pklOutput', 'value': 'output_data_numEvent1000.pkl'}, {'key': 'csvOutput', 'value': 'output_data_numEvent1000.csv'}], 'step': {'process': {'process_type': 'interpolated-script-cmd', 'interpreter': 'bash', 'script': 'pwd -LP\nls -lh\nsource /opt/cms/cmsset_default.sh\ncd /home/cmsusr/CMSSW_5_3_32/src/\neval `scramv1 runtime -sh`\ncmsRun Analyzer/AOD2NanoAOD/configs/general_cfg.py maxEvents={maxEvents} txtFile={inputFile} isData={isData}\n'}, 'environment': {'environment_type': 'docker-encapsulated', 'image': 'gitlab-registry.cern.ch/algomez/cms-open-data', 'imagetag': 'latest', 'resources': [], 'envscript': '', 'env': {}, 'workdir': None, 'par_mounts': []}, 'publisher': {'publisher_type': 'interpolated-pub', 'publish': {'outputFile': 'output_simulation.root'}, 'glob': False, 'relative_paths': False}}}}]}}, 'dataopts': {'initdir': '/var/reana/users/3938cb60-6819-4c9a-89e5-657f0beadae2/workflows/0419ee3e-c450-44e7-8df4-863804742f9c/'}, 'initdata': {}, 'ys': <yadage.steering_object.YadageSteering object at 0x7efe4ce129d0>}
2021-03-24 14:24:06,659 | reana-workflow-engine-yadage | MainThread | INFO | initializing REANA workflow tracker for id 0419ee3e-c450-44e7-8df4-863804742f9c
2021-03-24 14:24:06,659 | adage.pollingexec | MainThread | INFO | preparing adage coroutine.
2021-03-24 14:24:06,659 | adage | MainThread | INFO | starting state loop.
2021-03-24 14:24:06,664 | reana-workflow-engine-yadage | MainThread | INFO | sending progress information
2021-03-24 14:24:06,670 | reana-workflow-engine-yadage | MainThread | INFO | sending to REANA
                    uuid: 0419ee3e-c450-44e7-8df4-863804742f9c
                    json:
                    {
    "engine_specific": {
        "dag": {
            "edges": [],
            "nodes": []
        }
    },
    "planned": {
        "total": 0,
        "job_ids": []
    },
    "failed": {
        "total": 0,
        "job_ids": []
    },
    "total": {
        "total": 0,
        "job_ids": []
    },
    "running": {
        "total": 0,
        "job_ids": []
    },
    "finished": {
        "total": 0,
        "job_ids": []
    }
}
                    message:
                    this is a tracking log at 2021-03-24T14:24:06.669849
                    
2021-03-24 14:24:06,699 | adage.controllerutils | MainThread | INFO | no nodes can be run anymore and no rules are applicable
2021-03-24 14:24:06,726 | adage.controllerutils | MainThread | INFO | no nodes can be run anymore and no rules are applicable
2021-03-24 14:24:06,726 | adage | MainThread | INFO | unsubmittable: 0 | submitted: 0 | successful: 0 | failed: 0 | total: 0 | open rules: 2 | applied rules: 0
2021-03-24 14:24:11,950 | reana-workflow-engine-yadage | MainThread | INFO | sending progress information
2021-03-24 14:24:11,956 | reana-workflow-engine-yadage | MainThread | INFO | sending to REANA
                    uuid: 0419ee3e-c450-44e7-8df4-863804742f9c
                    json:
                    {
    "engine_specific": {
        "dag": {
            "edges": [],
            "nodes": []
        }
    },
    "planned": {
        "total": 0,
        "job_ids": []
    },
    "failed": {
        "total": 0,
        "job_ids": []
    },
    "total": {
        "total": 0,
        "job_ids": []
    },
    "running": {
        "total": 0,
        "job_ids": []
    },
    "finished": {
        "total": 0,
        "job_ids": []
    }
}
                    message:
                    this is a tracking log at 2021-03-24T14:24:11.956497
                    
2021-03-24 14:24:11,958 | reana-workflow-engine-yadage | MainThread | INFO | Finalizing the progress tracking for: <yadage.wflow.YadageWorkflow object at 0x7efe4cd9e2e0>
2021-03-24 14:24:11,961 | adage | MainThread | INFO | adage state loop done.
2021-03-24 14:24:11,961 | adage | MainThread | INFO | execution valid. (in terms of execution order)
2021-03-24 14:24:11,961 | adage | MainThread | INFO | workflow completed successfully.
2021-03-24 14:24:11,961 | yadage.steering_api | MainThread | INFO | done. dumping workflow to disk.
2021-03-24 14:24:11,963 | yadage.steering_api | MainThread | INFO | visualizing workflow.
2021-03-24 14:24:11,996 | root | MainThread | ERROR | Error while publishing channel disconnected
2021-03-24 14:24:11,996 | root | MainThread | INFO | Retry in 0 seconds.
2021-03-24 14:24:12,013 | root | MainThread | INFO | Workflow 0419ee3e-c450-44e7-8df4-863804742f9c finished. Files available at users/3938cb60-6819-4c9a-89e5-657f0beadae2/workflows/0419ee3e-c450-44e7-8df4-863804742f9c.