CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • Configuration
  • Adding a script
  • First, let's see the hard way
  • The easy way is better!
  • What happens?
  1. Getting Started
  2. How-tos

Execute a script at the end of a run

PreviousSend run events to SqliteNextSend events to MySQL or Postgres

Last updated 2 months ago

Running a script at the end of a named-paths group run is a common need. Setting it up is a straightforward configuration file change plus one PathsManager method call.

Configuration

In config/config.ini you need a couple of things:

  • The listener enabled

  • The [scripts] section with two keys

First, make sure you have the [scripts] section. If your config.ini file is newly generated from the most recent point release, it will be there. Otherwise, add it like this:

The run_scripts key is enables or blocks all script running. By default script running is blocked. To run scripts you must have the value yes. The shell key is optional. CsvPath Framework uses it to create a shebang in the first line of your script if it doesn't see one. You can set shell to blank or remove the key, if you don't think it would be helpful.

You also need the listener import line at [listeners] scripts.results. Again, if your project is new you may have it. Otherwise, copy and paste from the code below.

Next, let's make sure the listener is active.

The listener group is scripts. There is only one event type that executes scripts: results. You need the scripts.results key under [listeners] to import the class. And you need the groups key to include scripts. If you have multiple listener groups enabled just remember to comma-separate them. Here's everything:

[listeners]
groups = scripts
scripts.results = from csvpath.managers.integrations.scripts.scripts_results_listener import ScriptsResultsListener

Adding a script

To add a script you do one of:

  • Add the script to the named-paths groups definition.json in a text editor by hand

  • Call the PathsManager's store_script_for_paths() method

The second option is the better and easier choice. Using the PathsManager method you have less chance for error. If the definitions file doesn't yet exist it is generated for you. However, should you want to add the script by hand, you can.

First, let's see the hard way

  • Open or create definitions.json wherever you keep your csvpaths prior to loading them. Your file doesn't need to be called definitions.json, but when you load it that is the name CsvPath Framework will use

  • Create a _config key with a dict

  • In the _config dict add a key with the named-paths group name that holds a dict

  • In the dict add a script type key that holds a script name

The script type is one of:

  • on_complete_all_script — executed on every run

  • on_complete_valid_script — executed when all csvpaths in the run are fully valid

  • on_complete_invalid_script — executed when any csvpath in the run is invaid

  • on_complete_errors_script — executed if there were any errors

What you get should look something like:

{
  "many": [
    "tests/test_resources/named_paths/many.csvpaths"
  ],
  "numbers": [
    "tests/test_resources/named_paths/zips.csvpaths",
    "tests/test_resources/named_paths/select.csvpaths"
  ],
  "needs split": [
    "tests/test_resources/named_paths/zips.csvpaths"
  ],
  "_config": {
    "many": {
      "template": ":1/:run_dir/:2",
      "on_complete_all_script": "complete_script.sh"
    }
  }
}

That's not super hard, but it's harder than doing it the easy way.

You would then put your script file, in this example complete_script.sh, in the named-paths home, the same directory as the definition.json ends up.

The easy way is better!

The easy way to add a script is to call a PathManager method:

from csvpath import CsvPaths

CsvPaths().paths_manager.store_script_for_paths(
    name="many", 
    script_name="complete_script.sh", 
    text="echo 'hello world'"
)

Here the named-paths group name is many. If we didn't want to run the script every time using on_complete_all_script, the default, we would add a parameter like script_type="on_complete_errors_script" or or one of the other script types.

The outcome of the method call is the same as the example of doing it by hand — just much easier to set up.

What happens?

When your named-paths group runs and the script type's condition is met, CsvPath Framework copies the script file into the run home directory and runs it. It captures the standard out and error out to a text file that has the same name as the script plus a timestamp.

And that's it.

The only caveat is that you cannot run scripts after named-paths group runs unless you are using the local filesystem backend — the default. If you are storing your archive in the cloud or on an SFTP server you will need to use another method to trigger actions. would be one option. You can to trigger actions and workflows in the cloud.

A cloud function
use Zapier, FTTT, or another webhook savvy tool with a named-paths webhook call
A script and its output captured to a timestamped file after a run is complete