CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • Defaults
  • An Example
  • Detailed Descriptions
  • Run Mode
  • Validation Mode
  • Logic Mode
  • Return Mode
  • Print Mode
  • Explain Mode
  • Unmatched Mode
  • Files Mode
  • Source Mode
  • Transfer Mode
  • Error Mode
  1. Topics

The Modes

PreviousThe Collect, Store, Validate PatternNextThe Reference Data Types

Last updated 4 months ago

In the context of a CsvPaths instance's run, an individual CsvPath instance can operate in several possible modes that allow you to configure its behavior without resorting to the global config.ini or applying settings programmatically. In particular, the modes help you configure groups of csvpaths more flexibly. You can use them to easily disable individual csvpaths or configure them differently than other csvpaths in the same named-paths group.

Modes are set in your csvpath's comments. The modes are:

  • : [bare / full]

  • : [explain / no-explain]

  • : (all or any combination of)

    • all

    • data / no-data

    • unmatched / no-unmatched

    • printouts / no-printouts

  • : [AND / OR]

  • : [default / no-default]

  • : [matches / no-matches]

  • : [run / no-run]

  • : preceding

  • : data / unmatched > var-name

  • : [keep / no-keep]

  • : (any combination of)

    • print / no-print

    • raise / no-raise

    • stop / no-stop

    • fail / no-fail

    • collect / no-collect

    • match / no-match

Modes are only set in external comments. External comments are comments that are outside the csvpath, above or below it. External comments can also have other user-defined metadata and plain text mixed in with mode settings. If a mode setting is followed by plain text there must be a stand-alone colon between the mode and the text.

Defaults

  • error-mode: defaults to bare, meaning error() and built-in errors are presented minimally

  • explain-mode: no explanations are logged when logging is set to INFO

  • files-mode: there is no check for optional files having been generated

  • logic-mode: match components are ANDed

  • print-mode: print statements go to the console

  • return-mode: matches are returned

  • run-mode: the csvpath is run

  • source-mode: the named-file that was passed to the named-paths group is used as input

  • transfer-mode: no result data transfer is made

  • unmatched-mode: the lines not returned are discarded

  • validation-mode: validation errors are only printed and logged

An Example

These settings are configured like in this example of two trivial csvpaths in a named-paths group called example:

~
   id: hello_world
   run-mode: no-run
~
$[*][ yes() ]

---- CSVPATH ----

~  
   id: next please!
   explain-mode: explain
   validation-mode: no-raise, print
   logic-mode: OR
   return-mode: matches
   unmatched-mode: keep
   print-mode: default :
   All of these mode settings are optional, of course! And they don't have to be written as neatly as this, either.   
~
$[*][
   import($example.csvpaths.hello_world)
   yes()
]

hello_world will not be run when the named-paths group runs, but it will be imported into the second csvpath identified as next please!. This example doesn't do much, but it gives an idea of how you can easily configure individual csvpaths within a group that will be run as a single unit. As you can see, some modes can take multiple values separated by commas.

Detailed Descriptions

Run Mode

Setting

no-run

The csvpath will not be run on its own. It only runs as an import into another csvpath that is runnable.

run

Run is the default.

Validation Mode

Validation mode controls how the CsvPath instance reacts to built-in validation errors. Built-in validation errors have two types:

  • Problems with the csvpath's syntax or structure

  • Problems with the data being validated

Setting

raise

The setting raise indicates that when a validation problem occurs, an exception should be raised that will likely halt the program. The opposite is no-raise. Setting neither value defaults the decision back to the global config.ini setting.

print

The print setting makes the CsvPath instance print validation messages to all configured Printer instances. The opposite is no-print.

stop

The stop mode setting makes the CsvPath instance stop as soon as a validation problem occurs. no-stop prevents this premature completion, enabling the CsvPath instance to alert and continue.

fail

The fail setting sets the csvpath being run to invalid. Effectively this means setting the CsvPath instance's is_valid property to False. The opposite setting is no-fail. Failing has no effect on the program or the validation run continuing.

match

When match is set a built-in validation error will match, rather than fail to match. The thing to remember is that this setting applies to errors in the data (e.g. adding "five", not 5) only. Errors in the CsvPath Language are still not allowed. As a practical example add("five", 5) never works, but add(@five, 5) always does because even if @five turns out to not be a number on a particular line we still match on it in accordance with this setting. Regardless of if you set match or not, if you don't have no-raise, your csvpath will blow-up on validation errors.

collect

Logic Mode

AND

AND is the default logic mode. It requires that all match components evaluate to True for a line to match.

OR

OR mode is similar to how the or() function works. Any match component that evaluates to true makes the line match.

Return Mode

Setting

matches

All the matching lines will be returned by next() or collect(). (fast_forward() never returns lines, regardless of mode). This is the default behavior.

no-matches

All the lines that fail to match will be returned.

Print Mode

CsvPath supports printing errors and user-defined messages to any number of Printer objects using the print() and error() functions. Printers send text to separate queues. By default a "standard out" printer is enabled that prints to the console, as well as to a file. If you don't want anything printed to the console you would set no-default.

Setting

default

When default is set the CsvPath instance prints to the console, as well as any other Printer instances you configure.

no-default

When no-default is set the standard console printer is disabled.

Explain Mode

Setting

explain

When set a step-by-step explanation of the values, assignments, match, etc. are dumped to INFO for each line in the file being processed. This can be a good aid to debugging but is performance expensive. The hit can be around 20-25%.

no-explain

no-explain is the default.

Unmatched Mode

Setting

keep

Return mode determines if matches or non-matches are returned. Unmatched mode determines if the non-returned lines are kept available in the Result instance or on the CsvPath instance. If the lines are kept and you are using a CsvPaths instance, the Result instance will be serialized to the archive directory and you will see an unmatched.csv file containing the lines.

no-keep

No lines that were not returned are kept.

Files Mode

The impact of files-mode is that the run instance manifest and the csvpath's manifest will show that files were created as expected, or not.

There are various reasons why printouts.txt, data.csv and unmatched.csv might not be generated. For e.g., if we expect no validation output from user-created print() statements or built-in validation error messages we might set the files-mode to no-printouts. If a validation error was then printed we would be alerted in the metadata. In another example, if we set unmatched-mode to no-keep (the default) and files-mode to unmatched we have a conflict that we'll be alerted to in the metadata. Similarly, if we set files-mode to data and then run fast_forward_paths() we will not get data.csv files and the metadata will alert us to the mismatch.

errors.json, vars.json, meta.json, and manifest.json are always generated, regardless of files-mode. When you set files-mode to all the CsvPath Library will double-check that meta, vars, errors were correctly created, but that part of its checking is superfluous.

Setting

all

All file types are expected to be generated

data / no-data

Determines if the data.csv file is expected

unmatched / no-unmatched

Determines if the unmatched.csv file is expected

printouts / no-printouts

Determines if we expect anything to be sent to the Printer instances using print()

Source Mode

Usually the data for a csvpath in a named-paths group comes from the data input for the whole group. I.e., all the csvpaths in the group run against the same source file. However, in some cases you might want the input to a csvpath to be the csvpath preceding it. Meaning that the results captured from the first csvpath are piped into the second. To do this, you set source-mode: preceding on the second csvpath.

Setting

preceding

Instructs the csvpath to use the output of the preceding csvpath in the named-paths group as its input data

Transfer Mode

Setting

data > var-name

Indicates you are transferring data.csv to the value of var-name as a relative path within the transfer directory

unmatched > var-name

Indicates unmatched.csv to the value of var-name

Error Mode

error-mode allows you to output errors with log-like information or as plain plain messages.

Setting

bare

Errors are output as simple strings

full

Errors are output according to the [errors] pattern config value using the following fields:

  • time: Time

  • file: Named-file name

  • line: Line number

  • paths: Named-paths name

  • instance: Csvpath instance ID/name

  • chain: Match component chain

  • message: Message

The default pattern is:

{time}:{file}:{line}:{paths}:{instance}:{chain}: {message}

The chain field gives the parent-child relationships from the top match component to the match component child that was the source of the error.

When a mode is not explicitly set CsvPath uses sensible defaults. Some modes default to options set in config/config.ini. For example, validation-mode overrides [errors] csvpath in config.ini. (.) Other defaults are built-in, for instance, logic-mode overrides the library's built-in default matching using ANDed operations. The defaults are:

When collect is set errors are captured. When no-collect is set they are dropped. You can drop errors and still fail a file to make it invalid; just as you can capture errors but choose to not use fail(). Keep in mind that when you don't collect errors CsvPath.has_errors() is False. Also bear in mind that if you are using the (e.g. to push events to Grafana, New Relic, etc.) you can choose to drop errors but still fire error events.

Keep in mind that CsvPaths instances' _collects methods and _by_line methods are . Source mode does not apply to by-lines runs—i.e. it is for linear, not breadth-first runs—because in a by-lines run each line is passed through each of the csvpaths in the named-paths group before the next line is considered. Csvpaths in a by-lines run can change data for downstream csvpaths in their named-paths group, and they can skip or advance the run in order to filter data so that downstream csvpaths don't have a chance at it. This just means that there are multiple ways of allowing earlier csvpaths to have an effect on later csvpaths.

, also , as well as .

transfer-mode let's you copy data.csv or unmatched.csv to an arbitrary location in the transfers directory. The transfers directory is configured in config/config.ini under [results] transfers. To use transfer-mode you use the form data | unmatched > var-name where var-name is the name of a variable that will be the relative path under the transfer directory to the data you are transferring. Note that transfer-mode has no effect on the original data, in keeping with CsvPath Library's copy-on-write semantics. You may have as many transfers as you like by separating them with commas. .

error-mode
explain-mode
files-mode
logic-mode
print-mode
return-mode
run-mode
source-mode
transfer-mode
unmatched-mode
validation-mode
Read here for more about the config file
quite different in how they handle data sources
Source mode has a lot to do with rewind/replay
references between data sets
strategies for validation and canonicalization
Read more about using transfer-mode here
OpenTelemetry integration