The Modes
In the context of a CsvPaths
instance's run, an individual CsvPath
instance can operate in several possible modes that allow you to configure its behavior without resorting to the global config.ini
or applying settings programmatically. In particular, the modes help you configure groups of csvpaths more flexibly. You can use them to easily disable individual csvpaths or configure them differently than other csvpaths in the same named-paths group.
Modes are set in your csvpath's comments. The modes are:
error-mode
: [bare
/full
]explain-mode
:
[explain
/no-explain
]files-mode
:
(all or any combination of)all
data
/no-data
unmatched
/no-unmatched
printouts
/no-printouts
logic-mode
:
[AND
/OR
]print-mode
:
[default
/no-default
]return-mode
:
[matches
/no-matches
]run-mode
:
[run
/no-run
]source-mode
:
preceding
transfer-mode
:
data
/unmatched
>
var-name
unmatched-mode
:
[keep
/no-keep
]validation-mode
:
(any combination of)print
/no-print
raise
/no-raise
stop
/no-stop
fail
/no-fail
collect
/no-collect
match
/no-match
Modes are only set in external comments. External comments are comments that are outside the csvpath, above or below it. External comments can also have other user-defined metadata and plain text mixed in with mode settings. If a mode setting is followed by plain text there must be a stand-alone colon between the mode and the text.
Defaults
When a mode is not explicitly set CsvPath uses sensible defaults. Some modes default to options set in config/config.ini
. For example, validation-mode
overrides [errors] csvpath
in config.ini
. (Read here for more about the config file.) Other defaults are built-in, for instance, logic-mode
overrides the library's built-in default matching using ANDed operations. The defaults are:
error-mode
: defaults tobare
, meaningerror()
and built-in errors are presented minimallyexplain-mode
: no explanations are logged when logging is set toINFO
files-mode
: there is no check for optional files having been generatedlogic-mode
: match components are ANDedprint-mode
: print statements go to the consolereturn-mode
: matches are returnedrun-mode
: the csvpath is runsource-mode
: the named-file that was passed to the named-paths group is used as inputtransfer-mode
: no result data transfer is madeunmatched-mode
: the lines not returned are discardedvalidation-mode
: validation errors are only printed and logged
An Example
These settings are configured like in this example of two trivial csvpaths in a named-paths group called example
:
hello_world
will not be run when the named-paths group runs, but it will be imported into the second csvpath identified as next please!
. This example doesn't do much, but it gives an idea of how you can easily configure individual csvpaths within a group that will be run as a single unit. As you can see, some modes can take multiple values separated by commas.
Detailed Descriptions
Run Mode
no-run
The csvpath will not be run on its own. It only runs as an import into another csvpath that is runnable.
run
Run is the default.
Validation Mode
Validation mode controls how the CsvPath
instance reacts to built-in validation errors. Built-in validation errors have two types:
Problems with the csvpath's syntax or structure
Problems with the data being validated
raise
The setting raise
indicates that when a validation problem occurs, an exception should be raised that will likely halt the program. The opposite is no-raise
. Setting neither value defaults the decision back to the global config.ini setting.
print
The print
setting makes the CsvPath
instance print validation messages to all configured Printer instances. The opposite is no-print
.
stop
The stop
mode setting makes the CsvPath
instance stop as soon as a validation problem occurs. no-stop
prevents this premature completion, enabling the CsvPath
instance to alert and continue.
fail
The fail
setting sets the csvpath being run to invalid. Effectively this means setting the CsvPath
instance's is_valid
property to False
. The opposite setting is no-fail.
Failing has no effect on the program or the validation run continuing.
match
When match
is set a built-in validation error will match, rather than fail to match. The thing to remember is that this setting applies to errors in the data (e.g. adding "five"
, not 5
) only. Errors in the CsvPath Language are still not allowed. As a practical example add("five", 5)
never works, but add(@five, 5)
always does because even if @five
turns out to not be a number on a particular line we still match on it in accordance with this setting. Regardless of if you set match
or not, if you don't have no-raise
, your csvpath will blow-up on validation errors.
collect
Logic Mode
AND
AND
is the default logic mode. It requires that all match components evaluate to True for a line to match.
OR
OR
mode is similar to how the or()
function works. Any match component that evaluates to true makes the line match.
Return Mode
matches
All the matching lines will be returned by next()
or collect()
. (fast_forward()
never returns lines, regardless of mode). This is the default behavior.
no-matches
All the lines that fail to match will be returned.
Print Mode
CsvPath supports printing errors and user-defined messages to any number of Printer
objects using the print()
and error()
functions. Printers send text to separate queues. By default a "standard out" printer is enabled that prints to the console, as well as to a file. If you don't want anything printed to the console you would set no-default
.
default
When default
is set the CsvPath
instance prints to the console, as well as any other Printer instances you configure.
no-default
When no-default
is set the standard console printer is disabled.
Explain Mode
explain
When set a step-by-step explanation of the values, assignments, match, etc. are dumped to INFO for each line in the file being processed. This can be a good aid to debugging but is performance expensive. The hit can be around 20-25%.
no-explain
no-explain
is the default.
Unmatched Mode
keep
Return mode determines if matches or non-matches are returned. Unmatched mode determines if the non-returned lines are kept available in the Result
instance or on the CsvPath
instance. If the lines are kept and you are using a CsvPaths
instance, the Result
instance will be serialized to the archive
directory and you will see an unmatched.csv
file containing the lines.
no-keep
No lines that were not returned are kept.
Files Mode
The impact of files-mode
is that the run instance manifest and the csvpath's manifest will show that files were created as expected, or not.
There are various reasons why printouts.txt, data.csv and unmatched.csv might not be generated. For e.g., if we expect no validation output from user-created print()
statements or built-in validation error messages we might set the files-mode
to no-printouts
. If a validation error was then printed we would be alerted in the metadata. In another example, if we set unmatched-mode
to no-keep
(the default) and files-mode
to unmatched
we have a conflict that we'll be alerted to in the metadata. Similarly, if we set files-mode
to data
and then run fast_forward_paths()
we will not get data.csv
files and the metadata will alert us to the mismatch.
errors.json
, vars.json
, meta.json
, and manifest.json
are always generated, regardless of files-mode
. When you set files-mode
to all
the CsvPath Library will double-check that meta, vars, errors were correctly created, but that part of its checking is superfluous.
all
All file types are expected to be generated
data
/ no-data
Determines if the data.csv file is expected
unmatched
/ no-unmatched
Determines if the unmatched.csv file is expected
printouts
/ no-printouts
Determines if we expect anything to be sent to the Printer
instances using print()
Source Mode
Usually the data for a csvpath in a named-paths group comes from the data input for the whole group. I.e., all the csvpaths in the group run against the same source file. However, in some cases you might want the input to a csvpath to be the csvpath preceding it. Meaning that the results captured from the first csvpath are piped into the second. To do this, you set source-mode: preceding
on the second csvpath.
Keep in mind that CsvPaths
instances' _collects
methods and _by_line
methods are quite different in how they handle data sources. Source mode does not apply to by-lines runs—i.e. it is for linear, not breadth-first runs—because in a by-lines run each line is passed through each of the csvpaths in the named-paths group before the next line is considered. Csvpaths in a by-lines run can change data for downstream csvpaths in their named-paths group, and they can skip or advance the run in order to filter data so that downstream csvpaths don't have a chance at it. This just means that there are multiple ways of allowing earlier csvpaths to have an effect on later csvpaths.
Source mode has a lot to do with rewind/replay, also references between data sets, as well as strategies for validation and canonicalization.
preceding
Instructs the csvpath to use the output of the preceding csvpath in the named-paths group as its input data
Transfer Mode
transfer-mode
let's you copy data.csv
or unmatched.csv
to an arbitrary location in the transfers
directory. The transfers
directory is configured in config/config.ini
under [results] transfers
. To use transfer-mode
you use the form data
| unmatched
>
var-name where var-name is the name of a variable that will be the relative path under the transfer
directory to the data you are transferring. Note that transfer-mode
has no effect on the original data, in keeping with CsvPath Library's copy-on-write semantics. You may have as many transfers as you like by separating them with commas. Read more about using transfer-mode here.
data
>
var-name
Indicates you are transferring data.csv
to the value of var-name as a relative path within the transfer
directory
unmatched
>
var-name
Indicates unmatched.csv to the value of var-name
Error Mode
error-mode
allows you to output errors with log-like information or as plain plain messages.
bare
Errors are output as simple strings
full
Errors are output according to the [errors] pattern
config value using the following fields:
time
: Timefile
: Named-file nameline
: Line numberpaths
: Named-paths nameinstance
: Csvpath instance ID/namechain
: Match component chainmessage
: Message
The default pattern is:
{time}:{file}:{line}:{paths}:{instance}:{chain}: {message}
The chain
field gives the parent-child relationships from the top match component to the match component child that was the source of the error.
Last updated