Named Files and Paths

CsvPaths instances work with named-files, named-paths, and named-results. What are those?

  • A named-file is simply a name that points to a physical file location

  • A named-paths name points to a set of csvpaths that run together as a unit

  • Named-results are the results of running a set of named-paths; their names are the same

Named Files

Named-files are a convenience. It's a lot easier to ask CsvPaths to process orders with their validations like this:

lines = csvpaths.collect_paths(filename=august_orders, pathsname=orders_validation)

Rather than, potentially, something like:

lines = csvpaths.collect_paths( filename=/users/me/validation/orders/august/aug-31-2024.csv, pathsname=....what do I even enter here?

The latter is an illustration, not a real method call.

CsvPaths's file manager also takes care of caching and other background details that CsvPath instances on their own don't support.

Named Files

Named-paths are more interesting. The goal with named-paths is for us to be able to easily run multiple csvpaths against a single file in one go. The attraction to that is primarily that you can segment your validations into separate and composable csvpaths. As discussed in Validation Strategies and Another Example, Part 2, separate cvpaths can be important to:

  • Quality control of your validation

  • Maintainability

  • Reuse and efficient development

  • Performance

You can set up named-paths that are simple 1-to-1 names, like with named-files. But you can also have multiple csvpaths in one file or multiple files keyed by one named-paths name. The options are:

  • Put your csvpath files in a directory and import them under whatever name you like

  • Put your csvpath files in a directory and import them, each as separate named-path, optionally with multiple csvpaths per file

  • Read a JSON structure from a file that contains a Dict[str, List[str]] where the list of strings is a list of csvpaths

  • Do the same, but constructing the Dict[str, List[str]] yourself in Python

There is a table of the advantages of each approach here.

Keep in mind that order matters in CsvPath. The order of match components within a csvpath is most important. But the order csvpaths are run in may also have an impact. Depending on if you run your named-paths breadth-first (a.k.a. line-by-line) or serially, you can enable different interactions. The differences are discussed here. Having your separate csvpaths impact one another is optional, of course!

It is important to remember that order is important across csvpaths as well as within a single csvpath because when you import csvpaths from a directory the order is not guaranteed. By contrast, the order of csvpaths within single file is clear. Likewise, the order in the Dict[str, List[str]] structure is deterministic.

Named Results

CsvPaths instances keep named-results that store the outputs from named-paths runs. The name of the results is the same as the name of the paths that generated them. Named results are a collection of one Result object per CsvPath instance per csvpath string. The Result objects hold:

  • The CsvPath instance that ran each csvpath in the named-paths set

  • All the print output lines

  • The CSV file lines that matched the csvpath (optionally)

  • Any errors that happened (configurable in config.ini)

The CsvPath instance also holds the metadata and variables collections. All-in-all, named-results have a ton of data to support your validations.

Last updated