CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • Named Files
  • Named Files
  • Named Results
  1. Topics
  2. Data, Validation Files, and Storage

Named Files and Paths

PreviousData, Validation Files, and StorageNextWhere Do I Find Results?

Last updated 5 months ago

CsvPaths instances work with named-files, named-paths, and named-results. What are those?

  • A named-file is simply a name that points to a physical file location

  • A named-paths name points to a set of csvpaths that run together as a unit

  • Named-results are the results of running a set of named-paths; their names are the same

Named Files

Named-files are a convenience. It's a lot easier to ask CsvPaths to process orders with their validations like this:

lines = csvpaths.collect_paths(filename=august_orders, pathsname=orders_validation)

Rather than, potentially, something like:

lines = csvpaths.collect_paths( filename=/users/me/validation/orders/august/aug-31-2024.csv, pathsname=....what do I even enter here?

The latter is an illustration, not a real method call.

CsvPaths's file manager also takes care of caching and other background details that CsvPath instances on their own don't support.

Named Files

Named-paths are more interesting. The goal with named-paths is for us to be able to easily run multiple csvpaths against a single file in one go. The attraction to that is primarily that you can segment your validations into separate and composable csvpaths. As discussed in and , separate cvpaths can be important to:

  • Quality control of your validation

  • Maintainability

  • Reuse and efficient development

  • Performance

You can set up named-paths that are simple 1-to-1 names, like with named-files. But you can also have multiple csvpaths in one file or multiple files keyed by one named-paths name. The options are:

  • Put your csvpath files in a directory and import them under whatever name you like

  • Put your csvpath files in a directory and import them, each as separate named-path, optionally with multiple csvpaths per file

  • Read a JSON structure from a file that contains a Dict[str, List[str]] where the list of strings is a list of csvpaths

  • Do the same, but constructing the Dict[str, List[str]] yourself in Python

It is important to remember that order is important across csvpaths as well as within a single csvpath because when you import csvpaths from a directory the order is not guaranteed. By contrast, the order of csvpaths within single file is clear. Likewise, the order in the Dict[str, List[str]] structure is deterministic.

Named Results

  • The CsvPath instance that ran each csvpath in the named-paths set

  • All the print output lines

  • The CSV file lines that matched the csvpath (optionally)

The CsvPath instance also holds the metadata and variables collections. All-in-all, named-results have a ton of data to support your validations.

There is a table of the advantages of each approach .

Keep in mind that order matters in CsvPath. The order of match components within a csvpath is most important. But the order csvpaths are run in may also have an impact. Depending on if you run your named-paths breadth-first (a.k.a. line-by-line) or serially, you can enable different interactions. The differences are . Having your separate csvpaths impact one another is optional, of course!

CsvPaths instances keep named-results in that stores the outputs from named-paths runs. The name of the results is the same as the name of the paths that generated them. Named results are a collection of one Result object per CsvPath instance per csvpath string. The Result objects hold:

Any errors that happened (configurable in )

Validation Strategies
Another Example, Part 2
here
discussed here
an archive
config.ini