CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • Directory structure and files
  • meta.json
  • errors.json
  1. Topics
  2. Data, Validation Files, and Storage

Where Do I Find Results?

CsvPath serializes results for your future use

PreviousNamed Files and PathsNextStorage Backends

Last updated 7 months ago

When you run a CsvPaths instance, your results are stored in several files in an archive directory. The results are the serialized version of all the run data that is accessible through the CsvPaths instance ResultsManager. If your config does not include an archive pointer the folder ./archive will be created, if needed, and used.

CsvPaths's results manager makes results of named-paths runs available under the named-paths name. A named-paths name identifies a set of csvpaths that are run as a unit. You request the results of a run like this:

paths = CsvPaths()
paths.paths_manager.add_named_paths_from_file( name="autogen", file_path="assets/response.csvpath")
paths.file_manager.add_named_file(name="data", path="assets/Medicare_Claims_data-550.csv")
paths.collect_by_line(pathsname="autogen", filename="data")
results = paths.results_manager.get_named_results('autogen')

What you get back is a list of Result, one result for each of the CsvPath instances that ran one of the csvpaths in the named-paths group. The results each hold the following information:

Directory structure and files

Data
File
Description

CSV Data

data.csv

This file contains all the lines matched or the lines that did not match, depending on if you set return-mode to match (the default) or no-match. If you use the fast_forward, fast_forward_paths, or fast_forward_by_line methods this file will be empty.

Metadata

meta.json

The metadata is the user defined metadata plus the modes and the csvpath identity. By default the csvpath identity is the index of the csvpath in the named-paths group. The metadata also has the full original external comments in one field, including the other metadata fields that were broken out individually.

Runtime data

meta.json

Printouts

printouts.txt

Print statements and validation messages go in this file. printouts.txt contains all the Printers' output. If you have multiple Printer instance you can print to one of them specifically; otherwise, by default the print string will go to all Printers. In printouts.txt each Printer's content is prefixed by a separator line like:

---- PRINTOUT: my_printer_name

Variables

vars.json

Variables is a dictionary of all the variables captured during the csvpath run. They are only from the specific csvpath this Result represents. If needed, the ResultsManager can provide a dictionary collected as a union of all the Result variables of every csvpath in the named-paths group.

Errors

errors.json

These are the errors collected, if any. Errors are collected by default; however, you can turn off error collection in the error policy in config.ini. Unlike raise, print, stop, and fail, there is no mode setting to override config.ini's error collection setting. The assumption is that in the usual case you don't want to drop errors, and that if you do it is an operational consideration, not a csvpath writer's choice.

Unmatched lines

unmatched.csv

If unmatched-mode is set to keep, a file named unmatched.csv is created to provide access to all the lines that were not returned during the run. If unmatched-mode is no-keep, the default, this file will not be created.

Your archive directory will look like this.

The directory structure is:

meta.json

The metadata file looks like this:

Notice the metadata key and the runtime_data key. This file holds both of those data sets.

errors.json

Errors.json has this structure.

In this case, the error message is a clear tell that this is a built-in validation error. When your own rules fail, most often you want those messages in the main printout stream.

The runtime fields are the data available under the csvpath data type accessed in print . For e.g. $.csvpath.line_number.

In this case, we are using an . We ran it three times from a file called main.py using the default archive directory. The .csvpath file had three csvpaths in it. You can see their identities in the names of the directories in the fourth column. Each csvpath has its own results files, visible in the last column.

autogenerated csvpath
references
An example of the archive directory structure