CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  1. Getting Started
  2. Organizing Inbound Data
  3. The Three Data Spaces

Validation Assets

PreviousSource StagingNextTrusted Publishing

Last updated 2 months ago

CsvPath Framework has delimited data validation and upgrading as a core feature. CsvPath Validation Language statements are like SQL queries, XQuery statements or XPath paths. A Validation Language statement is called a csvpath.

The Framework enables you to apply multiple csvpaths to one or more source files in a run as a unit. Using multiple csvpaths in the same run allows you to decompose your validation and upgrading steps for easier development and testing, as well as to separate each csvpath's metadata and documentation for clarity. If your CSV or Excel file validation and upgrading requirements have 10, 20, 50 or 100 rules, being able to separate the rules for development and testing, while running them in production as a unit, is very helpful.

The validation assets area is relatively flat. It is a directory containing a folder for each named-paths group. Each named-paths group directory contains a group.csvpaths file and a manifest.json. It may also include a definition.json file. These files are for:

  • group.csvpaths holds all the csvpath statements in the named-paths group. (You can keep the statements in separate files for development. When you load multiple csvpaths into a named-group CsvPath Framework automatically compiles them into a single group.csvpaths file)

  • manifest.json holds metadata collected by CsvPath Framework about your named-paths group and its csvpath statements

  • definitions.json is optionally created by the csvpath writer. It contains a JSON structure that defines the order of the csvpaths and their original locations. (They can be anywhere reachable by the CsvPath Framework process). It also holds configuration options for the named-paths group — most especially, a template defining the archive location of the results files

CsvPath Framework does not keep versions of named-paths groups like it does named-files. The reason for this is that we anticipate that most developers will have their code, including csvpaths, in a source code revision control system like Git. CsvPath Framework does not try to do Git's job.

CsvPath Framework keeps validation assets in a shallow directory tree