CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  1. Getting Started
  2. Csv and Excel Validation

Your First Validation, The Easy Way

CsvPath is a very flexible language. There is often a simpler way than you first thought.

PreviousYour First Validation, The Lazy WayNextYour First Validation, The Hard Way

Last updated 3 months ago

CsvPath Language offers both rules-based validation and schemas. Validation rules are powerful. A rule can do things that a schema can't do. But sometimes just clarifying the shape of the data is enough. Both approaches are important tools.

CsvPath's claim to fame is the power of rules-based validation — no other validation language offers the same capabilities for delimited, tabular data. But CsvPath is also really good at structural validation. You can see more examples and .

Here's our First Validation example again, this time using a simple schema. We'll use the line() function to specify what each line of data looks like in a valid CSV or Excel file. This snippet is the whole validation:

    line(
        string.notnone("firstname"),
        string.notnone("lastname", 30),
        string("say")
    )

We expect exactly three headers and require two of them to always have values.

We could run this using the CLI, but let's see how to do it with Python.

from csvpath import CsvPath

csvpath = """
    ~ 
     id: First Validation, Simplified!
     description: Check if a file is valid
     validation-mode: print, no-raise, fail 
    ~
    $trivial.csv[*][
    line(
        string.notnone("firstname"),
        string.notnone("lastname", 30),
        string("say")
    )
]"""

path = CsvPath().parse(csvpath)
path.fast_forward()
if not path.is_valid:
    print(f"The file is invalid")

Two lines to run the validation. Two lines to print a warning if the CSV file is invalid. Simple!

We also added some optional metadata and a mode configuration in a comment. The comment is the part between the ~ characters.

In the metadata we are saying that the id field will become the csvpath's identity. The identity is available on the CsvPath instance's identity property. It will be used in printing out any built-in validation errors.

The description is ours to use as we wish—it is a user-defined field. And the validation-mode is a setting that tells the CsvPath instance what to do when there is a validation error. In this case we want to print errors, but not raise exceptions, and we want the file to be marked as invalid if there are errors. We sometimes call this failing the file.

There's a lot more you could do, of course. This is barely the tip of the iceberg. Keep reading and experimenting!

read about the difference between these approaches here