CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  1. Getting Started
  2. Csv and Excel Validation

Your First Validation, The Lazy Way

Start here if you want a code-free introduction!

PreviousCsv and Excel ValidationNextYour First Validation, The Easy Way

Last updated 5 months ago

This page gives one way to do the Your First Validation exercise. In the other approaches to our the focus is on the CsvPath Language, but there was also a little Python to drive it. We can do it without the Python by using the CLI that comes with the CsvPath Library. Here's how.

We're going to use Poetry for our example project. You can .

Open the command line and type this:

poetry new first_example

Change first_example to any project name you like. You should see this:

cd into your new project. Next, add csvpaths to your project with:

poetry add csvpaths

You should see this:

We can now run the CsvPath Library's CLI with:

poetry run cli

If you're not a Poetry user, what we're doing is running a script defined in the pyproject.toml.

You can do the same with:

from csvpath.cli import Cli
Cli().loop()

The CsvPath CLI is bare-bones. Despite that, it is a useful way to do simple stuff fast. It is great for learning and basic CsvPath Language dev work.

You should see this:

You can select quit, for now.

    ~ 
     id: First Validation, Simplified!
     description: Check if a file is valid
     validation-mode: print, no-raise, fail 
    ~
    $[*][
    line(
        string.notnone("firstname"),
        string.notnone("lastname", 30),
        string("say")
    )
  ]

Add the example delimited data in example.csv to the assets dir. Use a trivial data set:

firstname,lastname,say
Sam,Cat,Meow...
Fred,Dog,Woof woof
Blue,Bird,Tweet!

Now we're ready to run the validation. Fire up the CLI again with poetry run cli. Select named-files. You should see this:

Hit return on add named-file. We're going to import your file into the FileManager's files area. The file manager is used whenever one of your CsvPaths instances needs to run a validation. When you hit return the CLI should ask you for a name for the file you are going to import:

Any name works. example would be a good choice. You should then see a selection of dir, file, or json.

Pick file. Next you will select your file by drilling down into your assets directory. Select your file and hit return.

Once your file is added you go back to the top menu. This time select named-paths. And in the next submenu pick add named-paths.

Again, you enter a name for the csvpaths you are adding. first would be a fine name. You next drill down to your first.csvpaths file in the assets dir. Select it and hit return. You'll be taken back to the top menu.

Now you're ready to run your example. Select run and hit return. You will be asked for the name of a file. Select your file's name, example.

Next you'll be asked for the name of your csvpaths:

You have just one named-paths name, first, so select that and hit return.

Now you get the question of what method you want to use to run your paths against your file. The options are fast forward or collect. As you may already know, fast-forward runs your validation, but doesn't collect the matching lines. Instead, it only collects variables, printouts, and errors. The collect method does collect the matching lines, in addition to variables, printouts, and errors.

As an aside, the library also allows you to step through a CSV path as its being validated, line-by-line. However, the CLI does not offer that option. You can easily do it programmatically using a CsvPaths instance's next_paths() method in a for line in csvpath.next_paths() loop.

For our purposes, either method works. Pick collect.

The CLI briefly tells you it is running. Then you're back at the top menu. You have successfully completed your first validation run. Congrats!

Now let's take a look at what resulted from our validation run. Select named-results.

The CLI is so simple it can only open our results in your operating system's file browser. But that will do for learning and developing. Select open named-result and select first. A new window opens to your runs of the first named-paths group runs. So far you have just one run. It should be timestamped for a minute ago.

Inside your first run you should see these files:

data.csv has all the lines from your example.csv file with no changes made. Our validation matched all the lines and we used the collect method (technically, CsvPaths.collect_paths()) so everything in the original file came through unchanged. errors.json is empty because there were no errors. We didn't set any variables, so vars.json is empty. And we didn't print anything as the run happened, so printouts.txt is also empty. Not a lot to see, here, but we were expecting that, so it is a good thing.

There is a good amount of metadata in meta.json. If you open that file you should see something like this:

And that's it. Your first validation. Simplified and no Python code involved. Not bad!

In your project dir, create a subdirectory called assets, or whatever name you like. We'll drop an example CSV file and your csvpath file there. Create a file called first.csvpath—or again, whatever name you like. Into it, paste the simplified version of the csvpath statement:

On line 16 you can see what file we used. It is the one you imported earlier. You can . And read this page for more information about .

First Validation Example
learn more about how the CsvPath Library manages files here
named-paths group validation results
super-simple first validation exercise
learn how to set Poetry up here