CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • PyPI and Github
  • Let's run something!
  1. Getting Started

Quickstart

PreviousDATA PREBOARDINGNextOrganizing Inbound Data

Last updated 2 months ago

This page gives you all the information you need to get started validating your CSVs with CsvPath. It is super high-level and quick. You will want to go deeper on other pages later. We're going to do a trivial validation of a CSV file. Validating an Excel file would be essentially the same.

If you need help getting started with Python, try . Starting with a project tool like or can also help.

Feel like skipping the Python? 🎥 or .

PyPI and Github

is available through as csvpath. The project is quite active. You should pin the version you use but update it regularly.

We use Poetry for our own development. If you choose Poetry, all you need to do is:

poetry new <<your project name>>
cd <<your project name>>
poetry add csvpath

If you prefer Pip, install CsvPath Framework with:

pip install csvpath

Have a look at the for more details. You can read that site in parallel to this one.

Let's run something!

The worker class in CsvPath is unsurprisingly called CsvPath. For simple validation, it is all you need.

For more complex situations and DataOps automation we use the manager class CsvPaths. But we'll come back to that in later pages. For now just know that it exists, has essentially the same API, and is equally lightweight to use.

To continue with the simplest possible Python, let's do a hello world.

Create a script file and import CsvPath:

from csvpath import CsvPath

Create a test CSV file. Save it as trivial.csv or whatever name you like.

Make a csvpath. Also a trivial one, just to keep it simple.

csvpath = """$trivial.csv[*][yes()]"""

This path says:

  • Open trivial.csv

  • Scan all the lines

  • Match every one of them

Here's everything:

from csvpath import CsvPath

path = """$trivial.csv[*][yes()]"""

cp = CsvPath()
cp.fast_forward(path)

if cp.is_valid:
    print("Totally valid!")
else:
    print("Not valid.")    

What does this script do?

  • Line 1: imports CsvPath so we can use it

  • Line 3: is our csvpath that we'll use to validate our test file, trivial.csv

  • Line 6: fast-forwards though the CSV file's lines. We could also step through them one by one, if we wanted to.

  • Line 8: checks if we consider the file valid. If the file didn't meet expectations our csvpath would have declared the file invalid using the fail() function.

When you run your script you should see something like:

Hello-world examples are never super impressive on their own. But you are now ready to dig in and see what CsvPath can really do.

The quickest way to bootstrap a CsvPath Framework project is the command line interface (CLI). The CLI is a super simple tool that is great for fast no-code development. To try the CLI, skip over to .

Next, try . Also check out the for more use cases and examples. If you'd like a helping hand, !

Your First Validation, The Lazy Way
Your First Validation, the Lazy Way
How-tos section
contact us
Python.org's intros
Poetry
Jupyter Notebooks
Watch these videos
try this Python-free CLI example
The open source CsvPath Framework
PyPI
Github project
75B
trivial.csv
Add CsvPath to your requirements.txt or dependency manger first, of course!
Screenshot of the open source CsvPath Framework's GitHub page
A trivial CSV file to do a simple intro validation using CsvPath Language
A trivial CsvPath Language validation statement. This statement can validate a CSV or Excel file.
A screenshot of the Python to run the validation file. It is the same as the code below.
A screenshot of what you should see when you run your validation.