Page cover image
Logo for the CsvPath Framework

The Framework For Automated Data Preboarding

Your inbound flat-files handled efficiently with automated quality.

The open source CsvPath Framework makes robust DataOps at the parameter easy. CsvPath Framework verifies that CSV, Excel, and other delimited and tabular data files meet expectations and enter the organization in a controlled way, acting as the Trusted Publisher to your data lake and applications. The Framework's approach to edge governance is opinionated, prescriptive, and super productive.

Your data lake deserves a data publisher it can trust!

Delimited data validation is core to the Framework. CsvPath Validation Language is simple, easy to integrate, and flexible enough to handle the unexpected. Inspired by Schematron, XPath, and the Collect, Store, Validate, Publish design pattern, CsvPath Validation Language brings powerful data validation to less structured data. Coming from the world of DDL, XSD, or JSON Schema? Start here.

The CsvPath Framework implements CsvPath Validation Language within a complete Collect, Store, Validate Publish Pattern that makes data preboarding and trusted publishing faster, cost-efficient, and more effective. Out-of-the-box, CsvPath Framework fills the blindspot between MFT (managed file transfer) and the data lake with a simple path to provably correct data.

This data onboarding blindspot is a big deal. Think about it. If even 1 in 30 companies depends heavily on CSV or Excel data, the lack of delimited file pre-boarding is a trillion-dollar problem. In our experience, 1 in 30 would be a low estimate.

A data flow diagram showing how CSV, Excel and other tabular data come into the organization through a preboarding process that acts as a Trusted Publisher to the data lake and applications.

CsvPath Framework can help you build confidence that your organization's data governance doesn't turn a blind eye to your most unruly data. Take a look through these pages and cruise over to the detailed docs on the CsvPath Github to see if open source CSV and Excel data pre-boarding should be part of your DataOps toolkit.

Logos of the many popular DataOps tools that are integrated with CsvPath Framework: aws s3, azure, slack, Excel, opentelemetry, sftp, ckan, pandas, openlineage, and more
CsvPath has a bunch of built-in integrations. Suggest more!

Last updated