CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
NextDATA PREBOARDING

Last updated 1 month ago

The Framework For Automated Data Preboarding

Your inbound flat-files handled efficiently with automated quality.

This data onboarding blindspot is a big deal. Think about it. If even 1 in 30 companies depends heavily on CSV or Excel data, the lack of delimited file pre-boarding is a trillion-dollar problem. In our experience, 1 in 30 would be a low estimate.

The open source makes robust DataOps at the parameter easy. CsvPath Framework verifies that CSV, Excel, and other delimited and tabular data files meet expectations and enter the organization in a controlled way, acting as the Trusted Publisher to your data lake and applications. The Framework's approach to edge governance is opinionated, prescriptive, and super productive.

Delimited data validation is core to the Framework. CsvPath Validation Language is simple, easy to integrate, and flexible enough to handle the unexpected. Inspired by , , and , CsvPath Validation Language brings powerful data validation to less structured data. Coming from the world of DDL, XSD, or JSON Schema? .

The CsvPath Framework implements CsvPath Validation Language within a complete that makes data preboarding and trusted publishing faster, cost-efficient, and more effective. Out-of-the-box, CsvPath Framework fills the blindspot between MFT (managed file transfer) and the data lake with a simple path to provably correct data.

CsvPath Framework can help you build confidence that your organization's data governance doesn't turn a blind eye to your most unruly data. Take a look through these pages and cruise over to the on the to see if open source CSV and Excel data pre-boarding should be part of your DataOps toolkit.

CsvPath Framework
Schematron
XPath
the Collect, Store, Validate, Publish design pattern
Start here
Collect, Store, Validate Publish Pattern
detailed docs
CsvPath Github
GitHub - csvpath/csvpathGitHub
Page cover image
1MB
CsvPath - Data Onboarding Simplified.pdf
pdf
Logo for the CsvPath Framework
Your data lake deserves a data publisher it can trust!
A data flow diagram showing how CSV, Excel and other tabular data come into the organization through a preboarding process that acts as a Trusted Publisher to the data lake and applications.
Logos of the many popular DataOps tools that are integrated with CsvPath Framework: aws s3, azure, slack, Excel, opentelemetry, sftp, ckan, pandas, openlineage, and more
CsvPath has a bunch of built-in integrations. Suggest more!
Cover

5-minutes to get the idea

Cover

Cover

Get started with Edge Governance

Getting Started
Easy dataset publishing to the leading data portal
CsvPath + OpenLineage
Logo