CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  1. Topics
  2. High-level Topics

Paths To Production

There are many paths. What's the simplest thing that could possibly work?

PreviousCsvPath Use CasesNextSolution Storming

Last updated 5 months ago

As you saw in and , there are two ways to use the CsvPath library.

  • Single csvpath pointing to a single file, simple results

  • Multi-csvpath

Both can work in a production setting. The latter is more flexible and offers more capabilities with little additional complexity. We'll focus on the multi-csvpath option.

The path to production starts with just getting the simplest possible thing working—a proof of concept. We can quickly sketch that out.

As our starting point, here is a high-level view of everything you need to know about the library. (Setting aside how to write csvpaths using the built-in functions, for now).

Here's what we think the simplest-possible thing might be, at least for some companies. (Apologies to non-AWS users). Could you do something simpler? Probably. Is reality more complicated. For sure!

Your Lambda, where CsvPath runs, is pretty simple. (Remembering this is just a sketch to give you ideas). In mostly accurate pseudo-code, it might look like:

Adding a simple email giving validation results using SES is an easy add-on. The Web is littered with SES email examples.

And that's pretty much all of it. Obviously, some assembly required. As they say, an exercise for the reader.

Our goal is to create an automated CSV validation capability. We're not actually picking the right technologies that work for your specific situation, obviously! Writing the minimal implementation code for this is an exercise for you and . A POC like what we are sketching should be pretty quick.

Imagine your CSV files are arriving by SFTP. You need a landing zone for your CSV files and csvpaths. Let's say you use AWS Transfer Family fronting S3. When a file arrives it triggers a Lambda that runs your csvpath validation. This scenario is .

And we can repurpose the multi-csvpath file from to stand in for your inbound file validation rules.

Likewise, pushing metadata or lines from the Results object into a database would be a straightforward database insert—at least if we're sticking with the .

Claude
similar to this tutorial
Another Example, Part 2
simplest-thing-that-could-possibly-work ethic
Your First Validation
Another Example Part 2
Pseudo-code for setting up CsvPaths in an AWS Lambda