CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  1. Getting Started
  2. How-tos
  3. File references and rewind/replay how-tos

Doing rewind / replay, part 2

Can we achieve the same goals using only the CLI? Yes!

PreviousDoing rewind / replay, part 1NextReferring to named-file versions

Last updated 4 months ago

While the Python we used to drive was not complicated, using the CLI would be even quicker and in a triage situation might be a better option. So let's try that.

We'll use the same data and csvpaths for this second example. Only our use of the CLI is different. Because it is the same activity we'll do it quickly and let you refer back to part 1 for the background information.

Here are our files. Two sets of three csvpaths and one data file.

You stage the data and load the csvpaths like this:

Next, run the original from sourcemode.csvpaths. We're using the version where the source2 csvpath has the word working. Here are the steps:

The result is a bit messy, but if you look closely those printouts tell us the csvpaths worked. Anyway, the print() statements in the csvpaths can be removed when you want to show off this trick to your friends.

Running a rewind/replay in the CLI is easy. But remember to reload your csvpaths when you make changes. We have a second csvpaths file that has our changes, so load it now. Use the same named-paths name as you did the first time.

Now we're ready to rewind and replay using the same data generated by the source1 csvpath with the new csvpaths. The new csvpaths file is only different in the source2 and source3 csvpaths, so we're going to start with source2:

And there again we did the substitution of thinking for working.

You can see that rewind/replay based on immutable data is straightforward — using the CLI or Python. If you need to redo any part of your process and you don't want to start at the very beginning, you have an easy way to do that. And because the data is immutable and the processing is idempotent, there is little risk in iterating on a solution. Using the CLI, if you prefer that approach, is just icing on the cake.

Here's a quick video to show the steps we took. Try it yourself.

Rewind / replay, part 1
183B
people.csv
918B
sourcemode.csvpaths
924B
sourcemode2.csvpaths