CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • CsvPath Framework is the Trusted Publisher
  • What does that mean?
  • What does publishing mean?
  1. Getting Started
  2. Organizing Inbound Data
  3. How Data Progresses Through CsvPath Framework

Publishing

PreviousNamed-paths Reference QueriesNextInspect Run Results

Last updated 2 months ago

CsvPath Framework is the Trusted Publisher

CsvPath Framework is the internal trusted publisher for your data lake, streaming infrastructure, data warehouse, applications, and/or other analytical processing.

What does that mean?

It means that data published by CsvPath is either known-good or known-bad to a certain organizationally-defined standard. The data lake receives data at a certain minimal level of trustworthiness. The applications receive data that has passed quality, business, and governance checks so the application doesn't have to enforce those itself. The analytics team can go back to an immutable source with clear, detailed provenance any time their results come into question. Data trust is efficient and transitive. If the data lake can trust its data source, the downstream data consumers can trust the data lake.

What does publishing mean?

The Trusted Publisher can provide data in a push or a pull mode. CsvPath Framework can push data to the data lake or other data consumers by SFTP transfer, filesystem transfer, or pointing the Archive backend to a bucket or other location. Alternatively, the downstream consumer(s) can reach into the Archive, wherever you configure it to be, to access the data they need. Both work.

From a data operations perspective, the Trusted Publisher can sit on the same infrastructure as the data lake — essentially the CsvPath Framework backend is in the data lake — or it can stand separately as an upstream data source. Since we would expect CsvPath Framework to run in an automated, highly controlled way, either approach works fine. It just depends on how your team and its governance activities are organized.