DATA PREBOARDING

Why we all need to care about data preboarding and the trusted publisher model

Last updated 2 months ago

DATA PREBOARDING

Why we all need to care about data preboarding and the trusted publisher model

CsvPath is the leading tool for automated data preboarding. It is a purpose-built open source Python framework integrated with a wide variety of popular DataOps tools that acts as a trusted publisher between MFT and the data lake and applications.

What is Data Preboarding?

Data preboarding is the receiving process for external batch data. It is the first part of a robust data onboarding process. Preboarding assigns a durable identity, validates that the data meets expectations, upgrades it for productivity, and stages it in an immuable known-good archive for downstream consumers. Your data lake deserves a data publisher it can trust! Once data is preboarded it is no longer considered external.

Data preboarding may be a new term to you, or not; either way it is not a new concept. All data is preboarded on its way into the organization. The question is, how well does your onboarding process work? The experience of most companies is that the process is less reliable, holds more risk, and is much more expensive than is comfortable. Manual and error prone preboarding commonly diverts more than 2% of revenues to overhead. That's north of $20,000 per million or more than $20 million per billion in revenue. That adds up!

How does the CsvPath Framework help?

CsvPath is a drop-in replacement for rickety data landing zones. It is laser-focused on automated data preboarding. The Framework focuses on making the overall onboarding process efficient, fast, and safe by generating trustworthy data — and doing it in a way that scales operationally to any number of data partners. A company with one data partner needs effective preboarding. A company with a thousand data partners needs efficient preboarding that never fails. CsvPath Framework can help!

CsvPath brings many capabilities to the table:

An opinionated framework for collecting, identifying, validating and publishing data that enables you to spin up a new data partner project literally in seconds
Powerful schema and rules-based validation that has never before been available for delimited data
Explainability-focused metadata production that gives you the power to know exactly what happened as your data evolved
Out-of-the-box integrations for lineage tracking, observability, MFT (managed file transfer), and more

With CsvPath Framework you are signing up for a well-known pattern that settles the architecture and design questions up-front, leaving your team focused on data quality and accountability. And with CsvPath's the automation-forward approach, you can scale-down manual data quality efforts and scale up data throughput.

How to get started

Data pre-boarding is everywhere. And yet it is dramatically undertooled. We're on a mission to upgrade preboarding and make CsvPath Framework the world's trusted publisher. Welcome aboard!

PreviousCsvPath NextQuickstart