CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  1. Getting Started
  2. How-tos
  3. Storage backend how-tos

Loading files from S3, SFTP, or Azure

PreviousStore source data and/or named-paths and/or the archive in AWS S3NextAdd a file by https

Last updated 3 months ago

Say you want to add a file currently sitting in S3 to CsvPath Framework as a named-file. That is to say, you want to give an easy name to a physical file and move it into CsvPath's inputs location from the bucket where it currently is in S3.

CsvPath Framework stores incoming files in the named-files directory tree. This is your data file registry. The location of the named-files directory is set in config/config.ini's [inputs] section under the files key. The named-files directories can live in any of the CsvPath storage backends. At this time those are:

  • The local filesystem

  • S3

  • SFTP

  • Azure Blob Storage

The file you want to register is also in one of these storage systems. How can you get it loaded into named-files? Easy, just use:

  • A relative or absolute path to the file in the local system, or

  • A URL like s3://bucket/name, or

  • An sftp://server:port/path/to/my/file URL, or

  • A URL like azure://container/key

In the case of S3 and SFTP you also need credentials. That will require one of these options:

  • For AWS, add an AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY pair to the environment

  • For SFTP, credentials are set in the [sftp] section's username and password keys. Use all caps to reference environment variables.

That takes care of the left-hand side of this picture. As you may already know, the right-hand side is setup simply using the [inputs] section and the flies and csvpaths keys. You can .

read more about that here
This is just an example. You can mix and match storage backends as you like.
Moving csv or excel files into CsvPath Framework from S3 to SFTP