CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • Setup the link to CKAN
  • The directives
  1. Getting Started
  2. How-tos

Sending results to CKAN

CsvPath is integrated with CKAN, the leading open source data portal.

PreviousWorking with error messagesNextTransfer a file out of CsvPath

Last updated 5 months ago

is a portal purpose-built for data discovery and distribution. It is backed by the and used by large-scale data publishers, from the US federal government's to LEGO.

CKAN is integrated with CsvPath through the event listener mechanism. When named-paths groups run, CKAN is notified and receives content. The integration is standard and requires only two minor changes to config.ini to activate.

See this page with .

Setup the link to CKAN

Open config/config.ini (or wherever your config file is). We have two changes to make:

  • Enable the listener

  • Add the server details

Look for the [listeners] section in config.ini. Make sure the ckan.results key has the name of the CKAN listener class. Second, add ckan to the list in the groups key. If there are other groups enabled put ckan at the end after a comma. Your file should look like this:

Next, make sure you have a [ckan] section. It should have two keys:

  • server

  • api_token

The server key takes a URL for your CKAN instance. If you're just trying out CKAN you may be running CsvPath on the CKAN server or, more likely, on the same machine as you are running the CKAN docker containers. In this case, put in http://localhost:80.

The api_token key takes your CKAN token. Log into CKAN and open your profile (see the link at the top right of every page). You should see three tabs. Click on the tab for API tokens and create one. Paste the value into your config.ini on the api_token key. Your config.ini should look like this:

The directives

You're all integrated! Time to see what you can do with CKAN.

Publishing datasets to CKAN requires that you add directives to your csvpaths. The directives go in an external comment. An external comment is one that is outside the csvpath, above or below it. You can add as many CKAN directives, modes settings, user-defined metadata, etc. as you like in your comments.

A set of directives might look like:

Let's go through what these directives mean.

Directive
Values
Explanation

ckan-publish

always | never | on-valid | on-all-valid

These have the meaning you would expect. The default is that csvpath's results are not published to CKAN.

on-all-valid means that all csvpaths in the named-paths group must be determined to be valid.

Remember that is_valid is True by default. A csvpath that has an internally defined validation failure (e.g. treating "five" as an integer) may be marked invalid, depending on your configuration, without you having to use fail() explicitly.

ckan-group

use-archive | use-named-results | any name

Results will be associated with a group if this directive is used. If the indicated group doesn't exist it will be created.

use-archive means that the name in config.ini under the archive key is used to identify the group.

used-named-results makes the group name the same as the named-paths group's name. Remember that named-paths group names and their named-results names are the same.

Any name means any user-friendly word or words you like.

ckan-dataset-name

use-instance | use-named-results | var-value:name | any name

use-instance means use the identity property of the csvpath. The identity of a csvpath is the name or id field in its metadata (so set in an external comment) or its zero-based index in the run.

use-named-results means the dataset should have the same name as the named-paths group.

var-value points to a variable, the word after the colon. The value of the variable will be used for the name.

Alternatively, any word or words.

ckan-dataset-title

var-value:name | any title

var-value points to a variable, the word after the colon. The value of the variable will be used for the title. E.g. var-value:city would indicate that the dataset's title would be something like "New York City".

Alternatively, any user-friendly title.

ckan-visibility

public

The default is private. Any word other than public, or no setting, results in the dataset being marked private.

ckan-tags

Any alphanumeric words or numbers separated by -, _ or .. Separate multiple tags with commas.

CKAN's tags are super useful for grouping assets for management and discovery. These are assumed to be simple free-form tags, not CKAN's vocabulary-controlled tags.

ckan-show-fields

Names of metadata fields separated by commas. See any meta.json for the available fields.

This setting allows you to include any field from meta.json as a named-value in the dataset's main view.

Recall that meta.json contains all the metadata (settings, user defined fields, and ad hoc comments or documentation) and runtime data (line_number, number of matches, current headers, etc.)

ckan-send

data | printouts | unmatched | errors | meta | vars | manifest

The names of the standard files you want to send to CKAN, minus their extensions.

Note that transfers and any Jinja files are not an option. Also note that you do not need to call out individual printers.

ckan-split-printouts

split

A csvpath may have any number of Printer instances for various purposes.

Each printer prints to its own print stream (which may or may not include standard out on the console). All the printers' captured printout lines are added to the same printouts.txt with a separator marking each contribution.

The CKAN integration can split the different printers' print output into separate files before sending them to CKAN. If you do this, each file will be named by the name of its printer.

This feature can be useful for keeping different kinds of reports. E.g. you could let the built-in validations go to the default printer, but send your own business rule validations to a separate printer.

ckan-printouts-title

Any title

Any title for the default printer file

ckan-data-title

Any title

Any title

ckan-unmatched-title

Any title

Any title

ckan-vars-title

Any title

Any title

ckan-meta-title

Any title

Any title

ckan-errors-title

Any title

Any title

In general, user-friendly titles may include upper and lower case, spaces, and common punctuation. Most names, URL slugs, IDs, tags, etc. can only have alphanums, ., _, and - and will be lowercased.

CKAN
Open Knowledge Foundation
data.gov
more step-by-step guidance on getting started with CKAN