Sending results to CKAN

CsvPath is integrated with CKAN, the leading open source data portal.

CKAN is a portal purpose-built for data discovery and distribution. It is backed by the Open Knowledge Foundation and used by large-scale data publishers, from the US federal government's data.gov to LEGO.

CKAN is integrated with CsvPath through the event listener mechanism. When named-paths groups run, CKAN is notified and receives content. The integration is standard and requires only two minor changes to config.ini to activate.

See this page with more step-by-step guidance on getting started with CKAN.

Open config/config.ini (or wherever your config file is). We have two changes to make:

  • Enable the listener

  • Add the server details

Look for the [listeners] section in config.ini. Make sure the ckan.results key has the name of the CKAN listener class. Second, add ckan to the list in the groups key. If there are other groups enabled put ckan at the end after a comma. Your file should look like this:

Next, make sure you have a [ckan] section. It should have two keys:

  • server

  • api_token

The server key takes a URL for your CKAN instance. If you're just trying out CKAN you may be running CsvPath on the CKAN server or, more likely, on the same machine as you are running the CKAN docker containers. In this case, put in http://localhost:80.

The api_token key takes your CKAN token. Log into CKAN and open your profile (see the link at the top right of every page). You should see three tabs. Click on the tab for API tokens and create one. Paste the value into your config.ini on the api_token key. Your config.ini should look like this:

The directives

You're all integrated! Time to see what you can do with CKAN.

Publishing datasets to CKAN requires that you add directives to your csvpaths. The directives go in an external comment. An external comment is one that is outside the csvpath, above or below it. You can add as many CKAN directives, modes settings, user-defined metadata, etc. as you like in your comments.

A set of directives might look like:

Let's go through what these directives mean.

Directive
Values
Explanation

ckan-publish

always | never | on-valid | on-all-valid

These have the meaning you would expect. The default is that csvpath's results are not published to CKAN.

on-all-valid means that all csvpaths in the named-paths group must be determined to be valid.

Remember that is_valid is True by default. A csvpath that has an internally defined validation failure (e.g. treating "five" as an integer) may be marked invalid, depending on your configuration, without you having to use fail() explicitly.

ckan-group

use-archive | use-named-results | any name

Results will be associated with a group if this directive is used. If the indicated group doesn't exist it will be created.

use-archive means that the name in config.ini under the archive key is used to identify the group.

used-named-results makes the group name the same as the named-paths group's name. Remember that named-paths group names and their named-results names are the same.

Any name means any user-friendly word or words you like.

ckan-dataset-name

use-instance | use-named-results | var-value:name | any name

use-instance means use the identity property of the csvpath. The identity of a csvpath is the name or id field in its metadata (so set in an external comment) or its zero-based index in the run.

use-named-results means the dataset should have the same name as the named-paths group.

var-value points to a variable, the word after the colon. The value of the variable will be used for the name.

Alternatively, any word or words.

ckan-dataset-title

var-value:name | any title

var-value points to a variable, the word after the colon. The value of the variable will be used for the title. E.g. var-value:city would indicate that the dataset's title would be something like "New York City".

Alternatively, any user-friendly title.

ckan-visibility

public

The default is private. Any word other than public, or no setting, results in the dataset being marked private.

ckan-tags

Any alphanumeric words or numbers separated by -, _ or .. Separate multiple tags with commas.

CKAN's tags are super useful for grouping assets for management and discovery. These are assumed to be simple free-form tags, not CKAN's vocabulary-controlled tags.

ckan-show-fields

Names of metadata fields separated by commas. See any meta.json for the available fields.

This setting allows you to include any field from meta.json as a named-value in the dataset's main view.

Recall that meta.json contains all the metadata (settings, user defined fields, and ad hoc comments or documentation) and runtime data (line_number, number of matches, current headers, etc.)

ckan-send

data | printouts | unmatched | errors | meta | vars | manifest

The names of the standard files you want to send to CKAN, minus their extensions.

Note that transfers and any Jinja files are not an option. Also note that you do not need to call out individual printers.

ckan-split-printouts

split

A csvpath may have any number of Printer instances for various purposes.

Each printer prints to its own print stream (which may or may not include standard out on the console). All the printers' captured printout lines are added to the same printouts.txt with a separator marking each contribution.

The CKAN integration can split the different printers' print output into separate files before sending them to CKAN. If you do this, each file will be named by the name of its printer.

This feature can be useful for keeping different kinds of reports. E.g. you could let the built-in validations go to the default printer, but send your own business rule validations to a separate printer.

ckan-printouts-title

Any title

Any title for the default printer file

ckan-data-title

Any title

Any title

ckan-unmatched-title

Any title

Any title

ckan-vars-title

Any title

Any title

ckan-meta-title

Any title

Any title

ckan-errors-title

Any title

Any title

In general, user-friendly titles may include upper and lower case, spaces, and common punctuation. Most names, URL slugs, IDs, tags, etc. can only have alphanums, ., _, and - and will be lowercased.

Last updated