Sending results to CKAN
CsvPath is integrated with CKAN, the leading open source data portal.
Last updated
CsvPath is integrated with CKAN, the leading open source data portal.
Last updated
CKAN is a portal purpose-built for data discovery and distribution. It is backed by the Open Knowledge Foundation and used by large-scale data publishers, from the US federal government's data.gov to LEGO.
CKAN is integrated with CsvPath through the event listener mechanism. When named-paths groups run, CKAN is notified and receives content. The integration is standard and requires only two minor changes to config.ini
to activate.
See this page with more step-by-step guidance on getting started with CKAN.
Open config/config.ini
(or wherever your config file is). We have two changes to make:
Enable the listener
Add the server details
Look for the [listeners]
section in config.ini
. Make sure the ckan.results
key has the name of the CKAN listener class. Second, add ckan to the list in the groups
key. If there are other groups enabled put ckan
at the end after a comma. Your file should look like this:
Next, make sure you have a [ckan]
section. It should have two keys:
server
api_token
The server key takes a URL for your CKAN instance. If you're just trying out CKAN you may be running CsvPath on the CKAN server or, more likely, on the same machine as you are running the CKAN docker containers. In this case, put in http://localhost:80
.
The api_token
key takes your CKAN token. Log into CKAN and open your profile (see the link at the top right of every page). You should see three tabs. Click on the tab for API tokens and create one. Paste the value into your config.ini
on the api_token
key. Your config.ini
should look like this:
You're all integrated! Time to see what you can do with CKAN.
Publishing datasets to CKAN requires that you add directives to your csvpaths. The directives go in an external comment. An external comment is one that is outside the csvpath, above or below it. You can add as many CKAN directives, modes settings, user-defined metadata, etc. as you like in your comments.
A set of directives might look like:
Let's go through what these directives mean.
ckan-publish
always
| never
| on-valid
| on-all-valid
These have the meaning you would expect. The default is that csvpath's results are not published to CKAN.
on-all-valid
means that all csvpaths in the named-paths group must be determined to be valid.
Remember that is_valid
is True
by default. A csvpath that has an internally defined validation failure (e.g. treating "five"
as an integer
) may be marked invalid, depending on your configuration, without you having to use fail()
explicitly.
ckan-group
use-archive
| use-named-results
| any name
Results will be associated with a group if this directive is used. If the indicated group doesn't exist it will be created.
use-archive
means that the name in config.ini under the archive key is used to identify the group.
used-named-results
makes the group name the same as the named-paths group's name. Remember that named-paths group names and their named-results names are the same.
Any name means any user-friendly word or words you like.
ckan-dataset-name
use-instance
| use-named-results
| var-value:name
| any name
use-instance
means use the identity
property of the csvpath. The identity of a csvpath is the name
or id
field in its metadata (so set in an external comment) or its zero-based index in the run.
use-named-results
means the dataset should have the same name as the named-paths group.
var-value
points to a variable, the word after the colon. The value of the variable will be used for the name.
Alternatively, any word or words.
ckan-dataset-title
var-value:name
| any title
var-value
points to a variable, the word after the colon. The value of the variable will be used for the title. E.g. var-value:city
would indicate that the dataset's title would be something like "New York City"
.
Alternatively, any user-friendly title.
ckan-visibility
public
The default is private
. Any word other than public
, or no setting, results in the dataset being marked private.
ckan-tags
Any alphanumeric words or numbers separated by -
, _
or .
. Separate multiple tags with commas.
CKAN's tags are super useful for grouping assets for management and discovery. These are assumed to be simple free-form tags, not CKAN's vocabulary-controlled tags.
ckan-show-fields
Names of metadata fields separated by commas. See any meta.json
for the available fields.
This setting allows you to include any field from meta.json as a named-value in the dataset's main view.
Recall that meta.json contains all the metadata (settings, user defined fields, and ad hoc comments or documentation) and runtime data (line_number
, number of matches, current headers, etc.)
ckan-send
data
| printouts
| unmatched
| errors
| meta
| vars
| manifest
The names of the standard files you want to send to CKAN, minus their extensions.
Note that transfers and any Jinja files are not an option. Also note that you do not need to call out individual printers.
ckan-split-printouts
split
A csvpath may have any number of Printer instances for various purposes.
Each printer prints to its own print stream (which may or may not include standard out on the console). All the printers' captured printout lines are added to the same printouts.txt with a separator marking each contribution.
The CKAN integration can split the different printers' print output into separate files before sending them to CKAN. If you do this, each file will be named by the name of its printer.
This feature can be useful for keeping different kinds of reports. E.g. you could let the built-in validations go to the default printer, but send your own business rule validations to a separate printer.
ckan-printouts-title
Any title
Any title for the default printer file
ckan-data-title
Any title
Any title
ckan-unmatched-title
Any title
Any title
ckan-vars-title
Any title
Any title
ckan-meta-title
Any title
Any title
ckan-errors-title
Any title
Any title
In general, user-friendly titles may include upper and lower case, spaces, and common punctuation. Most names, URL slugs, IDs, tags, etc. can only have alphanums, .
, _
, and -
and will be lowercased.