Run CsvPath on Jenkins
An example of using a simple automation tool to feed data to CsvPath
Last updated
An example of using a simple automation tool to feed data to CsvPath
Last updated
We run eight nightly builds testing CsvPath Framework with different storage backend and OS combinations using Jenkins. It occured to me that since using Jenkins is a good tool for automated CsvPath preboarding, our simple use case might make a good how-to.
First, what we did. Our setup is mainly for unit and integration tests. That's not exactly the same as the preboarding step in a data onboarding process, but it's close enough that it might be 90% of what you need.
We pulled the official Jenkins docker image and execed into the running container using a command like:
We installed Python and Poetry. CsvPath Framework supports Python 10 or greater. We use the latest version of Poetry. We installed Poetry using Pipx. The instructions for doing that are here. It should all take just a couple minutes.
Next, log into the Jenkins admin console. (By default: http://localhost:8080/). On the Dashboard, look for + New Item
at the top left. In the new item form, give your new workflow a name and choose Freestyle project
as the item type.
In the configuration form that opens you have a few options. Our goal was an automated build of the CsvPath project, so Git was the first important fill-in for us.
For you, Git may not be important. If you're just running a script to start onboarding data, and it doesn't live in Git, you wouldn't need this part. However, automation scripts should be in a source control system so the Git setting is important for most people.
Next, we wanted nightly builds. Your data may arrive on some other schedule, but let's say you want to run nightly. You would add a cron-like expression to say when. We used H H(21-23) * * *
to run at the end of every day.
Finally, we added a build step to run simple shell commands:
It looks like:
Our CSVPATH_CONFIG_PATH
env variable points to a config file in the assets directory of the project. You might just use the default config/config.ini
.
Remember that the [config] path
value in your config file must be the same as that of the file it is in. If it is not the same path, CsvPath will reload from that named config file. This makes it easy to bootstrap from a default config file generated by CsvPath. But if you aren't bootstrapping and don't want to use config/config.ini
you must remember to sync the path key with the file name.
Since that config file is specific to this Azure Blob Storage build, so we don't have anything to do there. Your config file will of course be different from ours.
Your script will also be different. We're running pytest. You would probably want to be running something like more like:
Admittedly, there are more elegant ways to create a simple preboarding script. But that's the basic Python code you'd need. 5 lines (+/- whitespace). Not bad. You'll need your own paths and names, of course.
Save the job and try running it.