CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • Be careful what you ask for
  • Back to error messages!
  1. Getting Started
  2. How-tos

Working with error messages

PreviousAnother Example, Part 2NextSending results to CKAN

Last updated 4 months ago

The CsvPath Framework has rich and flexible built-in error handling. Let's look at how you can use it in the CLI.

For this how-to, create a CsvPath project called title_fix. (We're not going to actually fix titles in this example, that's just a handy example project). Using Poetry, that would be something like:

poetry new title_fix
cd title_fix
poetry add csvpath

Drop these files in an assets directory (or wherever you like).

If you haven't seen this data file before, it is a small cut of a public dataset of books checked out in the Seattle library system.

When your project is ready, fire up the CLI with poetry run cli. When you do that CsvPath will create directories for config, logs, etc. We don't need to work on them yet, though.

First, add the named-file:

Next, add the named-paths group:

Now we're ready to run. At the top menu, click run.

And we're off and running.

This is where it gets interesting. Your results should look like this:

What you are seeing is five errors. The left part of each line is a set of information intended to tell you exactly where the problem happened. The last part on the right is the error message. In this case, numbers in brackets were rightly deemed to be invalid integers.

The amount of detail you see here is intended to help you, as a csvpath writer, diagnose a problem brought to you by an ops team member after an error was observed in production. As you may know, there is far more information available in the metadata captured by the CsvPath Framework — all so you can answer the question: what just happened? And then do something about it.

Let's break down a line of what you're seeing:

2025-02-06 23h45m41s-835847:title_fix:4:title_fix_schema:0:checkout.integer[11]:  Cannot convert [2019] to int

These are the fields:

  • time: 2025-02-06 23h45m41s-835847

  • named-file: title_fix

  • line: 4

  • named-paths: title_fix_schema

  • chain: 0:checkout.integer[11]

  • message: Cannot convert [2019] to int

The chain field may be new, let me explain. This field points uniquely to the match component that was the source of the error. In this case 0 is the index of the component. The 0th match component in our csvpath is line.checkout() . The line() function is how we define schema entities. This entity is called checkout. Within checkout we have 12 fields, starting with these three strings:

string.notnone(#UsageClass),
string(#CheckoutType),
string.notnone(#MaterialType),
...

The integer[11] part of the chain says that we should look at the function in position 11 (the 12th function within line(), because 0-based). That integer on line 4 of the data file is the source of the validation error. Admitedly this would be easiler to read if we named our functions better. So let's quickly do that in the csvpath:

The result of running the modified csvpath is a nicer error message:

2025-02-07 00h00m57s-958206:title_fix:4:title_fix_schema:0:checkout.year:  Cannot convert [2019] to int

It's easy to see that the 0th match component is an entity called checkout and on line 4 its year was invalid. Not bad. But can we do better? Maybe we don't need all this information right at the moment.

There are two directions to go at this point. On the one hand, we can cut out the extra info by switching it off. Or, on the other hand, we can streamline things by defining a simpler pattern. Let's do both, in order.

At the CLI's top menu select config. This opens a dialog that allows you to tweak a few config options that are most useful for debugging.

The CsvPath Framework separates some config options into a CsvPath instances setting vs. a CsvPaths instance setting. When you use the CLI you always have a CsvPaths instance that manages one CsvPath instance per csvpath expression in a named-paths group.

When you use this dialog you will be setting both the parent CsvPaths and its CsvPath children to behave the same way. When you step away from the CLI and work programmatically you can be more specific, if needed. The option to split the config makes a difference to ops because the system that runs validations has different error reporting needs from the validations themselves. That's a whole other interesting conversation.

In this page we're all about the last option in the dialog: Print detailed errors. As you can imagine, this is where we can choose to not see all those fields we talked about above. First, however, a more general word about debugging.

Be careful what you ask for

One of the challenges with CsvPath Framework and CsvPath Language is their flexibility. There's generally a few ways to attack a problem. That means you have to be careful to think through what you're seeing when you debug. This dialog is a case in point.

If you set Raise exceptions, your runs will stop at the first problem encountered. Likewise, if you select Stop on errors the Framework will stop when it runs into an issue; it won't, however, throw an exception — again, the difference is an operational concern. If you want to see all your errors at once you need to suppress exceptions and not stop at errors. But keep in mind, it is possible to halt on an error, or on the use of stop(), without there being an error message. Likewise, it is possible to suppress exceptions and then not realize you encountered them.

All this flexibility is there for important operational reasons. You just have to be mindful of it.

Back to error messages!

Uncheck Print detailed errors, if it is selected. When you hit Ok (use tab or the mouse) you are setting a key in the errors section of config/config.ini: use_format. use_format is either full or bare. What we saw above was full. Now you've set CsvPath Framework to report errors as leanly as possible:

Nice and clean, right? But you don't see a line number. A line number may be useful information for larger files than our example data. We can add that back in by turning detailed errors back on but also customizing the message pattern. Go ahead and open the config window and reset Print detaled errors.

To set the new pattern we have to exit the CLI. The CLI is constantly improving, but at this time it does not offer a way to change the [errors] pattern key. No matter, editing config/config.ini is painless. Open it and look near the top for the [errors] section. If you don't see a pattern key you can add one. (It should be there in the auto-generated config.ini, but if you are using an older CsvPath Framework install you may need to add the key yourself.)

Create a pattern like the one shown above: {file}:{line}:{chain}: {message}. Then save config.ini and restart your CLI.

Now when you run the title_fix named-file against the title_fix_schema named-paths you will get a less cluttered set of messages with just the information you need to start debugging your data and/or csvpaths.

One last call out. If you haven't tried creating rules-based error messages using the error() function you should. The error() function allows you to generate error messages that are co-equal to the built-in validation errors CsvPath Language provides. That means that when you do something like:

not( #PublicationYear ) -> error("You must provide a publication year")

Your error message will be available with the same fields as the built-in error would have — or none, if you turn the details off. And your error will be generated into the errors.json file that captures all of a run's errors in a machine-friendly format. Pretty cool, right?

All of this gets even more fun when you remember that a csvpath writer can override the Framework's config settings on a csvpath-by-csvpath basis using . The reason the modes exist is so that ops teams can set a standard config that csvpath writers can override during development or because they have more specific requirements and/or greater knowledge of the data.

the modes
4KB
checkouts.csv
592B
title_fix_schema.csvpaths
Our data is a single file so just select the file option
Select file and navigate to your csvpath file
I happen to have two csvpaths. You'll just see the one you added a moment ago.
Either method works, but let's go with collect
Our integer is now a year