Named-file Reference Queries
The simplest use of CsvPath is to simply stick with names to reference files, groups of csvpaths, and results. But sometimes using a simple name is not enough. When you get to that point you need to use CsvPath's Reference Language.
As we dig into references, particularly named-paths group references, you may want to skip to the CsvPath Validation Language area for a quick intro to csvpaths and named-paths groups.
CsvPath's default is to return the most recent file or results by name. Templates add a layer of complexity, but even then, asking for a file or results by name works great if you just need that most recent item. However, with or without the use of templates, you sometimes need to select data files in a dynamic way. There several typical reasons:
You may need to rerun a file for testing or to account for a bug
Maybe you missed a run
You are running multiple files at once
There are multiple systems using different files from the same named-file folder tree
The way we select files dynamically is to use references. References are essentially queries in the simple XPath-like CsvPath Reference Language. Like XPath, CsvPath Reference Language is a way to pick out resources according to their location within a dataset.
Note that CsvPath Reference Language is not the same as CsvPath Validation Language. Their goals, syntax, use case, and level of complexity are quite different.
The two languages only overlap in two places: 1.) some Validation Language functions use limited Reference Language paths to pull in data from other runs, and 2.) simplified local references are used in the print function to pull in variables, metadata, headers, etc. from the running csvpath.
Both of these limited uses of Reference Language within Validation Language are straightforward ways of pointing to data. They do not mean that CsvPath Reference Language is a part of CsvPath Validation Language.
Think of them like XPath and XSD in the XML world, or DML and DDL in the SQL world.
Where are references used?
References are mainly used to pick out files within named-files, individual csvpaths in named-paths groups, or named-results. Their form is:
The four sections are:
Root: a named-file name, named-path name, or named-result name
datatype: an indicator of the type of data we're looking for
name_one
: the most important id/name/date, etc. we're pointing toname_three
: a secondary id/name/date, etc. that helps determine what the reference is to
The datatypes are like XPath's axes or a scope, they tell you where the root name is. Effectively they determine what kind of reference you are dealing with. The main datatypes are:
files
: for references to a named-filecsvpaths
: for references to a named-paths groupresults
: for references to named-results
Secondarily, there are datatypes to indicate the internal data structures of a running csvpath:
variables
: to give access to a csvpath's variablesmetadata
: to pull user-defined metadata fieldscsvpath
: for access to runtime stats like the match count or line numberheaders
: to access the current header values, line by line
These latter four datatypes are primarily used "locally" from within a csvpath referring to data of the same csvpath. However, references to variables from different runs is a useful non-local use of these types. Local references forego the root name and instead start with just $.
, i.e. dollar sign dot.
References from CsvPath Validation Language that pick out variables from other runs are a case where you can use a # to point to variables from a specific csvpath instance in the other results, or even an earlier csvpath instance in the currently running results.
A reference of this type might look like:
This reference says to pull the value of the city
variable from the myinstance
csvpath variables from the most recent mypaths
named-results run. If you didn't use #myinstance
you would be pulling the city
variable from the union of all the variable sets created in the most recent run of mypaths
. Since two csvpath instances might both leave behind their own city
variable with a different value you might want to be more specific.
While this is a great use of references, it is not the most common syntax.
References use "pointers" to fill in parts of name_one
and name_three
. A pointer looks like a colon followed by a word or number. Using a pointer you can complete a date, provide an index, indicate a day, etc. The pointers are:
:all
— returns all matches.:all
is the default behavior. Using:all
is just more explicit.:from
— used in thecsvpaths
datatype to indicate a run should start from a certain csvpath:to
— like:from
, but indicating the run should stop with a certain csvpath:before
— selects all files registered before a date:after
— selects all files registered after a date:first
— selects the first file in a set of matching files:last
— selects the most recent or last match:yesterday
— converts to the datastamp of0:00:00
on the previous day:today
— converts to the timestamp of0:00:00
on the present day:n
(n = any integer from0
to99
) — indicates which match to return out of a set of matches
As noted above, in some few cases you can split the root, name_one, and name_three path segments using a #
. In fact, grammatically, you can always do this; however, the support for #
separated words having a good effect is inconsistent and the intended usage has not yet settled.
You might see the values created by a #
referred to as root_minor
and name_two
and name_four
.
But again, it is not common and unless directions say to use the capability you should not.
Examples
Named-file references
This points to the orders
named-file. It looks for a dataset in or below the EMEA/annual
folder. 20
may indicate a folder that has a name starting with 20
. The reference would also match a filename starting with 20
. Of all the registered files found, the first version registered is selected.
By default a reference will return all items unless there is a pointer indicating only a single match should be returned; e.g. :last
returns a single match, if any. The default return, therefore, is :all
. But in some cases you might want to be explicit and include :all
, even if you don't have to.
This path returns the sixth file registered under the named-file name. (Remember that indexes are 0-based).
:yesterday
and :today
are stand-ins for dates to make dynamic references a bit easier. Using the actual date is also pretty straightforward. This reference pulls all the files registered under the orders
named-file and returns the second one.
This reference matches a file by the SHA256 hash value of its content. It is an exact match. No two hash values are alike.
This reference returns all the files registered before Jan 1, 2025 at 2:30 PM UTC
. The match on the datetime is progressive, meaning that any part of the datetime you don't specify will be added in 0
s.
Another reference returning any number of files before a date. This time we're scoping the datetime limitation to those files that have a named-files path beginning with acme/2025
. To be clear, the reference would match any file located at staging/orders/acme/2025-EMEA-invoices.csv
and registered before the start of March 21, 2025, assuming staging
is the name we configured for our named-files area. It might match other filenames within the orders
named-file as well, for e.g. staging/orders/acme/2025-US-invoices.csv
.
Last updated