Add a file by https
You can register named-file content using HTTP or HTTPS in the same way that you would using S3, SFTP, or the local filesystem.
While adding files by HTTP is a snap, the remote file name in the URL may not be helpful. CSV and Excel files on the web sometimes come out of applications. When they do they may lack a regular file name. Luckily there's an easy way to update the registered content with a name.
Happy path
First, let's create a simple harness. Our goal is to register a file from the web in CsvPath Framework. We are importing it, or staging it, as a named-file. For context we'll run a csvpath against our new content and access the results.
The main event is the method call starting on line 6 that adds a named-file called orders
. The new version of orders
is coming from a Google Drive account with a long opaque HTTPS URL. So far so good.
Let's fix that name
However, when we look at the registered file's manifest there is a gotcha. Our manifest is at ./inputs/named_files/orders/manifest.json
. (If you aren't using the default location for named-files your path will be different). When we open it we see:
Lots of things went right. Our time, uuid, from URL, and fingerprint are fine. But the file type should be csv
and the file name and file home are garbled because the HTTP URL didn't point to a physical file so much as identify an item of content held by the Google Drive application.
Since we know the data we're downloading is CSV data and we know what it is about, we can easily update the named-file to add clarity. We'll use the patch_named_file
method. The patch_named_file
method is on FileRegistrar
. The FileManager
that you use to add a named-file has a registrar to keep track of file metadata. It can help us easily compensate for HTTP's deficiencies.
The fix is line 10. We're passing a "patch" that will change the type of the file to cvs
and the name of the file to download.csv
. The FileRegistar
updates the manifest.json
so everything tics and ties. This is what you should see:
And you're good. The orders
named-file is ready to work.
Of course using HTTP to load content into a named-file doesn't always require the extra step to patch it. If you have a URL like https://csvpath.org/my-data-file.csv
you won't need to help CsvPath know what the file name and file type are because it's obviously CSV data in a file called my-data-file.csv
. But if you do need to make an adjustment, that's how you do it.
Last updated