I’ve been putting a little more time into CurlyQ this week, as I’m able.
First thing to note is a breaking change: it will always return an array now, even if there’s only one result. I had waffled on this a little, but for predictability in scripting it really always needs to be a consistent format. So even a single-string result, e.g. a command that targets a single element with --search and then uses .source in the query (which previously would have just returned the source string for the matched tag) will now return an array containing a single string.
Secondly, I’ve put a considerable amount of effort into the --query feature. You can now use jq-like syntax to query multiple items in an array, use dot-syntax for attribute comparisons, and use comparisons (like ^=) on hashes, returning true if any value in the hash matches the query. Still, if you want the full power of something like jq or yq, you can just pipe the output to either and work with more familiar tools.
But on to a cool thing. I mentioned CurlyQ’s screenshot capability in the intro post, but it’s received some improvements, and I thought it deserved a little more detail.
I incorporated Selenium to allow scraping of dynamic web pages. One of the features Selenium provides is screenshots saved from the browser of choice. Thus CurlyQ has a screenshot feature:
The --browser flag (-b) determines whether it uses Chrome or Firefox, and the selected browser must be installed on your system. The full-page capture (-t full) is only available with Firefox. Chrome can only output visible (the visible part of the page on first load) and print, a print version of the page with @media print styling applied. Firefox can output all types.
The --type flag (-t) accepts full, visible, and print. With -t full and -b firefox, you get a full-length version of the rendered page, including offscreen elements. All of these can be abbreviated to their first letter, e.g. -t f or -b c.
The --output flag (-o) is required and determines the path/name of the output file. Providing just a name will save the file to the current directory. Extensions can be provided but will be changed depending on output type, .png for full and visible, .pdf for print. So you can just provide a name without extension and CurlyQ will apply the appropriate extension.
As a side note, saving a screenshot with -t print will output a PDF with actual text that can be searched by Spotlight (and other tools). So you could ostensibly use CurlyQ to crawl an entire site (by parsing the links subcommand output and spidering) and save every page to a searchable PDF. I don’t know offhand why you’d do that, but it’s possible.
CurlyQ is still being refined and your input is welcome. Join me on the Forum, or just message me on Mastodon with suggestions and bug reports.