06 - jq tricks
MAAS: Install ~ Configure ~ DHCP ~ Commission ~ Deploy ~ jq ~ SSH ~ More jq
Using jq
to make CLI output readable
In my previous few posts, it is very evident that the JSON output from the allocate and deploy commands was very lengthy for even one machine — so you can imagine how large a list of 10 or 12 machines might be. Traditional JSON output is both consistent and comprehensive, but sometimes hard for humans to process.
Enter jq, a command-line tool dedicated to filtering and formatting JSON output, so that you can more easily summarize data. For instance, consider a small MAAS install with 12 virtual machines. Six of these machines are lxd VMs, and six are libvirt VMs. Suppose I enter the MAAS CLI command to list all those machines:
The listing would be many pages long, and likely very time-consuming to pick through. On the other hand, I can apply the jq command, a couple of other Ubuntu CLI commands, and just a little bit of finesse to get something conventional-looking:
In fact, with this command,we can produce an useful and compact machine listing that serves 99% of my routine MAAS information needs:
Here we have a clean text table listing the machine hostnames, along with the system IDs, power states, machines statuses, tags, pools, and networking information. These parameters represent only a small fraction of the available JSON output, of course. Let's break this command down, piece by piece, and see how it works.
Basic jq usage
First, we'll just pull the hostnames from these machines, with no qualifiers or formatting rules, like this:
This command returns output that looks something like this:
Note a couple of things about this command:
First, the jq instructions are enclosed in single quotes. As such, they can span lines if necessary, without any line continuations (\), like this:
Second, notice the structure of the jq instructions. The .[] tells jq that it's decoding an array of data sets — in this case, an array of machine data sets — and that it should iterate through each of the outer data sets (each machine) individually. The pipe symbol (|) completes the “for each” construct, so this command basically says, “for each set of machine data you get, pull out (and return) the value associated with the JSON key hostname. The return value reflects this structure:
The outer square brackets represent the boundaries of each machine's data set, and the value in quotes corresponds to the value of the key hostname in successive machine data sets. It can get a little complicated sometimes, but that's basically the way to parse JSON with jq.
For practice let's try pulling the value of the key that holds machine status, again with no qualifiers or special formatting:
This command essentially tells jq to do the same thing as last time, but also collect the value of the key “statusname” for each machine. The results looks something like this:
So much for printing the values of JSON keys. There are still some nuances (arrays, nested keys, …), but this is the lion's share of the syntax. Let's divert for a minute and look at how to format the output in a more human-readable way.
Improved formatting
Most of the Ubuntu text-processing commands use tabs as field delimiters, which is a trait inherited from grandfather UNIX. Currently, the output is clean, but relatively hard to format into lines. Luckily jq has a filter for this: the “tab-separated values” filter, known as @tsv. This filter transforms the output records into individual lines with values separated by tabs.
Adding @tsv to the mix:
we get something like this:
That's a step in the right direction, but it's still pretty far from human-readable output. If only there were some way to get rid of the quotes and just do the tab, instead of representing it as a regex character. In fact, the jq “raw” output option (-r) takes care of this:
Feeding the raw output into our three-filter set gives us a more readable result:
This is tabulated, but the number of spaces between the columns is a little big, and, if there's an unusually long value in one of the fields, it may throw the tabulation off for that line. Something could have been added to jq for that, but there is no need, since Ubuntu already has the column utility. Piping the output of the command so far to column -t (-t for “tabs”) will normalize the tab spacing to the data and ensure that each column is exactly long enough for the longest value in that column:
This command result is very similar to the previous output, though you'll notice that the field spacing is neatly optimized to the data itself:
Making real tables
So far, so good, but this still isn't a presentable data table. First of all, there are no headings. These can be added by passing a literal row to jq, like this:
You'll note that there are two expressions in parenthesis (representing individual lines or rows). The first just contains the two column headings, while the second contains the “for each” construct that pulls the hostname and status out of the JSON. In essence, the first expression evaluates to just one row, since there's nothing to tell it to iterate. The second expression evaluates to one row per machine, since that's the level of data we're reading. Here's what we get from this command:
Nice, but it needs a horizontal rule, like a line of dashes, to separate the headings from the data. We can do this by essentially turning the one header row into two, using some jq macros to generate dashes lines of appropriate length:
The expression | (.,) tells jq to convert the foregoing header row into two rows: the first contains the two headers, as in the previous row, and the second contains the result of a couple of macros (map and length). We won't detail those here, but the use of this construct produces the following output:
Extending the list
Let's add a couple more fields, owner (which is sometimes blank), and systemid (which is never blank), to the output:
This gives us the following result:
You'll notice right away there's a problem with the columns. Remember that only machines in the “Allocated” or “Deployed” state are owned by anyone, since that's what allocate/acquire means. The lines for the deployed and allocated machines lay out correctly, but the lines for the unowned machines are incorrectly formatted. We can fix this by using the jq “alternate value” construct (a // "b"), which can be loosely read, “if not a, then b.” We add it to the owner key like this:
Then the results line up nicely, based on the longest value in each key column:
Nested arrays
Machines have a nested array (of indeterminate length) for machine tags. In JSON terms, instead of having a single key-value pair at the top level, like this:
tags are represented by nested arrays, like this:
Incorporating a random number of tags per machine into a neat table is beyond the scope of this particular post, but we can show the first tag in the table rows:
Where we would use .json-key-name for a non-nested value, we need only use .json-key-name[0] to refer to the first element of the nested array. Doing this produces the following result:
That's almost right, but notice that the heading separates on spaces between words. Let's try a better way, with an underscore:
This version of the command produces the expected output:
Nested keys
These aren't all the routine key-value pairs we want in the table, though. It would also be nice to print the pool to which each machine is assigned. Just asking for .pool as a single key-value pair:
produces an error:
Looking at the JSON output, we see that .pool is a nested key, not a key-value pair:
What we really want is the pool name, so we need to add one level of indirection to that particular key to reach the actual key-value pair, like this:
which gives us what we want:
It's also useful to list the VLAN and fabric names in the output table. Looking at the JSON again, these values present like this:
This means they are doubly-nested. No problem; just use double indirection (two levels of . separators) to retrieve them:
The modified command yields the desired results:
There's just one more (deeply nested) value we want to retrieve, and that's the fully-qualified subnet address in CIDR form. That's a little trickier, because it's buried in JSON like this:
So the value we want is in the nested key bootinterface, in a nested array links[], which contains the doubly-nested key subnet.name. We can finish our basic CLI machine list — the one we started with — by adding this complex formulation to the command:
Sure enough, this command gives us the same table we had at the beginning of this post:
Chaining Ubuntu CLI commands
Although the machine list above looks fairly neat, it's actually not sorted by hostname, exactly. To accomplish this, we'd need to add a couple of Ubuntu CLI commands to the mix. Sorting on hostname means we want to sort on field 1 of the current command's output. We can try just feeding that to sort like this:
This command does indeed sort by hostname:
but it has the unintended side-effect of sorting the header lines into the output. There are probably at least a dozen Ubuntu CLI solutions for this, so we'll just pick one of the most elegant here, using awk:
This command gives us the desired output:
Note that by changing the numerical “-k” argument to “sort,” you can change which field controls the sort:
This command sorts by machine state, which is the fourth field:
Summary
At this point, it should be clear that jq is a relatively simple, powerful tool for formatting output from the MAAS CLI. You should also remember that, like any Ubuntu CLI command, jq simply outputs text — so anything you can do with text output, you can do with the output from jq. In the next post, we'll look at some ways to use jq to automatically write CLI scripts to automate various routine MAAS operations.