Hacking HXL: using tag patterns

Quick overview

Tag patterns let you choose columns in a HXL dataset based on the base hashtag and a list of attributes that must or must not be present. They are not an official part of the HXL standard, but several toolsets support them, including the HXL Proxy, the Python HXL library, and the Javascript HXL library. Here is an example (whitespace is optional):

#country +code -origin

A tag pattern looks like a regular HXL hashtag specification, with two exceptions:

  1. The leading hash sign is optional (it can be awkward on command lines and in URL parameters).
  2. Attribute names can begin with a “-” as well as a “+”.

The above example, #country is the base hashtag, and the other two elements are patterns: +code means that the code attribute must be present, and -origin means that the origin attribute must not be present.

Deep dive

HXL is designed to be flexible, and to retrofit onto datasets that people already have. As a result, data consumers should always be ready to accept (and ignore) hashtags or attributes that they don’t recognise.

When you use a HXL tag pattern, you don’t have to be able to anticipate every possible attribute that a data provider might use, or the order in which the data provider will use them. You simply specify what has to be present and what has to be absent, and assume that the rest (if any) don’t matter.

Examples

Here are some examples of patterns and what they match:

Tag pattern Matches Non-matches
sector #sector
#sector+code
#sector+name+fr
#adm1+code
#inneed
#geo+lat
#country +code -origin #country+code
#country+code+dest
#country+dest+code
#country+iso3+code
#country
#country+origin+code
#country+origin
#country+code+iso3+origin
#population+f #population+f
#population+elderly+f+adults
#population
#population+m
affected-m #affected
#affected+children
#affected+f+infants+age_0_2
#affected+m
#affected+children+m+age_5_12
#affected+m+f

For coders

In a filter in Python:

import hxl

data = hxl.data("http://ourairports.com/countries/GN/airports.hxl")
filtered = data.without_columns('loc+airport+code-local')

Listing matching columns in Python:

import hxl

data = hxl.data("http://ourairports.com/countries/GN/airports.hxl")
pattern = hxl.model.TagPattern.parse("loc+airport+code-local")
for column in data.columns:
    if pattern.match(column):
        print(column)

In a filter in Javascript:

hxl.load("http://ourairports.com/countries/GN/airports.html", function(dataset) {
    var filtered = 
      dataset.without_columns("loc+airport+code-local");
    console.log(filtered_data);
});

Listing matching columns in Javascript:

hxl.load("http://ourairports.com/countries/GN/airports.html", function(dataset) {
    var pattern = hxl.classes.Pattern.parse("loc+airport+code-local");
    dataset.columns.map(function (col) {
        console.log(col);
    });
});