Learning HXL, part 1: hashtags and attributes

This is the first in a series of short, technical articles for people planning to create HXL-tagged data or software to process HXL data. The next post will cover Extending HXL.

HXL hashtags are related to the hashtags used in social media, but there are a few important differences.

HXL tags are categories

In social media, a hashtag usually tells you what something is:

@someone Good flight into #LAX. I will enjoy my time in #CA

#LAX and #CA are values, but there’s no information about what they’re values of (stock symbols? places? moods? TV show abbreviations?). The reader has to guess, based on the context: in this case, the context suggests that #LAX is an airport code, and that #CA is a US state abbreviation (though it might be difficult for a machine to figure that out).

In a HXL dataset, a hashtag tells you the category of the information, while the actual values appear in the columns underneath:

Organisation Cluster District Subdistrict
#org #sector #adm1 #adm2
UNICEF Education Coast Capital
Red Cross Shelter Northern Plains

While a social-media post might contain only two or three values, a humanitarian dataset can contain thousands, so creating a unique hashtag for each one doesn’t make sense (Guinea alone has 34 prefectures, and 340 subprefectures). Instead, the HXL hashtag #org (for example) tells a machine that everything underneath is the name of an organisation, and HXL-aware systems can use the hashtag to validate, clean, transform, analyse, and visualise the data.

Attributes refine hashtags

In HXL, hashtags define very broad categories, like #affected for the number of people affected by a crisis, or #org for an organisation. To specify subcategories, you use attributes (the space before the “+” sign is optional):

#org
An organisation somehow involved.
#org + funder +name
The name of the funding organisation.
#org +impl +code
A code for the implementing NGO.
#country +name +fr
The name of the country in French.
#country +code +iso3
The ISO 3166-1 alpha3 code for a country.

Attributes give you the chance to add extra information for systems that can understand it. For example, System A might not know (or care) about implementing organisations, but it can still tell that a column contains something to do with organisations; System B, on the other hand, might need to make a careful distinction between funding and implementing partners, and the +impl attribute gives it the extra information it needs to do that.

Advanced examples

Sex and age markers

The humanitarian community is becoming more aware of the need to collect sex- and age-disaggregated data (SADD) to understand the needs of affected populations. HXL proposes the attributes +m (male), +f (female), and +i (intersex) to identify sex, and the attributes +infants, +children (may include infants and adolescents), +adolescents, +adults (may include adolescents and the elderly), and +elderly to provide SADD categorisation at a coarse level:

#affected
The number of people (in general) affected by a crisis.
#affected +f
Number of female affected by a crisis.
#affected +f +children
Number of girls affected by a crisis.

Organisations will often have more-exact age categories, like “Age 0-4”, “Age 5-12”, etc. We recommend creating your own specialised attributes and using them together with the recommended HXL ones, so that software that is not designed to work with your categories can still extract some information from the data:

#affected +f +children +age_5_12
Number of girls between the age of 5 and 12 affected by a crisis.

Multilingual data

Many humanitarian data fields, such as organisation names, placenames, and activity descriptions are human-readable and written in natural languages (rather than as numbers and codes). HXL has a convention that two-letter attributes are ISO 639-1 language codes, like +ar for Arabic or +fr for French:

#org +name +en
Organisation name in English.
#org +name +fr
Organisation name in French.
#activity +desc +es
Activity description in Spanish.

In a dataset, you might use them like this:

Org (English) Org (français) Province (English) Province (français)
#org +name +en #org +name +fr #adm1 +name +en #adm1 +name +fr
WHO OMS Coast Côte
UNICEF UNICEF North Nord
Doctors without Borders MSF Coast Côte

For more information about HXL hashtags and attributes, download a copy of the HXL postcard in English, French, Arabic, or Spanish.