About XML

XML is one of those boring ideas that can make business run more smoothly, like ISBN numbers or barcodes. Really it’s just some general rules for how to write down information so that computers as well as people can read it – mainly computers, though. It’s not even a full set of rules; it’s just enough to help people make a start on designing their own formats for sharing information.

So, in publishing, people have taken the XML rules and added some extra, book-specific ones until they came up with a standardised way for describing new titles – or old ones, for that matter. The standardisation makes it easy for the whole industry to share information. Publishers can tell all the book stores, book clubs, library services and industry databases about a new title by sending them all a copy of the same XML file. In other sectors, XML is being used as the starting point for storing a multitude of different types of information. There are XML standards for creating an invoice or describing a gene – or (to return to publishing) to store the contents of an e-book in a standardised way.

The guts of XML (made palatable for non-techies)

The first thing to learn about XML is good news: it’s written in English. Or rather it’s written using ordinary words, with a few squiggles added, and not in some sort of computer hieroglyphics.

So, imagine you’re sending out details of a new book you’re publishing. Naturally you want everyone to add its details to their stock systems so they can easily order it. Let’s start with the title, The Life and Times of Ned Lud. A human being can take a guess that it’s a book title, but in XML you always label information to make it clear. So we might write this:

<TitleText>The Life and Times of Ned Lud</TitleText> 

It’s like that maxim for lecturing: tell people what you’re going to say, then say it, then tell them what you just said. So that line above says: Here comes something called TitleText, ‘The Life and Times of Ned Lud’, that’s the end of the TitleText. When you put ‘/’ in front of a label you’re marking the end of something.

Of course you can make up your own names for information. You could choose or , but it just so happens that is the name that’s been agreed on by largish group of book publishers as part of the ONIX standard. The bit in the angle brackets is called a ‘tag’. It’s fairly easy to see why; you ‘tag’ information to say what it means.

For instance, if all you have is the piece of text ‘Winston Churchill’, it’s difficult to tell whether that’s a book about him or a book written by him – both are plausible. So everything gets tagged for clarity’s sake. Take a look at this:

<Author>Katie Daynes</Author>
<TitleText>Winston Churchill</TitleText>
<ISBN>074606814X</ISBN>
<Publisher>Usborne Publishing Ltd</Publisher>

Hopefully it’s pretty obvious what that all means. Each ‘tag’ has a start and an end and the bit in the middle is the information you want to share – also known as the ‘contents’ of the tag. A computer can easily read it and so can a person (with a little effort). Unfortunately for us, information about books gets complicated and so the XML used to store it has to get complicated too.

For instance, what do we do if there’s more than one author? Or if there’s no author, just an editor and some contributors? And what about all the other pieces of information we might want to share, like publication date, price and distributor details? The people who wrote ONIX came up with something that allows you store a vast amount of information about a title in a structured way.

One feature of XML the ONIX designers made use of was the idea of putting one tag inside another. I want to show you what that looks like, but real ONIX documents are a bit difficult to read, so this next example is just a made-up one; it doesn’t follow the ONIX standard. But on the plus side, it’s actually possible for a human to understand it.

<Book>
  <Title>
    <MainTitle>The Life and Times of Ned Lud</MainTitle>
    <SubTitle>Backward Looking Visionary</SubTitle>
  </Title>
  <Author>
    <FirstName>Emma</FirstName>
    <Surname>Barnes</Surname>
  </Author>
  <Illustrator>
    <FirstName>Rob</FirstName>
    <Surname>Jones</Surname>
  </Illustrator>
</Book>

So in this made-up example, if a <Surname> tag is inside an <Author> tag, then it’s the name of an Author; if it’s inside an <Illustrator> tag, then it’s the name of an illustrator.

Ok. So there’s lots more you could learn about XML, but that’s enough so you can join in fun dinner party conversations on the subject and look techies in the eye without flinching. Let’s get back to the real world of publishing.

The ONIX Standard

If you want an easy way to tell Nielsen or Amazon or Waterstones about a new book, you can put all the relevant info in an ONIX message and e-mail or FTP it to them. In case you’re interested, the British contributors to the ONIX standard were the BIC, made up of the Library Association, The British Library, The Booksellers Association and The Publishers Association, so it’s got some weight behind it.

It’s a gigantic and complicated standard because it needs to be able to hold gigantic and complicated amounts of information for each title. For instance, it gives you tags for listing the back cover quotes on your book, and giving the names of each quote contributor and the organisations they work for. It lets you include details of discounts and promotions by date and region. It holds information on formats and rights and physical dimensions – and even what units the measurements are being given in.

Unfortunately for anyone who wants to open an ONIX message and actually read the contents, the standard also makes use of numbers where a name would have been easier to read. For instance, if you want to know whether a book has been published yet you could look at the tag. Here’s one:

<PublishingStatus>04</PublishingStatus>

But what does ‘04’ mean? Well, if you hunt down a copy of the ONIX documentation you’ll find a list called List 64: Publishing status, with entries such as:

04: Active. The product was published, and is still active in the sense that the publisher will accept orders for it, though it may or may not be immediately available, for which see <SupplyDetail>.

and

05: No longer our product. Ownership of the product has been transferred to another publisher (with details of acquiring publisher if possible in PR.19).

And many more. There are over 150 of these lists explaining what the various different numbers and codes mean, so while humans can get the gist of what’s in an ONIX message, the details are often hard to follow. The ONIX people could have chosen to use the words ‘Active’ and ‘No longer our product’ instead of the numbers ‘04’ and ‘05’, but they probably took the view that machines, rather than people, would be reading these messages and numbers were more concise.

How can I use ONIX?

Having established that ONIX messages are swines to read – unless you’re a machine – the obvious thing to do is enlist the help of a machine whenever you want to work with ONIX.

Bibliocloud, and any publishing management system worth its salt, makes up ONIX messages. Instead of us looking up what all the tags and numbers mean, the software does that. The system gives you helpful forms to fill out; they list the available options in words. Then, behind the scenes, the program inserts the relevant code on your behalf. So if you choose ‘Paperback’ from the drop-down list, the program puts the code ‘BC’ into your ONIX message, saving you the bother of looking it up. (If we actually had to write ONIX messages by hand, we probably wouldn’t bother.)

You send your ONIX files to anyone you like: Nielsen, Bowker, Amazon and so on. You can also download it if you want to import it into other programs such as InDesign to make your catalogues or AIs. This is nothing new: check out out a seven-year old video of mine here: https://www.youtube.com/watch?v=DMdc2psf01Y.

But most publishers still aren’t harnessing the power of structured data, whether that’s ONIX, another sort of XML or JSON (which is like XML, but more concise). If you did your last catalogue by hand, and survived the ordeal, you should really find out more about what computers can do for you and save yourself (and your company) from death-by-copy-and-pasting.