Here at Bibliocloud we send ONIX to around thirty recipients, and I think it's fair to say that no two feeds are exactly the same -- different content goes to each client even for the same product at the same time.

This may be a surprise, since ONIX is a capital-S Standard in the book trade, or, more accurately, it is a set of standards that cover various different functionalities and versions. However, here we're concerned with one form – ONIX for Books – for the exchange of product metadata.

But given that ONIX for Books is a Standard, why would a single product not have a single version at any one time, to be sent to all recipients? Isn't that the point of a Standard?

If only it were so ... and for the following reasons, it is not. What follows is a list of some of the features that ONIX recipients ask us for.

1. Proprietary Codes, Please

Identification is a perpetual bugbear of the publishing industry.

The ISBN is thankfully well-established as a product identifier, but as far as the rest goes it's still the Wild West out there. ISTCs appear to be dead or dying as a work identifier; DOI is well-established in the academic market; and ISNI has a strong and growing claim as an authoritative identifier of public personae (authors, primarily).

Outside of the English-speaking world there are pockets of strongly-established identifiers for publishers (e.g. the Fondscode Boekenbank) and imprints (e.g. Identifiant Marque Electre in France), and the ISNI can be used for such purposes. But in the global publishing market there are few, maybe no, widely-used organisational identifiers.

And therefore we have ONIX recipients, primarily distributors, who must have their code for an imprint in the ONIX feed, with specific requirements on how it should be included in the feed. Well, if they're big enough then they can get away with it, but in practice I suspect that they have lumbered themselves with a requirement that is so infrequently fulfilled by ONIX vendors that their ONIX take-up by smaller publishers will be very poor. I have heard, anecdotally, of a medium-sized publisher being quoted in excess of £10,000 for the establishment of an ONIX feed to a major UK distributor, an offer which the publisher declined in favour of continuing to type metadata into a web form.

It probably goes without saying that if you start sending distributors imprint-specific codes to all of your ONIX recipients, you're soon going to find problems.

Thus we have

Rule One:

For some ONIX recipients you must include their particular identification codes, and you'd best not be sending them to anyone else.

2. We Still Use The Old Way

ONIX is a standard, but it is an evolving standard, and particularly in the still-alive-and-kicking ONIX 2.1 there are almost always different ways of representing the same data. ISBN-13 could be represented in its own element, or as a type 15 Product Identifier, or even as a type 03 (GTIN-13) Product Identifier.

Rule Two:

For every new way of representing a data item in ONIX, there will be one new format of ONIX required by someone. 3. We're Eight Code Lists Behind

Another way in which ONIX evolves is by the addition of new code values, and re-interpretations of old values.

The Price Qualifier "06" used to be "Corporate price", described as "Price for sale to libraries or other corporate or institutional customers". The use of "or" is a giveaway, because now, of course, "06" is "Corporate / Library / Education price", and there are more specific codes (six of them) to allow differentiation between the subtypes.

None of which will stop some recipients of library prices from requiring that they be coded as "05".

Rule Three:

The same data is sometimes sent under different codes.

4. We Don't Need That

ONIX is a broad format, and it covers many business cases for transferring metadata: publisher to distributor, publisher to sales agent, publisher to retailer, and so on. For some of these you need to include particular data that is not required for others:

  • Retailers need to know the consumer prices, not the library prices. They might need only particular currencies, and if they only price in a single currency then telling them which territories that price applies in is unnecessary.
  • Sales agents already know who the sales agents are (it's them), but might need to know the full range of prices – consumer/library, GBP/USD/AUD – and in which territories each currency should be used.
  • Metadata aggregators quite often don't need to know who all your distributors are, but if they're broadcasting metadata round the world then the list of sales agents per territory is going to be useful.

It is not necessarily a case that sending the extra data is going to break anything (see Rule Seven for that), but every extra data item you send is another potential data change that you have to check for, which might cause data to be sent too frequently to recipients. If you change your sales agent for Algeria, do you then want to send a complete refresh for every product with world sales rights to every recipient?

Rule Four:

Not every recipient needs every item of data. Be prepared to trim off the excess.

5. ONIX Doesn't Do What We Need

ONIX is also a narrow format. There is a wide range of metadata that it is not designed to carry, but the standard can be bent slightly to accommodate extra requirements.

One example of this is the identification of POD printers to a distributor. A POD arrangement might be in place with different printers in different territories, or a stock item in some territories and POD in other. It might be true POD, or could be an automatic stock replenishment program designed to keep a minimum number of units in stock.

This can require logic such as "when sending this product to this recipient, against this supplier add one or more codes to represent this set of POD/ASR arrangements".

Rule Five:

Sometimes you will need to go beyond the natural capabilities of ONIX.

6. Not In That Order

"We can read those codes, but only the first six, so please make sure that they are ordered according to the following arrangement."

Typically observed with Related Product composites, and reflecting a weakness in the capabilities of the recipient's system, we sometimes cannot rely on sending all the metadata and letting the recipient pick out the interesting parts.

Rule Six:

It's not just the data, it's the order in which you include it.

7. That Breaks Our System

Just because ONIX is valid doesn't mean it's valid for everyone.

Particularly seen in composites that can be repeated an arbitrary number of times, some systems will only accept particular codes for those composites. Add the "wrong" one, and that's your data not ingested.

Often expressed as "We can only accept one Work Identifier".

Rule Seven:

One person's "valid ONIX" is another person's "we can't read this".

8. Different Market, Different Data

I think it is fair to say that publishers in the UK have a particular perspective on prices.

  • No tax on print books.
  • One rate of tax on eBooks.
  • No need to think about tax for the rest of the world.

And the slight variation in there is that academic publishers often want to specify eBook prices exc-tax, and trade publishers want to specify inc-tax. UK recipients of GBP pricing generally want inc-tax prices, and for the most part this is entirely compatible with the publisher's viewpoint.

In Europe there appears to be a slightly different approach, in that specifying inc-tax prices in their home currency means picking their own VAT rate and applying it to their Euro prices. When the price is sent to a trade partner in a different country who also uses Euros, there are a few possibilities:

  • Use the same inc-VAT price, but with local taxes applied instead.
  • Use the same exc-VAT price, converted to new inc-VAT prices at the local tax rates.

Rule Eight:

The same data item can need to be presented in an entirely different way.

9. Send Different Data

Consider the Short Description -- 350 characters of finely tuned, carefully crafted marketing wizardry, designed to entice the reader. Do you want to send the same text to Amazon.com as you do to Waterstones? Possibly not.

Rule Nine:

Not everything can be blamed on ONIX recipients. Sometimes you just have to do what you have to do.

10. Send Different Products

eBook distributors get PDFs, only send Kindle to Amazon, don't send this to them unless it has a USD consumer price, OpenAccess doesn't go there...

You get the picture.

Rule Ten:

You sometimes need to filter out products by varied criteria.

Summary

There is no such thing as Standard, and sometimes you have to:

  • Not send some data.
  • Change the order of the data.
  • Change the meaning of some data.
  • Add extra not-strictly-ONIX data.
  • Add particular identifiers unique to a recipient.
  • etc.

Next up, I'll write about how we track changes to product-level ONIX on a per-recipient basis, to guarantee that they only get truly incremental feeds.