Another Year Again: 2017 this time

· by joe · Read in about 22 min · (4516 Words)

This time last year I came to the conclusion that it’s better to celebrate new year in the summer than the winter. See what you’re dealing with. Get it over with in the daylight where you can see the thing clearly rather than stumbling around for it in the dark and probably knocking it over. And whilst there is something to be said for huddling around a cauldron of mulled wine bubbling on the stove, sacrificing ritual resolutions to ourselves to Do Better This Year so that the sun will rise again, I find that I can see so much further in the daylight.

Not that you’d necessarily want to. It’s been a bit of a glum year at home and abroad. As we tuned into BBC Radio 4 to hear Big Ben strike two thousand and seventeen, there was a hint of impatience in the announcer’s voice. “Let’s get this year over with”, it seemed to say. The schedulers at Radio 4 almost deliberately looked the other way, deciding to sandwich the brief nod to 2016 between a programme about William Blake and one about the Musicians’ Union. Give me an imprecise summer day any time.

I am very lucky to be able to say that, geopolitical circumstances notwithstanding, I’ve had a very interesting past year. I celebrate four years doing R&D at Crossref, three years living in Chateau Monstronauticus our no-longer-new boat, two years of bee-keeping and one year of marriage. I’ve spent most of the past year with my head in Event Data, thinking, building, talking and occasionally listening.

So, it’s time for my annual display of synchronized navel-gazing.

3:AM Altmetrics Conference

In August 2016 I headed to Bucharest for the 3:AM Altmetrics Conference and associated Altmetrics16 Workshop and hackathon. The Workshop took place on the first day and had a greater focus on the finer details of altmetrics research. The Conference on the following two days had a broader scope. Being in Bucharest there was a pleasing shift away from the American and Western European bubble that we all inhabit I live in, and it was interesting to hear about scholarly publishing in shadow of the legacy of the Eastern Bloc.

The theme of this year’s Workshop was ‘Moving beyond counts: integrating context’. Altmetrics is a field that seeks to capture alternative information about how people interact with scholarly literature. It looks beyond traditional citations and to capture activity such as social media, blogs etc.

Over the course of early 2016, I worked with my colleagues Martin Fenner at Datacite, Maddy Watson at Crossref and others to find a common data model that could be used to express the kind of data we were collecting. What started as a perfectly innocent project in Crossref Labs to see how many links we could find to scholarly literature out on the web and social media, has slowly turned into a bit of a beast. On two occasions the amount of data moving through the system turned out to be an order of magnitude more than I expected. Which is a nice problem to have.

The more I look the more I find, and the more diverse the data becomes. At its core we’re still trying to find instances of links between A and B, of type C, but we’re looking in new places and finding new nuance to the links. For example, how do you represent the constantly churning, never settled, references from Wikipedia to various literature? How do you do the same for Twitter, whilst sticking to the Terms of Use? The project has come a long way.

In my opinion, although the genesis of the field of altmetrics was an effort to move beyond existing imperfect metrics (which didn’t take account of modern ways that people interact with literature), merely generating a new metric with new information is reductive. In fairness I don’t know of any altmetrics providers who encourage that, but it was a bit disconcerting to see consumers of altmetrics using the scores as an end in themselves, in a variety of applications.

Crossref is all about links, so Event Data, being as it is a stream of links, is squarely in our focus and remit. Not only would producing a metric be counter to our mission, it would also be producing data of fundamentally the wrong level of detail.

It was in this state of mind that I presented at the Workshop. I identified a few different kinds of ‘context’ including:

  • There’s the social context: a subtly different one of each surrounds each platform. The denizens of Reddit, for example, move in different ways to those on Twitter. Blog authors probably take more time and care over each blog post, and write a different kind of literature to the authors of Tweets.
  • Then there’s the technical context of the platform. There’s no such thing as a retweet on Reddit (although there are re-posts which are similar). Tweets can’t be edited (but can be deleted), but Wikipedia articles can be edited and frequently are. Knowing that could be important when interpreting any data.
  • Then there’s the temporal context under which the data is retrieved. Perhaps Reddit was offline for a few hours, or the Twitter API was imposing rate limits, which might explain a gap or slow down in the data.

My proposition was that any altmetrics-style measurement isn’t much use unless these kinds of contexts are entailed. The service we’re building at Crossref isn’t the end-product, but it is a number of steps along the way, and it’s important that all that contextual baggage makes the journey along with the data. Here’s the proposal and horribly nervous video.

I was due to talk in the first session, but someone was late and the schedule was reshuffled. Instead of going out on a limb with a potentially controversial suggestion, I was gratified to find that I wasn’t the only person in the altmetrics research community with these ideas, and it seemed that the Event Data model was on a firm footing.

The following two days of Altmetrics Conference were a great chance to hear about what’s going on in the field and to chat to people doing it. In the last session my colleague Jennifer Kemp and I presented. In a departure from the typical powerpoint, we presented a light-hearted two-hander. Jennifer played an end-consumer of altmetrics (head of faculty) and I played a know-it all techie. The message was that you should be careful about how and when you use metrics. The session was written up in a blog post.

Our skit (and I must take responsibility for the script) was intended to be light-hearted and humourous. Maybe because the message, genre and delivery style were far from traditional. Maybe because it was the last session of an intense few days. Maybe our message, that you should be responsible about the provenance chain of metrics, was laid on a bit thick. But it didn’t set the room on fire.

Although someone (I forget who) had suggested earlier in the conference that a neutral non-profit agency should collect the underlying data, and the idea seemed to have a bit of traction, Crossref Event Data is such a unique new thing that I think people may have been cautious of the idea.

That was a year ago, and in a few short weeks is 4:AM in Toronto. I’m presenting at the conference again, this time on more infrastructural themes. Stacy Konkiel and I are organising and hosting the hack-a-thon, which we’ve re-cast as a do-a-thon. I am also presenting at the Workshop. I am looking forward to a very busy few days!

I had a bit of a damascene moment and wrote a blog post about how altmetrics are like oil. It’s posted on the 4:AM blog (and also on the Crossref one for good measure). If you want a sneak-preview of what I’m going to talk about, give that a read.


Around that time of year I stole my first honey. I found the whole experience supremely unsettling.

I’ve always enjoyed honey, but it has nothing to do with the reason I keep bees. Still, I’ve given them a home, fed them, medicated them, kept them warm and dry through the winter, and I felt that harvesting honey was a rite of passage. And rites of passage are never passed with un-mixed feelings. I wrote a blog piece about it.

Almost a year on, summer has come round again and again I have removed a solitary piece of honeycomb. It was quickly replaced; alarmingly quickly. So I don’t feel too bad. I’ve also added two supers because they were running out of space (that’s the place that bees are kindly encouraged to store honey), so they seem to be going strong in their massive home.


In November I flew to Reykjavík to PIDapalooza, a festival (read: actually laser-focussed but light-hearted conference) on the subject of Persistent Identifiers (that means labels that you give to things so you can talk about them, and which will hopefully keep working “forever”)

I felt at home in Reykjavík. It wasn’t too cold. It was too wet. I did venture out early one morning for a walk up to the cathedral in all of my thermal clothing and honestly I was nearly blown inside out. I ate soup out of a loaf of bread. On my weekend off I enjoyed skyr and coffee on a large volcanic plane between mountains. I went to a performance of contemporary music in a modern art venue. Two percussionists engaging in a musical conversation with a selection of sonorous objects and the aid of some electronics. There’s a certain snug feeling about being in an art venue, white-painted walls and dim lighting, after dark.

But the night before PIDapalooza, PIDapalooza Eve if you will, was that fateful night that changed America forever again. Some people stayed up and drank as the election results came in. I decided to turn in, wishing to be fresh for the conference the next day. No-one slept.

The next day, bleary-eyed conference attendees discussed persistence whilst wishing the new president elect quite the opposite.

Over the preceding weeks I’d been tackling some practical problems related to Event Data. I was trying to work out the relationship between DOIs and the Article Landing Page URLs that they point to. The easy answer, which is wrong, is that “every DOI points to a different Article Lading Page”. I wrote my results up in a blog post entitled URLs and DOIs: a complicated relationship.

I presented my research: When PIDs aren’t there: tales from Crossref Event Data. Building on the blog post, it described, to an audience familiar with DOIs (which are the best kind of PIDs), exactly how people aren’t using them.

It was a remarkably interesting meeting. Some of the same old faces, but many new ones were present. We heard about identity systems in Dutch universities and people assigning DOIs to oceanographic cruises.

I can’t wait for next year’s PIDapalooza, which is just as well, as as planning is well under way. This year I’m on the programme committee, and I’m looking forward to the submissions!

Event Data - to beta and beyond!

The majority of my time has been spent working on the infrastructure of Event Data and getting us to the point where we can declare a public beta. Over Christmas one of the last pieces of an infinite fractal puzzle fell into place, and the design of the full Event Data pipeline emerged, with data streaming from external systems in via Agents, being processed into Events with a full evidence trail by the Percolator, being exchanged with DataCite and emerging to the public in the API. I’ve pumped tens of thousands of words into the Event Data User Guide and it hasn’t burst yet.

At FuturePub 10 in London I did a ten-minute pitch for Event Data and in saw and heard some very interesting stuff, including a project to document musicians and scientists. I closed my call for beta testers with “go and make something cool!”. I really hope people do.

By late May we had a Beta ready to go, just in time for WikiCite in Vienna. WikiCite is a conference, summit and hack day focussed on citation within Wikipedia. There are a few community efforts under way. At last year’s WikiCite I joint-published a piece on HTTPS and Wikipedia with Dario Taraborelli.

I always find it fascinating to dip into the Wikimedia community. It’s an extraordinary combination of volunteers and employees, some running casual projects, some doing cutting-edge work. All do it for the love of open information and data. Throughout the three days we heard people presenting their ideas, finished projects, works in progress and research. Lively, engaged debate followed and I think everyone walked away with their ideas enriched somehow.

At these events EtherPads, collaboratively editable documents, are used extensively. It’s very exciting to experience a room full of people document a talk as it’s being given, chip in new pieces of information and develop ideas. They’re all linked from the programme page.

I presented on the transparency-first principles of Crossref Event Data, both as it pertains to how we track references to scholarly literature from Wikipedia, and contextualising Wikipedia in the broader picture of websites referencing scholarly literature.

There was also some interest in Event Data at the hackathon, and a couple of people started looking at the dataset and exchanging ideas. There were two Event Data oriented projects. The first looked at the most commonly referenced DOIs in Wikipedia according to Event Data, and then finding those that didn’t have items in WikiData (a huge knowledge base of structured data linking together concepts).


I’m on the Distributed Usage Logging Working (DUL) Group, which is hosted by Crossref, and provides a mechanism to allow sites such as reading platforms to share usage data about articles with the publishers of those articles. Where Event Data is a public pipeline of public data, DUL is a private channel for potentially confidential information. It’s been interesting talking to people on both sides of the table and helping to find a way to exchange data. COUNTER worked with the members of the group to formulate a standard for reporting this kind of data. My job was to research and propose a recommendation and proof-of-concept for the secure exchanging of data. The task was made interesting by the range of stakeholders and threat models. The final draft included two levels of verification and is based on JOSE.

I have also participated in the Scholix Scholarly Link Exchange working group, which is an emerging standard for exchanging links between datasets and literature. This is something very important to Crossref and DataCite, but there are other players who also have data and we need to make sure it’s all interchangeable. We are already publishing links between Crossref and Datacite via Event Data (Crossref Event Data and DataCite Event Data are two sides of the same coin), and it’s been interesting contributing to the discussion around creating a schema for exchanging links that’s simultaneously general enough for all participants and that fits the data we already have.

I have been collaborating with a small research group, led by Dr Lauren Maggio, writing a paper entitled ‘Wikipedia as a gateway to biomedical research: The relative distribution and use of citations in the English Wikipedia’. I played a small part, contributing DOI resolution data, words, and too many sentences reluctantly starting “I’m afraid that won’t work because”. The paper has been submitted to a journal and we have our fingers crossed.

But more importantly

I’m on the committee of a small folk festival called Skint. What it lacks in numbers (there’s only space for one hundred people) it makes up for in over-subscription. It’s a small, intimate event, and whilst newcomers are very welcome and encouraged, it also feels like a bit get-together of friends. I met my wife there (both for the first time, and subsequently). I run the booking system, and it’s heartbreaking when people, including our close friends, don’t get places. However, rules are rules, and we only have a finite space.

Three years ago, when I took on the job of running the booking, we had a first-come-first-served system. Lots of people felt that it wasn’t fair that the system was a little ableist, favouring as it did quick-fingered young people with fast Internet connections. After all, all the places went with 30 seconds.

Last year we decided to try something new. After some agonising discussion in the committee, and some surprisingly convoluted discussions about probability (you show me someone who can escape their intuition in these matters and I’ll show you someone who I think is probably wrong) we decided to randomise the order of applications within a 30-minute window of bookings open, so that everyone had an equal chance at getting a place providing they were able to fill in a form within 30 minutes.

Remembering the clamour of the year before, I pressed the button to open bookings and … nothing happened. I hadn’t expected the sellout in 30 seconds, but it was a little alarming to see them trickle in so slowly. Amateur game theory hypotheses bounced back and forth on the committee mailing list. Had we scared people off because they didn’t think they’d get a place? Still, within the 30 minute time window we had filled all the places.

This year we did the same thing, and I’m simultaneously pleased and disappointed to say that we met the quota within 8 minutes. Which is a more encouraging pace.

The Call

I’m also on the committee of the Bagpipe Society, a small but perfectly formed organisation which has the task of cultivating and communicating knowledge about the bagpipes and communicating instruments. It’s a broader remit than you’d think. And no, Great Highland Bagpipes (which, I’m not afraid to say, hold no pleasures for me) don’t feature prominently.

This year I undertook quite a hefty project to make the archives accessible. We already had scans of the three decades of publications, but they were somewhat locked away. I built a new website and integrated all the back-issues of our publications to the public to read (except for the past two years, which are members only). We had metadata such as titles and authors, and we’re slowly refining the site by tagging various instruments. Whilst the job’s never done, the site is now starting to represent something we can begin to think about being proud of. And anyone with a passing interest in bagpipes now has quite a lot of stuff to read on the subject.

This year we Got The Call that I have been waiting for for years. The day I never thought would come. In the eventuality, the Call came by email, and it enquired if we would be interested in being the featured guest publication on Have I Got News For You. As far as I’m concerned, that’s as good as it gets. There was initial doubt within the committee about how seriously they would take us. As it happened, it was the limpest, least interesting episode in living memory (the one with Ed Balls) and they spent a disappointingly short amount of time making fun of us. Still. That’s something off the bucket list.

Playing for Dancing

Folk Weekend Oxford 2017 was, again, fantastic, as hundreds of folkies from across the country descended on little old Oxford. As before, Coldharbour a trio consisting of my wife, Matt Coatsworth (off of Boldwood) and me, hosted two evenings of dance. One of Scandinavian Dance (with guests Andy Parr and Ella Sprung) and another of French Dance (with guests Anna Pack and Dave Shepherd). We also played at Chippenham Folk Festival.

One’s better two’s reciprocal

This year my wife graduated (with distinction) from her Masters degree in linguistics. I had hoped that a bit of it would rub off on me, but her preliminary reading left me in the dust. Her thesis was about a top down approach to the building of syntax trees (which, apparently, are usually prefer to be built from the bottom upward).

She also decided to become a yoga teacher, so she did. As an adherent of the Forrest Yoga school, she went all the way to Texas for a month of very rigorous training. She came back and has been offering well-attended yoga classes in Oxford. I go whenever I can. It’s an excellent form, focussing on breathing, core strength, not rushing things and acknowledging the intimate connection we each have with our bodies, and the effect that stretching out dusty corners can have. She’s a very talented teacher (I’m allowed to say that).

Happy birthday Monstronauticus

Our boat has reached her third birthday, and we celebrated by hauling her back out of the water to see what was up. Thankfully, nothing was.

She was given a good scrub, got a new coat of paint below the waterline and was back in. changes

In late November I took the decision to remove logins, thereby making read-only. After a consultation, I backed up every user account’s favourites, send them by email along with an explanatory note, and closed all the accounts. I then published a blog post explaining what had happened. Given the number of accounts and how much people used it, I expected fury. Instead I received a number of very complimentary emails thanking me for running the service. Maybe I should remove things more often.

Restoring balance

I read three corporate biographies recently. Having worked somewhere largeish, smallish, and somewhere growingish, I’m interested in how organisations maintain their balance as they increase their ambitions and headcount.

  • The Decline and Fall of IBM: End of an American Icon by Robert X Cringely
  • Show-Stopper!: The Breakneck Race to Create Windows NT by G. Pascal Zachary
  • Losing the Signal: The Untold Story Behind the Extraordinary Rise and Spectacular Fall of BlackBerry by Jacquie McNish

I remember hearing “no-one got fired for buying IBM” stories about the nasty tricks they used to pull from my father. They seem like an altogether loathesome company who made some good computers once. Despite this, Cringely bitterly seems to bitterly regret the decline of Big Blue, and catalogues every mis-step in a litany of management failures in this moan-a-thon.

Show Stopper was a romp through the clash of teams personalities within Microsoft as they developed something exciting. The balance wasn’t struck quite as I would have wished: a bit heavy on biographical details about all the characters (and there were many) and very light on technical detail. I’m not sure the market for this book would extend beyond people who actually cared a little about the technology.

The BlackBerry book, though, is a gem. I’ve been a fan of BlackBerry phones for a while (and have owned two), mostly because of their build quality and keyboards. I have always been fascinated by BlackBerry, and the book didn’t disappoint, being engagingly written, following the personal struggles of the leaders of the company as they pioneered a new kind of phone but lost the plot. It ends just before the BlackBerry Passport phone (my current phone) was developed. A lot has happened since the book.

I have Piloting Palm: The inside story of Palm, Handspring and the birth of a billion dollar handheld industry by Andrea Butter and David Pogue on my desk. Going by the first few pages, it looks like a treat. I’m not sure what else I’ll be able to do to feed my nostalgia except wait for someone to write a history of the BBC Micro.

I got my annual dose of Haruki Murakami: After Dark and Dance! Dance! Dance. Both other-worldly, crepuscular, recommended. Having read a few of his books they still feel subversive to the genre. If you’re into les défis personnels, I also recommend Narrow Dog to Carcasonne and Year of the Hare, set in France and Finland respectively.

Ian Martin’s Epic Space, a monumental piss-take of architectural criticism and planning policy, is a rare gem of a book. Production of the book was crowd-funded, and you can read all of the backers’ names in the appendix. And if you don’t know about Ian Martin, you really should look hard at yourself in a mirror and tell each other so to do.

This year’s politics have provided a lot of food for thought; not much of it easily digestible. Two books have helped me make sense of it all.

Difficult Conversations: How to Discuss What Matters Most by Douglas Stone, Bruce Patton and Sheila Heen, is a book about difficult conversations.

The result of a comprehensive, long-term study into conflict resolution (from domestic disputes to international incidents) at Harvard, it catalogues the various ways in which conversations can go wrong. It struck a chord with me in a way that few books ever have. There’s no way of saying this without sounding pretentious, but it expressed a philosophy of communication that I have always aspired to, and often failed to attain. All backed up with illustrations.

Lots of public conversations, and what seems like an increasing number, seem to be based around antagonism. It’s especially difficult to watch people on the left, who have ideas that I would dearly love to see put into practice, sabotage a chance to effect them. The advice in the book is a lot to ask for.

I honestly feel that everyone should read this book.

Thinking Fast and Slow by Daniel Kahneman is a book about two modes of thought, called systems 1 and 2. System 1 drives your car and makes you run from lions. System 2 helps you ponder, write and solve things. Unfortunately system 1 too often gets involved and makes a hash of things, over-extrapolating, being fooled by marketing tricks. Social media does its best to coax everyone into system 1 thinking because it’s easy. There is so much I could say about this book, but it’s getting late.

Everyone should read this book too. There’s a good taster on the Wikipedia article.


Happy new year. See you on the other side.

Work blog pieces:

Personal blog pieces:

Read more