Another Year

· 2012 Words

I think that summer is a far better host to ‘new year’ than winter. If you’re lucky enough to be able to take time out to enjoy it, it offers a chance to stop whatever you were stuck doing for a little while and think about it. A reset, and chance to look backward on the last year and forward on the next. For me, winter is all about hard work, when everything’s an effort. Hardly the time to stop and think. I’d much rather do that basking under the sun than huddled round a fire.

And for me, this new year, the first in my third decade, feels definitive: I got married, celebrated my first complete year of keeping bees, two years living on our no-longer new boat, and three years working at Crossref. I have also not touched a single biscuit (cookie for the Americans in the audience) or can of IRN BRU (I couldn’t possibly explain, you’ll just have to try it) or anything overtly sugary for a whole month, which is, in its own way, an achievement to rival, if not best, any of the above.

Three years at Crossref have given me three years to work out the answer to the question “what do you do?”. The short answer, of course, is to make reference to my contract of employment and say “I’m a programmer”. My business card bears only the rather enigmatic words “Strategic Initiatives” (which my boss, Director of Strategic Initiatives, insists on pronouncing “new shit”).

But going beyond that, I’ve not yet found an adequate and concise answer that doesn’t come out in a jumble of words. On a normal day I can’t even pronounce “strategic initiatives”.

When I started my blog I said that I wasn’t going to engage in navel gazing (I can’t stand it in myself or others) but here I am, eyes wandering in the direction of my umbilicus. This has been a significant year and I want to describe how I’ve spent it. Now when people ask “what you do you?”, I can point them to my blog and say “I write things like this on my blog”.

Another year at Crossref

My job title is “R&D Programmer”. This means that I research things in scholarly, often scientific, publishing and sometimes I develop them. I’m lucky to work in an environment where I get to work with the big picture right down to the details. This means that whilst I write programs for my job (I feel the same way about the word ‘coding’ as I do about navel-gazing), I spend at least as much time talking or writing about it. I really like building software, but I also like writing words, and discussing things, and sometimes getting up on my hind legs and telling an audience about them. This year I’ve had the chance to do all of the above.

The three big things I have devoted myself to this year have been Linked Clinical Trials, Crossmark and the Crossref Event Data service. I have been working all three for about two years, but in the last year things have really come to a head.

Of course I’m writing this on my own behalf, blog and behest.

Linked Clinical Trials and Crossmark

I am responsible for the technical side of Crossmark. Occasionally I find myself sitting at home reading an article and I notice, with a jolt, the Crossmark button. It happened last week, when I was reading the article on Bagpipe Lung on the British Medical Journal’s website. When you click on the Crossmark button it tells you if the article’s been retracted since it was published, and any other information that publishers want to share (unfortunately this ones’s a bad example as at the time of writing it looks like it’s still being processed).

This year Crossmark had a complete overhaul, with all new insides and a new coat of paint. Pixel-tweaking is not in my top-100-list of things to do, so I got in touch with an old colleague and we contracted out the browser work. I love products with great visual design, but I’m happy to let someone else make make them. I stick to the insides, if you see what I mean.

At the same time, the Linked Clinical Trials project drew towards its release. The idea is to allow publishers include links between their articles and the clinical trials they concern. When you get a number of articles about a clinical trial, possibly across different publishers, you can then see the thread of articles that were published before, during and after the trial. The links are shown in the Crossmark dialog box.

I spent a lot of time researching clinical trials, how they’re identified in various countries and what role the World Health Organisation plays. I have stared at thousands of clinical trial numbers. I worked with people at PLoS, Elsevier and other publishers to help them identify when they had references to clinical trials, and how to clean up the data they had. We met with Ben Goldacre and the people behind It was a really interesting collaborative project, and the result has been over 4,000 articles with Clinical Trial links have been deposited by various publishers and are being shown in the Crossmark dialog box.

Crossref Event Data

Crossref Event Data is a service to collect and distribute things that happen around scholarly articles. I’ve written a lot about it in the user guide (which is hopefully easy to understand). When someone talks about an article on a blog or social media, we collect it. It’s very exciting subject, as it affects the way that scholarship and particularly science is appraised and measured. Background info in the altmetrics manifesto.

I am the technical lead on Crossref Event Data, which means I’m responsible for working out what infrastructure and software needs to be made, making it, and making sure everyone understands what it does.

I started 2016 with a visit to Hannover to meet Martin Fenner, Technical Director at DataCite. We had a four day summit on how we develop Lagotto, the software that underpins Event Data, and how DataCite’s Event Data works with Crossref Event Data. The whole project is a joint venture between our two organisations, who both play a large part in the space. We met again in Berlin a few months later as the wider team from Crossref and DataCite came together for further discussions.

Earlier in 2015 I was invited to join a working group on the NISO Altmetrics Initative. In the working group we discussed issues around data quality, particularly with regard to enabling the consumers of data to audit the quality of the data themselves, and a code of conduct under which data providers can make certain declarations of transparency. I developed the architecture of the Event Data product as discussions developed within the working group. The result of the initiative was a recommendation for a Code of Conduct for altmetrics data providers, including examples the nascent Crossref Event Data. It will be published soon.

In May I attended CSV Conference in Berlin, and gave a talk about the approach I’m taking to crunching our DOI logs and what data I’m able to get out of it. I wrote a piece about it on the Crossref blog. One of the results the data showed was that the collaboration with Wikipedia, which involved working with them to change all the article links that use Crossref DOIs to be HTTPS, had borne real, observable fruit. This is the culmination of a collaboration that started in with a conversation with Dario Taraborelli, head of Research at the Wikimedia Foundation at the 2014 Altmetrics Workshop in San Francisco and continued at the Wikimedia Hackathon in Lyon last year.

After a brief stop-over to our Boston MA office, it was back to Berlin for WikiCite 16, where we talked about how citations are made within Wikipedia and how to improve them. Of particular interest was how to keep track of references that occur across different Wikipedia language sites and Wikidata. As we were both in the same town, Dario and I wrote a joint blog post about how the data showed that the whole project worked.

I also met with Daniel Mietchen in Berlin (and later Lauren Maggio in Oxford) and joined their research group looking into how citations of medical literature in Wikipedia affect educational outcomes. Hopefully we’ll publish and interesting paper or two.

Meanwhile, I’ve been developing the system architecture for Event Data. I’ve looked at concerns like making a reliable system with demonstrable transparency that will be cheap and easy to run. We’re making good progress. I have written thousands of words and thousands of lines of code, but there will be more.

The project has been very exciting on the R&D side. I got to play with a Twitter subscription (and talk to their very friendly sales engineers), prototype agents for following blog RSS feeds and conversations on Reddit.

I’m also on the Distributed Usage Logging committee, providing my tuppence on the technical side of how usage data should be reported to publishers. Similar but different to Event Data.


There is still time to tinker on R&D projects. In July I relaunched the Chronograph project, which charts the use of DOIs. In September Reddit made a data dump available and I jumped at the chance to do some quick analysis. When the Raspberry Pi Zero came out I got one before they sold out and made a live Wikipedia cite-o-meter.


In a few short weeks’ time I’ll be heading to Bucharest for the Altmetrics conference, workshop and hackathon. With a sneak preview of Event Data under my arm. I’m the first speaker at the Altmetrics Workshop and on the panel for the final session at the 3:AM Altmetrics Conference.

If you want more detail on the above, you can read my Crossref blog posts which I write sporadically.

And what do you do to relax?

And besides work, our wedding was everything we could possibly have hoped for, Skint Festival (I’m on the committee and run the ticket system) is happening again, Coldharbour hosted two evenings of dance at Oxford Folk Weekend, I released a completely new version of and launched the new website for the Bagpipe Society (I’m on committee), and I’m learning C system accordion.

I have read:

  • Kafka on the Shore by Haruki Murakami, which I recommend very highly
  • Hard-Boiled Wonderland and the End of the World by Harui Murakami, which I recommend a little less highly but is nonetheless an intruiging journey
  • The Count of Monte Cristo by Alexandre Dumas, which should be set reading for everyone
  • Rant by Chuck Palahniuk
  • On the Road by Jack Kerouac, which is quite good once you get the rhythm of it
  • Slaughterhouse 5 by Kurt Vonnegut which is an extraoridinary and important
  • The Philosopher’s Pupil by Iris Murdoch, which was interesting
  • I’d recommend them all, except perhaps The Philosopher’s Pupil.

And, as I say, I’ve somehow managed to stop eating biscuits.


Ginny Hendricks, September 1 2016

Yay Joe for doing an annual review in the relaxing summer and not the frantic new year. Now that you are married (congrats!!) you will have to have an annual “State of the Union” discussion like in my house (I am not kidding). No biscuits either.

I am loving your work at Crossref even though I’ve only witnessed it for one of your years there. I’m so happy you got in some tinkering time. It’s important.

Also, I cannot believe you think that Slaughterhouse 5 is important. I thought it was hogwash when I read it, total self-indulgence on the part of its ridiculous author. I’m sure that means I didn’t “get” some profound thing though and look forward to you setting me straight next time I’m in Oxford. I would recommend “The Portable Veblen” in return. But I am on holiday so I take no responsibility for the books that I read.

Read more