Saturday 22nd January 2011

Those of you with a technological bent may be interested to know that I’ve packaged up a lot of the data behind this site into computer readable form that will make it easy for people to make new things.

There’s a lot more detail in the README file, which is also included in the 6MB zip archive of the data (no need to download that if you’re not a programmer-type person!).

All the files are in JSON format and include:

  • The text of more than 3,000 diary entries, including footnotes and links to relevant Encyclopedia pages.

  • A list of the categories used in the Encyclopedia, with their structure.

  • All the data about the more than 4,000 topics in the Encyclopedia: names, descriptions, Wikipedia links, latitude/longitude, shapes, categories, etc.

  • More than 300 thumbnail images of people included in the Encyclopedia.

That should be plenty of stuff for someone who likes exploring data to get their teeth into. We already have maps of places in the Encyclopedia and graphs of how often each Topic appears. But there must be a lot more interesting (and probably beautiful) things that could be done with all this. Pepys’ Shows, which I mentioned the other day used an early release of some of this data.

Do you have any ideas, even if you wouldn’t know how to make it yourself?

3 Comments

Paul Chapin   Link to this

Phil, this looks like a great addition to make the contents of the site more useful for future scholars. I was just reading an article in _Science_ detailing some early exercises in what people are calling "culturomics" working with the Google corpus of books (about 4% of all the books ever published), and it's replete with fascinating findings. This is clearly an important new direction in, and tool for, humanities scholarship. Congratulations for being a pioneer.

Just a couple of comments about the data package as you've described it. First, I see that you've included the number of comments for each diary entry, but not the comments themselves. Those of us who have been writing comments have been hoping (I believe I speak for more than just myself) that those comments would remain part of the record and add to the value of the diary for future users. I don't know if it's technically infeasible to include them in the database, but I have some concern that if the chief way that the diary is available to future readers is through the database, the comments will disappear from view.

A second minor point: in your "location" example of an encyclopedia entry, I believe you have the latitude and longitude of New Palace Yard reversed. I don't know if that's a one-off error, or reflects a structural problem in the database that needs correction.

Many thanks, as always, for your leadership in this wonderful effort.

Phil Gyford   Link to this

Paul, thanks for the kind words.

I would like to add the comments to these files but this was kind of a first draft, which I was trying to get ready for History Hackday http://historyhackday.org/ in case anyone there would find it useful.

The best way of preserving stuff online seems to be to make many copies of it. So distributing everything like this should hopefully ensure that if this site disappears (eg, after I die!) then the data, including everyone's contributions in the form of annotations over the years, will still exist.

And thanks for the New Palace Yard correction -- that's purely my typo in the file, rather than a data problem. I've fixed it now.

Phil Gyford   Link to this

Over a year later... I've now added annotations to the exported data.

Log in to post a comment.

If you don't have an account, then register here.