I am currently around two thirds of my way through my doctoral research project at BCU. Working out of BCMCR, I am looking at the experiences of popular music listeners, with a particular focus on digital technologies. The main means through which I do this is The Harkive Project.
Harkive, originally developed during my MA studies at BCU, is an online, crowd-sourced method of gathering information from people about the detail of their music listening. The project occurs on a single day each year and invites people to tell the ‘story’ of their experience with music across the day. Since launching in 2013 the project has gathered over 8,000 stories. The project became the focus of my AHRC Midlands3Cities-funded PhD in 2015, and it returns for its fourth annual run on 19th July 2016.
One of the main challenges of the PhD project is the need to devise a means by which the insights held within the stories people have told the project since 2013 may be revealed. The largely text-based data collected represents a huge challenge in that regard, leading to a methodological focus on collaborative and experimental analytical methods. Such an approach is by no means unique to this project. Academic researchers in a number of disciplines have been embracing new methods and experimental approaches for several years, leading to the genesis of entirely new fields: Social Computing; Digital Humanities; Cultural Analytics. At the same time, barriers to entry and access in terms of data collection, storage and analysis, are falling, enabling people to critically and artistically engage with data in interesting ways. Think of terms such as Citizen Data Science, or movements such as The Quantified Self.
Harkive and the doctoral research project that underpins it, resides somewhere within the broad and emerging area described above. What makes this exciting for the project is that, just as the landscape of modern popular music is a fascinating and dynamic space, so – increasingly – is the field of human-data interaction.
To put all of this another way, just as the stories Harkive collects are ‘crowd-sourced’, one avenue my project is keen to explore is to see if perhaps some of the analysis may be derived from a similar method. What questions would other people like to ask of this data? What could be built with it? What would it sound like as a piece of music? These are the questions that come from having an interesting data set! There are, however, more possibilities and questions than there is time. It is with this in mind that, together with Nick Moreton, my colleague from the School of Media at BCU, we have created the Harkive API, full details of which are provided below.
For those of you reading who may be unaware of the function of an API (Application Programming Interface), in simple terms it allows access to data in a structured, reliable way, so that applications, visualisations and other online tools (and even pieces of music) can potentially be created by making use of the data. The crucial point is that although the data held within an API may change over time, the structure the data is held within remains constant. This means that anything built upon an API is able to change dynamically in line with changes in and to the data, without necessarily having to change its own structural dynamics. APIs are thus powerful tools for developers and, increasingly, academic researchers.
Data Visualisations created with Harkive API
A better way to understand the above is to look at the small number of visualisations that have been built by Nick using the Harkive API. These are being hosted on a dashboard at www.harkive.com and relate to the Harkive 2016 data. This data is and will be dynamic – as people tell their stories, they will generate more data – but the structure of the API remains the same. Because the visualisations on the dashboard are built with the API, they will change as more stories are gathered.
Here are some examples:
Story Sources: will display the ratio of total stories according to the various submission methods. For a full list of the available story-telling methods, please visit the How To Contribute page on the Harkive site. From the screenshot below, however, it is easy to see the dominance of Twitter in terms of conversations about Harkive, but these ratios may change on 19th July as stories begin to be posted elsewhere.
Harkive Around The World: will display details of Tweets sent with the #harkive hashtag. Although limited to Twitter users who have enabled location settings, this nevertheless provides an idea of the reach of the project and engagement with it.
WordCloud: Following automatic removal of Stopwords and other phrases (incl. the word Harkive, which features prominently in collected posts), this visualisation will produce a Wordcloud based on the content of Harkive stories. As the screenshot below shows, ‘tell’ is a prominent word at this point in time, and this is because of the promotional posts (and shares of those posts) encouraging people to ‘tell their story’ to Harkive ahead of 19th July 2016.
The basic examples above demonstrate some of the many ways that different levels of insight can be derived from data. They represent, however, only the tip of the iceberg of what is possible.
Shortly after the 2016 story-gathering element of the project ends next week, I will begin the process of sorting, cleaning and analysing the data. For the purposes of the immediate concern of the PhD project, this analysis will proceed according to three broad themes: Formats and Technology; Data, Privacy, Identity and Ownership; Recommendation and Discovery. Beyond that, I believe there a great number of ways in which this dataset could be put to use by researchers interested in popular music and digital culture. It is the intention that the API is the first step towards Harkive becoming a useful resource.
If you have any questions about the Harkive API, or would like to discuss potential collaborative work, please do get in touch: [email protected]
Further information on the Harkive API
Documentation is available at http://developer.harkive.com.
The Harkive API allows developers and researchers access to limited elements of the data collected by The Harkive Project. In particular, and based on the Research Ethics underpinning the project, the API does not provide access to personal information gathered by the project.
The API currently contains only data collected by the 2016 instance of Harkive. Data from 2013-2015 has been prepared according to the structure of the API and will be added shortly after Harkive 2016.
The automated collection methods that place new data within the API structure at present capture everything related to Harkive, so will necessary include tweets (and other types of posts) that mention the project, but may not necessarily be a music listening story. Although tweets sent from the official @Harkive twitter account – the main source of non-story content and traffic – have been excluded from certain counts in the visualisations, anything posted by others online ahead of Tuesday 19th July will be displayed. This data is included at this stage primarily to demonstrate the API and visualisations. Shortly after 19th July, the 2016 data contained within the API will be sorted and cleaned, leaving only stories.