Crowdsourcing update – June 2015

It’s Daddy Daycare time here at Duxfield Towers so I can work on a journal article I wrote before Mini Medievalist made her entrance into our lives and I was launched into a world of nappies, night-feeds and more mummy coffee mornings than I care to admit to. I’m back at the dining room table trying to get in the crowdsourcing zone to address the corrections and edits the reviewers asked for, but sitting far closer to the laptop than I could when I wrote the article. Happily, most of the changes needed are on the minor side and the feedback from the two reviewers is serving as a good way of helping me to cast fresh eyes not only on the article, but on how the Estoria project uses crowdsourcing.

The mysteriously named ‘Reviewer 2’ asks why we have not yet made use of strategies involving competition between volunteers. I have mulled this over all week since the reviews arrived in my inbox, mostly whilst washing up, if I’m quite honest. It has led me to consider to what extent there is a feeling of community between volunteers. We have worked hard to ensure that volunteer transcribers feel part of the transcribing team and that it is made clear that their input is valued, because it is valued, and they are as much part of the team as the paid Estoria transcribers, they just form a different branch. We have aimed to create an atmosphere of camaraderie between volunteers and paid staff, but does this extend to camaraderie between volunteers yet? I think the answer is probably no, or if it does, not much.

There are several reasons for this: there are not that many volunteers yet, and their transcribing questions often come directly to their assigned research associate rather than going straight on the bulletin board. Also, the nature of the task of inputting XML tags which must be consistent and parse, means that for a while, any new transcriber, both volunteer and paid, ask questions which require a ‘correct’ answer. It is not until far later in their tagging journeys that they are able to suggest tags which will work within the system. This means that at the moment, crowdsourcers are reliant on the knowledge of paid transcribers, and often the senior academics in the team rather than the graduate students. The upshot of this is that it creates, or maintains, a degree of hierarchy between transcribers which no amount of camaraderie development can break down past a certain stage. And neither should it. Estoria team meetings often involve talking in great (great, great, great) depth about seemingly minuscule changes in tags, in what certain abbreviation marks mean, and how they should be represented in XML, and sometimes it is the case that no-one is right because no-one is wrong, they are just often coming at it from a different angle. (The main thing I have learnt in my two years as a PhD student is that Academics Love a Row. Then they stop rowing and make one another a coffee.) When no-one is right we reach an impasse, defer to the highest in the team to make the decision and then, crucially, all stick to that decision. This is vital in collaborative transcribing tasks. For this reason we cannot lose sight of the hierarchy. Aengus might make the tea or turn up with cakes, but at the end of the day he is the gaffer. It is his name in the biggest letters on the project, and he has spent years earning his stripes. We need a leader to steer the ship, so we need to maintain a degree of hierarchy, whilst ensuring that all team-members feel able to approach every other team-member with questions, comments or suggestions. This is an atmosphere we are working hard to foster at the Estoria project, but the currently small number of active volunteer transcribers means that whilst there is a necessary hierarchy between the chiefs (i.e. the senior academics), the graduate students and the crowdsourcers, at the moment we don’t have quite enough Indians for there to be the level of camaraderie between the volunteers that we would eventually like to see. If there were more crowdsourcers it would be more likely that some of their questions would go straight to the bulletin board rather than to their assigned graduate student, and these questions would then be more likely to be answered by any member of the team, whether that be a research associate, fellow, senior member of the team, or another more-experienced volunteer. This would not remove the hierarchy, but it would change the chiefs:Indians ratio, making volunteers, particularly more experienced volunteers, feel more empowered to answer other volunteers’ questions. In my previous incarnation as I teacher I used to encourage my students to use the following strategy when they had questions rather than immediately sticking their hand in the air: Brain, Book, Buddy, Boss (I cannot take credit for making that up – I stole it from a poster in somebody else’s classroom). If you don’t know the answer (brain), look through your notes (book, or in our case, the Transcription Guidelines). If you can’t find the answer there, ask a buddy. If your classmate doesn’t know, ask the boss. In this story, as the teacher, I was the boss. Ah, how times have changed. In the Estoria project, we often skip the buddy phase and go straight for the boss, which removes the group-work atmosphere we are trying to create. If we had more volunteers there would be more buddies to ask, there would be more people on the bulletin board posting and answering questions, and we could make much further use of this important stage for building the sense of a community of volunteer transcribers.

Another reason why we don’t do competitions between crowdsourcers (yet?) is that our transcription platform doesn’t allow us to do it at the moment. That is not to say that if we wanted to, we couldn’t develop such tools. We could. Or if we couldn’t we could certainly find a friendly computer scientist who could. We haven’t done so yet, however, mostly due to issues of man-power. We don’t want sloppily transcribed, rushed folios. To date our volunteers are pretty good at working accurately, but still, every transcribed folio is moderated, regardless of who transcribed it. We don’t yet have the human resources to moderate transcribed folios at the rate we would need to for users to get the quick feedback they would require for a competition pitting volunteers against each other to work. Folios cannot be moderated mechanically because the computer just can’t do the palaeography well enough yet, so all transcribed folios need to be moderated by humans, and these humans are also trying to write doctoral theses, so the process can be rather slow at times. This removes the instant(-ish) gratification that volunteers who are motivated by competition would get. If the project were to receive further funding more specifically aimed at crowdsourcing we may be able to employ someone whose job is primarily to moderate volunteer transcribed folios, in which case we could develop an Estoria transcription competition.

What we could focus more on is how each transcribed folio works towards a common goal of transcribing various copies of the Estoria de Espanna, perhaps giving a percentage of the task completed by the whole team, and counting how many different transcribers have worked on the task as a whole. This would help to develop a competition with ourselves, rather than between ourselves, as we work together for the common goal. It may also help volunteers to see what a large team they form part of, and that the number of chiefs is far smaller than the number of Indians, even though communication from paid team-members is far more visible than from other crowdsourcers, which can give the impression that the opposite is true. This is something that we certainly could work on, even with current levels of funding and manpower, using the current transcription platform, and that would be beneficial to the project. It is something that many moons ago we had briefly thought about doing, so perhaps Reviewer 2’s comments will give us the push we need to actually bring it to fruition. If it does, it will just go to show how important it is within academia to showcase your research, because the feedback and questions received from reviewers and conference attendees can often be a springboard to improvement, and that can only be a good thing.

One thought on “Crowdsourcing update – June 2015

  1. Nick

    Thanks for the post Polly. As one of the volunteers, maybe I should offer some perspective from our end (or at least from my end).

    Regarding communication/camaraderie between volunteers, I think you make a good point about the volunteers needing ‘correct answers’ from more experienced transcribers, especially in the beginning.

    Some other points are:

    – I like to transcribe a folio in one go, so I prefer to collate my issues and still be able to hit ‘send’ on the email and feel as though I have finished something and can tick it off my list, rather than having to wait for answers on the bulletin board. This goes back to the instant gratification idea that you mentioned.
    – I often have technical issues with the bulletin board which doesn’t motivate me to use it much. Plus it doesn’t seem as though the staff use it that often either.
    – I think volunteers might be a bit embarrassed to post their questions in a more public forum in fear of asking ‘silly’ or newbie questions.
    – Personally, I’m not sure I see ‘competition’ between volunteers as something that would motivate me much, if at all, especially if team spirit and camaraderie is what we’re aiming for. I try to transcribe one folio a week if I can, and whether or not that’s possible depends more on time rather than motivation. I find the percentage idea more interesting, or perhaps just more information in general about where the project is headed, what the stages are and how we all fit into it.

    Looking forward to getting back into the Estoria as early as next week when I get back from my assignment in Baku.



Leave a Reply

Your email address will not be published. Required fields are marked *