Skip to content

Crowdsourcing transcription and translation

Sara Gaudon edited this page Jul 18, 2016 · 19 revisions

A guidelines for consultants

Overview

Crowdsourcing is the process of dividing a project into many discrete tasks and distributing those tasks among a large pool of non-expert workers, who complete the project quickly and at a relatively low cost. Crowdsourcing transcription or translation work involves four primary steps: dividing the audio into short clips, making the clips available to workers, connecting with workers to get the transcription/translation done, and re-aggregating the mini-transcripts produced in order to create a complete transcript.

While the workflow is fairly simple, it’s spread across three different platforms – an audio editor, a file sharing site, and a crowdsourcing platform – each of which requires some learning. This overview will provide details on each of the steps involved.

Note that these instructions refer to detailed transcriptions or translations of audio files. If you are taking rough notes from an audio recording or assigning themes in a content analysis, the instructions should be modified accordingly.

Workflow

  1. Edit and segment the audio for sharing

  2. Upload the audio clips to a file sharing platform

  3. Design and launch the project, by using a crowdsourcing platform to:

    a) design the project (select a pool of workers; set project parameters; title your ‘campaign’)

    b) provide instruction to workers

    c) direct workers to audio clips

    d) launch the project

    e) approve the finished transcripts for payment

    f) download the transcripts in spreadsheet form

  4. Paste the transcripts into a document and edit where needed

Instructions

1. Edit and segment the audio

Edit the audio

Audio that’s appropriate for crowdsourcing is adequately loud, clear, and free of extraneous data. Because the workers transcribing or translating the audio may be listening under non-ideal conditions – in a public place, through cheap earbuds etc. – it’s important that the audio they receive is of optimal quality. In addition to improving audio quality, any long pauses, interruptions, or digressions should be removed prior to segmenting the audio, so that workers are only paid to transcribe data that is useful.

Adjust and edit audio using Audacity. The process involves amplifying the track and slowing the tempo slightly, which reduces the rate of speech without affecting its pitch. This second step renders rapid or accented speech more intelligible to non-expert listeners, thereby making for more accurate work. Once the track is loud and clear, cut any unwanted audio and reduce noise where necessary. Audacity provides clear instructions for each of these actions.

Once the audio has been prepared for listening, save the edited track in its entirety as a WAV file. The track can then be revisited later, when checking and editing the crowdsourced transcript.

Segment the audio

Audacity allows users to segment tracks by creating labels at points where the track is to be cut, then exporting these labeled portions as discrete, numbered audio files. In deciding how long each segment should be, one must balance the needs of the non-expert worker against those of the person tasked with creating, exporting, and uploading potentially hundreds of files. Very short clips are easier to transcribe or translate, but they require more management than do fewer, longer clips. We generally recommend that audio segments be cut at around 15 seconds. But, if your audio is not confidential you may want to use longer segments, from 30 seconds to several minutes.

Audacity’s manual and tutorials provide technical instructions on creating and exporting labels. Remember that your audio needs to be cut at pauses between words. Audacity does provide a way of automating this task, but we haven’t been able to make it work on audio that is noisy and includes multiple speakers. So instead, we suggest you impose labels at regular intervals of 15 seconds, then align each label manually with a dip in the waveform. Then we create a new folder for each interview segment, and export labels to that folder as MP3 files.

2. Upload the audio to a file sharing platform

Once you’ve segmented your audio, you need to make the clips available to workers. We currently use Dropbox, which allows the uploading of multiple files at one time, and supports file sharing by assigning each file a unique URL. Once you have uploaded your audio to Dropbox, copy and paste the URL for each clip in sequential order into a spreadsheet. Afterwards, you can upload this spreadsheet to a crowdsourcing platform, which will then channel each of the URLs to a separate worker. When workers click on the URLs, they are taken to a Dropbox page where the clip is streamed.

3. Design and launch the project

Crowdsource transcription

You can crowdsource transcription through microWorkers.com. Microworkers (MW) is a bare-bones site, but it offers access to an international pool of workers, free email support, intuitive video tutorials, and templates for various kinds of jobs including transcription/translation. Once you’ve created an account with MW, allow a few hours for exploring their guidelines and FAQ, watching their video tutorial on creating a translation job, experimenting with the site and corresponding with their help team as you learn.

Microworkers refers to a crowdsourcing project (i.e. an interview for transcription) as a ‘campaign’. The discrete units of work (i.e. mini-transcriptions) which workers are assigned are called ‘tasks’.

Select a pool of workers

The first step in creating a new campaign is to select a pool of workers to whom the project will be made available. For skilled work like transcription/translation, MW recommends creating a ‘Hire Group’ campaign. After selecting this option, you can choose workers based on their nationality, the city where they live, the number of tasks they’ve performed, and the average rating their work has received from employers to date. If you’d rather not consider these details, MW provides a number of pre-defined groups from which to choose (‘Top India workers’, ‘All German workers’, etc.).

Our advice on selecting workers is to begin with a broad pool, then refine the group over subsequent campaigns. We opened our first transcription campaign to all Indian workers, kept notes on workers’ performance, and winnowed out poor workers over several rounds of transcriptions to create a reliable group.

Set project parameters

MW next asks you to categorize your campaign, select or create an instructional template (we discuss this in the next section), and estimate the time that each task will take to complete. You then enter the number of tasks available, the number of tasks that each worker may do, and the amount they’ll be paid for each task.

  • In estimating time allowances, we use a formula of 10 x real-time audio. In our experience, this is a reasonable speed for non-expert transcription, particularly if your audio clips are short. The time allowance will be lower if the worker is just tagging content with themes or making rough notes from the audio recording.

  • Limit the number of tasks each worker can complete to just a few initially. Once you’ve come to trust a number of workers and have refined your group accordingly, you can allow each worker to take on a larger proportion of your project.

  • It may take some thought and experimentation to determine a level of payment that is ethical, attractive to workers, and affordable for you. We offer workers $0.15 to transcribe 15-second clips. This works out to an hourly wage of about $4, which is reasonable in an Indian context.

MW provides a quote for the campaign based on the number of tasks to be completed and your rate of pay. You must deposit adequate funds in your account in order for your campaign to run.

  • Choose a title for your campaign that is descriptive and concise. “Transcribe Hindi audio clip (10-20 sec)” has worked for us.

Provide instruction

MW asks that you provide instruction in the form of a template, from which you can draw repeatedly as you run subsequent campaigns. Guidance in creating a template for transcription/translation is available here.

The template for transcription is appended. We include a two-sentence description of the job, identifying the task involved and the larger project to which it will contribute. We then provide very concise, basic instructions for transcription, followed by a number of more subtle notes.

Direct workers to audio clips

Workers encounter your audio clips through a link that is embedded in your instructional template. MW asks that you upload audio links in the form of a spreadsheet, which you must save as a CSV file. A template is provided for download. In uploading your completed file, you have the option to direct workers to audio links in a random order. Should you select this option, the finished CSV will still display the finished micro-transcriptions in sequential order.

Launch the project

Having designed your project, you are ready to submit your project to MW for approval. This process may take several hours or longer.

Approve the finished transcripts for payment

Once you’ve launched your campaign, MW allows you up to 7 days to approve your workers’ submissions. Approving tasks is an essential step in evaluating workers and weeding out low-quality work. Any work that does not meet your standards can be marked ‘unsatisfactory’, in which case the worker responsible will not be paid, and the task will be opened to other workers instead. Unsatisfactory ratings do impact workers negatively, so use this option only for work that is clearly fraudulent or very poorly done.

After 7 days, all work is automatically approved and the workers paid.

Download the transcripts in spreadsheet form

Following approval of all submitted tasks, you may download your finished mini-transcripts as a CSV file.

4. Paste the transcripts into a document and edit where needed Complete your crowdsourced transcript by copying the mini-transcripts from your CSV, pasting them into a word processing document, and editing for punctuation/style and accuracy. Check the transcript against your edited audio recording (saved as a WAV file in step 1), and correct any errors as you go. This may not be necessary if your requirements around accuracy are relatively low.

Congratulations! You have crowdsourced your transcription/translation job.

Further reading

There are many technical papers available on crowdsourcing translation/transcription, but these are the two pieces that we found most practical and helpful:

Parent, G. (2013). Crowdsourcing for speech transcription. In M. Eskénazi, G.-A. Levow, H. Meng, G. Parent and D. Suendermann (Eds.), Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment (72-105). Oxford: John Wiley & Sons.

Wray, S., Mubarak, H., and Ali A. (2015). Best practices for crowdsourcing dialectal Arabic speech transcription. Proceedings of the Second Workshop on Arabic Natural Language Processing (99-107). Retrieved from: http://www.aclweb.org/anthology/W15-3211

Clone this wiki locally