Transcription and GDPR: transcribe interviews without exposing your data
Transcription and GDPR: transcribe interviews without exposing your data
Apple unveiled a redesigned dictation and voice assistant this week. One detail went almost unnoticed: these features will not be available in Europe, for regulatory reasons. The signal is clear. The moment you touch voice and personal data, the European framework sets the rules.
And transcribing an interview is exactly that: processing personal data. A recording contains a voice, a name, sometimes sensitive information. Turning it into text does not change your responsibility. This guide explains what the GDPR expects from you when you transcribe, and how to reduce the risk in practice.
Why a transcription falls under the GDPR
An interview recording contains personal data
A voice is personal data. So is a spoken name, a mentioned employer, a referenced place. As soon as a recording can identify a person, directly or indirectly, it falls within the scope of the GDPR.
The transcription does not remove that status. The text file remains a processing of personal data, just like the original audio. You are responsible for both.
The case of sensitive data
Some interviews go further. A health research interview, an HR conversation about a sick leave, a legal testimony: these can reveal a person's health, opinions, orientation, or judicial situation.
The GDPR classifies this information as sensitive data. Its processing is more tightly regulated. The basic rule: collect only what is necessary, and protect what you keep.
The concrete obligations when you transcribe
Legal basis and informing the people involved
You need a valid reason to process this data. For research, it is often consent or a framed legitimate interest. For an HR interview, the performance of a contract or a legal obligation.
The person interviewed must know that the conversation is being recorded and transcribed, and what the transcription will be used for. A clear statement at the start of the interview is enough in most cases.
Retention period for recordings and transcriptions
You cannot keep the files indefinitely. Set a period, justified by your purpose. A study ends. A case is closed. Beyond that, the audio and the transcription must be deleted or irreversibly anonymised.
Write that period down somewhere. In the event of an audit, it is one of the first points checked.
Hosting location and transfers outside the European Union
This is the most often overlooked point. When you send an audio file to a transcription service, where does it go? Many tools process data on servers outside Europe, which triggers additional transfer obligations.
A service hosted in the European Union simplifies compliance. Your data stays within the European framework, with no transfer to manage. Before choosing a tool, ask about server location. The answer should be easy to find.
The right habits by profession
Social science researcher
Anonymise the verbatim before sharing it with your team or archiving it. Keep the anonymisation key separately if you need to re-identify participants. Check that your transcription tool can export into your analysis software format without a manual copy-paste that multiplies copies of the file.
Lawyer and HR department
Limit access to transcriptions to the people directly involved in the case. An HR interview about an individual situation does not need to circulate. Set a retention period aligned with the nature of the case, and delete at the deadline.
Journalist
Source protection comes first. Anonymise any passage that could identify a sensitive contact before any sharing or archiving. Avoid storing raw recordings longer than necessary on services whose location you do not control.
Anonymising a transcription: step by step
Identify the segments to mask
Reread the transcription and spot anything that can identify a person: surname, first name, employer, precise role, place, specific date, unique biographical detail. Taken together, these elements are often enough to recognise someone even without their name.
Anonymise before sharing or archiving
Replace or mask these segments before the transcription leaves your workstation. An editor that lets you select a passage and anonymise it directly avoids mistakes and saves time. This is one of YobiYoba's built-in features: select a sensitive segment and mask it, without handling the file across several tools.
Anonymisation done upstream protects the person and protects you. Once a passage is masked, it no longer circulates.
Choosing a GDPR-compliant transcription tool
Three concrete criteria to check before you commit.
First, hosting. Are the servers in the European Union? If so, you avoid the question of international transfers.
Next, anonymisation. Does the tool let you mask sensitive segments directly, or do you have to patch it afterwards in a word processor?
Finally, exports. Can you retrieve the transcription in the format you need (text, subtitles, ELAN, PRAAT) without duplicating the file at every step? The fewer copies there are, the lower the risk.
In practice
The GDPR compliance of a transcription does not rest on one more legal document. It rests on a few simple decisions made at the right time: inform the person, choose a tool hosted in Europe, anonymise before sharing, delete at the deadline.
This week's news is a reminder. The European framework is demanding, and it will not loosen. Better to build these habits into your workflow now than to scramble for them under the pressure of an audit.
YobiYoba was built for this context: hosting in Europe, manual segment anonymisation, multiple exports without needless duplication. Everything you need to transcribe your interviews while keeping control of your data.