How to automatically transcribe an audio into text ?

Compared to traditional solutions, Happy Scribe helps save up to 70% of time spent transcribing interviews, podcasts, et cetera.
Happy Scribe provides free trial for its automated transcription service, after which the price charged is only €0.09 per minute, with an accuracy that’s on par with other tools available.

So how does one transcribe a file on this inexpensive software with a simple interface?
It’s an easy three-step process:

1. Record your audio/video

You do not require sophisticated recording equipments, even your phone can potentially record great audio files. Having said that, equipment does matter, especially when recording in conditions that have background noise, The following tips help generate the best audio/video file, which in turn will generate texts of superior quality.

  •  Indoor recording please!
    As far as possible, try indoor or in-studio recording as noise editing is quite an arduous task. If recording outdoors, use a “dead cat,” “wind muff,” or windshield, these fluffy microphone covers minimize, if not completely eliminate wind noise
  • Lean in principle
    The closer the microphone is to the speaker, the better. Enunciating the words well will ensure better sound fidelity.
  •  Uniformity in speech
    Being the sensitive devices they are, microphones can not pick sound distortions correctly. Therefore, try to maintain balance and uniformity while speaking.


2. Upload

We are quite flexible, Happy Scribe can transcribe from speech to text almost all types of audio and video formats (even ignoring the intro music!). We don’t limit the size of the file that you upload to our service. You can simply upload a file via your computer or you can try the advanced uploader which gives you the option upload your files from a URL, Dropbox, Evernote, Facebook, Box, OneDrive. Using the advanced uploader, you can also record an audio or video file directly on the Happy Scribe website .

Once you’re done uploading simply select the language in which the interview has been recorded (PS: We have +119 languages available!), then create an account and you’re good to go!


3. Proofread it

Proofreading is one of the last steps in the journey of audio or video transcription . Even if all the above steps are followed with care during the process of transcription certain in accuracies can slip into the text. It is therefore strongly advisable to proofread the text completely before it is sent back to the client or customer. It is usually the most effective when conducted after a short break post transcribing the audio. It is an essential step in the whole process as it can resolve any incomprehensible or inaudible words that may have be transcribed incorrectly.

One great tip for proofreading using the Interactive Editor which works very well is doing a quick spell-check in reverse order or reading the text aloud to see if everything is in sync!



Saikruti Kesipeddi (

Akanksha Tiwari (

Sumer Jagda (

Transcription tools to transcribe audio into text

Did you know that transcribing audio and video content can directly improve SEO? Transcribing all those interviews, podcasts, webinars, videos you put online and posting them alongside your content will help you get quick SEO benefit [1]. Whether you’re a PhD candidate or a research analyst conducting multiple focus groups or a series of in-depth interviews, having the conversations transcribed is invaluable for reporting.

Transcription involves listening to a recording of something and typing the contents up into a document, which is then returned to the client, giving them a written record of what’s on the recording.

The time it takes to transcribe a recording depends on several factors like speed at which the people are talking, number of people involved in the conversation, recording clarity (background noise should be minimal) and clarity of the person speaking.

Currently, there are three main transcription options available:

  • Using Automatic Transcription Software
  • Manual Transcription
  • Outsourcing transcripts


Automatic Transcription Software

Happy Scribe is a transcription tool that costs just $.10 a minute (1/10 of the cost of human-powered services). It works with 80 languages and accents and over 4,000 journalists and researchers have already used the service[1]. For 9p per minute, you will upload your recording, receive the transcription back in under 30 minutes, and be given access to an editing tool. This tool allows you to play the recording as you edit, highlighting the text as it goes, and timestamps beside the paragraphs help you easily find your place in the recording to edit the transcription as fast and easily as possible[2].


Manual Transcription

Individuals not technology are creating a transcript. When accuracy is mandatory, like in the medical and legal field or even media companies that require closed-captioning, manual transcription has too many perks to ignore. It ensures that intricate contextual variations are considered and errors are minimised[1]. Although it has a high success rate, the time it takes to transcribe even a small piece of information is a lot. Voice recognition is entirely based on algorithms designed to read the clearly spoken speech by sound patterns to a dictionary-like database in a controlled environment, however, it cannot pick up cultural intonations and variations in dialect as accurately as a manual transcribe would do[2].


Outsourcing and Delegating Transcripts

For transcribing phone calls, or interactions involving two or more speakers outsourcing to a company dealing with conversation or call analytics can prove helpful. Their biggest advantage is the ability to split audio into two channels and deliver lots of surrounding data. It is a quick way of collecting data for further research and use and helps reduce capital investments, costs and overheads. Further, hiring specialists reduces the burden on the company, enabling the entire process to be carried out in a more structured manner.


Conclusion : Representation of audible and visible data into written form is an interpretive process which involves making judgments. Decisions about transcribing are guided by the methodological assumptions underpinning a particular project, and there are therefore many different ways to transcribe the same data[1].




Saikruti Kesipeddi (

Akanksha Tiwari (

Sumer Jagda (


Transcription process in qualitative research

Transcription is an integral process in the qualitative analysis of language data and is widely employed in basic and applied research across several disciplines and in professional practice fields (1).

Due to financial restraints in both educational institutions and for individuals, transcribing audio and video materials that would otherwise be beneficial for the research process are given a backseat or offloaded to inexperienced interns or inferior outsourcing firms. But automatic transcription also has advantages that help obtain a fine result. In this article, we will discuss some benefits of transcription process in qualitative research.

Transcription process benefits in qualitative research

Transcription in qualitative research has made the overall process of interpreting data very simple for researchers. It has helped interviewers by enabling them to read, analyse and interpret information with ease, with text that is precise and concise as well as easily understandable.

Easy Interpretation of Data

Transcripts help in easy interpretation of data, as reading text related to the research/interview makes it easy to collate data and structure it better.

Shared copies for future analysis

Transcription of interviews and qualitative research ensures that copies of the document can be distributed to everyone in the team which would further help in future analysis of data. This also helps fellow researchers to have direct access to data which is extremely valuable and useful. Audio transcription has also helped enhance teamwork, as tasks are shared between co workers.

Inclusion of verbatim comments in the report

Instead of having to listen, pause and type, having the focus group or interview pre transcribed helps generate good quality and in-depth reports.

Enables a follow up and a detailed examination of the events

Taking notes during the event might result in the interviewer missing out on key pieces of information being mentioned. The word-for-word transcription allows one to listen and interpret what is being said more effectively.

Source of reference and Data interpretation

Provides a source of reference for the interviewer while conducting a follow up interview and allows them to be able to go through data and use it at points when he/she could reach a standstill while interpreting data.

This data can also be re-used during the course of other investigations and can be used to draw new conclusions during later studies as well. This has helped the extent of which data can be used and the amount of observations that can be made from the same piece of data.

Quick reporting/Browsing

The process of transcription of qualitative data has made browsing through data a much faster process. A researcher looking for data can simply use commands like “ CTRL-F” to find specific information in the transcript without having to waste time going through the whole text or trying to listen through long audio files to get certain information.

Transcribing is an arduous process, even for the most experienced transcribers, but it must be done to convert the spoken word to the written word to facilitate analysis. And over the last decade, advances in technology have brought about new possibilities in the field of transcription. Clients have the option of automated translation softwares like ‘Happy Scribe’. Happy Scribe will transcribe your audio or video file using voice recognition technology within a few minutes which you will be able to access your transcript through your dashboard.

Authors :

Akanksha Tiwari (
Saikruti Kesipeddi (
Sumer Jagda (

Podcasters: Get your SEO right!

Podcasts are the modern radio, in fact, better. They offer on-demand shows, free information and present huge marketing opportunities beneficial to both advertisers and viewers. Optimal use of SEO for the same can result in significant gains for any business (1).


What is search-engine optimization (SEO)?

To understand the concept of SEO, it is imperative to look at the various parts that it consists of :

Quality of traffic : You can attract all the visitors in the world, but if they’re coming to your site because Google tells them you’re a resource for Apple computers when really you’re a farmer selling apples, that is not quality traffic. Instead you want to attract visitors who are genuinely interested in products that you offer.

Quantity of traffic: Once you have the right people clicking through from those search engine results pages (SERPs), more traffic is better.

Organic results: Ads make up a significant portion of many SERPs. Organic traffic is any traffic that you don’t have to pay for (2).


SEO Tips for your Podcast


It’s all in the Name (Title):
In order to gain attraction, a podcast title must be catchy, precise yet informative at the same time. As it is the first thing that appears, it must make a good first impression in order to push the user to subscribe or listen to the podcast.


Finding the Right Keyword(s)
Keywords help with search queries and tell search engines what the article is about. They should be as short as possible so they can be used in various search criteria.


Use Rich images
Using rich images can help enhance the conversion rate of a user towards your podcast. In order to optimise this, keywords can also be used on the image in order to create a recall factor for the user.


Optimize your site and media for speed
Having highly active social accounts help raise the profile of your content on search engines and social channels. Google+ is an important channel to ensure the content gets ranked highly.


Reasons to have Podcasts Transcribed:


The reading effect:
Reading gives users an option to be able to skim through content and gain an overview of what the transcription is about. Listening to a whole podcast can be time consuming while scanning a document might actually help push them to subscribe to the podcast much faster.


The internet effect:
Transcribing podcasts makes the whole process of sharing on social media extremely easy for users. This not only helps in increasing the reach of the podcast but also ensure a high level of engagement, as well as the ability to highlight a particular section of the podcast.


The accessibility effect:
Transcripts open a new way for the listener to approach the content of podcasts, by reading them. This also caters to the lesser fortunate sections of society like the deaf or individuals with hearing impairment.


“Text is the secret. Converting podcast recordings into a transcript makes it possible for podcast producers to give listeners a better experience.”

Authors :

Akanksha Tiwari (
Saikruti Kesipeddi (
Sumer Jagda (

Happy Scribe’s Bits #001

Hello   👋

Over the past two weeks, we have spent time calling some of you. We have been amazed by the diversity of content that you create and we love the idea that Happy Scribe can help you to produce it. Starting this month, we would like to share with you a short newsletter. The idea is to share the content that YOU produce.


If you want to share your work just reply to this email with a short description and a link to your work. We’ll take care of spreading the word 💖



Opposable Thumbs
Opposable Thumbs is a bi-weekly creative challenge podcast. They invite a guest on each show to talk about what they created. They discuss their successes, failures and some cool tools they love.


🧠  THE NEWS    

History of Voice Recognition
From the Audrey system built in 1952 by Bell Labs to the recent breakthrough of Google, we look at how voice recognition has evolved in the last decades.



We are Happy Scribe #001
We listen to a lot of music! So instead of keeping it for us, we decided to start a playlist. If you have a playlist that you like, send us the link 🙂

Click here to listen >>



Marc Assens and André Bastié


PS: We are hiring a frontend developer if you are interested just reply to this email 🙂

History of Voice Recognition

Voice recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.

Designing a machine that mimics human behavior, especially the capability of speaking and responding to it, has intrigued engineers and scientists for centuries. Speech technologies have witnessed a dramatic transformation, from what started as a speech machine using resonance tubes to Graham Bell’s first recording device to Dictaphone and the first voice synthesizer, Voice Operating Demonstrator  (VODER) to today’s smart virtual assistants like Apple’s Siri or Amazon’s Alexa . Thanks to the advancements in AI, Voice recognition technology is gaining popularity. According to a recent U.S. Cellular survey, 36% of smartphone owners use a virtual assistant daily and 30% use smart home technology daily. This connectivity is expected to increase with the number of devices and sensors predicted to rise 200% to 46 billion by 2021.

The idea is to transform recorded audio into a sequence of words, as an alternative to typing on the keyboard. From helping people with physical disabilities, transcription of interviews, learning a new language or accessing a file via voice commands, speech recognition finds use in a number of applications. Voice recognition systems facilitate the interaction with technology, enabling hands-free requests.


From 1952 to today.

The earliest voice recognition technologies could only comprehend digits. Audrey system, built by Bell Labs in 1952 considered to be the first speech recognition device, recognised only ten digits spoken by a single voice. This was followed by the Shoebox machine, developed by IBM in 1962, which could recognise 16 English words, 10 digits and 6 arithmetic commands.

The U.S. Department of Defence made great contributions towards the development speech recognition systems. From 1971 to 1976, it funded the DARPA SUR (Speech Understanding Research) program, which led to the development of Harpy by Carnegie Mellon that could comprehend 1011 words. At around the same time, the first commercial speech recognition company, Threshold Technology was founded and Bell Labs introduced a system that could interpret multiple people’s voices. In 1978, Texas Instruments introduced Speak & Spell, which was a milestone in speech development because of its use of speech chip, leading to more human-like digital synthesis sound. The development of hidden Markov model, which considered the probability of unknown sounds using statistics proved to be a major breakthrough, it even entered the home, in the form of Worlds of Wonder’s Julie doll.

Faster microprocessors

Thanks to the introduction of faster microprocessors, speech, in 1990, the world’s first speech recognition software for consumers was developed. It was the first continuous dictation software, meaning one did not have to pause between words. In 1992, Apple also produced its real-time continuous speech recognition system that could recognise as many as 20,000 words.

Smart Assistant

By 2001, speech recognition development had hit a plateau, until in 2008, Google emerged with its Google Voice Search application for iPhones. In 2010, Google introduced personalized recognition on Android devices which would record different users’ voice queries to develop an enhanced speech model. It consists of 230 billion English words. Eventually, Apple’s Siri was implemented in iPhone 4S in 2011, which relied on cloud computing as well.

The Breakthrough

A Stanford study revealed that speech recognition is now about three times as fast as typing on a cell phone. Once 8.5%, the error rate has now dropped to 4.9%. These technological advances have given rise to multiple applications like transcription assistant tools including Happy Scribe.

Little Known Facts About Speech Recognition Technology
  1.  Technically speaking, speech recognition goes way back to 1877 when Thomas Edison invented the phonograph, the first device to record and reproduce sound.
  2. When it comes to speech recognition, accuracy is measured by a Word Error Rate calculation, which tracks how often a word is transcribed incorrectly.


Authors :

Akanksha Tiwari (
Saikruti Kesipeddi (
Sumer Jagda (

5 great tools for journalists and writers

Marc and I have curated a list of tools for you that could be useful in your everyday jobs. It includes five tools that we love and that, we believe, could be useful for you too. Don’t hesitate to contact me at to tell us about your favourite tools.

1. OmmWriter, to write your articles

Thomas liked your status, Franck retweeted your last tweet, Lucie sent you an urgent email. When writing, it’s pretty difficult to remain focused for a long period of time. We’re all continuously distracted by social media notifications, emails or random phone calls. Well, not surprisingly this decreases our productivity which annoys us. But what are we doing about it? Not so much in most of the cases.

In the early days of Happy Scribe, we were facing the same issue when we had to write our thesis. We had to optimise our time to make the most of it. OmmWriter has been a game changer for us. This simple and easy to use editor is extremely minimalistic and really increases your writing productivity. In addition, you get nice ambience music while writing and a really addictive little noise when typing.

Price: $7.38


2. Happy Scribe, to transcribe your interviews 

You are probably doing tons of interviews, and you know how tedious can transcribing can be. Research actually shows that it can take up to FIVE hours to transcribe a single hour of interview. This. is. insane.

Happy Scribe is a transcription tool that use voice recognition technology to transcribe interviews from speech to text within just a few minutes. You can transcribe interviews in almost any languages and accents (+119 available to date).

By using automated transcription you can save up to three hours of work when transcribing an interview. We recommend uploading high-quality audio files to ensure that the machine accurately transcribes your interview which will result in less time spend proof-reading it. (in the future add a link to the article on how to record high-quality audio file)

Price: $0.10 per minute

 Happy Scribe

3. TweetDeck, to stay connected

Do you want to have a better overview of what is going on in your industry? Or maybe, you just want to see how your audience react to your last article? Then TweetDeck is the perfect tool for you, you can track specific keywords or links and see all the tweets that are related to the query.

The tool can also be really useful when covering an event. It gives you the ability to see all the tweets published in real time.

Price: Free



4. Office Lens, to scan your documents 

Quite often I get documents or business cards (or even receipts) from other people and to be honest, I don’t really know what to do with it. I have tried most of the scanning apps out there. None of them really convinced me, but last week I tried Office Lens and this was exactly what I needed. It’s super easy to use and entirely free.

Price: Free
Link: Apple Store – Android Store

Office Lens


5. Pocket, to save your reading list 

How many time a day you pass by a great article but don’t have time to read it? Often I would say. Well, I have the same problem, for a few years now, I have been using pocket which comes with a twitter integration and a Chrome extension.
Each time you see an interesting article, you just have to hit the pocket icon and the article will then be available to you within the application in a format optimised for reading and with offline access (Think about how useful this is in boring public transport commute). Moreover, Pocket will recommend you articles that people with similar interest have read. It’s a godsend.


Price: Free