The 9 Finest Speech-to-Textual content Apps in 2023 (Tried & Examined)

Most speech recognition apps don’t have any hassle transcribing a local speaker being recorded with a professional microphone in a quiet room. This isn’t a problem.

So to check them extra totally, I created a nightmare recording of two non-native audio system with loud metropolis background noise.

How did they fare?

Let’s discover out. homepage

Otter was one of the vital incessantly talked about options once we requested for solutions on Twitter and within the Ahrefs neighborhood. And for good motive. It’s simple to arrange, has an intuitive interface, and affords clear pricing.

Distinctive options

What stands out from the remaining is the app’s potential to report on-line conferences and transcribe them—just by pasting the assembly URL. However you too can import a video/audio file or report audio proper within the app.

Moreover, you’ll be able to join your calendar to by no means miss a gathering.

Transcript high quality

I bought respectable outcomes, however there was lots to edit too.

It didn’t get some names proper. However I can’t blame any device for not selecting up “Ahrefs” or “Tim Soulo” 100% of the time. transcription results

One factor I discovered is that after it notified the transcriptions had been prepared, it’d nonetheless do one thing within the background (modify time stamps, tag audio system, and so forth.). Like a pupil nonetheless scribbling on a take a look at paper whereas passing it to the instructor.


You can begin totally free and improve to a paid plan later. You may import as much as three recordsdata and report 290 minutes of conferences earlier than it’s worthwhile to improve (as of April 2023). homepage

Organising an account was a no brainer. I discovered the interface simple to navigate as properly. One private comment is that it felt slightly too “chilly” to make use of since I noticed issues like “Place Order,” “Billing,” and “Bill” means too usually. 

You would possibly get an impression that it was designed by an accounting crew (versus Descript that comes subsequent on this roundup).

Distinctive options

Moreover auto-generated transcripts, Rev affords dwell captions for Zoom conferences. You even have the choice to put an order for human transcriptions.

Transcript high quality

Poor audio with metropolis noise was a bit an excessive amount of for Rev. Some phrases had been lacking, whereas others had been misrecognized. Consequently, some paragraphs didn’t make a lot sense, whereas others had been high quality. transcription results


You may transcribe the primary audio file (as much as 45 minutes) totally free. I bought a invoice for $1.25 with a reduction that resulted in a complete of $0.00. Thanks, accounting crew. 😉

Rev additionally has a 14-day trial of its paid plan. However that was tough to seek out. To find it, it’s worthwhile to go to the footer of the homepage and search for it underneath “Providers.”

Footer of the homepage, via
Descript's homepage

Descript welcomed me by title (which was a pleasant coincidence). The principle factor it’s important to know is that it’s a standalone software program reasonably than an online service. It’s way more than a speech-to-text converter. It’s principally a video enhancing device. And there’s undoubtedly a studying curve. However fortunately, onboarding is extraordinarily humorous and fascinating.

Descript's onboarding is interactive and engaging

Distinctive options

As I discussed, Descript is extra of a video enhancing device that’s good with transcribing. I’d name it “Canva for video/captions.” You may add B-rolls, results, animations, and extra.

You may simply drag and drop and principally produce an entire video with its assist. However for those who simply want a transcript or captions of a video or audio, you are able to do that too.

Transcript high quality

My pattern audio had fairly muddy outcomes. At occasions, it had issue recognizing abbreviations (e.g., search engine optimisation). I additionally had an issue with eradicating filler phrases like uh and um.

I discovered that if I didn’t select an choice to take away them, they, um, simply stayed there despite the fact that I didn’t want them more often than not. But when I did select to take away them, it often ate up elements of different phrases, inflicting much more hassle.

Additionally, it couldn’t acknowledge elements {that a} human being would don’t have any drawback understanding simply from context, e.g., “Jack of all trades” turned ‘“jackal, trades.”

On the intense facet, I consider you’ll be able to nonetheless perceive what the textual content is about.

Descript transcription results


You can begin with fundamental capabilities totally free and improve if wanted.

MacWhisper app on

MacWhisper is a transcription device powered by Whisper. It’s an computerized speech recognition (ASR) system developed by OpenAI, the identical firm that introduced us ChatGPT.

As OpenAI states on its web site:

Whisper is skilled on 680,000 hours of multilingual and multitask supervised information collected from the internet.

Whisper will not be one thing you’ll be able to merely “run” as is. What’s extra, it’s fairly sophisticated to arrange for those who do wish to run it your self. Github, Python—you get the gist.

Fortunately, there are instruments like MacWhisper that take this off your shoulders and allow you to use the ability of AI in a easy consumer interface.

Distinctive options

Simply plain speech-to-text recognition with time stamps. Sadly, it doesn’t auto-tag the audio system.

Transcript high quality

If you run the device, it’s important to select a “mannequin” to work with. Principally, the lighter the mannequin, the faster it should run. However bigger fashions will produce higher outcomes. Additionally, in MacWhisper, these bigger (higher however slower) fashions are solely out there within the paid model.

I made a decision to begin with the free “small” mannequin, which was acknowledged to have “regular velocity with good accuracy.”

It was OK, however no higher than the opponents. I assumed it could work high quality with high-quality audio, however not with the horrible examples I fed to it.

“AI is overrated,” I believed. However earlier than closing the Mac and switching again to my pricey Home windows PC, I made a decision to offer the “massive” mannequin a attempt.

And what, AI will not be overrated. I discovered the outcomes to be significantly better than the rest.

The transcript was actually, actually good. It even bought issues like “Ahrefs” and “SaaS” proper! Although nonetheless not 100% of the time.

MacWhisper transcription results


You may run smaller fashions totally free. For a big mannequin, you’ll have to buy a license.

AI Transcriptions by Riverside homepage

This device is the simplest to make use of. Merely drag and drop your file—then it’s prepared. It takes a while to course of, although.

Distinctive options

Nothing in addition to downloading a transcription.

Transcript high quality

My first impression was that the outcomes had been excellent as a result of, visually, it delivered a confident-looking textual content:

AI Transcriptions by Riverside transcription results

However after proofreading, I noticed that it merely didn’t embrace the elements it failed to acknowledge—typically a number of phrases in a row.


It’s free to use.

Adobe Premiere Pro homepage

Premiere Professional will not be precisely a “transcription device” however reasonably a video enhancing software program. I’m together with it as a result of I assume that some corporations might have already got it of their arsenal (like we do).

To get to the transcription characteristic in Premiere Professional, simply go to the “Captions and graphics” workspace and click on “Create transcription.”

Premiere Pro interface—you can generate transcriptions in the "Captions and graphics" workspace

Distinctive options

If we take solely speech recognition under consideration right here, what it does properly is creating exact time stamps, auto-tagging the audio system and, if wanted, routinely including an editable captions observe to a video challenge.

Transcript high quality

Let’s be simple: I discovered the noisy audio transcript to be a failure. I couldn’t comprehend what individuals had been speaking about within the first place.

Adobe Premiere Pro transcription results

Nonetheless, I believe this characteristic may be actually useful if you’re creating captions from high-quality audio. I used it myself a number of occasions and had nothing to complain about when the recording high quality was good.


You want an Adobe Inventive Cloud subscription to make use of Premiere Professional. homepage

Whereas signing up and importing recordsdata is reasonably simple, it’s important to spend a while answering questions on you and your organization earlier than you’ll be able to lastly get to the device itself. And no, you’ll be able to’t skip typing in your organization title, your position, and your organization measurement.

However when you get by way of this, the interface is clear and intuitive.

Distinctive options

You may generate a transcript or captions for video or audio. There may be additionally an choice to request a handbook assessment of the transcript. Alternatively, you’ll be able to generate subtitles in a special language, so you will have transcription and translation in a single click on.

Happy Scribe features include transcription, subtitles, and foreign language subtitles

Transcript high quality

Joyful Scribe did a extremely good job transcribing the audio. It had no drawback with phrases like “search engine optimisation” and “SaaS” (clearly the weakest level for a lot of instruments). It might additionally auto-tag the audio system, which is likely to be useful in sure conditions.

Happy Scribe transcription results


I might take a look at one file totally free. After that, I would wish to purchase credit for use for every minute of video or audio transcribed. homepage

Sonix is a device for computerized transcriptions, translations, and integration with assembly apps.

Distinctive options

Moreover conferences integration, which is nearly a given for many instruments, AI abstract era is an attention-grabbing characteristic (in beta as of April 2023.) However I already bought spectacular outcomes from it.

AI summary from Sonix

You additionally get some additional instruments to work with video captions—a timeline view and an choice to separate captions into a number of strains. You may as well import an current transcript, and Sonix will sync it with the audio.

Transcript high quality

Sonix has a customized vocabulary characteristic. I discovered that helped a bit with names like “Tim Soulo” and “Ahrefs,” nevertheless it didn’t work 100% of the time. It principally did properly. However at occasions, it mistook search engine optimisation for CEO and returned the phrase “Excel” seemingly out of nowhere.

The transcript made sense normally however required various edits if it wanted to be excellent. transcription results


Sonix has a free trial for 25 minutes of transcriptions. After that, it’s worthwhile to buy pay-as-you-go credit or get a subscription. homepage

Notta is one more transcription service that works for each real-time conferences and current recordings.

Distinctive options

Moreover transcription, Notta focuses on streamlining sure workflows and affords options similar to calendar sync and scheduler (in beta as of April 2023).

Transcript high quality

Background noise and poor audio high quality weren’t deal breakers for Notta. The transcription outcomes turned out principally OK however nonetheless had some issues. transcription results

Sentence construction was typically a bit bizarre, sure phrases went lacking, and my favourite “Jack of all trades” half wasn’t that neat this time.

Inconsistency in Notta's transcription

One other factor price noting is that, for some motive, it failed to acknowledge two audio system, and the entire interview was tagged as “Speaker 1.”


You can begin with a free fundamental subscription and check out a three-day trial of the paid plan, Notta Professional.

Remaining ideas

As you’ll be able to see, there are many instruments to select from. Nonetheless, evidently OpenAI stirred issues up a bit by releasing a free ASR (computerized speech recognition) system, which I discovered to be significantly extra succesful than others.

However pure speech recognition high quality is only one issue. Possibly you do have to report your Zoom conferences (Otter), work with captions in a big video challenge (Premiere Professional), or rapidly create a Canva-style video (Descript).

Additionally, I have to stress that I used to be attempting to push these instruments to the sting by giving them the worst-case situation recording. For extra pure makes use of, the variations within the final result is likely to be a lot much less noticeable.

It’s nice to see that there are such a lot of choices on the market, and I hope this assessment will assist a bit find the one that’s excellent for you.

Received questions? Ping me on Twitter.


Leave a Reply

Your email address will not be published. Required fields are marked *