Descript Audio — Guide

Descript Audio — User Guide

Edit podcasts by editing text—remove filler instantly.

Visit website VPN may be required Freemium Sign-up required

Strengths

Text editing is video editing. Deleting text automatically deletes the corresponding video clips.
AI automatically removes slips of the tongue and pause words (um, ah, etc.)
Sound cloning function to correct recording errors without re-recording
Automatically generate subtitles with high accuracy

Best for

Podcast recording and post-production
YouTube video editing and subtitle generation
Online course video production
Organizing and editing conference videos

Text-Driven Video Editing

Descript transcribes video into text and edits video by editing text.

Scenario

Quickly remove slips of the tongue from videos

Prompt example

After importing the video, find the slip of the tongue part in the transcribed text, directly select and delete it

Output / what to expect

The corresponding video clips are automatically deleted, and the editing points transition naturally without the need to manually drag the timeline.

Tips

Turn on the "Remove filler words" function, Descript will automatically mark all "um", "ah" and other stop words, and delete them in batches with one click.

Scenario

Generate and edit subtitles

Prompt example

Import the video, click "Transcribe", select the language as Chinese, and wait for automatic transcription to complete

Output / what to expect

Automatically generate subtitles with an accuracy of more than 90%, errors can be modified directly in the text editor, and the subtitles and video are synchronized in real time.

Tips

Select SRT format when exporting subtitles, which can be directly uploaded to platforms such as YouTube and Bilibili.

Sound cloning and restoration

Use the Overdub feature to clone sounds and fix recording errors.

Scenario

Correct the incorrect content in the recording

Prompt example

Select the text that needs to be modified, enter the correct content directly, and click "Regenerate with Overdub"

Output / what to expect

AI regenerates the audio segment using the cloned voice, seamlessly blending with the original recording without the need for re-recording.

Tips

Voice cloning requires recording 10 minutes of training material first, and the effect will be very natural after the training is completed.

Sources & references:

Descript official website (2025-01)