Actual play podcasts have become one of the most demanding audio production formats in independent media. A single GM narrates every NPC, controls pacing, manages rules, and keeps 100-episode story arcs coherent — all while recording in real time. A voice changer for actual play podcast production solves the hardest part of that job: making a cast of characters sound genuinely distinct when they are all coming from the same person.
This guide covers the full workflow: AI cloning for persistent NPC voices, soundboard for ambient props and music, noise suppression for home-studio recording, and multi-track routing through Discord and Riverside. Whether you are running a D&D 5e homebrew campaign or a Pathfinder 2e Adventure Path, the same principles apply.
TL;DR — Actual Play Voice Workflow at a Glance
| Need | Tool feature | Why it matters |
|---|---|---|
| Distinct NPC voices | AI voice cloning | One GM, dozens of recognizable characters |
| Persona consistency across seasons | Saved voice profiles | Same timbre in episode 1 and episode 112 |
| Ambient props and stingers | Soundboard | Tavern noise, thunder, combat cues in one keypress |
| Clean dialogue capture | Noise suppression | Strips HVAC, dice, keyboard from live signal |
| Platform compatibility | WASAPI routing | Works transparently with Discord and Riverside |
| No driver install | WASAPI interception | Runs on Win 10/11 with zero virtual cable setup |
If you want to skip straight to setup: download VoxBooster and read the Discord setup guide.
Why Actual Play Is the Hardest Use Case for Voice
Most voice changer guides are written for gamers pulling pranks on friends. Actual play is categorically different. The demands that separate it from casual use are:
Sustained character consistency. A game session runs three to four hours. A season runs a hundred sessions. The gnome merchant you voiced in episode three needs to sound the same in episode eighty-nine. That requires voice profiles, not just a pitch slider you eyeball differently each week.
Multiple simultaneous characters. A GM in a D&D or Pathfinder campaign regularly runs four to ten NPCs in a single encounter. Switching between them must be fast enough not to break the scene — ideally under a second, inaudible to the audience.
Live performance pressure. Actual play is theater. Lag, artifacts, and hardware glitches happen on camera or in a live stream. The voice changer has to be rock-solid. A 500ms clone that occasionally stutters is fine for a solo TikTok; it kills a live D&D session.
Post-production integration. Multi-track recording tools like Riverside and Zencastr capture each participant on a separate track. The voice changer signal needs to arrive on the correct track, cleanly, without routing artifacts that complicate editing.
AI Voice Cloning for NPC Characters
The central feature for actual play work is AI voice cloning — the ability to train a voice model on a short sample of your voice in character and then reproduce that character’s voice from whatever you say in real time.
How it works in practice
You record 30 to 60 seconds of yourself speaking as the character. The AI model learns the distinctive formants, resonance, and tonal envelope of that performance. From that point forward, when you speak into the microphone, the system maps your live voice onto the trained profile in real time — under 300ms in low-latency mode on typical hardware.
The result is that you can:
- Speak in your normal voice and have a gruff orc warlord come out the other end
- Switch to a different profile mid-scene to voice a completely different NPC
- Return to the first profile later in the session with identical timbre
Profile management for long-running campaigns
A serious actual play campaign might have thirty or forty recurring NPCs. The workflow that holds up over a hundred episodes is:
- Create a named profile for each character when they are introduced
- Back up profile files to cloud storage after training
- Assign keyboard shortcuts to the five or six NPCs most likely to appear in any given session
- Keep the rest accessible in a sidebar list for occasional characters
This discipline pays off in year two of a campaign, when a character the players haven’t seen since episode twelve reappears and sounds exactly right without any fresh training.
Soundboard for Ambient Props and Musical Stings
A soundboard is the second core tool in an actual play setup. Critical Role and similar productions use ambient audio to signal scene transitions, underscore dramatic moments, and reward player actions with immediate audio feedback.
The production use cases break into three categories:
Ambient loops. Tavern murmur, dungeon drip, forest wind — these run under the voice track and set scene without requiring a dedicated musician on the call. Triggered at scene start, faded when the party moves on.
Stingers and one-shots. Thunder crack, door slam, combat chord — these fire on a keypress and play once. Timing is everything; a well-placed thunder clap half a second after the villain’s monologue reads as production value, not a gimmick.
Musical cues. Full music tracks for boss fights, mystery reveals, and emotional scenes. In a full production like Critical Role these are live, but for independent shows a curated soundboard library covers the same emotional territory.
Soundboard hardware and hotkey layout
The ergonomics of triggering a soundboard during live play matter. You are simultaneously describing a scene, voicing an NPC, and tracking initiative. A soundboard that requires you to click through menus will not get used.
The standard setup for actual play:
- Assign ambient loops to one row of function keys
- One-shot stingers to a second row or numpad
- Keep the soundboard open on a second monitor or a Stream Deck with labeled keys
For recording sessions on Riverside or Zencastr, route soundboard output to a separate virtual channel so it can be balanced independently in post — or cut entirely if it interferes with the edit.
Noise Suppression in Home-Studio Actual Play Setups
The majority of independent actual play podcasts record in home studios — spare bedrooms, basements, home offices. These spaces have HVAC noise, computer fan hum, street traffic, and the incidental sounds of the game itself: dice on a table, book pages turning, players shifting in their chairs.
Real-time noise suppression processes the microphone signal before it reaches the recording or streaming platform. The practical outcome:
- HVAC hum is gone from the podcast feed
- Dice rolls don’t pop into the foreground when the room goes quiet
- Keyboard sounds during note-taking don’t appear in the audio
- The live stream sounds like it was recorded in a treated room even when it wasn’t
For multi-player sessions where participants are in different locations and joining via Discord, noise suppression on each end is particularly valuable — one player’s mechanical keyboard doesn’t bleed into everyone else’s track.
Routing for Discord and Riverside Multi-Track Recording
Discord
Discord is the most common platform for geographically distributed actual play groups. The voice changer hooks into the Windows audio subsystem via WASAPI so Discord captures the transformed voice from your real microphone input — no virtual device selection needed in Discord audio settings.
This matters because Discord occasionally resets audio device selections on major updates, and virtual microphone devices can be flagged as lower-priority in some server audio quality configurations. A WASAPI-level intercept is invisible to Discord and update-proof.
For full-party recording sessions, use Craig bot or Riverside’s multi-track mode to capture each participant on a separate track. The GM’s voice-changed track lands on its own stem, which makes editing — cutting takes, adjusting NPC levels, removing mistakes — straightforward in post.
Riverside
Riverside.fm records lossless audio locally on each participant’s machine and uploads after the session. This means the voice-changed signal captured locally is what Riverside sends, not a re-encoded stream. Quality is preserved end-to-end.
The recommended setup for an actual play session on Riverside:
- Run voice changer with WASAPI routing active
- Select your real microphone in Riverside — the already-processed signal arrives
- Route soundboard to a separate output channel if available, or manage it post-session
- Enable local recording backup on all participant machines in case of upload failure
Comparison: Voice Changer Approaches for Actual Play
| Approach | Persona consistency | Switch speed | Latency | Setup complexity |
|---|---|---|---|---|
| AI voice cloning (profile-based) | Excellent — saved profiles | Under 1 second | 100–300ms | Medium (training required) |
| Pitch shifter only | Poor — manual per session | Instant | <20ms | Low |
| Pitch + formant shifter | Moderate — approximated | Instant | <30ms | Low |
| Real-time AI cloning + WASAPI | Excellent | Under 1 second | Sub-300ms | Medium |
For actual play specifically, pitch shifting alone does not solve the persona consistency problem. Two characters with different pitches still sound like the same person on different days unless formants and resonance are shaped by a trained model.
Internal Links — Going Deeper
If you are building out a full actual play production stack, these guides cover adjacent topics:
- Best voice changer for Discord — platform-specific routing, PTT behaviour, Krisp interaction
- AI voice changer overview — how the underlying cloning technology works
- Best soundboard software 2026 — dedicated soundboard comparison if you want a standalone tool
- Epic narrator voice tutorial — voice performance tips that apply directly to GM narration
- Discord voice modifier — deeper Discord-specific configuration reference
External Resources
- Actual play — Wikipedia — history and format overview
- Critical Role Productions — the benchmark actual play production
- Riverside.fm — multi-track remote recording platform widely used in actual play production
What VoxBooster Adds to This Workflow
VoxBooster handles the technical layer of this workflow on Windows 10 and 11:
- WASAPI audio routing so Discord and Riverside capture transformed audio without virtual device setup
- AI voice cloning with sub-300ms latency for live NPC switching mid-scene
- Integrated soundboard with hotkey triggers for ambient props and stingers
- Real-time noise suppression that cleans home-studio recordings before they reach the recording platform
- No kernel driver installation — runs without elevated permissions, no BSOD risk from driver conflicts
At $6.99/month it fits independent creator budgets. The voice cloning and soundboard are included in the base plan — no separate add-on fees.
FAQ
Can one person voice multiple distinct NPCs live without stopping the session? Yes. With AI voice cloning you build a voice profile for each recurring NPC and switch between them in under a second. The GM speaks naturally and the cloned voice outputs in real time — players hear Gornak the orc and Lady Veth as distinct characters without any break in pacing.
What latency is acceptable for a live actual play recording session? Under 150ms is ideal for live roleplay. Sub-300ms is the practical ceiling for AI cloning without audible lag between your mouth and what Discord or Riverside captures.
Do I need a virtual audio cable for Discord or Riverside recording? Not if you use a voice changer that hooks into the Windows audio subsystem directly. VoxBooster routes transformed audio through WASAPI so Discord and Riverside see your real microphone and capture the already-processed signal.
How do I keep the same NPC voice consistent across a 100-episode season? Save each NPC as a named voice profile and back up the profile files. A profile trained on 30–60 seconds of your voice in character locks the timbre, resonance, and cadence permanently. Load it at session start for identical output every time.
Will a soundboard interrupt the recording on Riverside? Route ambient props and music to a separate mix-minus output so the host track stays clean. The soundboard layer can then be mixed or cut in post without affecting dialogue.
Does noise suppression help in home-studio actual play setups? Significantly. Real-time noise suppression strips HVAC hum, keyboard clicks, dice rolls, and paper shuffling from the mic signal before it reaches Discord or Riverside, saving hours of cleanup in post.
Is a voice changer legal to use on Critical Role-style productions? Yes. Voice processing is a standard production technique. There are no platform rules on Twitch, YouTube, or podcast hosts that prohibit voice effects on your own voice.
An actual play podcast is a long-form creative commitment. The production infrastructure you build in season one has to hold up through season three. Getting the voice changer workflow right from the start — AI cloning for character consistency, soundboard for atmosphere, noise suppression for clean audio, WASAPI routing for platform compatibility — means you are solving engineering problems once instead of patching them every few episodes.
Download VoxBooster and set up your first NPC voice profile before your next session.