Yor Forger Voice Impression Guide
Yor Forger from Spy x Family has one of the most acoustically interesting voices in recent anime — because she has two of them. The warm, slightly awkward homemaker register and the cold, flat Thorn Princess assassin tone come from the same performer, and the contrast is the entire character. This guide covers what makes that duality work acoustically, how to target it with both performance training and AI voice cloning, DSP settings for both modes, and how to set up the workflow for Discord, OBS, and gaming on Windows.
TL;DR
- Yor’s defining quality is controlled vocal duality: warm and slightly breathy as a homemaker, flat and formant-stripped as an assassin — with no pitch change between them.
- The Japanese dub by Saori Hayami is breathtakingly subtle; the English dub by Natalie Van Sistine is warmer and more approachable for imitation.
- DSP settings can approximate both modes; AI voice cloning captures the specific timbre of each performance.
- Two saved presets — one per mode — let you switch live during Discord calls or streaming.
- VoxBooster’s sub-300 ms AI cloning latency and WASAPI routing make the dual-preset workflow practical in real-time use.
- Training drills focus on breathiness control and formant narrowing rather than pitch work.
Who Is Yor Forger?
Yor Briar — known professionally as the Thorn Princess — is the contract wife and assassin in the Spy x Family series by Tatsuya Endo, which has been adapted into anime by WIT Studio and CloverWorks. She poses as the mother in the Forger family while secretly working as an elite assassin for an organization called the Garden.
The character’s core dramatic tension is that the same person who genuinely struggles with basic cooking and blushes at family dinners can dispatch multiple armed attackers with mechanical precision and zero visible emotion. The voice acting plays this duality honestly — the two registers sound like they share a body but not the same emotional state, which is exactly what makes the impression challenge interesting.
The Two Registers: Acoustic Profile
Homemaker Yor — Warm and Slightly Breathy
In domestic scenes, Yor’s voice has a few consistent qualities:
- Fundamental frequency: Around E3–G3 for speech, roughly 165–196 Hz. This sits lower than most anime female leads and closer to a natural adult female speaking range.
- Breathiness: Saori Hayami builds in a very controlled, subtle breathiness — slightly airy phonation that suggests vulnerability and effort without sliding into obvious vocal fry. Natalie Van Sistine’s English version is slightly more forward and less breathy.
- Formant positioning: F1 and F2 are relatively open — the vowels are rounded and warm, consistent with a voice projecting domestic softness.
- Pacing and dynamics: Slightly uncertain tempo, with small hesitations at emotional transitions. Not flat but not the full expressive range of a Genki-archetype character.
- Emotional tells: Awkward laughs, breathy interjections, and slightly exaggerated pronunciation of words she is reaching for socially — these are performance cues, not signal processing targets.
Thorn Princess Yor — Cold and Flat
When Yor enters operational mode, the transformation is subtle but immediate:
- Fundamental frequency: Unchanged — this is the key insight. The assassin voice does not go lower. The impression that it sounds completely different comes from the other parameters.
- Breathiness: Eliminated. The voice switches from slightly airy to fully modal phonation — efficient, no airflow waste.
- Formant positioning: Narrower and slightly shifted. The openness of the homemaker vowels compresses into a more controlled, less resonant placement.
- Dynamics: Flat. No emotional variation in pitch range; each word at approximately the same intensity level. The evenness is what reads as dangerous.
- Pace: Deliberate and unhurried. No hesitations, no interjections.
The assassin register is not deeper or louder — it is emptier. That is what makes it harder to imitate without understanding it acoustically first.
DSP Settings for Both Modes
The following table gives starting point parameters for both registers. Adjust in 0.5-unit increments and check results on a recording rather than through live monitoring.
| Parameter | Homemaker Mode | Thorn Princess Mode |
|---|---|---|
| Pitch shift | +3 to +4 st (male input) / 0 st (female input) | Same as homemaker |
| Formant shift | +1 to +1.5 st | +0.5 st (tighter placement) |
| Breathiness / air layer | +20 to +30% if available | 0% — fully modal |
| EQ — low shelf | –2 dB below 150 Hz | –3 dB below 150 Hz |
| EQ — presence | +1 dB @ 2–3 kHz | Flat or –1 dB @ 3 kHz |
| Dynamic range | Preserve / slight expansion | Compress slightly — flatten peaks |
| Reverb / space | Small room (2–4%) | Off — completely dry |
The breathiness toggle is the most important control in this table. If your voice software exposes it as a separate parameter (sometimes labeled “air,” “breathiness,” or modeled through phonation mode), it gives you most of the difference between the two modes without touching formants or pitch. If your tool lacks this control, formant tightening alone approximates the effect — tighter formants at the same pitch produce a more clipped, efficient-sounding vowel space.
The reverb hint on homemaker mode is small but meaningful on headphones and in recorded clips — it suggests an indoor domestic space and softens the voice slightly without being audible as reverb.
Voice Impression Training Drills
These drills are for performers working on the impression without software, or building the performance baseline that makes AI cloning output better.
Drill 1 — Breathiness Switch (5 minutes)
Sustain a vowel — any open vowel like “ah” — at comfortable speaking pitch. Practice switching between fully breathy phonation (allow air to escape around vocal folds, producing an airy quality) and fully modal phonation (folds closing efficiently, clean tone). Go back and forth on one sustained note until the switch feels controlled rather than accidental. This is the core mechanical skill the impression requires.
Drill 2 — Flatline Delivery (10 minutes)
Read a paragraph of dialogue — any text — with zero pitch variation. Every syllable at the same fundamental frequency and the same intensity. The goal is not robotic; it is controlled. This trains the assassin register’s defining quality. Most people find this uncomfortable at first because natural speech rises and falls constantly. The discomfort means the drill is working.
Drill 3 — Mode Switch on Single Sentences (10 minutes)
Take a neutral sentence — “I need to pick up something at the store” — and deliver it twice: once in homemaker mode (warm, slightly hesitant, breathy opening vowels) and once in assassin mode (flat, efficient, fully modal). Record both. Listen back and identify which parameters change. This conscious listening is faster than intuition alone for closing the gap between impression and original.
Drill 4 — Hayami Study (20 minutes)
Listen to 10–15 isolated lines of Saori Hayami’s performance in the original Japanese and transcribe the acoustic events: where does breathiness appear, where does it disappear, where do dynamics flatten. The Japanese dub is the harder target but studying it produces a more grounded impression even if you ultimately target the English version. Hayami’s control of phonation mode is one of the technical achievements of the performance.
Saori Hayami and Natalie Van Sistine: The Source Performances
Saori Hayami voices Yor in the original Japanese production. Hayami is known for an unusually controlled use of phonation mode across her roles — the technical term for the difference between breathy, modal, and pressed voicing. In Yor’s case, she uses this to deliver the duality without any explicit signaling to the audience that something has changed; you simply feel it before you can articulate why. That subtlety is what makes the Japanese performance technically demanding to imitate.
Natalie Van Sistine voices Yor in the English dub produced by Crunchyroll. Her performance leans warmer and slightly more forward in resonance placement — useful for the emotional clarity of Western dubbing norms but producing a slightly different acoustic target. The breathiness in the homemaker mode is less pronounced; the assassin flatness is more explicitly clipped. For most people approaching this impression without a strong background in Japanese phonetics, the English dub provides more accessible reference points.
Neither performance is the “correct” target — choose based on which you are more familiar with and which register feels closer to your natural voice production.
AI Voice Cloning Workflow for Yor Forger
AI voice cloning takes the impression from “sounds like a character like her” to “sounds like specifically her.” The process involves sourcing clean training audio, training or finding a pre-trained model, and importing it into your voice software.
Sourcing Training Audio
The best training data for Yor’s voice is isolated dialogue — no music, no sound effects, no overlapping voices. The anime’s episode audio has significant music presence in many scenes; look for clean dialogue-only releases or manually isolate lines using source separation tools. Target at least 20–30 minutes of audio covering both the homemaker register and the assassin register, so the model captures both phonation modes in training.
Separate the modes in your training data labels if possible. Some voice cloning training pipelines support multiple-register training; others produce a single blended model. A blended model is still highly usable — you handle the mode switch with the breathiness and formant parameters in your real-time software.
Finding a Pre-Trained Model
Community voice model repositories have pre-trained models for most major anime characters. Search for “Yor Forger AI voice” or “Thorn Princess voice model.” Evaluate downloads, training notes, and audio samples before choosing. A well-trained model from quality isolated dialogue will outperform your own hastily trained model on limited data.
Importing and Configuring in VoxBooster
VoxBooster supports native AI voice model import on Windows 10/11 without a Python environment. The sub-300 ms latency pipeline runs against your microphone in real time via WASAPI — no virtual cable routing is needed.
- Open VoxBooster and navigate to Voice Models → Import Custom Model.
- Load the
.pthmodel file and the paired.indexfile. - Set pitch offset to match the gap between your voice and Yor’s register (+3 to +4 semitones from a male voice, 0 from a female voice).
- Set index influence to 0.70–0.80. Higher values track the trained voice more tightly — useful when you want the homemaker register’s specific warmth. Lower values blend your own vocal energy, which can be useful in the assassin mode where personality is minimal.
- Save two presets: one with breathiness layer on (homemaker) and one with it off and slightly compressed dynamics (Thorn Princess). Label them clearly.
Switching Modes Live
With two presets saved, switching from homemaker to assassin during a conversation on Discord or OBS is a single click. The audio processing handoff takes one buffer window — imperceptible to listeners. This is the workflow advantage of software-based dual-register setup over pure impression performance, where switching mid-sentence requires complete vocal control.
Yor Forger in the Anime: Narrative Context for Impressions
Understanding why Yor sounds the way she does narratively deepens the impression beyond pure acoustic mimicry. Yor’s homemaker register is not her natural state — she grew up as an assassin and performs domesticity from scratch, which is why Hayami plays it with a slight tension underneath the warmth. She is always slightly effortful in normal life, not because she is uncomfortable with kindness but because she has no stored muscle memory for it.
The assassin register, conversely, is her genuine default — efficient, trained, and devoid of affectation because she has never needed to perform in it. The flatness is not coldness; it is the absence of performance. That distinction, if you internalize it, changes the quality of the impression. The homemaker voice has warmth and strain underneath; the assassin voice has precision but not menace.
For Discord roleplay, streaming roleplay, or cosplay content, playing this dynamic honestly — the slightly effortful domestic Yor and the effortlessly functional Thorn Princess — produces a more interesting performance than just switching between “nice voice” and “scary voice.”
Comparison: DSP vs. AI Cloning for This Impression
| Approach | Homemaker Accuracy | Assassin Accuracy | Setup Time | Latency | Notes |
|---|---|---|---|---|---|
| DSP pitch + formant only | Moderate | Good (flatness is achievable) | Under 5 min | <30 ms | No GPU needed; breathiness control varies by tool |
| AI voice clone, generic female model | Poor–Moderate | Poor | 10–20 min | ~300 ms | Wrong timbre; usable as a starting point only |
| AI voice clone, Yor-specific model | Very good | Good | 20–40 min (or instant with pre-trained) | ~300 ms | Best result; requires quality training data |
| DSP + Yor AI model hybrid | Excellent | Excellent | 30–60 min | ~300 ms | Post-chain breathiness and formant tweaks on top of AI base |
The hybrid approach in the bottom row is the practical recommendation: load a Yor-specific AI voice model as the base conversion, then use VoxBooster’s post-chain DSP controls to toggle breathiness and formant placement for each mode. The AI model handles timbre; the DSP layer handles the mode switch. Neither alone achieves the full result as efficiently.
Setting Up for Discord, OBS, and Gaming
VoxBooster appears as a standard audio input device in Windows after installation. No virtual cable configuration is required — the WASAPI injection layer handles routing directly at the Windows audio API level, with no kernel driver.
Discord: Settings → Voice & Video → Input Device → select VoxBooster. Set Voice Activity threshold or use Push-to-Talk. For AI cloning mode with sub-300 ms latency, push-to-talk provides the cleanest result because the processing window is absorbed in the press-to-speak gap.
OBS: Add a Microphone/Auxiliary Audio source and select VoxBooster as the device. For video synchronization, measure the AI cloning latency with a clap test (clap near mic and webcam simultaneously and measure the offset in the recorded clip). Apply that value as a video offset in OBS Advanced Audio Settings. This keeps your lips and your voice synchronized for your stream audience.
Gaming: In game audio settings, select VoxBooster as the microphone input device. The no-kernel-driver design means no conflicts with anti-cheat software including EAC, BattlEye, and Riot Vanguard.
Ethics and Consent
Using AI voice cloning of real voice actors raises legitimate questions worth addressing directly. Saori Hayami and Natalie Van Sistine are working professionals whose performances are intellectual property.
For personal, non-commercial use — Discord calls with friends, streaming your own gameplay, cosplay events — fan voice cloning of fictional characters occupies a wide-tolerance gray zone. Studios focus enforcement on commercial misuse rather than fan activity.
For any commercial application — monetized video content, sold products, commissioned work using the voice — the ethical and legal position changes significantly. Do not use a cloned voice actor performance for commercial purposes without explicit licensing. The fictional character and the human performance are separate considerations: Yor Forger is a fictional character, but Saori Hayami’s specific vocal performance is her professional work.
The anime voice changer guide covers ethics considerations for AI character voice cloning in more detail.
Frequently Asked Questions
What makes Yor Forger’s voice acoustically unique compared to other anime characters? Yor’s defining quality is her controlled duality — the same vocal tract produces a warm, slightly breathy domestic register and a flat, tonally stripped assassin tone. The switch is not pitch-driven; it is a formant and breathiness toggle. That precision makes her harder to imitate convincingly than high-pitched or deep-voiced characters.
Is the Japanese dub or the English dub easier to imitate for a Yor Forger voice impression? The Japanese dub by Saori Hayami requires careful control of breathiness and restraint — her performance is subtle and technically demanding. The English dub by Natalie Van Sistine sits in a more forward, slightly warmer register that is more approachable for imitation. Most beginners find the English version easier to target with DSP settings.
What pitch shift do I need for a Yor Forger voice impression? Yor’s voice sits lower than most anime female leads — around E3 to G3 for calm speech, approximately 165–196 Hz. For a male voice, that is a modest +3 to +4 semitone shift. For a female voice, little or no pitch shift is needed; the formant target matters more. The assassin mode requires no additional pitch change — only breathiness reduction and formant narrowing.
Can I switch between homemaker and assassin Yor mid-conversation using software? Yes. The most practical approach is two saved presets in your voice software — one for the warm domestic register with slight breathiness and slightly raised formants, one for the flat assassin mode with breathiness removed and formants tightened. Switching takes one click and is seamless enough for Discord or live streaming context switching.
Do I need a GPU to run an AI voice clone for Yor Forger? For DSP-only pitch and formant shifting, any modern CPU handles it under 30 ms. For AI-based voice cloning, a GPU (GTX 1060 class or better) brings latency down to sub-300 ms, which works for push-to-talk and streaming. CPU-only AI inference is possible but adds 500–800 ms, making continuous voice activity impractical.
Is cloning Yor Forger’s voice legal? For personal, non-commercial use — streaming, gaming, Discord roleplay — fan voice cloning of fictional characters sits in a wide-tolerance gray area that studios rarely pursue. For any commercial project: monetized content, products, or services using the voice, consult the guidelines from WIT Studio and Shueisha before publishing.
What is the difference between a Spy x Family voice impression and a Yor voice clone? A voice impression is a performance skill — you train your own voice and delivery to approximate the character. A voice clone uses AI to transform your microphone signal into the target voice in real time. Impressions require no software but take weeks of practice; clones require a trained model and suitable hardware but work immediately.
Conclusion
Yor Forger’s voice impression is fundamentally about controlled duality — two distinct acoustic states produced by the same voice, switching on the same pitch. Getting it right means understanding that the assassin register is not deeper or louder than the homemaker register; it is emptier, stripped of breathiness and dynamic variation. That insight changes the training approach entirely.
For software implementation, the hybrid workflow — AI voice cloning handling timbre, DSP post-chain handling the mode switch via breathiness and formant toggles — produces the most convincing result for both halves of the character. VoxBooster’s dual-preset setup and WASAPI routing make this practical for real-time use in Discord, streaming, and gaming without kernel drivers or Python environment management.
If you want to test the workflow before committing, download VoxBooster and load a community model for the character. The whole setup from install to live Discord use takes under 15 minutes. Check the pricing page to find the plan that fits — plans start at $6.99/month — or start with a free trial to hear the AI cloning quality on your own voice first.