Gollum Voice Impression: Master Sméagol’s Raspy Hiss
The Gollum voice impression is one of the most recognized and technically challenging character voices in modern pop culture. Thin, wet, conspiratorial — it lives at the back of the throat in a register that sits somewhere between a hiss and a cough. Andy Serkis spent years refining it for Peter Jackson’s Lord of the Rings trilogy, and what resulted became a masterclass in split-personality vocal performance. This guide unpacks exactly how that voice works anatomically, what DSP chain recreates it in software, and how to use AI voice conversion to take your impression far beyond what pitch knobs alone can achieve.
TL;DR
- The Gollum voice is built on back-throat constriction, heavy sibilance, and wet vocal fry — Serkis sourced the inspiration from his cat coughing up a fur ball.
- Gollum and Sméagol are two distinct voices layered over the same character: raspy conspiratorial hiss vs. higher childlike pleading.
- DSP preset: −2 pitch, −1 formant, heavy distortion with ring-mod shimmer, extended sibilant reverb.
- AI voice cloning captures the wet timbre qualities that knob-based DSP cannot fully replicate.
- VoxBooster routes both approaches through a virtual microphone to Discord, OBS, or any Windows app.
- Attempting the physical technique risks vocal strain — warm up, hydrate, and keep attempts short.
The Origin of the Gollum Voice: A Cat, a Cough, and a Character
When Andy Serkis was cast as Gollum, director Peter Jackson wanted something genuinely unsettling — not a stock evil voice, not a theatrical villain baritone. Serkis found the key when he watched his cat cough up a fur ball. The sound was visceral: a strangled, involuntary constriction deep in the throat, producing a wet, rattling expulsion of air. Serkis took that physical sensation and turned it into a controlled performance technique.
The mechanism involves partial constriction of the pharynx and the back of the tongue pressing upward toward the soft palate. This narrows the vocal tract above the larynx, creating a turbulent airflow that generates the raspy, hissing quality. Combined with heavy modal fry at the glottal level, the result is a voice that sounds simultaneously ancient, tortured, and unnervingly alive.
Crucially, Serkis did not just perform one voice — he performed two. Gollum and Sméagol represent the same creature’s split psyche, and each half has a distinct acoustic signature. This split-personality dual voice is what makes the character so compelling, and it is what makes the impression genuinely difficult to pull off convincingly.
The full motion capture performance extended across all three Lord of the Rings films, with Serkis performing on set alongside the other actors so they had a real voice to react to. The voice you hear in the final film is Serkis’s own performance, processed only lightly in post — the character was not generated artificially.
Anatomy of the Gollum Voice: Acoustic Breakdown
Understanding the acoustic components lets you target them precisely with both technique and technology.
Pitch and Register
Gollum speaks in a mid-low range, roughly 100–140 Hz for the fundamental. This is notably not dramatic bass — the intimidating quality comes from texture, not depth. Men with average speaking voices need only minor downward pitch adjustment (−1 to −3 semitones). Women need slightly more (−4 to −6 semitones) to reach the same fundamental range. Sméagol shifts upward by roughly four to six semitones relative to Gollum, landing in a thin, higher register that reads as childlike vulnerability.
Vocal Fry and Glottal Constriction
Heavy vocal fry — partial vibration of the vocal folds at low amplitude — underlies the Gollum voice throughout. In DSP terms, this appears as strong subharmonic content (frequencies below the fundamental) and irregular amplitude modulation. A ring modulator set to a low carrier frequency (30–50 Hz) can approximate this shimmer in a voice changer chain.
Sibilance: The “My Preciousss” Effect
The extended sibilance on words ending in ‘s’ is Gollum’s most imitated feature. Serkis deliberately elongates the tongue-to-palate friction on sibilant consonants, allowing the turbulent air to decay slowly rather than cutting off sharply. In a processing chain, this can be emphasized with a long-tailed reverb on the high-frequency band (above 4 kHz) or a multi-tap delay with a very short offset (8–12 ms) that smears the ‘s’ without introducing echo on vowels.
Breathiness and Wetness
Both Gollum and Sméagol carry a wet, slightly “slobbery” quality — the sound of a creature that lives in caves and does not modulate speech for social presentation. In a microphone recording, this partly comes from closer mic placement (2–5 cm) that captures oral moisture sounds. In software, a parallel signal with subtle chorus at low depth and very slow rate adds organic textural complexity without artificial tuning artifacts.
Formant Positioning
Gollum’s formants sit in an unusual position because the constricted pharynx shifts the second formant (F2) downward while keeping the first formant (F1) relatively stable. This creates a “hollow” mid-throat resonance. A formant shift of −1 to −2 semitones captures this reasonably well in software.
Gollum vs. Sméagol: The Dual Voice in Practice
The split-personality performance is the heart of the Gollum impression. Here is how the two voices differ across every technical dimension:
| Parameter | Gollum | Sméagol |
|---|---|---|
| Pitch shift | −2 semitones | +3 semitones |
| Formant shift | −1 semitone | +1 semitone |
| Vocal fry / distortion | Heavy (60–70% drive) | Light (15–25% drive) |
| Sibilant tail | Long (120–150 ms reverb on HF) | Short (30 ms) |
| Breathiness | Low-moderate | Moderate-high |
| Emotional tone | Conspiratorial, suspicious, predatory | Pleading, fearful, innocent-sounding |
| ”Ring-mod shimmer” | Yes (40 Hz carrier) | No |
| Compression ratio | 6:1 (flat, punchy) | 3:1 (dynamic, expressive) |
| Typical phrase examples | ”My preciousss…”, “We hates it" | "We wants to go home”, “Sméagol will find the way” |
The transition between them should feel abrupt and startling — a physical gear-shift mid-sentence. On a voice changer, map each preset to a separate hotkey so you can toggle in real time during roleplay or streaming.
Physical Technique: How to Attempt the Voice Yourself
Before reaching for software, understanding the physical mechanics helps you blend performance with processing for a more natural result.
Positioning the Constriction
Pull the back of your tongue slightly toward the soft palate, narrowing the pharyngeal space. Do not push from the front of the throat — this strains the larynx. The sensation should be in the upper-back mouth area, similar to the position you hold when fogging a mirror from distance. Breathe through this constricted space while voicing.
Adding the Fry Layer
Once you have the pharyngeal constriction, drop your larynx gently and speak at the low end of your comfortable register. You should feel a crackling, irregular onset to each vowel. This is modal-to-fry register mixing — the quality Gollum uses throughout.
Elongating the Sibilants
On any word ending in ‘s’, allow the tongue to linger against the alveolar ridge slightly longer than normal. Let the air hiss slowly to silence rather than cutting it off. For “my preciousss,” emphasize the final decay by gradually reducing airflow pressure rather than stopping the ‘s’ abruptly.
Sméagol Switch
To switch to Sméagol, release the pharyngeal constriction, raise your larynx, and add a slight upward inflection to sentence ends. The voice becomes lighter and more forward-resonant — place it in the front of the mouth rather than the back.
Health note: Sustained back-throat constriction and forced vocal fry can cause hoarseness, soreness, and in prolonged sessions, vocal fatigue or minor mucosal swelling. Warm up with gentle humming beforehand, drink water frequently, and limit continuous impression attempts to one to two minutes per session. Stop immediately if you experience pain, a sharp feeling in the throat, or loss of voice. This technique is not suitable for people with existing laryngeal conditions.
DSP Chain: Recreating the Gollum Voice in a Voice Changer
A voice changer with a flexible DSP chain can approximate the Gollum voice convincingly for casual streaming and gaming. Here is a complete starting configuration:
Gollum Preset
- Noise Gate — threshold −40 dBFS, attack 5 ms, release 100 ms. Removes background hiss that gets amplified by subsequent distortion.
- Pitch Shift — −2 semitones. Subtle, not dramatic.
- Formant Shift — −1 semitone. Adds the hollow mid-throat resonance.
- Ring Modulator — carrier frequency 40 Hz, mix 18%. Introduces the irregular shimmer of heavy vocal fry.
- Harmonic Distortion — drive 65%, soft-clip curve. Adds the rasp. Avoid hard clipping, which sounds digital rather than organic.
- High-Frequency Reverb — pre-delay 0 ms, decay 130 ms, applied only to 4–12 kHz band. Smears sibilants without adding room sound to vowels.
- Compressor — ratio 6:1, attack 8 ms, release 60 ms, mild makeup gain. Flattens dynamics to the flat, controlled delivery Gollum uses.
Sméagol Preset
- Same Noise Gate.
- Pitch Shift — +3 semitones.
- Formant Shift — +1 semitone. Brightens resonance.
- Harmonic Distortion — drive 20%, light overdrive curve.
- High-Frequency Reverb — 30 ms decay. Much shorter sibilant tail.
- Compressor — ratio 3:1, longer attack (25 ms). More dynamic, expressive.
AI Voice Conversion: Going Beyond DSP
DSP effects approximate the Gollum voice by shaping the signal you produce. AI voice conversion goes further by transforming your voice into a model of the target timbre — capturing the specific wet, constricted resonance that ring modulators and distortion can only hint at.
VoxBooster’s custom AI voice cloning uses a trained conversion model that runs entirely on your local machine (Windows 10/11, no cloud required). You record a short reference sample, the model encodes its timbre, and real-time inference converts your speech with sub-300 ms latency — imperceptible in conversation. There is no kernel driver involved; the virtual audio device appears in Windows through WASAPI like any standard microphone input.
The Whisper-based voice activity detection built into VoxBooster ensures clean boundaries between speech and silence, so the wet throat artifacts in the model do not bleed into quiet segments and produce unnatural noise.
For a Gollum impression specifically, AI conversion combined with a light DSP layer (−1 formant, gentle sibilant reverb) tends to produce the most convincing result because the AI model carries the timbre load while DSP handles the acoustic-space cues that models are less consistent at rendering.
Streaming and Roleplay Setup
Discord
- Open VoxBooster and activate the Gollum preset.
- In Discord Settings → Voice & Video, set Input Device to VoxBooster Virtual Mic.
- Disable Discord’s noise suppression (it can strip the intentional textural quality of the Gollum voice — the “noise” is part of the character).
- Map Gollum / Sméagol hotkeys in VoxBooster so you can toggle mid-conversation.
OBS and Streaming
- In OBS, add an Audio Input Capture source.
- Set Device to VoxBooster Virtual Mic.
- Add a Filters chain in OBS: Gate → high-shelf boost at 3 kHz (+2 dB) for consonant clarity → moderate limiter to prevent clipping.
- If you stream with facecam and want the dual-personality effect visually, consider a push-to-talk toggle so your “true voice” can narrate between character segments.
Virtual Tabletop and Roleplay Games
Games like Foundry VTT, Roll20, or Tabletop Simulator read from your system default microphone or a configurable input. Point them to the VoxBooster virtual device. For D&D roleplay where Gollum is an NPC, switching between presets live adds genuine theatrical impact that a static text description cannot match.
Common Problems and Fixes
Voice sounds too electronic or robotic Reduce ring modulator mix to under 15%. A ring modulator that is too prominent overwhelms the organic vocal qualities. Also ensure the harmonic distortion is using a soft-clip or saturation algorithm rather than hard-clip.
Sibilants are too harsh or piercing The high-frequency reverb tail may be too long or too bright. Lower the reverb decay to 80–90 ms and apply a gentle high-shelf cut (−2 dB at 8 kHz) after the reverb insert.
Sméagol sounds the same as Gollum Ensure the pitch differential is at least +4 to +5 semitones between presets, and that the Sméagol preset has significantly reduced distortion drive. The emotional quality also matters — consciously adopt the pleading, upward-inflecting delivery even with software doing the heavy lifting.
Latency is noticeable in fast-paced gaming Switch to the DSP-only preset (turn off AI conversion). Pure DSP runs under 20 ms end-to-end in VoxBooster. Reserve AI conversion for lower-latency-tolerance contexts like roleplay streams.
My physical voice is getting hoarse after attempts This is a warning sign. Stop performing the voice, rest your vocal cords for at least 24 hours, stay hydrated with warm (not hot) liquids, and rely on the software to do the heavy lifting rather than trying to match the character through physical effort alone. The software exists precisely to spare your voice the strain.
Why the Gollum Voice Still Resonates
More than two decades after The Fellowship of the Ring, the Gollum voice remains one of the most frequently imitated sounds in pop culture — at conventions, in gaming, in online communities, in meme content. Part of what makes it endure is that it is not merely a “funny voice.” The dual Gollum/Sméagol dynamic is a shorthand for internal conflict, obsession, and fractured identity. Using it in roleplay carries narrative weight instantly recognizable to anyone who has seen the films.
Technically, it also sits in a sweet spot for voice impression: unusual enough to be interesting, achievable enough with practice (or software) to be within reach. The raspy hiss reads as character even when imperfectly executed, which makes it forgiving for streamers and roleplayers who cannot spend years refining their pharyngeal constriction the way Andy Serkis did.
Whether you are going for a one-time “my preciousss” drop during a stream, running Gollum as an NPC in a campaign, or building a full AI voice model for extended roleplay use, the combination of understood technique and the right tool makes the difference between a gimmick and a genuinely immersive performance.
Get the Gollum Preset in VoxBooster
VoxBooster ships with a Fantasy Characters voice bank that includes Gollum and Sméagol as separate presets. Available for Windows 10/11, starting at $6.99/month (€5.99/month in Europe, R$29,90/month in Brazil). No kernel driver. No cloud required for voice conversion. Whisper-powered voice activity detection. Works in Discord, OBS, games, and any WASAPI-compatible application.
Download VoxBooster and try the presets free during the three-day trial.
FAQ
How did Andy Serkis develop the Gollum voice for Lord of the Rings? Serkis based the Gollum voice on the sound of his cat coughing up a fur ball — a strangled, wet, back-of-the-throat constriction. He then layered a split-personality performance on top: raspy, hissing Gollum versus the higher, more childlike and pleading Sméagol. Years of rehearsal refined the cadence.
What is the difference between the Gollum voice and the Sméagol voice? Gollum speaks in a low, raspy, conspiratorial hiss — pitch is mid-low, vocal fry is heavy, consonants like ‘s’ are elongated into a wet sibilance. Sméagol is higher-pitched, breathier, almost childlike and pleading. Switching between them mid-sentence is the signature performance challenge that defines the character.
Can I do the Gollum voice without straining my vocal cords? A short impression attempt is generally low risk for healthy adults, but prolonged constriction of the back throat can cause vocal fatigue or soreness. Warm up your voice beforehand, limit sustained attempts to under two minutes, stay hydrated, and stop immediately if you feel any pain or hoarseness.
How do I set up a Gollum voice changer for Discord or streaming? Install VoxBooster, apply the Gollum preset from the Fantasy Characters bank, and select the VoxBooster Virtual Mic as your input device in Discord or OBS. The sub-300 ms AI voice conversion path gives the most accurate result; the DSP-only preset works with zero additional latency.
Does a Gollum voice changer work in games like DnD virtual tabletop or GTA roleplay? Yes. Any Windows application that reads a microphone input will see the VoxBooster virtual device. You can switch between Gollum and Sméagol presets live using hotkeys, which makes roleplay sessions significantly more immersive.
What pitch settings recreate the Gollum voice with a standard voice changer? Start with pitch shift at −2 semitones (Gollum is not dramatically deep, just rough), formant shift at −1 semitone, heavy harmonic distortion with a ring-mod shimmer, and a long sibilant tail on the reverb. For Sméagol, raise pitch +3 semitones and reduce distortion by 60%.
Is AI voice cloning better than DSP effects for a Gollum impression? AI voice conversion captures timbre qualities — the specific wet, constricted resonance — that DSP effects approximate but cannot fully replicate. The trade-off is latency: DSP runs under 20 ms, while AI conversion in VoxBooster runs sub-300 ms, which is imperceptible in casual conversation but noticeable if you are playing a fast-paced FPS.