In this Build Guide:
- The Chip: Why you need an ESP32-S3 (and why the C3 won't work).
- The Backend: Installing Whisper and Piper in Home Assistant.
- The Parts: INMP441 Mic + MAX98357A Amp.
- The Wiring: Understanding the I2S Audio Protocol.
- The Result: A private smart speaker for under $15.
In our previous guides, we loved the ESP32-C3 for temperature sensors and WLED. It is cheap and efficient. But today, we are building ears and a mouth for your home.
For audio processing, the C3 is too weak. To detect a wake word like "Hey Jarvis" locally—without sending audio to the cloud—we need heavy processing power. We need the ESP32-S3.
Hardcore Hardware
This is an intermediate build involving I2S audio protocols. Love getting into the weeds of datasheets? Search for the "Electronics" or "PCB Design" tags on Great Meets to find other hardware hackers in your city.
Step 1: Preparing Home Assistant (The Brains)
Before we wire up the hardware, we need to ensure Home Assistant has the "brains" to understand English and talk back. We need to install three add-ons.
- Go to Settings -> Add-ons -> Add-on Store.
- Install Whisper (Speech-to-Text). This converts your voice recording into text.
- Install Piper (Text-to-Speech). This creates the computer voice that talks back.
- Install openWakeWord. Even though the S3 chip handles detection, this add-on manages the models.
Once installed, go to Settings -> Voice Assistants and make sure you have a pipeline active that uses these three services. This is the "server" your ESP32 will talk to.
Step 2: The Shopping List
Unlike a smart plug, we are building this from components. You will need:
The Brain
ESP32-S3 DevKit. Make sure it is the S3 version (N16R8 is best). The "S" stands for Smart (AI features).
The Ears
INMP441 Microphone. An omnidirectional I2S microphone. It captures high-quality digital audio.
The Mouth
MAX98357A Amplifier. This takes digital audio from the ESP32 and powers a small 3W speaker.
Step 3: The Wiring (I2S Protocol)
We are using I2S (Inter-IC Sound), which is a standard for connecting digital audio devices. It requires 3 wires: Clock (BCLK), Word Select (LRC), and Data (DIN/DOUT).
| ESP32-S3 Pin |
Microphone (INMP441) |
Amplifier (MAX98357A) |
| 3.3V / 5V |
VDD |
Vin |
| GND |
GND |
GND |
| GPIO 41 |
SCK |
BCLK |
| GPIO 42 |
WS |
LRC |
| GPIO 40 |
SD (Serial Data) |
- |
| GPIO 39 |
- |
DIN |
Note: You can change these GPIO pins in the software, but these are standard defaults for many S3 boards.
Getting Static or Screeching?
Audio hardware is sensitive to power noise. If your speaker is buzzing, you might need a capacitor or a cleaner power supply. Stuck? Search for an "Audio Engineer" on Great Meets and message them for troubleshooting tips.
Step 4: The Software (ESPHome)
We will use ESPHome to program the chip. You will need a specific configuration that includes the "Micro Wake Word" component.
Create a new device in ESPHome and use this configuration block for the I2S setup:
i2s_audio:
- id: i2s_bus
i2s_lrclk_pin: GPIO42
i2s_bclk_pin: GPIO41
microphone:
- platform: i2s_audio
id: board_microphone
i2s_din_pin: GPIO40
adc_type: external
pdm: false
speaker:
- platform: i2s_audio
id: board_speaker
i2s_dout_pin: GPIO39
dac_type: external
mode: mono
voice_assistant:
microphone: board_microphone
speaker: board_speaker
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
Step 5: The Test
Once you flash the ESP32-S3, Home Assistant will auto-discover it. Add it, and then go to Settings -> Voice Assistants.
- Select your device.
- Choose your Wake Word (e.g., "Okay Nabu" or "Hey Jarvis").
- Speak!
When you speak the wake word, the ESP32-S3 detects it locally. It then streams the audio to the Whisper add-on (Step 1) to convert it to text. Home Assistant processes the command, and sends the response back to Piper, which plays the audio out of your DIY speaker.
Conclusion
You have just built a device that rivals Amazon Echo in functionality but respects your privacy completely. It costs about $15 in parts and gives you total control over the hardware.
Build It Together
Soldering tiny wires to an ESP32 can be daunting. Why not host a "Build Night"? Great Meets lets you find others who want to learn. Create a local meetup or just find a buddy to share shipping costs on parts.