Preparing Archive
pipecat-friday-agent
Build a low-latency, Iron Man-inspired tactical voice assistant (F.R.I.D.A.Y.) using Pipecat, Gemini, and OpenAI.
Architectural Overview
"This module is grounded in ai engineering patterns and exposes 1 core capabilities across 1 execution phases."
Pipecat Friday Agent
Overview
This skill provides a blueprint for building F.R.I.D.A.Y. (Replacement Integrated Digital Assistant Youth), a local voice assistant inspired by the tactical AI from the Iron Man films. It uses the Pipecat framework to orchestrate a low-latency pipeline:
- STT: OpenAI Whisper (
whisper-1) orgpt-4o-transcribe - LLM: Google Gemini 2.5 Flash (via a compatibility shim)
- TTS: OpenAI TTS (
novavoice) - Transport: Local Audio (Hardware Mic/Speakers)
When to Use This Skill
- Use when you want to build a real-time, conversational voice agent.
- Use when working with the Pipecat framework for pipeline-based AI.
- Use when you need to integrate multiple providers (Google and OpenAI) into a single voice loop.
- Use when building Iron Man-themed or tactical-themed voice applications.
How It Works
Step 1: Install Dependencies
You will need the Pipecat framework and its service providers installed:
pip install pipecat-ai[openai,google,silero] python-dotenv
Step 2: Configure Environment
Create a .env file with your API keys:
OPENAI_API_KEY=your_openai_key
GOOGLE_API_KEY=your_google_key
Step 3: Run the Agent
Execute the provided Python script to start the interface:
python scripts/friday_agent.py
Core Concepts
Pipeline Architecture
The agent follows a linear pipeline: Mic -> VAD -> STT -> LLM -> TTS -> Speaker. This allows for granular control over each stage, unlike end-to-end speech-to-speech models.
Google Compatibility Shim
Since Google's Gemini API has a different message format than OpenAI's standard (which Pipecat aggregators expect), the script includes a GoogleSafeContext and GoogleSafeMessage class to bridge the gap.
Best Practices
- ✅ Use Silero VAD: It is robust for local hardware and prevents background noise from triggering the LLM.
- ✅ Concise Prompts: Tactical agents should give short, data-dense responses to minimize latency.
- ✅ Sample Rate Match: OpenAI TTS outputs at 24kHz; ensure your
audio_out_sample_ratematches to avoid high-pitched or slowed audio. - ❌ No Polite Fillers: Avoid "Hello, how can I help you today?" Instead, use "Systems nominal. Ready for commands."
Troubleshooting
- Problem: Audio is choppy or delayed.
- Solution: Check your
OUTPUT_DEVICEindex. Run a script liketest_audio_output.pyto find the correct hardware index for your OS.
- Solution: Check your
- Problem: "Validation error" for message format.
- Solution: Ensure the
GoogleSafeContextshim is correctly translating OpenAI-style dicts to Gemini-style schema.
- Solution: Ensure the
Related Skills
@voice-agents- General principles of voice AI.@agent-tool-builder- Add tools (Search, Lights, etc.) to your Friday agent.@llm-architect- Optimizing the LLM layer.
Validation Signals
1 runnable scripts
Primary Stack
Python
Tooling Surface
Scripts
Workspace Path
.agents/skills/pipecat-friday-agent
Operational Ecosystem
The complete hardware and software toolchain required.
Scripts
Discovered in workspace
Module Topology
Antigravity Core
Principal Engineering Agent
Validation signal
4d 1h ago
1 runnable scripts
Recommended for this workflow
Adjacent modules that complement this skill surface
An error occurred. Please try again later.