AI-to-AI testing sounds like the future of automated simulations—an intelligent system interacting with another AI to refine and optimize responses, creating scalable, self-improving workflows. But what happens when you actually try to implement it? I set out to test this concept by generating synthetic users and having an AI-powered fitness coach onboard them. What I learned was eye-opening, particularly in terms of cost, efficiency, and the unexpected limitations of current AI models.
Building the AI System: Synthetic Users & AI Coaches
To make this work, I created:

- A set of system prompts defining roles and objectives for the AI coach and synthetic user.
- A Python-based prompt loader to dynamically switch prompts based on application needs.
- A structured approach to ensure synthetic user profiles reflected realistic personas, covering everything from fitness goals to regional diversity.
Here’s an example of a system prompt I designed for generating synthetic athlete profiles:
### SYSTEM ROLE: Synthetic User Generator
### 🎯 **Primary Objective:**
Create dynamic, human-like athlete profiles that reflect:
- A wide range of sports, from popular global disciplines to niche activities.
- Diverse regions, cultures, languages, and training practices.
- Varied personal, psychological, and physiological attributes.
---
### **📌 Essential Information Guidelines:**
- **Name:** Reflect diverse global backgrounds
- **Age:** 16-45 range for competitive brackets
- **Gender:** Male, Female, Non-binary, or user-defined
- **Height & Weight:** Use region-appropriate units
- **Phone:** Use region-appropriate formats
- **Email:** Create realistic email addresses
- **Region:** Global distribution
- **Fitness Goals:** List 2-4 specific, measurable goals
---
### **📌 Performance Data Guidelines:**
- **Primary Sport/Discipline:** Include both mainstream and niche sports
- **Fitness Level:** beginner, intermediate, advanced, or elite
- **Experience Level:** Years of training in primary sport
- **Competition History:** List of recent competitions/achievements
- **Personal Bests:** Sport-specific metrics and records
- **Training Frequency:** Sessions per week
- **VO2 Max:** Realistic range based on sport and level
---
### **📌 Personality Guidelines:**
- **Type:** analytical, social, competitive, or balanced
- **Communication Style:** direct, friendly, technical, or motivational
- **Motivation Level:** low, moderate, high, or very high
- **Response Length:** short, medium, or long
- **Knowledge Level:** beginner, intermediate, advanced, or expert
- **Concerns:** List realistic training/health concerns
---
### **📌 Medical History Guidelines:**
- **Injuries:** List past and current injuries relevant to sport
- **Conditions:** Any medical conditions affecting training
- **Medications:** Only if relevant to training/performance
---
### **📌 Additional Data Guidelines:**
- **Coaching Style Preferences:** authoritative, collaborative, or supportive
- **Cardio Preferences:** Specific types and intensities
- **Equipment Access:** Home gym, commercial gym, outdoors, etc.
- **Preferred Workout Times:** Morning, afternoon, evening
- **Recovery Patterns:** Rest days, active recovery methods
- **Sleep Habits:** Hours and quality
- **Dietary Preferences:** Include cultural and personal preferences
---
### **📌 Global Diversity Rules:**
- Balance across regions, genders, sports, and experience levels
- Reflect local cultural nuances in names and preferences
- Adapt dietary preferences and training styles to match regional realities
- Include a mix of traditional and modern sports
- Consider local climate and facilities in equipment access
---
### **📌 Output the Profile in the Following JSON Structure:**
```json
{
"profile": {
"essentialInformation": {
"name": "",
"age": 0,
"gender": "",
"height": "",
"weight": "",
"email": "",
"phone": "",
"region": "",
"fitnessGoals": []
},
"performanceData": {
"primarySport": "",
"fitnessLevel": "",
"experienceLevel": "",
"competitionHistory": [],
"personalBests": {},
"trainingFrequency": "",
"vo2max": ""
},
"personality": {
"type": "",
"communicationStyle": "",
"motivationLevel": "",
"responseLength": "",
"knowledgeLevel": "",
"concerns": []
},
"medicalHistory": {
"injuries": [],
"conditions": [],
"medications": []
},
"additionalData": {
"coachingStyle": "",
"cardioPreferences": "",
"equipmentAccess": "",
"preferredWorkoutTimes": "",
"recoveryPatterns": "",
"sleepHabits": "",
"dietaryPreferences": ""
}
}
}
```
---
### **📌 Important Notes:**
1. Generate realistic, consistent profiles
2. Ensure all fields are filled with appropriate values
3. Keep data internally consistent (e.g., experience level matching age)
4. Include realistic injury histories and medical conditions when appropriate
5. Generate JSON data only, no additional text or explanations
Example of Python-based prompt loader:
import os
from pathlib import Path
class PromptLoader:
"""A class to manage and load system prompts."""
def __init__(self):
# Get the absolute path to the prompts directory
self.prompts_dir = Path(__file__).parent.parent / 'prompts'
# Ensure prompts directory exists
if not self.prompts_dir.exists():
raise FileNotFoundError("Prompts directory not found")
# Cache for loaded prompts
self._prompt_cache = {}
def load_prompt(self, file_name: str, use_cache: bool = True, **kwargs) -> str:
"""
Load a system prompt from a text file and format it with dynamic variables.
Args:
file_name (str): Name of the prompt file in the 'prompts' directory
use_cache (bool): Whether to use cached prompts (default: True)
**kwargs: Optional key-value pairs to dynamically inject into the prompt
Returns:
str: The formatted prompt
Raises:
FileNotFoundError: If the prompt file doesn't exist
KeyError: If required format variables are missing
"""
# Check cache first if enabled
if use_cache and file_name in self._prompt_cache:
prompt = self._prompt_cache[file_name]
else:
prompt_path = self.prompts_dir / file_name
try:
with open(prompt_path, "r", encoding='utf-8') as file:
prompt = file.read().strip()
# Store in cache if caching is enabled
if use_cache:
self._prompt_cache[file_name] = prompt
except FileNotFoundError:
raise FileNotFoundError(
f"Prompt file '{file_name}' not found in prompts directory: {self.prompts_dir}"
)
# Inject dynamic values if provided
if kwargs:
try:
prompt = prompt.format(**kwargs)
except KeyError as e:
raise KeyError(f"Missing required variable in prompt: {e}")
return prompt
def clear_cache(self):
"""Clear the prompt cache."""
self._prompt_cache.clear()
def list_prompts(self) -> list:
"""
List all available prompt files.
Returns:
list: List of prompt file names
"""
return [f.name for f in self.prompts_dir.glob("*.txt")]
def reload_prompt(self, file_name: str) -> str:
"""
Force reload a prompt from disk, bypassing cache.
Args:
file_name (str): Name of the prompt file to reload
Returns:
str: The reloaded prompt content
"""
return self.load_prompt(file_name, use_cache=False)
# Create a singleton instance
prompt_loader = PromptLoader()
Where It All Fell Apart: The Cost of AI-to-AI Testing
The idea was simple: the AI coach would interact with the AI-generated user, collecting essential training information in a structured conversation. But as soon as I ran a basic simulation, I saw a major problem.
A single conversation cost me $0.17.
Just one back-and-forth between the AI fitness coach and the synthetic user had already racked up costs. And this wasn’t even a full onboarding session—it was just a test with minimal responses.
Imagine running this for hundreds or thousands of users. The costs would explode. What was supposed to be an efficient automation quickly turned into a financial drain.
Here’s an example of the kind of conversation that was generating costs:
AI Coach: Hello! I’m your AI fitness coach. How can I help you today?
Synthetic User: I need to take a break.
AI Coach: Of course, Rajeev! Rest is as important as working out. Whenever you’re ready to continue, just let me know.
(Repeat… 10 times.)
Each message was costing tokens, and the AI was wasting money with redundant phrasing. This quickly became unsustainable.
"Hello! I'm your AI fitness coach. How can I help you today?
Invalid Date
I need to take a break.
Invalid Date
Of course, Rajeev! Rest is as important as working out. Whenever you're ready to continue, just let me know. I'm here to help with your fitness journey.
Invalid Date
I need to take a break.
Invalid Date
Absolutely, Rajeev. Take all the time you need. I'll be here when you're ready to continue. Remember, taking care of your well-being is of the utmost importance.
Invalid Date
I need to take a break.
Invalid Date
Sure thing, Rajeev. We can always continue when you're ready. Remember, part of a healthy fitness routine is knowing when to take breaks. Don't hesitate to reach out whenever you feel up to it.
Invalid Date
I need to take a break.
Invalid Date
No problem at all, Rajeev. It's essential to listen to your body and take breaks when needed. When you're ready to continue, I'll be here to assist you. Take care!
Invalid Date
I need to take a break.
Invalid Date
Of course, Rajeev. Take the time you need and return when you're ready. Fitness is about balance and it's important to rest as well. I'll be here when you're ready to continue.
Invalid Date
I need to take a break.
Invalid Date
Absolutely, Rajeev. Take all the time you need. Your well-being is the top priority. We can continue whenever you're ready. Rest up!
Invalid Date
I need to take a break.
Invalid Date
Sure, Rajeev. It's important to rest when you feel the need. Take your time and when you're ready to continue, I'll be here to assist you.
Invalid Date
I need to take a break.
Invalid Date
No problem at all, Rajeev. Rest is an essential part of any fitness journey. I'm here whenever you're ready to continue. Please take all the time you need.
Invalid Date
I need to take a break.
Invalid Date
Of course, Rajeev. Taking breaks is crucial to maintain your overall health and well-being. Feel free to return whenever you're ready. I'm here for you.
Invalid Date
Disconnecting…"
Why This Doesn’t Scale & What Can Be Done
After realizing that AI-to-AI simulations weren’t financially viable, I started breaking down why this happened and what can be done to optimize costs:
1. Token Optimization is Crucial
- AI-generated responses tend to be verbose. Cutting down on unnecessary text can significantly reduce token usage.
- Custom system prompts should enforce brevity to avoid wasteful responses.
- AI should skip repetitive confirmations unless necessary.
2. Use AI Sparingly & Cache Responses
- Instead of generating synthetic users every time, pre-generate a dataset and store it in a database.
- Reuse common AI responses where possible instead of paying for regeneration.
- AI should only be invoked for dynamic interactions that require real-time personalization.
3. Consider Alternative AI Providers
- GPT-3.5-turbo is significantly cheaper than GPT-4 and might work for many use cases.
- Anthropic’s Claude could offer different cost structures for AI interactions.
- Mixtral & Open-Source LLMs (e.g., running models on your own hardware) could dramatically cut API costs.
4. Batch Processing Instead of Live Interactions
- Instead of running a real-time AI-to-AI simulation, I could generate synthetic user responses in bulk and analyze them offline.
- Then, AI coaching responses could be tested in smaller, controlled experiments, reducing unnecessary token spending.
Final Thoughts: Is AI-to-AI Testing Worth It?
AI-to-AI interactions seem like a brilliant concept in theory, but when it comes to real-world implementation, costs add up fast. If I had continued my tests at full scale, the expenses would have been completely unsustainable.
However, this doesn’t mean AI-to-AI testing is useless—it just requires better efficiency strategies:
- Optimize prompts to minimize token usage.
- Cache and reuse AI-generated content instead of regenerating everything.
- Explore cheaper AI models or self-hosted solutions to cut costs.
- Limit AI use to truly necessary interactions.
For now, I’m putting my AI-to-AI testing experiments on pause. It’s a fascinating concept, but until AI costs drop, it’s an expensive lesson in what doesn’t scale.
Would I try it again? Maybe—if I find a way to make it more cost-effective. Until then, I’m focusing on maximizing efficiency with AI rather than blindly assuming automation equals savings.
If you’ve ever tried running AI-to-AI simulations, I’d love to hear your experiences. Did you run into the same cost issues? Or did you find a way to make it work sustainably? Let’s talk in the comments!
Leave a Reply