Analysis
This developer diary offers a fascinating glimpse into the iterative process of making AI communications feel genuinely human. By using a Large Language Model (LLM) as an automated judge, the author systematically improved their system's human-like scoring from a mediocre 4.1 to an impressive 7.7. The breakthrough discovery that removing robotic, overly polite phrases is far more effective than simply adding randomized filler words is a brilliant insight for prompt engineering.
Key Takeaways
- •A Large Language Model (LLM) was successfully used as an automated judge to evaluate an AI's human-like quality, stylistic uniformity, and natural timing.
- •Injecting high-context cultural rules helped the AI avoid direct negations, making its language feel much more natural and contextually aware.
- •Banning typical AI pleasantries (like 'Thank you for contacting us') drastically improved the human-like score, proving that subtraction often beats addition in prompt engineering.
Reference / Citation
View Original"Human-like quality can sometimes be improved more by 'what you stop doing' rather than 'what you add'. Addition by subtraction. This was the biggest discovery of this time."