Ask HN: How ChatGPT Serves 700M Users

Published:Aug 8, 2025 19:27
1 min read
Hacker News

Analysis

The article poses a question about the engineering challenges of scaling a large language model (LLM) like ChatGPT to serve a massive user base. It highlights the disparity between the computational resources required to run such a model locally and the ability of OpenAI to handle hundreds of millions of users. The core of the inquiry revolves around the specific techniques and optimizations employed to achieve this scale while maintaining acceptable latency. The article implicitly acknowledges the use of GPU clusters but seeks to understand the more nuanced aspects of the system's architecture and operation.

Reference

The article quotes the user's observation that they cannot run a GPT-4 class model locally and then asks about the engineering tricks used by OpenAI.