- GPT Analyst Newsletter
- Posts
- Does GPT-4o Know Company Financials?
Does GPT-4o Know Company Financials?
A Look at True vs. Predicted Values

Introduction
Language models like GPT-4o have shown remarkable capabilities in generating human-like text, writing code, and engaging with complex topics—financial discussions included. At GPT Analyst, we’ve developed a platform to seamlessly integrate financial data into GPT prompts. This allows users to leverage reliable context directly in their analyses. However, it raises an important question: how does GPT’s memorized knowledge interact with the context provided in prompts, particularly when recalling or estimating historical financial metrics of major publicly listed companies? In this article, we put GPT-4o to the test, comparing its predictions to actual, publicly available historical financials for some well-known technology giants.
What We Tested
We focused on several prominent companies—such as Microsoft (MSFT), Apple (AAPL), Google (GOOGL), and Amazon (AMZN)—alongside key financial metrics like total revenue, EBIT, net income, and research & development expenses. For each date and metric, we asked GPT-4o a simple, direct question:
What was the {metric} of {ticker} on {date}?
Only provide the number in scientific notation. Nothing else.
For example: #.##e# or #.##e-# or #.##e+#
We instructed the model to provide the answer in scientific notation, ensuring a straightforward numeric comparison. The true values came from publicly disclosed historical financial statements widely available from reputable sources like Yahoo Finance and the SEC’s EDGAR Database. These platforms provide detailed historical financial data for companies like Microsoft, Apple, and Amazon.
How We Collected the Data
For this experiment, we automated the querying process. For each metric and fiscal reporting date, we prompted GPT-4o and recorded its response. We then compared these responses to the known historical figures. By systematically repeating this across multiple data points, we built a dataset of predictions and actual values, allowing us to measure accuracy and spot patterns.
A total of 324 data points were tested, with the removal of 51 extreme outliers where GPT-4o’s predictions exceeded the actual value by more than tenfold. Extreme outliers are filtered out to obtain statistics on the values that are reasonable and could be mistaken as correct.
Surprising Accuracy
The plots demonstrate GPT's surprisingly accurate recall of historical revenue figures, with smooth approximations aligning closely to actual values in many cases. However, occasional spikes in specific years reveal areas where the model overestimates, likely due to inconsistent training data exposure. Notably, predictions for 2024, which fall outside GPT's training range, show extreme deviations, highlighting its inability to project values beyond its training scope accurately. These observations underscore GPT’s potential for approximating well-known data but also its limitations with out-of-sample or less common scenarios. A recent McKinsey report on AI in financial services highlights similar challenges in leveraging machine learning models for financial decision-making.




Accuracy Varies by Ticker
Our results showed that accuracy isn’t uniform across companies. Certain tickers showed smaller margins of error on some metrics.

Certain Metrics Proved More Predictable
GPT-4o was generally closer with commonly cited metrics like total revenue or net income. These figures are often well-known and frequently mentioned in the media and analyst reports, possibly making them more likely to appear in the model’s training data. Less frequently discussed metrics—such as research and development expenses—were harder for GPT-4o to predict accurately, as they may not be as widely reported or repeated in training data.

Limitations
GPT-4o relies on outdated or incomplete training data, which can lead to inaccuracies, particularly for recent or less-discussed metrics. This is a known limitation of AI models, as discussed in the OpenAI’s approach to data and AI. Understanding these constraints is crucial for effectively leveraging GPT in financial research.
Conclusion
GPT-4o demonstrates potential as a supplemental tool for financial research, offering context and trend analysis. However, its reliance on incomplete training data poses risks, particularly for forecasting or decision-making. Biases may arise from wrongly memorized values or hallucinated data, leading to flawed reasoning—underscoring the importance of reliable context in prompts. Platforms like GPT Analyst can address these challenges by integrating trusted financial data directly into GPT workflows. While GPT can assist with qualitative insights and scenario modeling, it should not be solely relied upon for precise forecasting or quantitative analysis. For critical tasks, validate with trusted data sources and use GPT strategically, pairing it with reliable tools like GPT Analyst for better results. Explore GPT Analyst to see how integrated data can transform your financial research.
Reply