An XGBoost model trained on 78,000+ ATP matches with ELO ratings, clay-specific stats, and Monte Carlo simulation
Based on 10,000 tournament simulations. The defending champion is the clear frontrunner after winning the 2025 French Open.
Yes. Sinner has the highest overall ELO (~2,219) in tennis right now. But this is the French Open — played on clay. When you look at clay-specific ELO, Sinner (1,870) still trails Alcaraz's blended score. That said, his clay game is seriously improving: at the 2025 French Open he beat Djokovic in the semis and pushed Alcaraz to a five-set final. His blended ELO of 1,810 and 85% clay win rate make him a firm second favourite.
The GOAT only has a 4.5% chance? Here's why. At the 2025 French Open, Djokovic lost to Sinner in the semis on clay. At 38 years old, the model sees him as a declining force on this surface. His blended ELO is 1,701 and his clay win rate has fallen to 75%. He's still dangerous — but the data says the torch has been passed, at least on clay.
The Italian surprised everyone at the 2025 French Open, beating Tiafoe in the quarter-finals to reach the semis. His blended ELO of 1,692 and 79% clay win rate make him a legitimate threat — the fourth most likely champion. He's beaten Djokovic, Alcaraz, and Sinner all on clay at some point. Don't sleep on him.
The model uses a blended ELO — 50% overall rating + 50% clay-specific rating. This is why rankings shift so much on clay compared to overall ATP standings.
| Player | Overall ELO | Clay ELO | Blended | Clay W% |
|---|---|---|---|---|
| Carlos Alcaraz | 1,919 | 1,791 | 1,855 | 96% |
| Jannik Sinner | 2,219 | 1,870 | 1,810 | 85% |
| Novak Djokovic | 2,107 | 2,049 | 1,701 | 75% |
| Lorenzo Musetti | — | — | 1,692 | 79% |
| Alexander Zverev | 1,993 | 1,939 | 1,691 | 72% |
| Rafael Nadal | 1,874 | 2,000 | 1,937 | 90% |
This is the deterministic bracket — always picking the player with the higher win probability. The percentage shows the AI's confidence in the winner.
I used Jeff Sackmann's open-source ATP dataset — 78,000+ matches spanning 25+ years, with data through January 2026 including the 2025 French Open results. Every ace, every double fault, every win and loss. I verified the data by checking Nadal's French Open record match-by-match.
Like a video game ranking. Every player starts at 1,500. Win and your score goes up, lose and it goes down. Beat someone way better? You get more points. I made separate ELO scores for each surface — clay, grass, and hard court — because some players are monsters on clay but average on grass.
A machine learning algorithm that builds 200 decision trees, one after another. Each tree fixes the mistakes of the previous ones. It's called "gradient boosting" — and it's the same technique that wins most data science competitions. I fed it 13 features including blended ELO, recent form, head-to-head records, and clay win rates.
Instead of always picking the favourite, I run the tournament 10,000 times using weighted coin flips. If Sinner has a 60% chance of beating Zverev, he wins 60% of the simulated matches — but not always. After 10,000 tournaments, I count how often each player wins to get championship probabilities.
The model figured out on its own that blended ELO difference (the gap between two players' combined overall + clay ratings) is by far the best predictor. This makes sense — if you're way better than your opponent overall AND on clay, you're probably going to win.
This project was inspired by Green Code on YouTube, who built a similar model and beat IBM's official Wimbledon predictions (66.3% vs 63.8%). He used XGBoost with ELO ratings and correctly predicted Jannik Sinner winning the 2025 Australian Open with 85.3% accuracy across 116 matches. I followed his approach — starting with decision trees and random forests, then upgrading to XGBoost — and added clay-specific ELO and Monte Carlo simulation on top.
The code for this project is written in Python using pandas, scikit-learn, and XGBoost. The model trains on data from Jeff Sackmann's open-source ATP dataset.
This model gets about 67% accuracy on test data. That's better than a coin flip but far from perfect. The real French Open draw isn't out yet — these predictions use estimated seedings. Tennis is unpredictable. Upsets happen. Don't bet your life savings on a 13-year-old's AI model. (But if it works, you owe me ice cream.)