🎾

I Trained AI to Predict the French Open

An XGBoost model trained on 78,000+ ATP matches with ELO ratings, clay-specific stats, and Monte Carlo simulation

🏆

The AI's Clear Favourite

29.2%
Carlos Alcaraz

Based on 10,000 tournament simulations. The defending champion is the clear frontrunner after winning the 2025 French Open.

13.3%
Jannik Sinner
7.0%
Alexander Zverev
6.0%
Lorenzo Musetti
4.8%
Alexander Bublik
4.5%
Novak Djokovic
1

Championship Probabilities

Carlos Alcaraz
29.2%
Jannik Sinner
13.3%
Alexander Zverev
7.0%
Lorenzo Musetti
6.0%
Alexander Bublik
4.8%
Novak Djokovic
4.5%
Alex De Minaur
3.5%
Jack Draper
3.0%
Casper Ruud
2.5%
Everyone Else
~26%

Wait — Isn't Sinner the World #1?

Yes. Sinner has the highest overall ELO (~2,219) in tennis right now. But this is the French Open — played on clay. When you look at clay-specific ELO, Sinner (1,870) still trails Alcaraz's blended score. That said, his clay game is seriously improving: at the 2025 French Open he beat Djokovic in the semis and pushed Alcaraz to a five-set final. His blended ELO of 1,810 and 85% clay win rate make him a firm second favourite.

What About Djokovic?

The GOAT only has a 4.5% chance? Here's why. At the 2025 French Open, Djokovic lost to Sinner in the semis on clay. At 38 years old, the model sees him as a declining force on this surface. His blended ELO is 1,701 and his clay win rate has fallen to 75%. He's still dangerous — but the data says the torch has been passed, at least on clay.

Dark Horse: Lorenzo Musetti (6.0%)

The Italian surprised everyone at the 2025 French Open, beating Tiafoe in the quarter-finals to reach the semis. His blended ELO of 1,692 and 79% clay win rate make him a legitimate threat — the fourth most likely champion. He's beaten Djokovic, Alcaraz, and Sinner all on clay at some point. Don't sleep on him.

2

Why Clay Changes Everything

The model uses a blended ELO — 50% overall rating + 50% clay-specific rating. This is why rankings shift so much on clay compared to overall ATP standings.

Player Overall ELO Clay ELO Blended Clay W%
Carlos Alcaraz 1,919 1,791 1,855 96%
Jannik Sinner 2,219 1,870 1,810 85%
Novak Djokovic 2,107 2,049 1,701 75%
Lorenzo Musetti 1,692 79%
Alexander Zverev 1,993 1,939 1,691 72%
Rafael Nadal 1,874 2,000 1,937 90%
3

Predicted Bracket

This is the deterministic bracket — always picking the player with the higher win probability. The percentage shows the AI's confidence in the winner.

Quarter-Finals

Carlos Alcaraz def. Alex De Minaur 75%
Lorenzo Musetti def. Alexander Bublik 52%
Alexander Zverev def. Novak Djokovic 55%
Jannik Sinner def. Jack Draper 64%

Semi-Finals

Carlos Alcaraz def. Lorenzo Musetti 72%
Alexander Zverev def. Jannik Sinner 52%

🏆 Final

Carlos Alcaraz def. Alexander Zverev 72%
4

How I Built This

📊

The Data

I used Jeff Sackmann's open-source ATP dataset — 78,000+ matches spanning 25+ years, with data through January 2026 including the 2025 French Open results. Every ace, every double fault, every win and loss. I verified the data by checking Nadal's French Open record match-by-match.

📈

ELO Ratings

Like a video game ranking. Every player starts at 1,500. Win and your score goes up, lose and it goes down. Beat someone way better? You get more points. I made separate ELO scores for each surface — clay, grass, and hard court — because some players are monsters on clay but average on grass.

🤖

XGBoost Model

A machine learning algorithm that builds 200 decision trees, one after another. Each tree fixes the mistakes of the previous ones. It's called "gradient boosting" — and it's the same technique that wins most data science competitions. I fed it 13 features including blended ELO, recent form, head-to-head records, and clay win rates.

🎲

Monte Carlo Simulation

Instead of always picking the favourite, I run the tournament 10,000 times using weighted coin flips. If Sinner has a 60% chance of beating Zverev, he wins 60% of the simulated matches — but not always. After 10,000 tournaments, I count how often each player wins to get championship probabilities.

5

What the AI Thinks Matters Most

The model figured out on its own that blended ELO difference (the gap between two players' combined overall + clay ratings) is by far the best predictor. This makes sense — if you're way better than your opponent overall AND on clay, you're probably going to win.

Blended ELO Diff
53%
Overall ELO Diff
13%
Rank Difference
7%
Surface ELO Diff
5%
Age Difference
3%
Recent Form (10)
3%
Head-to-Head
2%
Clay Win Rate Diff
2%
6

Inspiration

This project was inspired by Green Code on YouTube, who built a similar model and beat IBM's official Wimbledon predictions (66.3% vs 63.8%). He used XGBoost with ELO ratings and correctly predicted Jannik Sinner winning the 2025 Australian Open with 85.3% accuracy across 116 matches. I followed his approach — starting with decision trees and random forests, then upgrading to XGBoost — and added clay-specific ELO and Monte Carlo simulation on top.

The code for this project is written in Python using pandas, scikit-learn, and XGBoost. The model trains on data from Jeff Sackmann's open-source ATP dataset.

Disclaimer

This model gets about 67% accuracy on test data. That's better than a coin flip but far from perfect. The real French Open draw isn't out yet — these predictions use estimated seedings. Tennis is unpredictable. Upsets happen. Don't bet your life savings on a 13-year-old's AI model. (But if it works, you owe me ice cream.)