BUILDING BEST VALUE SOCCER TEAM
What does it take to build a professional soccer team that wins titles? Using data from 2,500 players across Europe's top 5 leagues, an algorithm was built to create the optimal starting lineup within budget constraints.
Complete mathematical derivations, additional constraints, and detailed results
PROJECT OVERVIEW:
The Premier League has seen 51 different teams since its start, but only 7 have won titles. Why is success so rare? Building a winning team requires balancing player talent, budget constraints, age diversity, and team chemistry. Our team tackled this challenge by creating an optimization algorithm that selects the optimal 11-player starting lineup from Europe's top leagues.
5 collaborators
2,500+ players
€350M constraint
Integer Programming
THE PROBLEM:
Building a successful soccer team is like solving a complex puzzle. You need to:
- •Stay within your budget (we used €350M - the 7th highest in Premier League)
- •Fill specific positions (1 goalkeeper, 4 defenders, 3 midfielders, 3 forwards)
- •Balance experience and youth (at least 3 players over 27, at least 3 under 24)
- •Maximize team performance using advanced player statistics
With 2,500+ players to choose from across Europe's top 5 leagues, manual selection becomes impossible. That's where optimization algorithms come in.
OUR SOLUTION:
Rather than relying on gut instinct or traditional scouting, we approached this as a data science problem. Our solution combines three key insights:
DATA-DRIVEN EVALUATION
Transform subjective player assessment into objective metrics using comprehensive performance data from Europe's top leagues.
MATHEMATICAL OPTIMIZATION
Use proven operations research techniques to find the globally optimal solution rather than settling for "good enough" manual selections.
REAL-WORLD TESTING
Validate theoretical results through simulation and compare against actual Premier League performance to prove the approach works.
The following sections detail exactly how we implemented each of these components and the mathematical framework that ties them together.
MATHEMATICAL FORMULATION:
To solve this systematically, we formulated it as an integer programming problem. This is a mathematical optimization technique that can find the globally optimal solution when given an objective function and constraints.
Here's how we translated our soccer team building challenge into mathematical language:
🎯 What We Want to Maximize:
Our goal is simple: select the 11 players that give us the highest total performance. Mathematically, this means:
⚖️ What Rules We Must Follow:
We can't just pick the 11 highest-rated players - we have real-world constraints that must be satisfied. Each constraint is expressed as a mathematical equation:
Exactly 1 goalkeeper, 4 defenders, 3 midfielders, 3 forwards
Total cost of selected players can't exceed our budget
Team average age must be between 24-26 years old
At least 3 experienced players (27+) and 3 young players (24 or under)
🔧 How We Solve It:
We used the Gurobi solver - a powerful optimization engine that can handle problems with thousands of variables and constraints. It uses advanced algorithms like branch-and-bound and cutting planes to find the mathematically optimal solution.
PLAYER INDEX METHODOLOGY:
The performance index Iplayer is the core of our optimization. We developed a weighted composite score that captures multiple dimensions of player value:
Mathematical Definition:
Sub-Index Components:
- I₁ (37.5%): Goals & Assists
- I₂ (12.5%): Clean Sheets (GK/DEF)
- I₃ (12.5%): Pass Accuracy
- I₄ (6.25%): Team Performance
- I₅ (6.25%): League Strength
Normalization Process:
- • Position-specific scaling
- • Min-max normalization [0,1]
- • League competition adjustment
- • Age performance curves
- • Market value correlation check
WHY THIS WEIGHTING?
Goals and assists (37.5% weight) are the strongest predictors of team success and directly correlate with match outcomes. Defensive metrics like clean sheets (12.5%) are crucial but position-dependent. Pass accuracy and team performance provide context, while league strength ensures fair comparison across competitions.
VALIDATION RESULTS
Our index correctly identified 8 out of 11 players who became regular starters in top Premier League teams the following season. The correlation between our index and actual season performance was 0.73, significantly higher than market value alone (0.41).
Next: With our mathematical framework and player evaluation system defined, let's see what the optimal team looks like when we run our algorithm on the full dataset.
THE OPTIMAL TEAM:
4-3-3 FORMATION
Optimal team selected by our algorithm (total value: €347M)

Our algorithm selected this optimal 11-player lineup using integer programming
DEFENSE + GOALKEEPER
MIDFIELD + ATTACK
KEY RESULTS & VALIDATION:
LEAGUE POSITION
EA FC 24 simulation: finished 5th in Premier League, outperforming our 7th-place budget ranking
BUDGET SCENARIOS
Analyzed 347 different budget levels using cutting planes to find optimal teams for any budget

STAR PLAYER
Takefusa Kubo played every game and led the team in goals + assists, validating our algorithm's selection
🏆 TEAM COMPOSITION & PERFORMANCE
CHALLENGES & LEARNINGS:
⚽ POSITION SPECIFICITY
Our data only had broad positions (defender, midfielder, forward) instead of specific roles like center-back vs full-back. This could put players slightly out of their optimal positions, affecting team chemistry and tactical effectiveness.
🌍 LEAGUE DIFFERENCES
Not all leagues are equally competitive. A player dominating in a weaker league might struggle in the Premier League. Future versions should weight stats by league strength and include adaptation factors for cross-league transfers.
🤝 TEAM CHEMISTRY
Our algorithm optimizes individual player performance but doesn't account for how well players work together. Team chemistry, communication, and playing style compatibility are crucial factors that pure statistics can't capture.
💡 KEY INSIGHTS
Despite limitations, our algorithm successfully identified undervalued players and achieved better performance than expected. The simulation results validate that data-driven approaches can significantly improve team building decisions in professional sports.
FUTURE IMPROVEMENTS:
- •Include more specific player positions (center-back, left-wing, etc.)
- •Add league strength multipliers to normalize performance across competitions
- •Expand dataset to include lower division and non-European leagues
- •Incorporate pass completion and team chemistry metrics
PROJECT COLLABORATORS:
This research was conducted as a group project with contributions from:
Each team member contributed to data collection, algorithm development, analysis, and validation.
TECHNICAL STACK:
Languages & Libraries
- • Python
- • Gurobi (Optimization Solver)
- • Pandas & NumPy
- • Matplotlib/Seaborn
Data Sources
- • Kaggle Player Stats (2,500 players)
- • Transfermarkt Values (Market data)
- • FBRef Goalkeeper Stats
- • EA FC 24 (Team simulation)