Soccer Team Optimization Project - Operations Research in Professional Soccer
Data ScienceOptimization

BUILDING BEST VALUE SOCCER TEAM

What does it take to build a professional soccer team that wins titles? Using data from 2,500 players across Europe's top 5 leagues, an algorithm was built to create the optimal starting lineup within budget constraints.

VIEW FULL TECHNICAL REPORT →

Complete mathematical derivations, additional constraints, and detailed results

PROJECT OVERVIEW:

The Premier League has seen 51 different teams since its start, but only 7 have won titles. Why is success so rare? Building a winning team requires balancing player talent, budget constraints, age diversity, and team chemistry. Our team tackled this challenge by creating an optimization algorithm that selects the optimal 11-player starting lineup from Europe's top leagues.

Group Project:

5 collaborators

Dataset:

2,500+ players

Budget:

€350M constraint

Method:

Integer Programming

THE PROBLEM:

Building a successful soccer team is like solving a complex puzzle. You need to:

  • Stay within your budget (we used €350M - the 7th highest in Premier League)
  • Fill specific positions (1 goalkeeper, 4 defenders, 3 midfielders, 3 forwards)
  • Balance experience and youth (at least 3 players over 27, at least 3 under 24)
  • Maximize team performance using advanced player statistics

With 2,500+ players to choose from across Europe's top 5 leagues, manual selection becomes impossible. That's where optimization algorithms come in.

OUR SOLUTION:

Rather than relying on gut instinct or traditional scouting, we approached this as a data science problem. Our solution combines three key insights:

1

DATA-DRIVEN EVALUATION

Transform subjective player assessment into objective metrics using comprehensive performance data from Europe's top leagues.

2

MATHEMATICAL OPTIMIZATION

Use proven operations research techniques to find the globally optimal solution rather than settling for "good enough" manual selections.

3

REAL-WORLD TESTING

Validate theoretical results through simulation and compare against actual Premier League performance to prove the approach works.

The following sections detail exactly how we implemented each of these components and the mathematical framework that ties them together.

MATHEMATICAL FORMULATION:

To solve this systematically, we formulated it as an integer programming problem. This is a mathematical optimization technique that can find the globally optimal solution when given an objective function and constraints.

Here's how we translated our soccer team building challenge into mathematical language:

🎯 What We Want to Maximize:

Our goal is simple: select the 11 players that give us the highest total performance. Mathematically, this means:

MAXIMIZE
i=1nIi × xi
Ii = performance index of player i (explained in detail below)
xi = 1 if we select player i, 0 if we don't
n = 2,500+ players in our dataset

⚖️ What Rules We Must Follow:

We can't just pick the 11 highest-rated players - we have real-world constraints that must be satisfied. Each constraint is expressed as a mathematical equation:

Team Formation (4-3-3):
GK xi = 1, ∑DF xi = 4, ∑MF xi = 3, ∑FW xi = 3

Exactly 1 goalkeeper, 4 defenders, 3 midfielders, 3 forwards

Budget Limit:
i=1n mi × xi ≤ €350,000,000

Total cost of selected players can't exceed our budget

Age Balance:
|1/11 × ∑i=1n agei × xi - 25| ≤ 1

Team average age must be between 24-26 years old

Experience Requirements:
age≥27 xi ≥ 3, ∑age≤24 xi ≥ 3

At least 3 experienced players (27+) and 3 young players (24 or under)

🔧 How We Solve It:

We used the Gurobi solver - a powerful optimization engine that can handle problems with thousands of variables and constraints. It uses advanced algorithms like branch-and-bound and cutting planes to find the mathematically optimal solution.

PLAYER INDEX METHODOLOGY:

The performance index Iplayer is the core of our optimization. We developed a weighted composite score that captures multiple dimensions of player value:

Mathematical Definition:

Iplayer = 100 × (0.375·I₁ + 0.125·I₂ + 0.125·I₃ + 0.0625·I₄ + 0.0625·I₅)

Sub-Index Components:

  • I₁ (37.5%): Goals & Assists
  • I₂ (12.5%): Clean Sheets (GK/DEF)
  • I₃ (12.5%): Pass Accuracy
  • I₄ (6.25%): Team Performance
  • I₅ (6.25%): League Strength

Normalization Process:

  • • Position-specific scaling
  • • Min-max normalization [0,1]
  • • League competition adjustment
  • • Age performance curves
  • • Market value correlation check

WHY THIS WEIGHTING?

Goals and assists (37.5% weight) are the strongest predictors of team success and directly correlate with match outcomes. Defensive metrics like clean sheets (12.5%) are crucial but position-dependent. Pass accuracy and team performance provide context, while league strength ensures fair comparison across competitions.

VALIDATION RESULTS

Our index correctly identified 8 out of 11 players who became regular starters in top Premier League teams the following season. The correlation between our index and actual season performance was 0.73, significantly higher than market value alone (0.41).

Next: With our mathematical framework and player evaluation system defined, let's see what the optimal team looks like when we run our algorithm on the full dataset.

THE OPTIMAL TEAM:

4-3-3 FORMATION

Optimal team selected by our algorithm (total value: €347M)

Optimal 4-3-3 Formation - Selected players positioned on soccer field

Our algorithm selected this optimal 11-player lineup using integer programming

DEFENSE + GOALKEEPER

Marc-André ter Stegen (GK)€35M
Alejandro Balde (LB)€50M
Jules Koundé (CB)€60M
Danilo (CB)€15M
Giovanni Di Lorenzo (RB)€25M

MIDFIELD + ATTACK

Vincenzo Grifo (LM)€12M
Takefusa Kubo (CAM)€25M
Mattia Zaccagni (RM)€30M
Ansu Fati (LW)€35M
Robert Lewandowski (ST)€30M
Folarin Balogun (RW)€30M

KEY RESULTS & VALIDATION:

5th

LEAGUE POSITION

EA FC 24 simulation: finished 5th in Premier League, outperforming our 7th-place budget ranking

347

BUDGET SCENARIOS

Analyzed 347 different budget levels using cutting planes to find optimal teams for any budget

Takefusa Kubo - Star Player

STAR PLAYER

Takefusa Kubo played every game and led the team in goals + assists, validating our algorithm's selection

🏆 TEAM COMPOSITION & PERFORMANCE

13
Clean Sheets
Defensive strength
€347M
Total Investment
Under budget limit
25.1
Average Age
Perfect balance
4-3-3
Formation
Attacking setup

CHALLENGES & LEARNINGS:

⚽ POSITION SPECIFICITY

Our data only had broad positions (defender, midfielder, forward) instead of specific roles like center-back vs full-back. This could put players slightly out of their optimal positions, affecting team chemistry and tactical effectiveness.

🌍 LEAGUE DIFFERENCES

Not all leagues are equally competitive. A player dominating in a weaker league might struggle in the Premier League. Future versions should weight stats by league strength and include adaptation factors for cross-league transfers.

🤝 TEAM CHEMISTRY

Our algorithm optimizes individual player performance but doesn't account for how well players work together. Team chemistry, communication, and playing style compatibility are crucial factors that pure statistics can't capture.

💡 KEY INSIGHTS

Despite limitations, our algorithm successfully identified undervalued players and achieved better performance than expected. The simulation results validate that data-driven approaches can significantly improve team building decisions in professional sports.

FUTURE IMPROVEMENTS:

  • Include more specific player positions (center-back, left-wing, etc.)
  • Add league strength multipliers to normalize performance across competitions
  • Expand dataset to include lower division and non-European leagues
  • Incorporate pass completion and team chemistry metrics

PROJECT COLLABORATORS:

This research was conducted as a group project with contributions from:

Hadas Barabash
Misha Melnyk
David Lascu
Sebastian Valencia
Shamar Phillips

Each team member contributed to data collection, algorithm development, analysis, and validation.

TECHNICAL STACK:

Languages & Libraries

  • • Python
  • • Gurobi (Optimization Solver)
  • • Pandas & NumPy
  • • Matplotlib/Seaborn

Data Sources