[Technical] Argus: Federated Optimization for F1 Lap Prediction (42.9% Gain)

A place to discuss the characteristics of the cars in Formula One, both current as well as historical. Laptimes, driver worshipping and team chatter do not belong here.
Forgingalex
Forgingalex
0
Joined: 23 Apr 2026, 01:17

[Technical] Argus: Federated Optimization for F1 Lap Prediction (42.9% Gain)

Post

Hi everyone,

I wanted to share a technical project I’ve recently completed called Argus. It addresses the "Data Starvation" problem in Formula 1, the structural disadvantage where smaller teams (e.g., Williams) have significantly less clean training data than top-tier teams (e.g., Red Bull), leading to a compounding gap in simulation accuracy.

I’ve built a federated learning simulation from scratch in PyTorch to evaluate if constructors can collaborate on intelligence without ever sharing raw telemetry.
Methodology:
  • [] Dataset: 19,590 clean laps from the full 2023 FIA World Championship (extracted via FastF1).
    [] Architecture: A 3-layer MLP (128 -> 64 -> 1) optimized with Huber Loss for outlier robustness.
    [] Federated Protocol: Hand-coded FedAvg and FedProx implementations.
    [] Evaluation: Chronological splitting within events to prevent data leakage from track evolution.
Key Findings:
  • [] Collective Gain: The federated model achieved a 42.9% accuracy gain (1.677s MAE) over models trained in isolation (2.936s MAE).
    [] Small Team Leap: Backmarker teams saw the largest marginal benefits, with improvements of over 50%.
  • Non-IID Stability: Despite pace and degradation distribution imbalances across the grid, weighted FedAvg proved remarkably stable, while FedProx provided a proximal regularizer to prevent client drift.
Engineering Note on Limitations:
I recognize that public timing data lacks internal variables like fuel loads, ERS deployment maps, and high-fidelity sensor telemetry. Argus focuses on the transferable patterns of tire degradation and track sensitivity that are emergent in the timing traces. The results suggest that federated optimization narrows the "privacy-performance gap" significantly compared to isolated training.

I have published a 6-page Technical Report covering the pace distribution analysis and the proximal regularization math. I'd value the feedback of this community on the personalization vs. generalization trade-offs for these types of multi-client models.

Technical Report (PDF): [https://github.com/Forgingalex/argus/bl ... Report.pdf]
GitHub Repository: [https://github.com/Forgingalex/argus]

Best,
Forgingalex