← Back to Lab Lab Project — Sports Data

CPBL Analytics Portal

A full sabermetrics portal for Taiwan's professional baseball league — built from scratch, end-to-end.

從零建起的中華職棒進階數據分析系統——ETL、API、Dashboard、視覺化一條龍。

Domain Sabermetrics · CPBL 中華職棒進階數據
Status Live + Daily Updates 持續每日更新
Type Portfolio + Research 作品集 + 研究
Role Solo Builder 全端獨立開發
01

Why Build This

為什麼做這個

Taiwan's baseball community deserves the same depth of analysis that MLB fans take for granted. Tools like FanGraphs and Baseball Savant don't cover CPBL, and the league's own data portal stops at basic counting stats.

I wanted to prove that a single engineer could close that gap — build a production-grade sabermetrics pipeline for CPBL covering ETL, API, statistical modeling, and interactive visualization.

台灣棒球圈值得跟 MLB 球迷一樣深度的數據分析。FanGraphs、Baseball Savant 都不涵蓋中華職棒,而聯盟官方的數據只停留在傳統統計。我想證明一個工程師就能補上這個缺口——打造一套 production-grade 的 sabermetrics pipeline,涵蓋 ETL、API、統計建模、互動視覺化。

02

By the Numbers

量化成果

377 Games analyzed 場比賽
28k Plate appearances 打席
112k Pitch events 逐球事件
84% Test coverage 測試覆蓋率
17 API endpoints API 端點
10 Analysis modules 分析模組
19 Dashboard pages Dashboard 頁面
168 Tests passing 測試通過
03

Analysis Modules

分析模組

LOB%

Strand rate — how often a pitcher leaves runners on base. Divergence from league baseline indicates luck vs skill.

殘壘效率:投手運氣 vs 實力的關鍵指標。

Leverage Index & Clutch

RE24-based win probability framework measuring how players perform in high-pressure moments.

關鍵時刻表現:RE24 壓力指數框架。

Count Splits + Heat Map

Per-count performance breakdown revealing pitcher tendencies and batter chase rates.

球數拆分 + 熱區圖:配球策略研究。

Pitcher Fatigue

Per-15-pitch performance buckets with changepoint detection to identify effective pitch limits.

投手疲勞曲線:每 15 球切 bucket,自動找衰退點。

wRC+

Park-factor-adjusted run creation — the definitive hitter quality metric. Built from scratch for CPBL.

wRC+:經球場係數調整的打者價值指標,為 CPBL 獨立建構。

Park Factor

Both team-based and venue-based park factor calculation with multi-season smoothing.

球場係數:主客場與場館雙維度計算。

BABIP Regression

Mean-reversion analysis identifying overperformers and underperformers likely to normalize.

BABIP 回歸分析:誰該回春,誰該崩盤。

Half-Season Splits

Upper/lower half comparison with rolling wOBA to detect in-season trends and regression candidates.

上下半季分割:搭配 rolling wOBA 追趨勢。

04

Tech Stack

技術棧

Backend

  • Python 3.12 + uv
  • FastAPI + Pydantic v2
  • SQLAlchemy 2.0 + SQLite WAL
  • pybaseball for reference data

Frontend

  • ECharts 5.4.3 (7 chart types)
  • Tailwind CSS 3.4.1
  • Vanilla JS — no framework lock-in
  • Responsive dashboards

Quality

  • 168 pytest tests · 84% coverage
  • ruff lint · mypy strict
  • GitHub Actions CI gate
  • Docker multi-stage build

Ops

  • Daily cron ETL (UTC 22:00)
  • Cloudflare Pages auto-deploy
  • R2 for historical data offload
  • Health checks + monitoring
05

What I Learned

學到了什麼

  • Small samples demand humility. With only ~120 games per team per year, confidence intervals matter more than point estimates. Every module has minimum PA/IP thresholds.
  • Data sources will fight you. CPBL's official API has no stable player IDs. Mapping tables, fuzzy-matching, fallbacks — half the engineering work was data plumbing, not analysis.
  • Tests paid for themselves. 168 tests at 84% coverage caught a critical AB/H column-swap bug that would have invalidated every downstream calculation. Integration tests with in-memory SQLite were the MVP.
  • Visualization is half the insight. Raw numbers are invisible. Savant-style percentile bars, diverging clutch charts, rolling wOBA lines — the chart IS the analysis.

See the live portal

看看實際運作

cpblanalysis.mursfoto.com ↗