Back to Lab

Case Study — Portfolio Project

CPBL Sabermetrics —
A Data Analytics Portfolio

中華職棒進階數據分析作品集

Domain

Baseball Analytics

中華職棒進階數據

Status

Live · 2025–2026

持續每日更新

Type

Portfolio Project

作品集 + 研究

Role

Data Engineer + Analyst

全端獨立開發

Live

cpblanalysis.mursfoto.com
01

Why Build This

為什麼做這個

CPBL analytics is years behind MLB in public tooling. No public Statcast equivalent. No equivalent to FanGraphs or Baseball Reference with league-specific metrics. Most analysis uses raw batting average and ERA from the official site. I wanted to know what the numbers actually say — so I built the infrastructure to find out.

台灣棒球圈值得跟 MLB 球迷一樣深度的數據分析。FanGraphs、Baseball Savant 都不涵蓋中華職棒——所以我自己建一個。

02

The Numbers

量化成果

377

Games analyzed

CPBL 2024–2025

28K

Plate appearances

Batted ball events

112K

Pitches tracked

Pitch-level data

84%

Data completeness

vs CPBL official

17

Analysis modules

Across 5 categories

10

Chart types

ECharts interactive

19

API endpoints

FastAPI backend

168

Players profiled

Batters + pitchers

03

Analysis Modules

分析模組

Seventeen modules across five analytical categories: batting, pitching, defense, game states, and season trends. Each module is independently queryable and visualized with ECharts interactive charts.

Batting Average on Balls in Play

BABIP by batter and pitcher. Identifies regression candidates and outliers across the full season.

xFIP & ERA- Analysis

Defense-independent pitching metrics. Surfaces pitchers outperforming or underperforming their true skill level.

Pitch Mix Breakdown

Usage rates and outcomes by pitch type. Visualized as stacked bars and polar charts per pitcher.

Zone Contact Maps

Spray charts and zone-based contact rates. Shows where each batter hits the ball and where pitchers attack.

Lineup Efficiency

wOBA and wRC+ by batting order position. Identifies misaligned lineups and platoon opportunities.

Bullpen Load Monitor

Tracks appearances, IP, and rest days for relief pitchers. Flags overused arms before performance drops.

Run Expectancy Matrix

Base-out state run expectancy built from CPBL-specific data. More accurate than applying MLB RE24 to CPBL.

Season Trajectory

Rolling 15-game averages for key metrics. Visualizes hot/cold streaks and team-level performance trends.

04

Tech Stack

技術棧

Data Layer

  • CPBL official API
  • Custom scraper (Python)
  • SQLite via FastAPI
  • GitHub Actions (daily sync)

Backend

  • FastAPI
  • Python 3.12
  • Pandas
  • NumPy

Frontend

  • Vanilla JS
  • Apache ECharts 5
  • Responsive grid layout
  • Dark mode support

Infrastructure

  • GitHub Pages (static)
  • GitHub Actions CI/CD
  • Cloudflare CDN
  • R2 (asset storage)
05

Takeaways

學到了什麼

  • Building from raw data teaches you what analytics platforms hide. CPBL API has field inversions, missing values, and non-standard encoding that only appear when you query at scale.

  • ECharts is the right tool for data-dense, interactive sports charts. Recharts and Chart.js both hit walls when rendering 112K data points with hover and drill-down.

  • A FastAPI + SQLite stack handles 19 endpoints and 28K rows trivially. Over-engineering the backend is the most common mistake on portfolio analytics projects.

  • Presenting sabermetrics to a CPBL audience requires translation — xFIP means nothing without a reference to familiar ERA context and a plain-language explanation.

Want to talk sports data?

想聊運動數據分析?

Get in touch →