Baseball DataBank's Statement of Purpose

Files available now

Table Definitions (Plain Text format - 26 KB)

Database in Comma-Delimited Form with Table Defs (Zipped Plain Text format - 8207 KB)

Database in mySQL Form (Zipped SQL format - 8566 KB)

Individual Tables in Comma-Delimited Form with Table Defs (Zipped Plain Text format)

Release Notes (last change 2011/03/28)

Consistency Query Results (last change 2011/03/28)

BDB Tables and row count

About this project

A volunteer effort to accumulate and redistribute baseball data in a convenient and easy to use form.

Mailing List

This is an open list that allows folks to discuss new features and bug fixes for the BDB.

Material below here is currently out of date

We are undergoing a big change in db format and also governance, so there will be considerably change in the next two months.

Ongoing Work

We don't want to duplicate work we don't need to, so this will list the projects currently undertaken and the data currently accumulated by different project members. Contact project members through the mailing list through a public message or private note.

  1. SeriesGames - line scores of all postseason games - Sean Forman
  2. PitchersStarts - box score lines for all starts from 1980-2001 - Sean Forman
  3. Umpire Data - A listing of all the Umpires and the years they worked linked to playing umpires as well. - Mike Crain and others.
  4. Hall of Fame Voting Results - Vote totals from each HOF election - Sean Lahman, Mike Crain and others.
  5. Executives - List of executives for each team - Mike Crain
  6. Coaches - List of each team's coaches - Mike Crain
  7. Japanese stats - Assorted batting and pitching stats for Japanese league players - Derek Adair
  8. No hitters - List of all no-hitters thrown, keyed to retrosheetGameIDs - Mike Crain, Sean Lahman, Derek Adair
  9. Ineligible List - players deemed ineligible by the commissioners office - Mike Crain
Some additional data may be available on the Data Page or in the files page of the e-mail group.

Proposed E-mail Subject Lines

Since we have a bunch of folks posting errors and posting errors multiple times isn't really productive. I'm going to propose the following convention for subject lines.

ERROR - teams - ID LAN used for 2001 instead of LAD

First we show that we are posting an error, second the table or tables involved and third a brief description of the error. In this case, I'm just pointing out that the Dodgers have a different team ID for 2001 than for the previous 40 years.

Unless you can't avoid it, I think posting separate notes for separate tables is the way to go. That will make it easier for people to tick off corrections to their databases as they come in.

In addition to ERROR, you can come up with whatever terms seem appropriate.

Some possibilities (an organic evolution is more likely to stick.

SCHEMA - regarding suggestions about table and db structure

NEWDATA - newly available information, like someone died


Page maintained by Sean Forman

Last Modified: .

new server