boxball
baseballr
boxball | baseballr | |
---|---|---|
7 | 16 | |
110 | 353 | |
- | - | |
5.5 | 7.4 | |
5 months ago | 21 days ago | |
Python | R | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
boxball
-
Importing Retrosheet to Tableau or Power BI
I haven't worked with BI tools, but I'll make the same recommendation for Retrosheet data that I always do: check out the Boxball distributions, specifically the CSV files (or the parquet files if you prefer). You can load those right in, rather than futzing with Retrosheet's event files and processing them manually, etc., just to get to the same spot.
-
Delta Aging Curve Python
The original data source is the Baseball Databank (aka "The Lahman Database"). I use the Boxball distributions, which include both retrosheet and Baseball Databank data.
- What's the current standard for getting mlb data into a database? I used to use Baseball On a Stick (which accessed the old gameday data) but that doesn't work now. I found "mlbdata" which accesses the mlb API but I can't figure out how to make it put stuff into a database. Is there a good option?
-
How do I compile a list of a team's games where event x did not happen?
You know, I thought about that after posting, that the dailies might not be available directly from retrosheet. I use the Boxball distributions, which take the retrosheet data and process it into database images. These distributions include a "daily" table with the daily logs. So apparently Boxball is generating that table.
-
Aggregate Game Logs
I used to think the same thing, and then I found the Boxball pre-built images. You still have to be a bit handy to get them working, but it's way easier than starting from scratch with raw retrosheet data. I initially used the postgres image and worked in SQL, and then switched to python/pandas and the parquet files.
-
Finding a player's stats through N games
I've always been good with data, math, Excel, etc., but decided to take it to another level during the pandemic. The most difficult thing about Retrosheet data is getting into a usable format. Luckily, the boxball project has created a number of ready-to-download images with all of the data, for different database technologies.
-
RE24 Data for wOBA Calculations
For Retrosheet data, I've been using pre-built images from Boxball, rather than going through the whole process of downloading and converting the files from retrosheet. The Boxball images are awesome. I mostly use the Parquet files for Pandas, but they have other formats (e.g., postgres Docker images, CSV, etc) there.
baseballr
-
[General Discussion] Around the Horn - 12/11/23
A basic understanding of R should be enough if you install the baseball r package. From there you can scrape off of Baseball Reference or Fangraphs for custom date ranges to get stats on whatever time frame basis you would like. Then you can export/copy/whatever to excel if you want, or do the analysis right in R.
-
Are the 2023 Yankees too dependent on Judge (and maybe Stanton)? (a) Judge/Stanton Active: .562 W-L% in 16G, 4.8 R/G (b) Judge Active, Stanton IL: .636 W-L% in 33G, 5.0 R/G (c) Judge IL, Stanton Active: .438 W-L% in 16G, 3.4 R/G (d) Judge/Stanton IL: .400 W-L% in 10G, 3.5 R/G (Source: MLB Stats API)
Source: MLB Stats API via baseballr.
-
Scraping Minor League Stats?
I like this idea, too! I use baseballr all the time. It is a godsend.
-
Question on data scraping
In order to make it, I need to get every lineup from every game in the season. I am using the baseballr package to get the game_pk number. Each game has a game_pk number, and each lineup is tied to that game_pk. So I need to create a dataframe (all_games_list) for each game with its game_pk number in it, and then use the game_pk numbers to create a new dataframe (lineup_all) that contains the lineup for said game_pk.
-
Is their a stat or a program where I can see which pitchers during game deficits or leads, giving up a few runs due to walk walks, played hits, rbis? How would I go about filtering it out? I don’t mean starting pitchers or anything like that, I mean pitchers that came in one inning gave up 4 runs.
I just remembered there is also this R package: Acquiring and Analyzing Baseball Data • baseballr.
-
[Doyle] Multiple sources: The Seattle Mariners are calling up right-handed pitcher Bryce Miller. He will start Tuesday against Oakland.
To get all the data, I would suggest checking out baseballr if you are familiar with R. https://billpetti.github.io/baseballr/
-
[OC] The New MLB Pitch Clock is Fixing Baseball's Pace-of-Play Crisis
Visualization originally posted on my blog - I built the boxplot using R and ggplot2, and was fortunate to be able to use the excellent baseballr package to query MLB game information for the runtime source data!
-
Help!! Dataset required for Supervised Linear Regression | Learning purposes
baseballR (baseball)
-
MLB Stats API Application time?
most folks without direct access to mlb's api scrape baseball savant's data api. packages like baseballr or pybaseball can help with this. remember, this is in the open on a trust model: no commercial use, and don't hammer the api.
-
Where to get started analyzing basic baseball metrics
If you're using R, this is the gold standard package to use for getting baseball data. This helps you scrape data.
What are some alternatives?
pybaseball - Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
ElectroCRUD - Database CRUD Application Built on Electron | MySQL, Postgres, SQLite
mlbplotR - R package to easily plot MLB logos
chadwick - Chadwick tools for manipulating baseball data
ggplot2 - An implementation of the Grammar of Graphics in R
baseball_sql - SQL scripts for working with the baseball data from retrosheet and baseball-databank, as provided by boxball
baseballr - A package written for R focused on baseball analysis. Currently in development.
condenser - Condenser is a database subsetting tool
tidycensus - Load US Census boundary and attribute data as 'tidyverse' and 'sf'-ready data frames in R
MLB-StatsAPI - Python wrapper for MLB Stats API
upm - ⠕ Universal Package Manager - Python, Node.js, Ruby, Emacs Lisp.