boxball
condenser
boxball | condenser | |
---|---|---|
7 | 14 | |
110 | 299 | |
- | 1.3% | |
5.5 | 1.7 | |
5 months ago | 5 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
boxball
-
Importing Retrosheet to Tableau or Power BI
I haven't worked with BI tools, but I'll make the same recommendation for Retrosheet data that I always do: check out the Boxball distributions, specifically the CSV files (or the parquet files if you prefer). You can load those right in, rather than futzing with Retrosheet's event files and processing them manually, etc., just to get to the same spot.
-
Delta Aging Curve Python
The original data source is the Baseball Databank (aka "The Lahman Database"). I use the Boxball distributions, which include both retrosheet and Baseball Databank data.
- What's the current standard for getting mlb data into a database? I used to use Baseball On a Stick (which accessed the old gameday data) but that doesn't work now. I found "mlbdata" which accesses the mlb API but I can't figure out how to make it put stuff into a database. Is there a good option?
-
How do I compile a list of a team's games where event x did not happen?
You know, I thought about that after posting, that the dailies might not be available directly from retrosheet. I use the Boxball distributions, which take the retrosheet data and process it into database images. These distributions include a "daily" table with the daily logs. So apparently Boxball is generating that table.
-
Aggregate Game Logs
I used to think the same thing, and then I found the Boxball pre-built images. You still have to be a bit handy to get them working, but it's way easier than starting from scratch with raw retrosheet data. I initially used the postgres image and worked in SQL, and then switched to python/pandas and the parquet files.
-
Finding a player's stats through N games
I've always been good with data, math, Excel, etc., but decided to take it to another level during the pandemic. The most difficult thing about Retrosheet data is getting into a usable format. Luckily, the boxball project has created a number of ready-to-download images with all of the data, for different database technologies.
-
RE24 Data for wOBA Calculations
For Retrosheet data, I've been using pre-built images from Boxball, rather than going through the whole process of downloading and converting the files from retrosheet. The Boxball images are awesome. I mostly use the Parquet files for Pandas, but they have other formats (e.g., postgres Docker images, CSV, etc) there.
condenser
-
Jailer, a unique open-source database tool
[1]: https://github.com/TonicAI/condenser
-
Ask HN: What are the open source tools for database subsetting?
[1]: https://github.com/TonicAI/condenser
-
Recommendation for tool or script for sanitizing data
This may be overkill but we have used Tonic for this: https://www.tonic.ai/
-
Is it atypical to have a dev DB service on your local environment?
A tool like https://www.tonic.ai/ might help.
- Launch HN: JumpWire (YC W22) – Easily encrypt customer data in your databases
-
Anonymize test data?
I attended a presentation a month or so ago where a co-worker was advocating for tonic. I've never used it myself, but, definitely looks as though it would cover your bases. I do agree with onomazein though that this should really be handled on the infrastructure side if at all possible. Dev's, testers, anyone else should be able to pull from an already anonymized location, but, but, someone else should be responsible for setting up the environment and the initial synchronization.
-
What's the coolest automation tool you've built or been involved in?
So I built a gitlab pipeline to create a backup of this upstream db without downtime using various SQL utils. This archive is then staged into an image so the data will unpack and load on startup. I then used a data subsetter called condenser to create datasets for certain use cases. Now devs can load reliable dev data quicker, test against data that QA uses but within their unique envs (local and preview) and create datasets for their own use cases.
- Preserve the unique relationships between data columns while wiping sensitive information from those columns using randomization.
- Don't let your test data suffer - meet the Tonic and Google BigQuery partnership.
- Don't let your test data suffer - meet the Tonic.ai and Amazon Redshift partnership for Real. Fake. Data.
What are some alternatives?
pybaseball - Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
Replibyte - Seed your development database with real data ⚡️
ElectroCRUD - Database CRUD Application Built on Electron | MySQL, Postgres, SQLite
prisma-field-encryption - Transparent field-level encryption at rest for Prisma
chadwick - Chadwick tools for manipulating baseball data
wordscapes-bot - A Bot that Completes Levels on the Videogame WordScapes
baseball_sql - SQL scripts for working with the baseball data from retrosheet and baseball-databank, as provided by boxball
sitcom-simulator-cli - A tool that combines GPT-3, Stable Diffusion, and FakeYou to create fully automated video. [Moved to: https://github.com/joshmoody24/sitcom-simulator]
baseballr - A package written for R focused on baseball analysis. Currently in development.
infrastructure-tools - JumpWire deployment and installation scripts
MLB-StatsAPI - Python wrapper for MLB Stats API
Docker Compose - Define and run multi-container applications with Docker