boxball vs condenser

boxball

Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV. (by droher)

Source Code

Suggest alternative

Edit details

condenser

Condenser is a database subsetting tool (by TonicAI)

Database Subsetter Testing Postgresql Postgres Subsetting MySQL

Source Code

tonic.ai

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

boxball		condenser
	Project
7	Mentions	14
110	Stars	299
-	Growth	1.3%
5.5	Activity	1.7
5 months ago	Latest Commit	5 months ago
Python	Language	Python
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

boxball

Posts with mentions or reviews of boxball. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-01-31.

Importing Retrosheet to Tableau or Power BI
1 project | /r/Sabermetrics | 26 Apr 2022

I haven't worked with BI tools, but I'll make the same recommendation for Retrosheet data that I always do: check out the Boxball distributions, specifically the CSV files (or the parquet files if you prefer). You can load those right in, rather than futzing with Retrosheet's event files and processing them manually, etc., just to get to the same spot.
Delta Aging Curve Python
1 project | /r/Sabermetrics | 26 Apr 2022

The original data source is the Baseball Databank (aka "The Lahman Database"). I use the Boxball distributions, which include both retrosheet and Baseball Databank data.
What's the current standard for getting mlb data into a database? I used to use Baseball On a Stick (which accessed the old gameday data) but that doesn't work now. I found "mlbdata" which accesses the mlb API but I can't figure out how to make it put stuff into a database. Is there a good option?
2 projects | /r/Sabermetrics | 31 Jan 2022
How do I compile a list of a team's games where event x did not happen?
2 projects | /r/Sabermetrics | 6 Oct 2021

You know, I thought about that after posting, that the dailies might not be available directly from retrosheet. I use the Boxball distributions, which take the retrosheet data and process it into database images. These distributions include a "daily" table with the daily logs. So apparently Boxball is generating that table.
Aggregate Game Logs
1 project | /r/Sabermetrics | 22 Jun 2021

I used to think the same thing, and then I found the Boxball pre-built images. You still have to be a bit handy to get them working, but it's way easier than starting from scratch with raw retrosheet data. I initially used the postgres image and worked in SQL, and then switched to python/pandas and the parquet files.
Finding a player's stats through N games
2 projects | /r/Sabermetrics | 3 Mar 2021

I've always been good with data, math, Excel, etc., but decided to take it to another level during the pandemic. The most difficult thing about Retrosheet data is getting into a usable format. Luckily, the boxball project has created a number of ready-to-download images with all of the data, for different database technologies.
RE24 Data for wOBA Calculations
3 projects | /r/Sabermetrics | 11 Jan 2021

For Retrosheet data, I've been using pre-built images from Boxball, rather than going through the whole process of downloading and converting the files from retrosheet. The Boxball images are awesome. I mostly use the Parquet files for Pandas, but they have other formats (e.g., postgres Docker images, CSV, etc) there.

condenser

Posts with mentions or reviews of condenser. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-08-07.

Jailer, a unique open-source database tool
3 projects | news.ycombinator.com | 7 Aug 2023

[1]: https://github.com/TonicAI/condenser
Ask HN: What are the open source tools for database subsetting?
2 projects | news.ycombinator.com | 17 May 2023

[1]: https://github.com/TonicAI/condenser
Recommendation for tool or script for sanitizing data
1 project | /r/SQLServer | 12 Jan 2023

This may be overkill but we have used Tonic for this: https://www.tonic.ai/
Is it atypical to have a dev DB service on your local environment?
1 project | /r/ExperiencedDevs | 1 Dec 2022

A tool like https://www.tonic.ai/ might help.
Launch HN: JumpWire (YC W22) – Easily encrypt customer data in your databases
4 projects | news.ycombinator.com | 1 Dec 2022
Anonymize test data?
1 project | /r/softwaretesting | 30 Nov 2022

I attended a presentation a month or so ago where a co-worker was advocating for tonic. I've never used it myself, but, definitely looks as though it would cover your bases. I do agree with onomazein though that this should really be handled on the infrastructure side if at all possible. Dev's, testers, anyone else should be able to pull from an already anonymized location, but, but, someone else should be responsible for setting up the environment and the initial synchronization.
What's the coolest automation tool you've built or been involved in?
8 projects | /r/Python | 2 Nov 2022

So I built a gitlab pipeline to create a backup of this upstream db without downtime using various SQL utils. This archive is then staged into an image so the data will unpack and load on startup. I then used a data subsetter called condenser to create datasets for certain use cases. Now devs can load reliable dev data quicker, test against data that QA uses but within their unique envs (local and preview) and create datasets for their own use cases.
Preserve the unique relationships between data columns while wiping sensitive information from those columns using randomization.
1 project | /r/u_Tonic_ai | 21 Sep 2022
Don't let your test data suffer - meet the Tonic and Google BigQuery partnership.
1 project | /r/u_Tonic_ai | 21 Sep 2022
Don't let your test data suffer - meet the Tonic.ai and Amazon Redshift partnership for Real. Fake. Data.
1 project | /r/u_Tonic_ai | 21 Sep 2022

What are some alternatives?

When comparing boxball and condenser you can also consider the following projects:

pybaseball - Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)

Replibyte - Seed your development database with real data ⚡️

ElectroCRUD - Database CRUD Application Built on Electron | MySQL, Postgres, SQLite

prisma-field-encryption - Transparent field-level encryption at rest for Prisma

chadwick - Chadwick tools for manipulating baseball data

wordscapes-bot - A Bot that Completes Levels on the Videogame WordScapes

baseball_sql - SQL scripts for working with the baseball data from retrosheet and baseball-databank, as provided by boxball

sitcom-simulator-cli - A tool that combines GPT-3, Stable Diffusion, and FakeYou to create fully automated video. [Moved to: https://github.com/joshmoody24/sitcom-simulator]

baseballr - A package written for R focused on baseball analysis. Currently in development.

infrastructure-tools - JumpWire deployment and installation scripts

MLB-StatsAPI - Python wrapper for MLB Stats API

Docker Compose - Define and run multi-container applications with Docker

boxball vs pybaseball condenser vs Replibyte boxball vs ElectroCRUD condenser vs prisma-field-encryption boxball vs chadwick condenser vs wordscapes-bot boxball vs baseball_sql condenser vs sitcom-simulator-cli boxball vs baseballr condenser vs infrastructure-tools boxball vs MLB-StatsAPI condenser vs Docker Compose

Compare boxball vs condenser and see what are their differences.

boxball

condenser

boxball

condenser

What are some alternatives?