json-buffet VS Apache Arrow

Compare json-buffet vs Apache Arrow and see what are their differences.

Apache Arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics (by apache)
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
json-buffet Apache Arrow
2 82
0 14,508
- 1.0%
3.0 10.0
over 1 year ago 1 day ago
C++ C++
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

json-buffet

Posts with mentions or reviews of json-buffet. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-18.
  • Analyzing multi-gigabyte JSON files locally
    14 projects | news.ycombinator.com | 18 Mar 2023
    And here's the code: https://github.com/multiversal-ventures/json-buffet

    The API isn't the best. I'd have preferred an iterator based solution as opposed to this callback based one. But we worked with what rapidjson gave us for the proof of concept.

  • Show HN: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19
    20 projects | news.ycombinator.com | 6 Mar 2023
    Ha! Thanks to you, Today I found out how big those uncompressed JSON files really are (the data wasn't accessible to me, so i shared the tool with my colleague and he was the one who ran the queries on his laptop): https://www.dolthub.com/blog/2022-09-02-a-trillion-prices/ .

    And yep, it was more or less they way you did with ijson. I found ijson just a day after I finished the prototype. Rapidjson would probably be faster. Especially after enabling SIMD. But the indexing was a one time thing.

    We have open sourced the codebase. Here's the link: https://github.com/multiversal-ventures/json-buffet . Since this was a quick and dirty prototype, comments were sparse. I have updated the Readme, and added a sample json-fetcher. Hope this is more useful for you.

    Another unwritten TODO was to nudge the data providers towards a more streaming friendly compression formats - and then just create an index to fetch the data directly from their compressed archives. That would have saved everyone a LOT of $$$.

Apache Arrow

Posts with mentions or reviews of Apache Arrow. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-10-30.
  • Using Polars in Rust for high-performance data analysis
    9 projects | dev.to | 30 Oct 2024
    One of the main selling points of Polars over similar solutions such as Pandas is performance. Polars is written in highly optimized Rust and uses the Apache Arrow container format.
  • Kotlin DataFrame ❤️ Arrow
    3 projects | dev.to | 10 Oct 2024
    Kotlin DataFrame v0.14 comes with improvements for reading Apache Arrow format, especially loading a DataFrame from any ArrowReader. This improvement can be used to easily load results from analytical databases (such as DuckDB, ClickHouse) directly into Kotlin DataFrame.
  • Random access string compression with FSST and Rust
    3 projects | news.ycombinator.com | 12 Sep 2024
  • Declarative Multi-Engine Data Stack with Ibis
    6 projects | dev.to | 17 Jul 2024
    Apache Arrow
  • Shades of Open Source - Understanding The Many Meanings of "Open"
    9 projects | dev.to | 15 Jun 2024
    It's this kind of certainty that underscores the vital role of the Apache Software Foundation (ASF). Many first encounter Apache through its pioneering project, the open-source web server framework that remains ubiquitous in web operations today. The ASF was initially created to hold the intellectual property and assets of the Apache project, and it has since evolved into a cornerstone for open-source projects worldwide. The ASF enforces strict standards for diverse contributions, independence, and activity in its projects, ensuring they can withstand the test of time as standards in software development. Many open-source projects strive to become Apache projects to gain the community credibility necessary for adoption as standard software building blocks, such as Apache Tomcat for Java web applications, Apache Arrow for in-memory data representation, and Apache Parquet for data file formatting, among others.
  • The Simdjson Library
    4 projects | news.ycombinator.com | 3 Jun 2024
  • Arrow Flight SQL in Apache Doris for 10X faster data transfer
    2 projects | dev.to | 12 May 2024
    Apache Doris 2.1 has a data transmission channel built on Arrow Flight SQL. (Apache Arrow is a software development platform designed for high data movement efficiency across systems and languages, and the Arrow format aims for high-performance, lossless data exchange.) It allows high-speed, large-scale data reading from Doris via SQL in various mainstream programming languages. For target clients that also support the Arrow format, the whole process will be free of serialization/deserialization, thus no performance loss. Another upside is, Arrow Flight can make full use of multi-node and multi-core architecture and implement parallel data transfer, which is another enabler of high data throughput.
  • How moving from Pandas to Polars made me write better code without writing better code
    2 projects | dev.to | 5 Mar 2024
    In comes Polars: a brand new dataframe library, or how the author Ritchie Vink describes it... a query engine with a dataframe frontend. Polars is built on top of the Arrow memory format and is written in Rust, which is a modern performant and memory-safe systems programming language similar to C/C++.
  • From slow to SIMD: A Go optimization story
    10 projects | news.ycombinator.com | 23 Jan 2024
    I learned yesterday about GoLang's assembler https://go.dev/doc/asm - after browsing how arrow is implemented for different languages (my experience is mainly C/C++) - https://github.com/apache/arrow/tree/main/go/arrow/math - there are bunch of .S ("asm" files) and I'm still not able to comprehend how these work exactly (I guess it'll take more reading) - it seems very peculiar.

    The last time I've used inlined assembly was back in Turbo/Borland Pascal, then bit in Visual Studio (32-bit), until they got disabled. Then did very little gcc with their more strict specification (while the former you had to know how the ABI worked, the latter too - but it was specced out).

    Anyway - I wasn't expecting to find this in "Go" :) But I guess you can always start with .go code then produce assembly (-S) then optimize it, or find/hire someone to do it.

  • Time Series Analysis with Polars
    2 projects | dev.to | 10 Dec 2023
    One is related to the heritage of being built around the NumPy library, which is great for processing numerical data, but becomes an issue as soon as the data is anything else. Pandas 2.0 has started to bring in Arrow, but it's not yet the standard (you have to opt-in and according to the developers it's going to stay that way for the foreseeable future). Also, pandas's Arrow-based features are not yet entirely on par with its NumPy-based features. Polars was built around Arrow from the get go. This makes it very powerful when it comes to exchanging data with other languages and reducing the number of in-memory copying operations, thus leading to better performance.

What are some alternatives?

When comparing json-buffet and Apache Arrow you can also consider the following projects:

is2 - embedded RESTy http(s) server library from Edgio

Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

japronto - Screaming-fast Python 3.5+ HTTP toolkit integrated with pipelining HTTP server based on uvloop and picohttpparser.

h5py - HDF5 for Python -- The h5py package is a Pythonic interface to the HDF5 binary data format.

semi_index - Implementation of the JSON semi-index described in the paper "Semi-Indexing Semi-Structured Data in Tiny Space"

Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing

json_benchmark - Python JSON benchmarking and "correctness".

FlatBuffers - FlatBuffers: Memory Efficient Serialization Library

reddit_mining

polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust

jq-zsh-plugin - jq zsh plugin

ClickHouse - ClickHouse® is a real-time analytics DBMS

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured

Did you konow that C++ is
the 6th most popular programming language
based on number of metions?