arrow-julia
cylon
Our great sponsors
arrow-julia | cylon | |
---|---|---|
4 | 3 | |
277 | 293 | |
1.8% | 1.0% | |
6.2 | 4.8 | |
16 days ago | 6 days ago | |
Julia | C++ | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
arrow-julia
-
Julia 1.8 has been released
For some examples of people porting existing C++ Fortran libraries to julia, you should check out https://github.com/JuliaLinearAlgebra/Octavian.jl, https://github.com/dgleich/GenericArpack.jl, https://github.com/apache/arrow-julia (just off the top of my head). These are all ports of C++ or Fortran libraries that match (or exceed) performance of the original, and in the case of Arrow.jl is faster, more general, and 10x less code.
- How to adapt Arrow.Table columns (naturally per record batch basis) into CuArrays for GPU processing?
-
Reading HDF5 Files
I guess current preferred format not feather, but arrow: https://github.com/JuliaData/Arrow.jl
-
Apache Arrow 3.0.0 Release
Excited to see this release's official inclusion of the pure Julia Arrow implementation [1]!
It's so cool to be able mmap Arrow memory and natively manipulate it from within Julia with virtually no performance overhead. Since the Julia compiler can specialize on the layout of Arrow-backed types at runtime (just as it can with any other type), the notion of needing to build/work with a separate "compiler for fast UDFs" is rendered obsolete.
It feels pretty magical when two tools like this compose so well without either being designed with the other in mind - a testament to the thoughtful design of both :) mad props to Jacob Quinn for spearheading the effort to revive/restart Arrow.jl and get the package into this release.
[1] https://github.com/JuliaData/Arrow.jl
cylon
-
Data Parallel Pipeline/MapReduce in C++?
There's also https://cylondata.org/ which is more of a Pandas approach.
-
Cylon: DataFrames for MPI!
I'd like to introduce Cylon, a fast, scalable, distributed-memory-parallel runtime. From v0.4 release onward, Cylon introduces a "Pandas-like DataFrames for MPI environments"! It is now available Cylon v0.4.0! :-) This is, by far our most significant release.
-
Apache Arrow 3.0.0 Release
Cudf and Cylon are two execution engines natively supporting Arrow format https://github.com/rapidsai/cudf https://github.com/cylondata/cylon
What are some alternatives?
perspective - A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
arquero - Query processing and transformation of array-backed data tables.
vega-loader-arrow - Data loader for the Apache Arrow format.
ClickHouse - ClickHouse® is a free analytics DBMS for big data
TableIO.jl - A glue package for reading and writing tabular data. It aims to provide a uniform api for reading and writing tabular data from and to multiple sources.
go-py-arrow-bridge - Bridge between Go and Python to facilitate zero-copy using Apache Arrow