feather
libvineyard
DISCONTINUED
Our great sponsors
feather | libvineyard | |
---|---|---|
3 | 4 | |
2,708 | 403 | |
- | - | |
0.0 | 9.1 | |
over 2 years ago | almost 3 years ago | |
JavaScript | C++ | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
feather
- Fun with File Formats
-
Vineyard: An open-source in-memory data manager
It'd be interesting to know how this compares with alternative solutions.
I might not understand the benefit proposition correctly, and I'm not specifically into Python for data work, but I immediately thought of things like feather[1], fst[2], disk.frame[3] and even DuckDB[4].
Some of these are on disk rather than in memory, but I'd still be interested in performance and use case comparisons.
[1] https://github.com/wesm/feather
libvineyard
-
GraphScope: A One-Stop Large-Scale Graph Computing System
https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/...
The graphs on GraphScope is backed by vineyard (https://github.com/alibaba/libvineyard). And that enables GraphScope to have multiple specifically optimized runtimes (written in C++, rust and Python) for different tasks shares the distributed graph data in memory efficiently.
It makes sense to run such tasks in other machines/systems without adding too much burden to a graph db to avoid affect its quality of service.
2. Fully integration with Python makes it more flexible to do data analytics. For example, you can leverage the ability provided by numpy, pandas and mars (https://github.com/mars-project/mars) along GraphScope with zero-copy thanks to our storage engine vineyard (https://github.com/alibaba/libvineyard)
3. Besides distributed processing, extra performance can also come from the efficient graph layout in memory, and other optimizations on the compiler and runtime-level. GraphScope is ~100x faster on Gremlin, and even more on graph analytical algorithms like PageRank, compared with graph dbs like JanusGraph.
-
Vineyard: An open-source in-memory data manager
6. Kubernetes-integration for large-scale big data applications
Github: https://github.com/alibaba/libvineyard (s are welcomed!)
What are some alternatives?
GraphScope - ๐จ ๐ ๐ป ๐ GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | ไธ็ซๅผๅพ่ฎก็ฎ็ณป็ป
libgrape-lite - ๐ A C++ library for parallel graph processing (GRAPE) ๐
euler - A distributed graph deep learning framework.
tablib - Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
tika-docker - Convenience Docker images for Apache Tika Server
DistorteD - Ruby multimedia toolkit with deep Jekyll integration ๐งช
file - Read-only mirror of file CVS repository, updated every half hour. NOTE: do not make pull requests here, nor comment any commits, submit them usual way to bug tracker or to the mailing list. Maintainer(s) are not tracking this git mirror.
SheetJS js-xlsx - ๐ SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs