doris
PaddlePaddle
doris | PaddlePaddle | |
---|---|---|
42 | 6 | |
11,363 | 21,625 | |
1.6% | 0.5% | |
10.0 | 10.0 | |
5 days ago | about 2 hours ago | |
Java | C++ | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
doris
-
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis
As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
-
Five Apache projects you probably didn't know about
Apache Doris is a real-time data warehouse.
-
Log Analysis: Elasticsearch VS Apache Doris
Learn more about Apache Doris or find the Doris makers on Slack.
-
Replacing Apache Hive, Elasticsearch, and PostgreSQL With Apache Doris
As you can imagine, a long and complicated data pipeline is high-maintenance and detrimental to development efficiency. Moreover, they are not capable of ad-hoc queries. So as an upgrade to our data warehouse, we replaced most of these components with Apache Doris, a unified analytic database.
-
Apache Doris 2.0 Beta Now Available: Faster, Stabler, and More Versatile
GitHub source code: https://github.com/apache/doris/tree/branch-2.0
-
A/B Testing was a handful
The key to Architecture 3.0 is the combination of Flink and Doris, so this is how to connect them. Probably the most important code in building architecture 3. flink-demo stream-load-demo
-
Ask HN: Are there any notable Chinese FLOSS projects?
https://github.com/apache/doris Is a great example. Same for it's cousin https://github.com/StarRocks/starrocks that was an early fork of the doris project.
To be fair, these are the only examples I can think of and I only learned of these as I'm standing up new data infra using starrocks.
- Apache Doris 2.0.0 Alpha Released
-
30,000 QPS Per Node: How We Increased Database Query Concurrency by 20 Times
We optimized Apache Doris to solve these problems. (Pull Request on Github)
-
Beginner's Guide to Data Analytics: Diving into Our Data Management Platform
So, in Storage Architecture 2.0, we introduced Apache Doris and Apache Spark. The whole data pipeline was a Y-shaped diagram.
PaddlePaddle
-
List of AI-Models
Click to Learn more...
-
Ask HN: Are there any notable Chinese FLOSS projects?
PaddlePaddle?
https://github.com/PaddlePaddle/Paddle
Also, Baidu have quite a few OSS projects out there in general.
https://github.com/baidu
-
Volcano vs Yunikorn vs Knative
Volcano is a batch scheduler on top of Kube-batch targetting spark-operator, plain old MPI, chinesium paddlepaddle, and Kromwell HPC.
-
Baidu AI Researchers Introduce SE-MoE That Proposes Elastic MoE Training With 2D Prefetch And Fusion Communication Over Hierarchical Storage
Continue reading | Check out the paper, and Github
- I have issue with only __habs for half datatype? Please help!
- Alternatives to google collab?
What are some alternatives?
starrocks - StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
tensorflow - An Open Source Machine Learning Framework for Everyone
tools
PyTorch-NLP - Basic Utilities for PyTorch Natural Language Processing (NLP)
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Keras - Deep Learning for humans
kop - Kafka-on-Pulsar - A protocol handler that brings native Kafka protocol to Apache Pulsar
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Boost-Pretty-Printer - GDB Pretty Printers for Boost
MLflow - Open source platform for the machine learning lifecycle
esphome-yeelight-ceiling-light - ESPHome custom firmware for some Yeelight Ceiling Lights
gym - A toolkit for developing and comparing reinforcement learning algorithms.