doris
skywalking
doris | skywalking | |
---|---|---|
42 | 23 | |
11,363 | 23,285 | |
1.6% | 0.6% | |
10.0 | 9.5 | |
5 days ago | 2 days ago | |
Java | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
doris
-
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis
As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
-
Five Apache projects you probably didn't know about
Apache Doris is a real-time data warehouse.
-
Log Analysis: Elasticsearch VS Apache Doris
Learn more about Apache Doris or find the Doris makers on Slack.
-
Replacing Apache Hive, Elasticsearch, and PostgreSQL With Apache Doris
As you can imagine, a long and complicated data pipeline is high-maintenance and detrimental to development efficiency. Moreover, they are not capable of ad-hoc queries. So as an upgrade to our data warehouse, we replaced most of these components with Apache Doris, a unified analytic database.
-
Apache Doris 2.0 Beta Now Available: Faster, Stabler, and More Versatile
GitHub source code: https://github.com/apache/doris/tree/branch-2.0
-
A/B Testing was a handful
The key to Architecture 3.0 is the combination of Flink and Doris, so this is how to connect them. Probably the most important code in building architecture 3. flink-demo stream-load-demo
-
Ask HN: Are there any notable Chinese FLOSS projects?
https://github.com/apache/doris Is a great example. Same for it's cousin https://github.com/StarRocks/starrocks that was an early fork of the doris project.
To be fair, these are the only examples I can think of and I only learned of these as I'm standing up new data infra using starrocks.
- Apache Doris 2.0.0 Alpha Released
-
30,000 QPS Per Node: How We Increased Database Query Concurrency by 20 Times
We optimized Apache Doris to solve these problems. (Pull Request on Github)
-
Beginner's Guide to Data Analytics: Diving into Our Data Management Platform
So, in Storage Architecture 2.0, we introduced Apache Doris and Apache Spark. The whole data pipeline was a Y-shaped diagram.
skywalking
- Show HN: OneUptime – open-source Datadog Alternative
-
Enhancing API Observability Series (Part 3): Tracing
When choosing distributed tracing tools, considerations include your technology stack, business requirements, and monitoring complexity. Zipkin, SkyWalking, and OpenTelemetry are popular distributed tracing solutions, each with its unique features.
-
Five Apache projects you probably didn't know about
Apache SkyWalking is an APM tool, focusing on microservices, Cloud Native apps, and Kuernetes architectures. It builds its architecture on four kinds of components:
- Show HN: Monitor your webapp with minimal setup
-
It's time to let go, Apache Software Foundation
Trying to play devil's advocate here.
> It needs at least a stable set of users, but maintaining a set of users is essentially managing the set of people onboarding and the set of people migrating off.
I could say that I don't care very much about how much users a piece of software has, only that it has enough information on how to use it and enough maintainers to patch any security vulnerabilities and do occasional releases with updated dependencies, as well as address any serious issues or bugs.
For example, Apache Skywalking is an APM solution that most people haven't even heard of (in contrast to something like Sentry), yet it fits those qualities and I see few to no issues with it: https://skywalking.apache.org/
> If you're shrinking then a competitor is providing better options, or your problem space has shifted.
Again, as a user, I might not care that Sentry or another piece of software is better in any number of ways than Apache Skywalking. Similarly, I might not care that something like PostgreSQL is more correct or has a large market share (at least on HN) in comparison to something like MariaDB/MySQL.
If a piece of software meets the needs of my project and won't effectively rot with time, then it's quite possibly good enough as it is, even if it's not the market leader. For my small project's APM needs Apache Skywalking is enough. For my CRUD database needs, something like MariaDB/MySQL will be okay until the time Sun burns out (or PostgreSQL if I'm feeling fancy, but even that's not one of the modern and hip solutions).
Ergo, those better options only become relevant once they're closer to being must haves than nice to haves. Same as how Docker Swarm might be enough for many, even if Kubernetes basically won in the "container wars" and has a way more active community. Swarm will only stop being an option for me once it hits EOL, at least for certain projects where simplicity is appreciated.
Then again, a counterpoint to my own argument here could be the story of LibreOffice and OpenOffice, where the latter was basically donated (instead of the rights to the name being given to the folks behind LibreOffice) and is now in decline while LibreOffice is flourishing - but at the same time they were so close to one another feature wise, that maybe it's not a good point, same as with Gogs and Gitea.
-
JDK 21 Release Notes
> Where's Java primarily used these days?
I've seen a lot of enterprise-y webdev projects use it for back end stuff (Dropwizard, Spring Boot, Vert.X, Quarkus) and in rare cases even front end (like Vaadin or JSF/PrimeFaces). The IDEs are pretty great, especially the ones by JetBrains, the tooling is pretty mature and boring, the performance is really good (memory usage aside) and the language itself is... okay.
Curiously, I wanted to run my own server for OIDC/OAuth2 authn/authz and to have common features like registration, password resets and social login available to me out of the box, for which I chose Keycloak: https://www.keycloak.org/
Surprise surprise, it's running Java under the hood. I wanted to integrate some of my services with their admin API, seems like the Java library is also updated pretty frequently: https://mvnrepository.com/artifact/org.keycloak/keycloak-adm... whereas ones I found for .NET feel like they're stagnating more: https://www.nuget.org/packages?q=keycloak (probably not a dealbreaker, though)
Then, I wanted to run an APM stack with Apache Skywalking (simpler to self-host than Sentry), which also turns out to be a Java app under the hood: https://skywalking.apache.org/
Also you occasionally see like bank auth libraries or e-signing libraries be offered in Java as well first and foremost, at least in my country (maybe PHP sometimes): https://www.eparaksts.lv/en/for_developers/Java_libraries and their app for getting certificates from the government issued eID cards also runs off of Java.
So while Java isn't exactly "hot" tech, it's used all over the place: even in some game engines, like jMonkeyEngine, or in infrastructure code where something like Go might actually be more comfortable to use.
-
OpenTelemetry in 2023
> What should people use?
I recall Apache Skywalking being pretty good, especially for smaller/medium scale projects: https://skywalking.apache.org/
The architecture is simple, the performance is adequate, it doesn't make you spend days configuring it and it even supports various different data stores: https://skywalking.apache.org/docs/main/v9.0.0/en/setup/back...
The problems with it are that it isn't super popular (although has agents for most popular stacks), the docs could be slightly better and I recall them also working on a new UI so there is a little bit of churn: https://skywalking.apache.org/downloads/
Still better versus some of the other options when you need something that just works instead of spending a lot of time configuring something (even when that something might be superior in regards to the features): https://github.com/getsentry/self-hosted/blob/master/docker-...
Sentry is just the first thing that comes to mind (OpenTelemetry also isn't simpler due to how much it tries to do), but compare its complexity to Skywalking: https://github.com/apache/skywalking/blob/master/docker/dock...
I wish there was more self-hosted software like that out there, enough to address certain concerns in a simple way on day 1 and leave branching out to more complex options like OpenTelemetry once you have a separate team for that and the cash is rolling in.
- Apache Skywalking Application performance monitor tool for distributed systems
- Improving Observability of Go Services
-
Monitoring Microservices with Prometheus and Grafana
Personally I've also used Apache Skywalking for a decent out of the box experience: https://skywalking.apache.org/
I've also heard good things about Sentry, though if you need to self-host it, then there's a bit of complexity to deal with: https://sentry.io/welcome/
What are some alternatives?
starrocks - StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
prometheus - The Prometheus monitoring system and time series database.
tools
jaeger - CNCF Jaeger, a Distributed Tracing Platform
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
signoz - SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
kop - Kafka-on-Pulsar - A protocol handler that brings native Kafka protocol to Apache Pulsar
Pinpoint - APM, (Application Performance Management) tool for large-scale distributed systems.
Boost-Pretty-Printer - GDB Pretty Printers for Boost
zipkin - Zipkin is a distributed tracing system
esphome-yeelight-ceiling-light - ESPHome custom firmware for some Yeelight Ceiling Lights
Grafana - The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.