kafka-connect-elasticsearch
Elasticsearch
Our great sponsors
kafka-connect-elasticsearch | Elasticsearch | |
---|---|---|
1 | 74 | |
684 | 62,494 | |
1.0% | 0.9% | |
8.6 | 10.0 | |
6 days ago | 2 days ago | |
Java | Java | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
kafka-connect-elasticsearch
-
Vinted Search Scaling Chapter 1: Indexing
Kafka Connect is a scalable and reliable tool for streaming data between Apache Kafka and other systems. It allows to quickly define connectors that move data into and out of Kafka. Luckily for us, there is an open-source connector that sends data from Kafka topics to Elasticsearch indices.
Elasticsearch
-
Elastic, Loki and SigNoz β A Perf Benchmark of Open-Source Logging Platforms
Benchmarks are always "It depends".
And what it depends on are your data volume, how you want to query, whether you value ingestion greater than query speed and timeliness and so forth.
Elastic sweet spot is that it indexes everything, and you can query fast as a result. But it does this at the cost of ingest as it's doing the work to build indexes during ingestion and so ingest is more CPU intensive and can hit limits here.
Loki sweet spot is that it has very few indexes, so ingestion is cheap and extremely capable for huge data volumes. It does this at the cost of query performance over very large data sets - without indexes it brute forces via mapreduce, which means you really want to specific where to look (which log streams) and when to look (a time window) and in that it excels.
ClickHouse sweet spot is the indexes are very explicitly configured by engineers who know what the data looks like and how they want to query it. Now the ingest cost is balanced, and the query performance is great - but it did this at the cost of you knowing your data and how you're going to query it most of the time - it's not so good for esoteric questions that you'd never anticipated (though you can get very far through some of their column data types allowing you to be reasonably flexible on this).
They all have sweet spots, and a benchmark is not going to answer the real questions - what data volume do you have, what do you value (ingest and preserve everything vs fastest query speed for ad-hoc queries vs a balanced approach), do you know how you want to query the data, etc?
Other thoughts:
Loki has recently moved to TSDB for the backend storage, these benchmarks don't go there.
Elastic can use less disk if you configure for synthetic source (https://github.com/elastic/elasticsearch/issues/86603) which discards the raw byte copy of the ingested data and only retains knowledge in the indexes, and uses the indexes to reconstruct the source should you request it.
Nothing to add about ClickHouse, I've used all three databases and worked against all three for huge volumes of data - if I want a more OLAP style querying than OLTP then ClickHouse absolutely shines here, Elastic and Loki shine far more for OLTP workloads (though Elastic does a good job at doing pretty well for more OLAP cases than Loki does today).
-
Deploy Elasticsearch 8.5 on Kubernetes with Okteto Cloud freeΒ plan
π§ Currently WIP, waiting this ES issue will be resolved
-
macOS Dev Setup
Elasticsearch is a distributed search and analytics engine. It uses an HTTP REST API, making it easy to work with from any programming language.
-
GitHub Copilot investigation
What? I'm saying that companies already have to protect against their developers violating licenses. Any junior developer can already copy patent-encumbered code off Github, strip the headers, and add it to your company's codebase. They don't need Copilot to make this easier, it's already trivial to do. And yet, companies aren't getting sued. Ergo, it's a minor problem.
-
Observability with Grafana Cloud and OpenTelemetry in .net microservices
Open source, self-hosted: Prometheus + Grafana and ElasticSearch + Kibana.
-
Implement DevSecOps to Secure your CI/CD pipeline
OpenSearch/Elasticsearch: It is a real-time distributed and analytic engine that helps in performing various kinds of search operations.
-
ZincSearch β lightweight alternative to Elasticsearch written in Go
- log search: elasticsearch/opensearch, loki[5], quickwit[6], zincsearch[7]
Vespa.ai[8] is great but I don't really know in which category to put them, user facing and enterprise search seems to work well with them.
It's interesting to note that Elasticsearch and Opensearch are general purpose search engine, Solr as well. They are all powered by Lucene, the popular and performant search engine library.
I would love to see some benchmarks by category :)
Note: I don't know well ZincSearch and put it in the log search as said on their front page.
-
Complete guide to open source licenses for developers
For example: Kubernetes example, Elasticsearch client license, CockroachDB.
-
Shitty documentation as a way to encourage paid usage
Also if you are SO annoyed, you can EDIT their documentation and submit a pull request. Here is the document.
-
DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It
Elasticsearch (licensed under SSPL, not OSI-approved)
What are some alternatives?
OpenSearch - π Open source distributed and RESTful search engine.
Whoosh
bleve - A modern text indexing library for go
Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]
elasticsearch-dsl-py - High level Python client for Elasticsearch
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
MeiliSearch - A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow.
django-haystack - Modular search for Django
GoAccess - GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
cube.js - π Cube β The Semantic Layer for Building Data Applications
Typesense - Open Source alternative to Algolia and an Easier-to-Use alternative to ElasticSearch β‘ π β¨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
PostHog - π¦ PostHog provides open-source product analytics, session recording, feature flagging and a/b testing that you can self-host.