automq
awesome-public-real-time-datasets
automq | awesome-public-real-time-datasets | |
---|---|---|
8 | 8 | |
1,421 | 366 | |
50.4% | 10.4% | |
9.9 | 5.1 | |
3 days ago | 10 days ago | |
Java | ||
GNU General Public License v3.0 or later | Creative Commons Zero v1.0 Universal |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
automq
-
Tiered storage won't fix Kafka
I agree with your viewpoint. The crux of the matter is not whether to use tiered storage or not, but what trade-offs have been made in the specific storage architecture and what benefits have been gained. Here(https://github.com/AutoMQ/automq?tab=readme-ov-file#-automq-...) is a qualitative comparison chart of streaming systems including kafka/confluent/redpanda/warpstream/automq. This comparison chart does not have specific numerical comparisons, but purely based on their trade-offs at the storage level, I think this will be of some use to you.
- Streaming Platform Comparision:Kafka/Confluent/Pulsar/AutoMQ/Redpanda/Warpstream
-
Show HN: AutoMQ – A Cost-Effective Kafka distro that can autoscale in seconds
Yes, thank you for the clarification. AutoMQ has replaced the topic-partition storage with cloud-native S3Stream (https://github.com/AutoMQ/automq/tree/main/s3stream) library, thereby harnessing the benefits of cloud EBS and S3.
- FLaNK Stack Weekly for 20 Nov 2023
awesome-public-real-time-datasets
- List of publicly available datasets with real-time data
- FLaNK Stack Weekly for 20 Nov 2023
- Bytewax: Stream processing library built using Python and Rust
- Public Real-Time Datasets and Sources
-
What are some good publicly available real-time data sources?
Added for now - https://github.com/bytewax/awesome-public-real-time-datasets/commit/94ca4a3d40dc212690c6cdc22c107289b4268661
I am attempting to source via the wisdom of the crowd here. I often find it hard to find good real-time data sources for learning about streaming, prototyping, or building hobby projects. I started researching and then created an "Awesome List" in a GitHub repo - https://github.com/bytewax/awesome-public-real-time-datasets.
-
Ask HN: What are some public real-time data sources?
I started an awesome list with real-time data sources here: https://github.com/bytewax/awesome-public-real-time-datasets . Have any datasets or data sources I should add to this list? Comment below or PRs welcome :).
What are some alternatives?
TinyLlama - The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
datagen - Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.
memq - MemQ is an efficient, scalable cloud native PubSub system
screenshot-to-code - Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
depthai-python - DepthAI Python Library
RedfinScraper - Scrapes Redfin data.
FLaNK-SaoPauloBrazil - FLaNK-SaoPauloBrazil
superset - Apache Superset is a Data Visualization and Data Exploration Platform
trip - Elegant middleware functions for your HTTP clients.
mockingbird - Mockingbird is a mock streaming data generator
ML-For-Beginners - 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all