The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Avro Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
pmacct
pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
adam
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
-
Cinchoo ETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
-
vscode-data-preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
-
SlimMessageBus
Lightweight message bus interface for .NET (pub/sub and request-response) with transport plugins for popular message brokers.
-
kafka-connect-file-pulse
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
-
rumble
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)
-
datagen
Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Open Table Formats Such as Apache Iceberg Are Inevitable for Analytical Data | news.ycombinator.com | 2024-01-18Apache AVRO [1] is one but it has been largely replaced by Parquet [2] which is a hybrid row/columnar format
[1] https://avro.apache.org/
So, is JR yet another faking library written in Go? Yes and no. JR indeed implements most of the APIs in fakerjs and Go fake it, but it's also able to stream data directly to stdout, Kafka, Redis and more (Elastic and MongoDB coming). JR can talk directly to Confluent Schema Registry, manage json-schema and Avro schemas, easily maintain coherence and referential integrity. If you need more than what is OOTB in JR, you can also easily pipe your data streams to other cli tools like kcat thanks to its flexibility.
Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler
For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.
If you want a tool that can ingest from a span port and generate netflow or IPFIX there is pmacct. This should work with your existing tooling that collects netflow data.
for example https://github.com/sksamuel/avro4s - check AvroName and AvroNamespace
Project mention: Difficulty configuring log4j when deploying code as plugin for an app | /r/CodingHelp | 2023-10-27I am working on a custom Kafka-Mongo sink connector (specifically, a custom WriteModelStrategy to be used with the official Mongo sink connector here: https://github.com/mongodb/mongo-kafka ). My code is not a standalone, executable Java application but rather a JAR that augments the functionality of another Java application.
Project mention: Kafka Connect Filepulse 2.13.0 is now available! This version includes support for SFTP and Alibaba OSS. It also contains many bug fixes and improvements. 🚀 | /r/apachekafka | 2023-09-15
I've just recently implemented Google Pub/Sub with Avro serialization using Avro4k library and it works fine https://github.com/avro-kotlin/avro4k
Project mention: What are your favorite tools or components in the Kafka ecosystem? | /r/apachekafka | 2023-05-31For fake data, shameless plug for https://github.com/MaterializeInc/datagen/tree/main
Avro related posts
- People who use Spring and Kotlin...
- Scala 3 Macros: How to Read Annotations
- JR, quality Random Data from the Command line, part II
- What are some good publicly available real-time data sources?
- Haskell jobs at Standard Chartered, various locations and seniority
- Simulating Streaming Data for Fraud Detection with Datagen CLI
- How train my SQL skills with real world data engineering problems ?
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Index
What are some of the best open-source Avro projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Apache Avro | 2,756 |
2 | rq | 2,254 |
3 | schema-registry | 2,136 |
4 | examples | 1,845 |
5 | DataProfiler | 1,362 |
6 | avsc | 1,245 |
7 | pmacct | 1,014 |
8 | adam | 967 |
9 | kafkactl | 748 |
10 | Cinchoo ETL | 735 |
11 | Avro4s | 716 |
12 | vscode-data-preview | 522 |
13 | SlimMessageBus | 433 |
14 | NoProto | 362 |
15 | compendium-client | 326 |
16 | mongo-kafka | 322 |
17 | kafka-connect-file-pulse | 305 |
18 | ABRiS | 221 |
19 | srclient | 221 |
20 | rumble | 207 |
21 | avro4k | 180 |
22 | clickhouse-sink-connector | 173 |
23 | datagen | 133 |
Sponsored