fastkafka
DataProfiler
fastkafka | DataProfiler | |
---|---|---|
38 | 61 | |
33 | 1,363 | |
- | 1.0% | |
8.7 | 6.3 | |
5 months ago | 7 days ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
fastkafka
- FLaNK Stack Weekly 16 October 2023
-
FastStream: Python's framework for Efficient Message Queue Handling
Our journey with FastStream started when we needed to integrate our machine learning models into a customer's Apache Kafka environment. To streamline this process, we created FastKafka using AIOKafka, AsyncAPI, and asyncio. It was our first step in making message queue management easier.
-
How we deprecated two successful projects and joined forces to create an even more successful one
After a short discussion, we concluded we were just too spoiled to use low-level libraries that were nothing more than just tiny wrappers around C++ libs and that we could just build our own. So, we shamelessly made one by reusing beloved paradigms from FastAPI and we shamelessly named it FastKafka. The point was to set the expectations right - you get pretty much what you would expect: function decorators for consumers and producers with type hints specifying Pydantic classes for JSON encoding/decoding, automatic message routing to Kafka brokers and documentation generation.
-
Introducing FastStream: the easiest way to write microservices for Apache Kafka and RabbitMQ in Python
FastStream simplifies the process of writing producers and consumers for message queues, handling all the parsing, networking and documentation generation automatically. It is a new package based on the ideas and experiences gained from FastKafka and Propan. By joining our forces, we picked up the best from both packages and created a unified way to write services capable of processing streamed data regardless of the underlying protocol. We'll continue to maintain both packages, but new development will be in this project.
-
FastStream: the easiest way to add Kafka and RabbitMQ support to FastAPI services
FastStream (https://github.com/airtai/faststream) is a new Python framework, born from Propan and FastKafka teams' collaboration (both are deprecated now). It extremely simplifies event-driven system development, handling all the parsing, networking, and documentation generation automatically. Now FastStream supports RabbitMQ and Kafka, but supported brokers are constantly growing (wait for NATS and Redis a bit). FastStream itself is a really great tool to build event-driven services. Also, it has a native FastAPI integration. Just create a StreamRouter (very close to APIRouter) and register event handlers the same with the regular HTTP-endpoints way:
-
The new release of FastKafka supports Pydantic v2.0
Inspired by FastAPI, FastKafka uses the same paradigms for routing, validation, and documentation, making it easy to learn and integrate into your existing streaming data projects. Please check out the latest version adds supporting the newly released Pydantic v2.0, making it significantly faster. https://github.com/airtai/fastkafka
- Inspired by FastAPI, FastKafka uses the same paradigms for routing, validation, and documentation, making it easy to learn and integrate into your existing streaming data projects. The latest version adds support for newly released Pydantic v2.0, making it significantly faster.
- FastKafka โ A Free Open-Source Python Library for Building Kafka-Based Services
DataProfiler
-
LongRoPE: Extending LLM Context Window Beyond 2M Tokens
It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler
For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.
- Data Profiler โ What's in your data?
-
Data Profiler 0.9.0 -- offering a massive improvement to memory usage during profiling of large datasets
Great call out -- would you be willing to write up an issue for that on the repo? Thank you! https://github.com/capitalone/DataProfiler/issues/new/choose
- FLiPN-FLaNK Stack Weekly for 20 March 2023
- Release 0.8.3 ยท capitalone/DataProfiler
What are some alternatives?
cookiecutter-faststream - Cookiecutter template for FastStream apps
ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
RealtimeTTS - Converts text to speech in realtime
pyWhat - ๐ธ Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! ๐งโโ๏ธ
datagen - Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.
usaddress - :us: a python library for parsing unstructured United States address strings into address components
aiokafka - asyncio client for kafka
XlsxWriter - A Python module for creating Excel XLSX files.
Propan - Propan is a powerful and easy-to-use Python framework for building event-driven applications that interact with any MQ Broker
superset - Apache Superset is a Data Visualization and Data Exploration Platform
pythagora - Generate automated tests for your Node.js app via LLMs without developers having to write a single line of code.
vtuber-livechat-dataset - ๐ VTuber 1B: Billion-scale Live Chat and Moderation Event Dataset