smart_open
Sidekiq
smart_open | Sidekiq | |
---|---|---|
6 | 91 | |
3,091 | 12,940 | |
0.7% | 0.2% | |
8.3 | 8.9 | |
12 days ago | 6 days ago | |
Python | Ruby | |
MIT License | GNU Lesser General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
smart_open
- smart_open: Utils for streaming large files (S3, HDFS, gzip, bz2...)
-
Use AWS to unzip all of Wikipedia in 10 minutes
We’re using smart_open, which is an amazing library that lets you open objects in S3 (and other cloud object stores) as if they’re files on your filesystem. It’s obviously critical that we’re able to seek to an arbitrary position in an S3 file without first downloading the whole thing. We’ll assume you’re using Poetry, but you should be able to follow along with any other package manager:
-
Using AWS and Hyperscan to match regular expressions on 100GB of text
If you didn’t follow along with the first article in this series, you should be able to follow this article with your own dataset as long as you install smart_open and Meadowrun. smart_open is an amazing library that lets you open objects in S3 (and other cloud object stores) as if they’re files on your filesystem, and Meadowrun makes it easy to run your Python code on the cloud.
-
Ask HN: Codebases with great, easy to read code?
I see that you're primarily looking into Python work, so I'd recommend `smart_open` as a nice, compact way to get started.
https://github.com/RaRe-Technologies/smart_open
-
How to open an s3 binary file in lambda using python open() function?
You want smart_open. It gives you a (more complete) file-like interface to many different storage systems, including s3. You can read and seek as needed.
-
Fsspec: Filesystem Interfaces for Python
See also smart_open: https://github.com/RaRe-Technologies/smart_open which might be more user-friendly? Never used it myself but it was on HN before. Discussion on their bugtracker: https://github.com/RaRe-Technologies/smart_open/issues/579
Sidekiq
-
solid_queue alternatives - Sidekiq and good_job
3 projects | 21 Apr 2024
I'd say Sidekiq is the top competitor here.
-
Valkey Is Rapidly Overtaking Redis
There's something wrong at Redislabs, it took them over a year to get RESP3 rolled out into their hosted service, you'd expect a rollout of that to be a bit quicker when they're the owner of Redis.
It affected us when upgrading Sidekiq to version 7, which dropped support for older Redis, and their Envoy proxy setup didn't support HELLO and RESP3: https://github.com/sidekiq/sidekiq/issues/5594
-
Redis Re-Implemented with SQLite
That depends on how the `maxmemory-policy` is configured, and queue systems based on Redis will tell you not to allow eviction. https://github.com/sidekiq/sidekiq/wiki/Using-Redis#memory (it even logs a warnings if it detects your Redis is misconfigured IIRC).
-
3 one-person million dollar online businesses
Sidekiq https://sidekiq.org/: This one started as an open source project, once it got enough traction, the developer made a premium version of it, and makes money by selling licenses to businesses.
-
Choose Postgres Queue Technology
Sidekiq will drop in-progress jobs when a worker crashes. Sidekiq Pro can recover those jobs but with a large delay. Sidekiq is excellent overall but it’s not suitable for processing critical jobs with a low latency guarantee.
https://github.com/sidekiq/sidekiq/wiki/Reliability
-
We built the fastest CI in the world. It failed
> I'm not sure feature withholding has traditionally worked out well in the developer space.
I think it's worked out well for Sidekiq (https://sidekiq.org). I really like their model of layering valuable features between the OSS / Pro / Enterprise licenses.
-
Exploring concurrent rate limiters, mutexes, semaphores
I was studying Sidekiq's page on rate limiters. The first type of rate limiting mentioned is the concurrent limiter: only n tasks are allowed to run at any point in time. Note that this is independent of time units (e.g. per second), or how long they take to run. The only limitation is the number of concurrent tasks/requests.
- Ask HN: What are some of the most elegant codebases in your favorite language?
- Sidekiq and managing resumable jobs?
-
Organize Business Logic in Your Ruby on Rails Application
The code above isn't idempotent. If you run it twice, it will create two copies, which is probably not what you intended. Why is this important? Because most backend job processors like Sidekiq don't make any guarantees that your jobs will run exactly once.
What are some alternatives?
s3fs - Amazon S3 filesystem for PyFilesystem2
Resque - Resque is a Redis-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later.
Streamz - Real-time stream processing for python
Sneakers - A fast background processing framework for Ruby and RabbitMQ
s3path - s3path is a pathlib extension for AWS S3 Service
Shoryuken - A super efficient Amazon SQS thread based message processor for Ruby
PyFilesystem2 - Python's Filesystem abstraction layer
Sucker Punch - Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
rxsci - ReactiveX for data science
Apache Kafka - Mirror of Apache Kafka
fluvio-client-python - The Fluvio Python Client!
celery - Distributed Task Queue (development branch)