chatnoir-resiliparse
A robust web archive analytics toolkit (by chatnoir-eu)
preshed
💥 Cython hash tables that assume keys are pre-hashed (by explosion)
chatnoir-resiliparse | preshed | |
---|---|---|
2 | 1 | |
42 | 78 | |
- | - | |
7.5 | 4.1 | |
6 days ago | 6 months ago | |
Cython | Cython | |
Apache License 2.0 | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
chatnoir-resiliparse
Posts with mentions or reviews of chatnoir-resiliparse.
We have used some of these posts to build our list of alternatives
and similar projects.
-
Selenium over scrapy
bs4 is a little slow, try https://github.com/chatnoir-eu/chatnoir-resiliparse, it's faster for working with the dom written in cython and based on lexbor (written in C and very fast)
-
Would I ever need anything besides Python (not pro)
I've been working on this for the last several days, and learning a lot. I'm actually moving away from dask to python multiprocessing, the overhead for extremely fast functions written in cython seems to slow it down when added to a dask task graph sometimes more than running sequentially. At least that's what experiments are showing, https://github.com/chatnoir-eu/chatnoir-resiliparse/issues/23
preshed
Posts with mentions or reviews of preshed.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-07-31.
-
Is anyone using PyPy for real work?
If you have very large dicts, you might find this hash table I wrote for spaCy helpful: https://github.com/explosion/preshed . You need to key the data with 64-bit keys. We use this wrapper around murmurhash for it: https://github.com/explosion/murmurhash
There's no docs so obviously this might not be for you. But the software does work, and is efficient. It's been executed many many millions of times now.