countwords
fast-sqlite3-inserts
countwords | fast-sqlite3-inserts | |
---|---|---|
5 | 11 | |
4 | 363 | |
- | - | |
2.6 | 0.0 | |
6 months ago | about 1 year ago | |
Rust | Rust | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
countwords
- Are there benchmark results of current Forth implementations (interpreted & compiled)?
-
Open any file as bytes
See an example: https://github.com/kimono-koans/countwords/blob/master/rust/fast-simple/main.rs
-
I/O is no longer the bottleneck
this is truly 1978 all over again. No flame graphs, no hardware counters no bottleneck analysis. Using these 'optimizations' for job interviews is questionable at best.
[1] https://benhoyt.com/writings/count-words/
-
Correct name for word matching problem
This might actually be interesting to you: https://benhoyt.com/writings/count-words/
-
Performance comparison: counting words in Python, C/C++, Awk, Rust, and more
In case anyone is interested, I did an optimized, but much more simple, Rust implementation just today[0], which is faster than the optimized implementation on my machine. No indexing into arrays of bytes, etc., no "code golf" measures.
Looks like idiomatic Rust, which I think is interesting. Shows there is more than one way to skin a cat.
[0]: https://github.com/kimono-koans/countwords/blob/master/rust/...
fast-sqlite3-inserts
-
SQLite performance tuning: concurrent reads, multiple GBs and 100k SELECTs/s
I am experimenting with SQLite, where I try inserting 1B rows in under a minute. The current best is inserting 100M rows at 23s. I cut many corners to get performance, but the tweaks might suit your workload.
I have explained my rationale and approach here - https://avi.im/blag/2021/fast-sqlite-inserts/
the repo link - https://github.com/avinassh/fast-sqlite3-inserts
-
I/O is no longer the bottleneck
I am working on a project [0] to generate 1 billion rows in SQLite under a minute and inserted 100M rows inserts in 33 seconds. First, I generate the rows and insert them in an in-memory database, then flush them to the disk at the end. To flush it to disk it takes only 2 seconds, so 99% of the time is being spent generating and adding rows to the in-memory B Tree.
For Python optimisation, have you tried PyPy? I ran my same code (zero changes) using PyPy, and I got 3.5x better speed.
I published my findings here [1].
[0] - https://github.com/avinassh/fast-sqlite3-inserts
[1] - https://avi.im/blag/2021/fast-sqlite-inserts/
- Ask HN: Which personal projects got you hired?
-
Is there any language that is as similar as possible to Python in syntax, readability, and features, but is statically typed?
I have a side project where I tried to insert one billion rows in SQLite. I was able to insert 100 million rows using Python under 210 seconds. The same thing with PyPy took 120 seconds. I am wondering what kind of speed boost I would get with Cython
- Ask for benchmark. The owner can’t verify a 18% perf gain, could you?
-
Inserting One Billion Rows in SQLite Under A Minute
Measure, measure, measure! There is a PR which made really minor changes, but it got 2x speed boost with CPython version
- Inserting One Billion Rows in SQLite Under a Minute
- Weekly Coders, Hackers & All Tech related thread - 17/07/2021
-
How we achieved write speeds of 1.4 million rows per second
[somewhat related] Recently, I was benchmarking SQLite inserts and I managed to insert 3.3M records per second (100M in 33 ish seconds) on my local machine - https://github.com/avinassh/fast-sqlite3-inserts Ofcourse the comparison is not apples to apples, but sharing here if anyone finds it interesting
What are some alternatives?
gccontent-benchmark - Benchmarking different languages for a simple bioinformatics task (Counting the GC fraction of DNA in a FASTA file)
tsbs - Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data
countwords - Playing with counting word frequencies (and performance) in various languages.
julia - The Julia Programming Language
countwords - Playing with counting word frequencies (and performance) in various languages.
plum - Multiple dispatch in Python
robin-hood-hashing - Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
sqlite_micro_logger_arduino - Fast and Lean Sqlite database logger for Arduino UNO and above
huniq - Filter out duplicates on the command line. Replacement for `sort | uniq` optimized for speed (10x faster) when sorting is not needed.
remixdb - RemixDB: A read- and write-optimized concurrent KV store. Fast point and range queries. Extremely low write-amplification.
countwords - Playing with counting word frequencies (and performance) in various languages.
dynamic-dns - An automated dynamic DNS solution for Docker and DigitalOcean