|about 2 months ago||5 days ago|
|Mozilla Public License 2.0||Mozilla Public License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
C++ for data analysis
2 projects | reddit.com/r/cpp | 18 May 2022
Fast Lane to Learning R
4 projects | news.ycombinator.com | 14 May 2022
I strongly recommend data.table R. Tidyverse is an improvement on base R, no question. Data.table has less intuitive syntax and can be harder to learn, but is lightning fast and memory efficient. If you're working with more than 1M rows, you should be using data.table.
Here are some benchmarks: https://h2oai.github.io/db-benchmark/
Friendlier SQL with DuckDB
8 projects | news.ycombinator.com | 12 May 2022
Hi, good to hear that you guys care about testing. One thing apart from the Github issues that led me to believe it might not be super stable yet was the benchmark results on https://h2oai.github.io/db-benchmark/ which make it look like it couldn't handle the 50GB case due to a out of memory error. I see that the benchmark and the used versions are about a year old so maybe things changed a lot since then. Can you chime in regarding the current story of running bigger DBs like 1TB on a machine with just 32GB or so RAM? Especially regardung data mutations and DDL queries. Thanks!
I used a new dataframe library (polars) to wrangle 300M prices and discover some of the most expensive hospitals in America. Code/notebook in article
2 projects | reddit.com/r/Python | 9 May 2022
Per these benchmarks it appears Polars is an order of magnitude more performant and it's lazy and Rust is just kinda sexy.
Benchmarking for loops vs apply and others
2 projects | reddit.com/r/rstats | 1 May 2022
This is a much more comprehensive set of benchmarks: https://h2oai.github.io/db-benchmark/
Why is R viewed badly over here? Also, as a college student, should I prioritize Python instead?
1 project | reddit.com/r/datascience | 29 Apr 2022
Its not like pandas is faster than tidyverse either on all the bechmarks, and data.table is faster than both. https://h2oai.github.io/db-benchmark/
Resources for data cleaning
2 projects | reddit.com/r/rstats | 13 Apr 2022
Language isn't really important here; what's important is tooling, and R definitely has the tooling. I would look at this benchmark reference for database-like operations, and you'll see that data.table (a very fast and memory-efficient R package) consistently ranks as one of the fastest tools out there that can also support a wide range of memory loads.
The fastest tool for querying large JSON files is written in Python (benchmark)
16 projects | news.ycombinator.com | 12 Apr 2022
Polars 0.20.0 release
2 projects | reddit.com/r/rust | 14 Mar 2022
How Easy It Is to Re-use Old Pandas Code in Spark 3.2
1 project | reddit.com/r/programming | 4 Feb 2022
It seems to me that the Spark model is much more sensible in terms of performance. In Spark, individual tasks are finally compiled into optimized Java code. As I understand Dusk works, a separate Python process is run for each data subset. So because of this architecture, Dusk is unlikely to ever get Spark performance. By the way, both systems build and optimize the operation graph. This is confirmed by benchmarks: https://h2oai.github.io/db-benchmark/
How to run python code in your browser
4 projects | dev.to | 14 May 2022
Datasette Lite: a server-side Python web application running in a browser
5 projects | news.ycombinator.com | 4 May 2022
I tried building this with a Service Worker first and it didn't work, because Pyodide needs XMLHttpRequest.
I opened an issue about that here: https://github.com/pyodide/pyodide/issues/2432
Python is in the browser. No idea if this will lead to chaos or harmony...
4 projects | reddit.com/r/ProgrammerHumor | 1 May 2022
Run Python in Your HTML via Pyodide
8 projects | news.ycombinator.com | 30 Apr 2022
But some other low hanging fruit include unvendoring the special encodings for Asian languages (hopefully everyone uses utf8), the decimal library, and the xml library which are all quite large and only occasionally used.8 projects | news.ycombinator.com | 30 Apr 2022
This uses Pyodide  under the hood , which is CPython compiled to WebAssembly. In all my tests of it, loading takes a long time ~5 seconds. Coldbrew , another distribution of CPython on Wasm, is another option with similar load times.
If load time is important, Brython is pretty nice. If feature completeness is important, Pyodide and Coldbrew are probably best.
WebAssembly in my Browser Desktop Environment
12 projects | dev.to | 28 Mar 2022
Python via Pyodide
Python 3.11 in the Web Browser
9 projects | news.ycombinator.com | 26 Mar 2022
Those interested in this should check out Pyodide. It basically "just works" so long as the libraries you import are pure Python or are part of the core scientific stack (the Pyodide authors have done the work to manually port all the C code behind numpy, scipy, etc.).
What I really wish for is for ~all Python packages to work in the browser without manual porting of the underlying C/Rust/etc. being needed, since a lot of the interesting and useful libraries aren't pure Python, and manual porting is non-trivial.
I'm not sure what the best route to that future is, but I'm guessing it'd probably help if Python had a wasm runtime in its standard library, since then authors of libraries that use C/Rust/etc. might make cross-platform builds (perhaps by default).
Regarding this Pycon speech, it seems that it's related to this entry in the 3.11 changelog, which the speaker was heavily involved with:
> CPython now has experimental support for cross compiling to WebAssembly platform wasm32-emscripten. The effort is inspired by previous work like Pyodide. (Contributed by Christian Heimes and Ethan Smith in bpo-40280)
Von Jura zu Informatik wechseln
2 projects | reddit.com/r/de_EDV | 24 Mar 2022
Jupyter in the Browser, with WebAssembly
1 project | news.ycombinator.com | 19 Mar 2022
Wow this is amazing. Now all I need to deploy static machine learning demos in documentation is https://github.com/pyodide/pyodide/issues/2198
Is it possible to use Python script on a html page?
1 project | reddit.com/r/learnprogramming | 2 Mar 2022
yes you can do that : https://github.com/pyodide/pyodide
What are some alternatives?
brython - Brython (Browser Python) is an implementation of Python 3 running in the browser
RustPython - A Python Interpreter written in Rust
arrow-datafusion - Apache Arrow DataFusion and Ballista query engines
PyWebIO - Write interactive web app in script way.
streamlit - Streamlit — The fastest way to build data apps in Python
polars - Fast multi-threaded DataFrame library in Rust | Python | Node.js
webview - Tiny cross-platform webview library for C/C++/Golang. Uses WebKit (Gtk/Cocoa) and Edge (Windows)
jupyterlite - Wasm powered Jupyter running in the browser 💡
yet-another-speed-dial - a modern speed dial for chrome, edge and firefox
hal9ai - Web-First Composable Data Apps
databend - A modern Elasticity and Performance cloud data warehouse, activate your object storage for real-time analytics.