Top 14 Python Analytic Projects
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.Project mention: How often do you use SQL query tool or service in your daily work? | reddit.com/r/SQL | 2021-11-21
Regarding the subqueries: try https://tablum.io or https://redash.io, they materialize queried data so you can do a subquery multiple times.
A Python based monitoring and tracking tool for Plex Media Server.Project mention: Hardware transcoding sucks? | reddit.com/r/PleX | 2021-11-27
Use Tautulli. It can tell you why the stream is transcoding.
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
An orchestration platform for the development, production, and observation of data assets.Project mention: Airflow 2.0 vs Prefect | reddit.com/r/dataengineering | 2021-10-20
It has been such a pleasure to use dagster. The testability is nice. It was designed to be type aware, so you can leverage type checks and it is also designed to be data aware when it comes to passing data between tasks. One negative I dont like is its handling of instances where a task does not produce output, but need to still indicate dependency of another task, so you utilize its Nothing abstraction. The syntax for this situation is awkward IMO and they've recognized that. Its UI called dagit is hands down, the best as it provides rich information on each task in your DAG. The developer experience is definitely better with dagster compared to Airflow. I briefly looked at Airflow 2.0 examples, and I still think dagster's API is better ( with version 0.13.x ). However, on the managed environment side, there is no 3rd party managed dagster provider other than the creator of dagster called Elementl has their cloud offering which is currently in beta. So there is no mature managed services for dagster yet. Again, this is due to dagster being a relatively new library - less than 3 years old.
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.Project mention: Best approach to get MongoDB data into BigQuery in real-time? | reddit.com/r/bigquery | 2021-11-01
b) trying not to use 3rd party ETL services, but happy to use getdbt.com
Modern, privacy-friendly, and detailed web analytics that works without cookies or JS.Project mention: Ask HN: Who wants to be hired? (October 2021) | news.ycombinator.com | 2021-10-01
I have a strong technical background and a passion for digital safety and privacy. Especially interested in trust & safety, privacy engineering, human-centered design, tech policy, and open source software. Looking for an internship or fellowship adjacent to trust & safety or privacy engineering.
Some of the projects I’m most proud of are Shynet , PrivacySpy , PolitiTweet , and a17t . I co-instruct CS 106S  at Stanford, and I worked on cyber policy for a 2020 presidential campaign. I also work at the Stanford Internet Observatory on both research and technical infrastructure.
Location: NYC or SF Bay Area
Remote: yes but ideally no
Willing to relocate: no
Technologies: Python, Rust, JS, Java, Kubernetes, C, OSINT, web dev, and more.
Email: [email protected]
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzerProject mention: Don't Bring a Tree to a Mesh Fight | news.ycombinator.com | 2021-11-23
It's super useful in practice!
In the table -> hypergraph transform @ https://github.com/graphistry/pygraphistry , we do `hypergraph(multicolumn_table, direct=True | False)['graph'].plot()` , which renders hypergraphs as a regular graph, this lets you pick/. Consider exploring some logs of customer activity or security events:
A hyperedge becomes either:
- a node of a bipartite graph. Ex: each log event becomes a node connecting the various entity nodes it mentions (IPs, accounts, countries, ...)
- .. or a bunch of pairwise entity<>entity edges. Ex: connect each IP<>account<>country directly, and label each edge with the hyperedge it came from.
In both cases, you can now directly leverage a lot of traditional graph thinking, and in our case, GPU acceleration.
Other systems might render hyperedges as say circles encomposing their nodes, but that's trickier at even small/medium scales
I increasingly just directly equate 'logs' with 'hypergraphs' and skip the relational step :)
A portfolio tracking, analytics, accounting and tax reporting application that protects your privacyProject mention: How do you track your portfolio? | reddit.com/r/CryptoCurrency | 2021-11-27
Just found a mature OSS at Rotki that is more than adequate.
Run Linux Software Faster and Safer than Linux with Unikernels.
A flexible, easy to use, automation framework allowing users to integrate their capabilities and devices to cut through the repetitive, tedious tasks slowing them down. #nsacyberProject mention: Current college student here. What is it like to work for defense contractors? | reddit.com/r/cscareerquestions | 2021-11-10
As for quirks, the biggest quirk is that you usually need to get a security clearance, and that means no drugs. As far as the tech goes, depends on what company you're working for and what government product they produce. If it's software for an otherwise physical product like a missile or an AGV, then it's probably gonna be some old stable language like C, with something like Java being used on the server side to talk to the machine. Meanwhile, there's definitely Python work sprinkled all throughout everything, and there's certainly parts of the government working on Docker or Kubernetes stuff. Like here's a completely unclassified government project that I've contributed to. It uses Docker and Yaml to automate tasks.
a flask profiler which watches endpoint calls and tries to make some analysis.Project mention: Profiling Flask application to improve performance | dev.to | 2021-02-28
There are a lot of profiling tools for Python code, and most of them are built-in — like profile or cProfile. Since I’m speaking about Flask application, let’s see what the world has especially for it. There is a beautiful lib called flask-profiler, which has a web interface with some cool features such as route or date filters. But Flask also has a built-in in werkzeug's profiler. It looked awesomely easy in use, so it was the first — and the last — one I tried. To use the built-in profiler you’ll need to add only two lines of code to your project:
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn applicationProject mention: Intel Extension for Scikit-Learn | news.ycombinator.com | 2021-11-01
Looks like they are responding to https://github.com/intel/scikit-learn-intelex#-acceleration
I completely agree. I hope some Intel competitor funds a scikit-learn developer to read this code and extract all the portable performance improvements.
Hashcat web interface
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and moreProject mention: Facebook bans personal accounts of academics who researched misinformation, ad transparency on the social network | reddit.com/r/technology | 2021-08-04
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).Project mention: Average reply times from some of my Facebook friends over the last few years [OC], full article here: https://medium.com/@timsugaipov/taking-your-facebook-messenger-data-further-f9da079b1409?source=friends_link&sk=3bd04bb35ad9a4b6f586300e52f96e4f | reddit.com/r/dataisbeautiful | 2021-11-01
Data Processing: SAYN
Welcome to my first full project!Project mention: Google Summer of Code Analytics | dev.to | 2021-10-13
I used Python 3 with libraries of Selenium and BeautifulSoup4. My project does not use click feature instead it grabs the Organization ID form the internal html.
Python Analytics related posts
Open Source Analytics Stack: Bringing Control, Flexibility, and Data-Privacy to Your Analytics
15 projects | dev.to | 25 Nov 2021
Don't Bring a Tree to a Mesh Fight
1 project | news.ycombinator.com | 23 Nov 2021
How often do you use SQL query tool or service in your daily work?
1 project | reddit.com/r/SQL | 21 Nov 2021
Average reply times from some of my Facebook friends over the last few years [OC], full article here: https://medium.com/@timsugaipov/taking-your-facebook-messenger-data-further-f9da079b1409?source=friends_link&sk=3bd04bb35ad9a4b6f586300e52f96e4f
1 project | reddit.com/r/dataisbeautiful | 1 Nov 2021
Ask HN: How do you develop internal tools for your organization?
3 projects | news.ycombinator.com | 31 Oct 2021
Tracking the facebook instant game events
1 project | reddit.com/r/dataengineering | 24 Oct 2021
Best Dashboard Advice
3 projects | reddit.com/r/BusinessIntelligence | 24 Oct 2021
What are some of the best open-source Analytic projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.