Python Analytics

Open-source Python projects categorized as Analytics | Edit details

Top 14 Python Analytic Projects

  • GitHub repo Redash

    Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

    Project mention: How often do you use SQL query tool or service in your daily work? | reddit.com/r/SQL | 2021-11-21

    Regarding the subqueries: try https://tablum.io or https://redash.io, they materialize queried data so you can do a subquery multiple times.

  • GitHub repo Tautulli

    A Python based monitoring and tracking tool for Plex Media Server.

    Project mention: Hardware transcoding sucks? | reddit.com/r/PleX | 2021-11-27

    Use Tautulli. It can tell you why the stream is transcoding.

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo dagster

    An orchestration platform for the development, production, and observation of data assets.

    Project mention: Airflow 2.0 vs Prefect | reddit.com/r/dataengineering | 2021-10-20

    It has been such a pleasure to use dagster. The testability is nice. It was designed to be type aware, so you can leverage type checks and it is also designed to be data aware when it comes to passing data between tasks. One negative I dont like is its handling of instances where a task does not produce output, but need to still indicate dependency of another task, so you utilize its Nothing abstraction. The syntax for this situation is awkward IMO and they've recognized that. Its UI called dagit is hands down, the best as it provides rich information on each task in your DAG. The developer experience is definitely better with dagster compared to Airflow. I briefly looked at Airflow 2.0 examples, and I still think dagster's API is better ( with version 0.13.x ). However, on the managed environment side, there is no 3rd party managed dagster provider other than the creator of dagster called Elementl has their cloud offering which is currently in beta. So there is no mature managed services for dagster yet. Again, this is due to dagster being a relatively new library - less than 3 years old.

  • GitHub repo dbt-core

    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

    Project mention: Best approach to get MongoDB data into BigQuery in real-time? | reddit.com/r/bigquery | 2021-11-01

    b) trying not to use 3rd party ETL services, but happy to use getdbt.com

  • GitHub repo Shynet

    Modern, privacy-friendly, and detailed web analytics that works without cookies or JS.

    Project mention: Ask HN: Who wants to be hired? (October 2021) | news.ycombinator.com | 2021-10-01

    I have a strong technical background and a passion for digital safety and privacy. Especially interested in trust & safety, privacy engineering, human-centered design, tech policy, and open source software. Looking for an internship or fellowship adjacent to trust & safety or privacy engineering.

    Some of the projects I’m most proud of are Shynet [0], PrivacySpy [1], PolitiTweet [2], and a17t [3]. I co-instruct CS 106S [4] at Stanford, and I worked on cyber policy for a 2020 presidential campaign. I also work at the Stanford Internet Observatory on both research and technical infrastructure.

    Location: NYC or SF Bay Area

    Remote: yes but ideally no

    Willing to relocate: no

    Technologies: Python, Rust, JS, Java, Kubernetes, C, OSINT, web dev, and more.

    Résumé/CV: https://docs.google.com/document/d/1wnIBWjEPmgdYXQ_EYZRU--KE...

    Email: [email protected]

    [0] https://github.com/milesmcc/shynet

    [1] https://privacyspy.org

    [2] https://polititweet.org

    [3] https://a17t.miles.land

    [4] http://cs106s.stanford.edu

  • GitHub repo pygraphistry

    PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

    Project mention: Don't Bring a Tree to a Mesh Fight | news.ycombinator.com | 2021-11-23

    It's super useful in practice!

    In the table -> hypergraph transform @ https://github.com/graphistry/pygraphistry , we do `hypergraph(multicolumn_table, direct=True | False)['graph'].plot()` , which renders hypergraphs as a regular graph, this lets you pick/. Consider exploring some logs of customer activity or security events:

    A hyperedge becomes either:

    - a node of a bipartite graph. Ex: each log event becomes a node connecting the various entity nodes it mentions (IPs, accounts, countries, ...)

    - .. or a bunch of pairwise entity<>entity edges. Ex: connect each IP<>account<>country directly, and label each edge with the hyperedge it came from.

    In both cases, you can now directly leverage a lot of traditional graph thinking, and in our case, GPU acceleration.

    Other systems might render hyperedges as say circles encomposing their nodes, but that's trickier at even small/medium scales

    I increasingly just directly equate 'logs' with 'hypergraphs' and skip the relational step :)

  • GitHub repo rotki

    A portfolio tracking, analytics, accounting and tax reporting application that protects your privacy

    Project mention: How do you track your portfolio? | reddit.com/r/CryptoCurrency | 2021-11-27

    Just found a mature OSS at Rotki that is more than adequate.

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo WALKOFF

    A flexible, easy to use, automation framework allowing users to integrate their capabilities and devices to cut through the repetitive, tedious tasks slowing them down. #nsacyber

    Project mention: Current college student here. What is it like to work for defense contractors? | reddit.com/r/cscareerquestions | 2021-11-10

    As for quirks, the biggest quirk is that you usually need to get a security clearance, and that means no drugs. As far as the tech goes, depends on what company you're working for and what government product they produce. If it's software for an otherwise physical product like a missile or an AGV, then it's probably gonna be some old stable language like C, with something like Java being used on the server side to talk to the machine. Meanwhile, there's definitely Python work sprinkled all throughout everything, and there's certainly parts of the government working on Docker or Kubernetes stuff. Like here's a completely unclassified government project that I've contributed to. It uses Docker and Yaml to automate tasks.

  • GitHub repo flask-profiler

    a flask profiler which watches endpoint calls and tries to make some analysis.

    Project mention: Profiling Flask application to improve performance | dev.to | 2021-02-28

    There are a lot of profiling tools for Python code, and most of them are built-in — like profile or cProfile. Since I’m speaking about Flask application, let’s see what the world has especially for it. There is a beautiful lib called flask-profiler, which has a web interface with some cool features such as route or date filters. But Flask also has a built-in in werkzeug's profiler. It looked awesomely easy in use, so it was the first — and the last — one I tried. To use the built-in profiler you’ll need to add only two lines of code to your project:

  • GitHub repo scikit-learn-intelex

    Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

    Project mention: Intel Extension for Scikit-Learn | news.ycombinator.com | 2021-11-01

    Looks like they are responding to https://github.com/intel/scikit-learn-intelex#-acceleration

    I completely agree. I hope some Intel competitor funds a scikit-learn developer to read this code and extract all the portable performance improvements.

  • GitHub repo WebHashcat

    Hashcat web interface

  • GitHub repo reddit-detective

    Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more

    Project mention: Facebook bans personal accounts of academics who researched misinformation, ad transparency on the social network | reddit.com/r/technology | 2021-08-04
  • GitHub repo sayn

    Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

    Project mention: Average reply times from some of my Facebook friends over the last few years [OC], full article here: https://medium.com/@timsugaipov/taking-your-facebook-messenger-data-further-f9da079b1409?source=friends_link&amp;sk=3bd04bb35ad9a4b6f586300e52f96e4f | reddit.com/r/dataisbeautiful | 2021-11-01

    Data Processing: SAYN

  • GitHub repo GSOC_org_analysis

    Welcome to my first full project!

    Project mention: Google Summer of Code Analytics | dev.to | 2021-10-13

    I used Python 3 with libraries of Selenium and BeautifulSoup4. My project does not use click feature instead it grabs the Organization ID form the internal html.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-27.

Python Analytics related posts

Index

What are some of the best open-source Analytic projects in Python? This list will help you:

Project Stars
1 Redash 19,987
2 Tautulli 4,112
3 dagster 4,026
4 dbt-core 3,780
5 Shynet 1,667
6 pygraphistry 1,486
7 rotki 1,286
8 WALKOFF 985
9 flask-profiler 669
10 scikit-learn-intelex 309
11 WebHashcat 165
12 reddit-detective 160
13 sayn 96
14 GSOC_org_analysis 0
Find remote jobs at our new job board 99remotejobs.com. There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com