Python Research

Open-source Python projects categorized as Research

Top 23 Python Research Projects

  1. khoj

    Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

    Project mention: Top 13 Self-Hosted Projects with the Most GitHub Stars | dev.to | 2024-09-10

    GitHub https://github.com/khoj-ai/khoj GitHub Star 12.4k GitHub Fork 627 GitHub Issue 64 GitHub Pull Request 3 GitHub Contributor 35 Open Source License AGPL-3.0 Official Website https://khoj.dev/ Documentation https://docs.khoj.dev/

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. qlib

    Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.

  4. UI-TARS

    Project mention: GitHub – ByteDance/UI-Tars | news.ycombinator.com | 2025-05-12
  5. software-papers

    šŸ“š A curated list of papers for Software Engineers

    Project mention: Papers for Software Engineers | news.ycombinator.com | 2025-01-21
  6. RD-Agent

    Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool R&D-Agent, which lets AI drive data-driven AI.

    Project mention: RD-Agent: LLM-based autonomous evolving agents for industrial data-driven R&D | news.ycombinator.com | 2024-09-25
  7. mlfinlab

    MlFinLab helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools.

  8. acme

    A library of reinforcement learning components and agents

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. scenic

    Scenic: A Jax Library for Computer Vision Research and Beyond (by google-research)

  11. catalyst

    Accelerated deep learning R&D (by catalyst-team)

  12. lingvo

    Lingvo

  13. habitat-lab

    A modular high-level library to train embodied AI agents across a variety of tasks and environments.

  14. diamond

    DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.

    Project mention: World Emulation via DNN | news.ycombinator.com | 2025-04-26

    I think https://diamond-wm.github.io is a reasonable place to start (they have public world-model training code, and people have successfully adapted their codebase to other games e.g. https://derewah.dev/projects/ai-mariokart). Most modern world models are essentially image generators with additional inputs (past-frames + controls) added on, so understanding how Diffusion/IADB/Flow Matching work would definitely help.

  15. Papers-in-100-Lines-of-Code

    Implementation of papers in 100 lines of code.

  16. music_source_separation

  17. yacs

    YACS -- Yet Another Configuration System (by rbgirshick)

  18. SALMONN

    SALMONN: Speech Audio Language Music Open Neural Network

  19. PyGame-Learning-Environment

    PyGame Learning Environment (PLE) -- Reinforcement Learning Environment in Python.

  20. dreamerv2

    Mastering Atari with Discrete World Models

  21. iris

    Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%. (by eloialonso)

  22. Mava

    🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX

  23. tldw

    tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer' (Open Source NotebookLM)

    Project mention: Show HN: Morphik – Open-source RAG that understands PDF images, runs locally | news.ycombinator.com | 2025-04-22

    Hey yes, I’m building exactly that.

    https://github.com/rmusser01/tldw

    I first built a POC in gradio and am now rebuilding it as a FastAPI app. The media processing endpoints work but I’m still tweaking media ingestion to allow for syncing to clients(idea is to allow for client-first design).

  24. pybossa

    PYBOSSA is the ultimate crowdsourcing framework (aka microtasking) to analyze or enrich data that can't be processed by machines alone.

  25. pdf-to-podcast

    Convert any PDF into a podcast episode!

    Project mention: Promptic – the "requests" of LLM app development | news.ycombinator.com | 2024-11-26

    Thanks for the kind words! I'm a fan of magentic :) One of the projects I've built with promptic is https://pdf-to-podcast.com

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Research discussion

Log in or Post with

Python Research related posts

Index

What are some of the best open-source Research projects in Python? This list will help you:

# Project Stars
1 khoj 30,051
2 qlib 19,206
3 UI-TARS 5,950
4 software-papers 5,860
5 RD-Agent 4,348
6 mlfinlab 4,165
7 acme 3,676
8 scenic 3,538
9 catalyst 3,336
10 lingvo 2,839
11 habitat-lab 2,341
12 diamond 1,817
13 Papers-in-100-Lines-of-Code 1,505
14 music_source_separation 1,329
15 yacs 1,311
16 SALMONN 1,227
17 PyGame-Learning-Environment 1,034
18 dreamerv2 929
19 iris 839
20 Mava 794
21 tldw 790
22 pybossa 754
23 pdf-to-podcast 751

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?