Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 6 Python scientific-paper Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: Lets-Plot: An open-source plotting library by JetBrains | news.ycombinator.com | 2023-07-15This seems quite similar to plotnine [0], which also provides a grammar of graphics interface for Python. That said, I love ggplot and I can't wait to use this in my research! I hope we can port/re-implement ggthemes, scientificplots [1], and other ggplot libraries for lets-plot.
Project mention: Oracle of Zotero: LLM QA of Your Research Library | news.ycombinator.com | 2023-11-26Nice project!
I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.
paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.
paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).
While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.
Project mention: Show HN: Open-source Rule-based PDF parser for RAG | news.ycombinator.com | 2024-01-23
Project mention: [P] abstracts-search: A semantic search engine indexing 95 million academic publications | /r/MachineLearning | 2023-05-15I'm releasing the entire project as open code and open data. All ~600 lines of Python, 69 GB in embeddings, and raw faiss index can be found through https://github.com/colonelwatch/abstracts-search
Python scientific-papers related posts
- Oracle of Zotero: LLM QA of Your Research Library
- [P] Parse research papers into structured data
- Parse research papers into a structured dataset
- ETL for medical and scientific papers
- ETL for medical and scientific papers
- ETL for medical and scientific papers
- Show HN: ETL for Medical and Scientific Papers
-
A note from our sponsor - InfluxDB
www.influxdata.com | 19 Apr 2024
Index
What are some of the best open-source scientific-paper projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | SciencePlots | 6,407 |
2 | scibert | 1,402 |
3 | paperai | 1,189 |
4 | paperetl | 315 |
5 | findpapers | 177 |
6 | abstracts-search | 66 |