covid19
paperetl
covid19 | paperetl | |
---|---|---|
1 | 12 | |
6 | 316 | |
- | 4.4% | |
0.0 | 6.3 | |
10 months ago | 5 months ago | |
Vue | Python | |
GNU General Public License v3.0 only | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
covid19
-
Open Source Super Fast Covid19 Parser and Dashboard
Fully customizable for country website for more info look Contribution guideline
paperetl
- Show HN: Open-source Rule-based PDF parser for RAG
-
Oracle of Zotero: LLM QA of Your Research Library
Nice project!
I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.
paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.
paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).
While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.
-
[P] Parse research papers into structured data
paperai | paperetl
- Parse research papers into a structured dataset
- ETL for medical and scientific papers
- Show HN: ETL for Medical and Scientific Papers
-
Seeking Advice: How to extract Abstract from scientific journals (.pdfs) 10k+.
paperai and paperetl are a set of projects to consider for this task.
- paperetl: ETL processes for medical and scientific papers
What are some alternatives?
dashboard - The Rancher UI
SciencePlots - Matplotlib styles for scientific plotting
tika-python - Tika-Python is a Python binding to the Apache Tikaβ’ REST services allowing Tika to be called natively in the Python community.
ciscoconfparse - Parse, Audit, Query, Build, and Modify Cisco IOS-style configurations.
paperai - π π€ Semantic search and workflows for medical/scientific papers
rdm - Our regulatory documentation manager. Streamlines 62304, 14971, and 510(k) documentation for software projects.
dagster - An orchestration platform for the development, production, and observation of data assets.
science-parse - Science Parse parses scientific papers (in PDF form) and returns them in structured form.
llmsherpa - Developer APIs to Accelerate LLM Projects
grobid - A machine learning software for extracting information from scholarly documents