datashare
aleph
datashare | aleph | |
---|---|---|
4 | 4 | |
545 | 1,947 | |
2.0% | 0.5% | |
9.8 | 9.4 | |
3 days ago | 8 days ago | |
Java | JavaScript | |
GNU Affero General Public License v3.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datashare
-
Leaks! how to organize them?
Datashare just came out, with full-text index NLP etc https://github.com/ICIJ/datashare Enjoy
-
Journalism/research/information linking/graph db
If you are doing investigative journalism, there is the OCCRP's Aleph for working with big amounts of data and connecting entities https://aleph.occrp.org/ . It's open source and self-hostable. Also for big messy dataset, the ICIJ's Datashare might be interesting.
-
Ask HN: What you up to? (Who doesn't want to be hired?)
I have slowly collected a large (several million file) ebook library from open directories over the past few years. I am now trying to set up a search solution for it.
Recoll doesn't seem to work well headless, so I am taking a look at: https://github.com/ICIJ/datashare
which claims to be able to do some distributed indexing.
-
Datashare a tool to better search files
There's a github too, https://github.com/ICIJ/datashare
aleph
- 🥪 Best Sites For ebooks, articles, research papers etc..🥪
-
What are Investigative Journalist Tools?
If you're talking about tools for data, it really depends on what kind of data you're looking at and why you have it. For exploring datasets that you don't know what is contained inside (like data leaks) there's OCCRP's Aleph tool: https://github.com/alephdata/aleph
-
document management system with good search match highlighting?
https://github.com/paperless-ngx/paperless-ngx https://github.com/alephdata/aleph
-
Journalism/research/information linking/graph db
If you are doing investigative journalism, there is the OCCRP's Aleph for working with big amounts of data and connecting entities https://aleph.occrp.org/ . It's open source and self-hostable. Also for big messy dataset, the ICIJ's Datashare might be interesting.
What are some alternatives?
peterburk - Github page [Moved to: https://github.com/peterburk/peterburk.github.io]
open-semantic-search - Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
resholve - a shell resolver? :) (find and resolve shell script dependencies)
obsidian-releases - Community plugins list, theme list, and releases of Obsidian.
mpack - MPack - A C encoder/decoder for the MessagePack serialization format / msgpack.org[C]
grobid - A machine learning software for extracting information from scholarly documents
PropertyWebBuilder - Create a fully featured real estate website on Rails in minutes! ⛺
Tenma - Comic book server with in-browser reader
create-rust-app - Set up a modern rust+react web app by running one command.
Trilium Notes - Build your personal knowledge base with Trilium Notes
py_regular_expressions - Learn Python Regular Expressions step by step from beginner to advanced levels
paperless-ngx - A community-supported supercharged version of paperless: scan, index and archive all your physical documents