trafilatura
PhotoPrism
trafilatura | PhotoPrism | |
---|---|---|
13 | 510 | |
2,853 | 32,687 | |
- | 1.6% | |
8.7 | 9.9 | |
2 days ago | 2 days ago | |
Python | Go | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
trafilatura
-
Trafilatura: Python tool to gather text on the Web
The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
-
Show HN: Build AI Dags with Memory; Run and Validate LLM Tools in Containers
The WebScraper tool uses Trafilatura [1] to scrape and parse HTML—nothing too fancy. "Scraping" a React site would require a totally different approach, probably something more akin to Adept's ACT-1 [2].
I run a local chat app built with Griptape and I use it to give me summaries of web pages or answer specific questions all the time :)
1. https://github.com/adbar/trafilatura/
-
Powerful and free scraper with a headless browser under the hood and Readability for parsing
I've been playing with Trafilatura lately, and it's very good. There are a few very thorough comparisons to other projects and it really shines. It doesn't do anything headless from what I can tell, but it doesn't have to do the scraping itself. Maybe an option could be to use Playwright to scrape, then Trafilatura to parse. Food for thought.
-
I made a Chrome Extension that lets you ask any question about the page you are on (bluf.ai)
Cool! If you care to explain me further... :) ... I tried parsing a page using: https://github.com/adbar/trafilatura, json stringify it and passing it to https://platform.openai.com/docs/api-reference/embeddings/create. How do I use the response as an input later? <3
-
Testing fast installation in tear-down environment
I want to test how easy it is to install a package plus special extra dependencies to run a certain script in that package: https://github.com/adbar/trafilatura
- Advice on standard design pattern for comparison test script
- Automate dependency installation
- Issue with sklearn
- Questions about some code
- How does Firefox's Reader View work?
PhotoPrism
-
Show HN: Memories, FOSS Google Photos alternative built for high performance
I have been using https://www.photoprism.app for a couple of years, and it works better than expected, with the latest updates it's actually quite fast and the face tagging works reasonably well.
-
Ente: Open-Source, E2E Encrypted, Google Photos Alternative
For self-hosting, there's Photoprism[1] as well.
Ente's strength lies in end-to-end encryption[2] and its cloud[3] offering so you don't have to worry about reliability.
So if self-hosting is what you're after, Immich, Photoprism and Damselfly (TIL!) are perhaps better designed to serve your needs.
[1]: https://github.com/photoprism/photoprism
[2]: https://ente.io/architecture
[3]: https://ente.io/reliability
-
Switching to Android Was Easy
For quite a while I'm also in search for a solution which allows me to share galleries with my family, without having to ask them to jump through hoops in order to access them.
After some searching I'm now testing photoprism [1] which is a fantastic application, especially for self-hosting of photos. There's no mobile app for it (yet) and user-management is just starting to get implemented, but it shows alot of promise. Unfortunately not yet enough for putting it on the tablet of my granny but one can hope (and donate!)
Either way, I'm afraid that building a good mobile gallery app is an equally large task, after all the best solution would be to replace the users' native gallery-app with an equivalent that also supports custom Online-Galleries...
[1]: https://www.photoprism.app/
-
I write HTTP services in Go after 13 years (Mat Ryer, 2024)
out of curiosity, why no sort-of-established pkg and internal dirs? What do you think of https://github.com/photoprism/photoprism structure?
-
Escaping Surveillance Capitalism, at Scale
Thank you!
Ente was first a piece of hardware, then a self-host-able project, but we had a hard time monetizing both, which lead to the E2EE pivot.
TIL about TagSpaces, thanks!
Our server can be open-sourced, but we're unsure of the value E2EE will provide, with services like Photoprism[1] and Immich[2] already doing a good job of serving customers who prefer to self host. In this context E2EE might become a constraint, rather than a feature.
[1]: https://github.com/photoprism/photoprism
[2]: https://github.com/immich-app/immich
-
Google Photos alternative with OCR
Ive seen github issues like this one https://github.com/photoprism/photoprism/issues/907 in which it is implied that this is very very difficult.
- New Release 231128-f48ff16ef ⚙️🌈
-
Photo gallery frontend with encryption and search
Hi. I want to implement an image server similar to Photoprism using ImageAI to tag images based on objects and context. However I don't want to spend to much time working on the frontend, at first I were thinking about using Danbooru and use Flexbooru or the web interface on my phone. But it doesn't have any encryption or password protection (since the purpose of it is to be used as a public image board).
-
Suche Fotoverwaltungssoftware
https://www.photoprism.app in Docker.
-
Ask HN: How do you manage photos, philosophically?
PhotoPrism[0] and some ugly plumbing[1] to semantically tag all images in the gallery.
0: https://github.com/photoprism/photoprism
What are some alternatives?
newspaper - newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Piwigo - Manage your photos with Piwigo, a full featured open source photo gallery application for the web. Star us on Github! More than 200 plugins and themes available. Join us and contribute!
python-goose - Html Content / Article Extractor, web scrapping lib in Python
immich - High performance self-hosted photo and video management solution.
TWINT - An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
librephotos - A self-hosted open source photo management service. This is the repository of the backend.
html2text - Convert HTML to Markdown-formatted text.
Lychee - A great looking and easy-to-use photo-management-system you can run on your server, to manage and share photos.
Goose3 - A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
Photonix - A modern, web-based photo management server. Run it on your home server and it will let you find the right photo from your collection on any device. Smart filtering is made possible by object recognition, face recognition, location awareness, color analysis and other ML algorithms.
textract - extract text from any document. no muss. no fuss.
Photoview - Photo gallery for self-hosted personal servers [Moved to: https://github.com/photoview/photoview]