trafilatura
Tautulli
trafilatura | Tautulli | |
---|---|---|
13 | 419 | |
2,853 | 5,371 | |
- | 1.1% | |
8.7 | 8.3 | |
2 days ago | 4 days ago | |
Python | Python | |
Apache License 2.0 | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
trafilatura
-
Trafilatura: Python tool to gather text on the Web
The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
-
Show HN: Build AI Dags with Memory; Run and Validate LLM Tools in Containers
The WebScraper tool uses Trafilatura [1] to scrape and parse HTML—nothing too fancy. "Scraping" a React site would require a totally different approach, probably something more akin to Adept's ACT-1 [2].
I run a local chat app built with Griptape and I use it to give me summaries of web pages or answer specific questions all the time :)
1. https://github.com/adbar/trafilatura/
-
Powerful and free scraper with a headless browser under the hood and Readability for parsing
I've been playing with Trafilatura lately, and it's very good. There are a few very thorough comparisons to other projects and it really shines. It doesn't do anything headless from what I can tell, but it doesn't have to do the scraping itself. Maybe an option could be to use Playwright to scrape, then Trafilatura to parse. Food for thought.
-
I made a Chrome Extension that lets you ask any question about the page you are on (bluf.ai)
Cool! If you care to explain me further... :) ... I tried parsing a page using: https://github.com/adbar/trafilatura, json stringify it and passing it to https://platform.openai.com/docs/api-reference/embeddings/create. How do I use the response as an input later? <3
-
Testing fast installation in tear-down environment
I want to test how easy it is to install a package plus special extra dependencies to run a certain script in that package: https://github.com/adbar/trafilatura
- Advice on standard design pattern for comparison test script
- Automate dependency installation
- Issue with sklearn
- Questions about some code
- How does Firefox's Reader View work?
Tautulli
-
I'm fine with the basics of Plex - now what can I do to really use plex to it's full potential?
With Tautulli you have a better monitoring system than what Plex offers. Streaming history split by user, you can add notifications to a lot of services like Slack, email and so on. You can even create newsletters being sent out to users based on what was added to your server.
-
My Overkill Home Network – Complete Details 2023
> How hard is this to configure?
Not at all. Just ensure that you have WoL enabled on the host machine and than proceed to send a magic packet. You could even do this with Home Assistant [1] if you are into that. I did this with a script that used tcpdump to monitor for incoming traffic [2] for Plex with an additional (dummy) Plex server on the Pi. I also remember faintly that I had to add 1 library and 1 video file to make this work though.
Powering down - or sleep - is a bit harder. I built a 'Sleep on LAN' app [3] for myself years ago that could power down (or sleep) a system on demand using a REST API. I used this and Tautulli [3] with Home Assistant that would check if there were any active streams and if there wasn't any activity for a specified amount of time I would send a SoL request to my service.
As you can see it isn't super hard or complicated, but a bit cumbersome to find all the moving bits and make it work. But when it does, it's IMHO fantastic.
1. https://www.home-assistant.io/integrations/wake_on_lan/
2. https://gist.github.com/alex3305/8cc73ddd2c8ca6328f20235480a...
2. https://github.com/alex3305/sleep-on-lan
3. https://tautulli.com/
-
Can I copy the metadata from one show to another?
In that case then I don't think there will be an easy way to transfer the metadata unfortunately (outside of some user-created script). One option that would at least help with the manual re-entry would be to use Tautulli's export feature or WebTools-NG's ExportTools to create a spreadsheet that has all the information in one place, and should be easier to copy/paste.
- Finding episodes with Multiple Languages
-
Plex GPU transcoding on unRaid
Also, take a look at Tautulli, tautulli.com. Its dashboard has more info than Plex. It shows transcode speed (1.0 = real time) and if subtitles are burning/transcoding/etc.
-
Best programs to use alongside Plex?
Tautulli for monitoring and notifications, plus some scripts for "maintenance," such as killing 4K transcodes and stopping remote streams after they've been paused for X minutes. These are from the JBOPS repository.
-
Plex health check
Tautulli. There's a feature where you can trigger notifications from events on your server, not just when it's down. I'm not sure there's a function to get your server to auto restart though.
-
Why is it transcoding to SDR ?
Looks like Tautulli's mobile companion app.
-
Import Spreadsheet of Metadata
The data Tautulli exports is great if you want to throw it in Excel to analyze your library, or as a backup of all the content you have on your server, but isn't really meant to be imported at a later date (the exporter guide. You should instead follow the Move an Install to Another System guide to move your server data to a new machine.
-
Plex Add-ons for new Plex Setup
I enjoy https://tautulli.com/ for looking at my server's stats
What are some alternatives?
newspaper - newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
discord-rich-presence-plex - Displays your Plex status on Discord using Rich Presence
python-goose - Html Content / Article Extractor, web scrapping lib in Python
Ombi - Want a Movie or TV Show on Plex/Emby/Jellyfin? Use Ombi!
TWINT - An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
overseerr - Request management and media discovery tool for the Plex ecosystem
html2text - Convert HTML to Markdown-formatted text.
Plex-scripts - Plex, the arr's and tautulli scripts coming from user requests
Goose3 - A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
Plex-Trakt-Scrobbler - Add what you are watching on Plex to trakt.tv
textract - extract text from any document. no muss. no fuss.
Varken - Standalone application to aggregate data from the Plex ecosystem into InfluxDB using Grafana as a frontend