trafilatura
Bitwarden
trafilatura | Bitwarden | |
---|---|---|
13 | 1,056 | |
2,853 | 14,371 | |
- | 1.2% | |
8.7 | 9.8 | |
2 days ago | 1 day ago | |
Python | C# | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
trafilatura
-
Trafilatura: Python tool to gather text on the Web
The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
-
Show HN: Build AI Dags with Memory; Run and Validate LLM Tools in Containers
The WebScraper tool uses Trafilatura [1] to scrape and parse HTML—nothing too fancy. "Scraping" a React site would require a totally different approach, probably something more akin to Adept's ACT-1 [2].
I run a local chat app built with Griptape and I use it to give me summaries of web pages or answer specific questions all the time :)
1. https://github.com/adbar/trafilatura/
-
Powerful and free scraper with a headless browser under the hood and Readability for parsing
I've been playing with Trafilatura lately, and it's very good. There are a few very thorough comparisons to other projects and it really shines. It doesn't do anything headless from what I can tell, but it doesn't have to do the scraping itself. Maybe an option could be to use Playwright to scrape, then Trafilatura to parse. Food for thought.
-
I made a Chrome Extension that lets you ask any question about the page you are on (bluf.ai)
Cool! If you care to explain me further... :) ... I tried parsing a page using: https://github.com/adbar/trafilatura, json stringify it and passing it to https://platform.openai.com/docs/api-reference/embeddings/create. How do I use the response as an input later? <3
-
Testing fast installation in tear-down environment
I want to test how easy it is to install a package plus special extra dependencies to run a certain script in that package: https://github.com/adbar/trafilatura
- Advice on standard design pattern for comparison test script
- Automate dependency installation
- Issue with sklearn
- Questions about some code
- How does Firefox's Reader View work?
Bitwarden
-
Ask HN: Why does Bitwarden not comment their code?
I was looking through the Bitwarden server repository (https://github.com/bitwarden/server ) and was surprised to see that no comments (xml or otherwise) were availible.
Is this normal in an entreprise setting? I thought it was standard to comment every public member (Visual Studio warnings).
- Bitwarden
- End of Life for Twilio Authy Desktop App
-
What program(s) do you use to remember passwords, including crypto?
For passwords and 2FA I use Bitwarden in combination with a self-hosted Vaultwarden service (for imcreased security and use of pro features for free).
- Ezt az üzenetet kaptam ma a Simple-től!!
-
Amazon Account with unauthorised purchases, did my google passwords get leaked
First it's good to use a password manager, however it's not a good idea to use the one built into your browser. I would suggest switching to BitWarden or similar (not LastPass).
-
Did I mess up?
I just noticed today when relogging in on Bitwarden (I couldn't sync my vault) that it said "Logged in as [email] on __$2__" instead of "Logged in as [email] on bitwarden.com". I don't know why or how that happened, and I have no idea what it means. Did I screw up somehow? Just to be clear, I did login and just after I logged in my brain realized that it said "__$2__" instead of what it should say.
-
Bitwarden Self-hosted not updating to 2023.12.0
bitwarden:~$ sudo ./bitwarden.sh updateself _ _ _ _ | |__ (_) |___ ____ _ _ __ __| | ___ _ __ | '_ \| | __\ \ /\ / / _` | '__/ _` |/ _ \ '_ \ | |_) | | |_ \ V V / (_| | | | (_| | __/ | | | |_.__/|_|\__| \_/\_/ \__,_|_| \__,_|\___|_| |_| Open source password management solutions Copyright 2015-2023, 8bit Solutions LLC https://bitwarden.com, https://github.com/bitwarden =================================================== bitwarden.sh version 2023.10.3 Docker version 24.0.7, build afdd53b Docker Compose version v2.21.0 Updated self. bitwarden:~$ sudo ./bitwarden.sh update _ _ _ _ | |__ (_) |___ ____ _ _ __ __| | ___ _ __ | '_ \| | __\ \ /\ / / _` | '__/ _` |/ _ \ '_ \ | |_) | | |_ \ V V / (_| | | | (_| | __/ | | | |_.__/|_|\__| \_/\_/ \__,_|_| \__,_|\___|_| |_| Open source password management solutions Copyright 2015-2023, 8bit Solutions LLC https://bitwarden.com, https://github.com/bitwarden =================================================== bitwarden.sh version 2023.10.3 Docker version 24.0.7, build afdd53b Docker Compose version v2.21.0 Update not needed bitwarden:~$
-
⟳ 0 apps added, 1 updated at mobileapp.bitwarden.com
Bitwarden (version 8588): A secure and free password manager for all of your devices.
-
What are some dangers that can happen if I’ve chosen not to enable 2fa on certain accounts
I would also recommend the use of a password manager such as Proton Pass, BitWarden or 1Password if your looking for a more premium solution.
What are some alternatives?
newspaper - newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
vaultwarden - Unofficial Bitwarden compatible server written in Rust, formerly known as bitwarden_rs
python-goose - Html Content / Article Extractor, web scrapping lib in Python
Passbolt - Passbolt Community Edition (CE) API. The JSON API for the open source password manager for teams!
TWINT - An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
sysPass - Systems Password Manager
html2text - Convert HTML to Markdown-formatted text.
Teampass - Collaborative Passwords Manager
Goose3 - A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
Padloc - A modern, open source password manager for individuals and teams.
textract - extract text from any document. no muss. no fuss.
bitwarden_rs - Unofficial Bitwarden compatible server written in Rust, formerly known as bitwarden_rs [Moved to: https://github.com/dani-garcia/vaultwarden]