scrapeghost
the-algorithm-ml
scrapeghost | the-algorithm-ml | |
---|---|---|
10 | 36 | |
1,396 | 9,881 | |
- | 0.2% | |
8.2 | 10.0 | |
5 months ago | 7 months ago | |
Python | Python | |
GNU General Public License v3.0 or later | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scrapeghost
-
Those of you who have developed product features using GPT4 API (or failed to do so), how did it go?
Not my project but an ex-colleague has been having some success in this direction: https://jamesturk.github.io/scrapeghost/
-
What are the best tools for web scraping and analysis of natural language to populate a dataset?
Yes, there is something like that available - ScrapeGhost.
- FLaNK Stack Weekly 3 April 2023
- Scraping Websites Using GPT
-
@TwitterDev Announces New Twitter API Tiers
With AI scraping, tools can be far more resilient than soon enough to minor dom changes. See - https://jamesturk.github.io/scrapeghost/.
-
Experimental library for scraping websites using OpenAI's GPT API
Their ToS mentions scraping but it pertains to scraping their frontend instead of using their API, which they don't want you to do.
Also - this library requests the HTML by itself [0] and ships it as a prompt but with preset system messages as the instruction [1].
[0] - https://github.com/jamesturk/scrapeghost/blob/main/src/scrap...
[1] - https://github.com/jamesturk/scrapeghost/blob/main/src/scrap...
- scrapeghost. Web scrape using gpt-4 (experimental)
the-algorithm-ml
-
Scammers posing as customer service agents on X as companies leave platform
I said “parts of the recommender system code.”
This is the kind of highly emotional reaction that’s not helpful.
Yes, I am quite familiar with building ML models, both training and building my own for which I’ve been paid large sums of money, and I’m here to tell you that you don’t know what you’re taking about.
There’s so much more information about an ML system than just the trained model that is important for understanding the effects of the system on a society, and its legal, ethical, and social ramifications.
Just seeing the type of RS being used, the ranking approach, and the information on SimClusters is enough for RAI folks to start to understand the ecosystem effects and how that can show up downstream in social effects.
https://blog.twitter.com/engineering/en_us/topics/open-sourc...
- Twitter's Recommendation Algorithm
-
AOC said Elon Musk put his 'finger on the scale' during Turkey's presidential election and is 'concerned' it will set a precedent for the 2024 US election
Blog summarising the change: https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm
-
Discussion Thread
People who don't share your interests (or at least what Twitter thinks your interests are). This blog post explains it in detail.
-
Twitter's For You Recommendation Algorithm
Twitter's announcement | Main GitHub Repo | ML GitHub Repo | Engineering Blog Post
- FLaNK Stack Weekly 3 April 2023
-
New York Times says it won't pay for Twitter verified check mark
where? I searched through the repo and couldn't find it.
- Analysis of Twitter algorithm code reveals social medium down-ranks tweets about Ukraine
What are some alternatives?
autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python
the-algorithm
tmx-solver - ThreatMetrix (anti-bot/fraud-detection) solver, deobfuscator & data harvester
Finagle - A fault tolerant, protocol-agnostic RPC system
wikipedia_ql - Query language for efficient data extraction from Wikipedia
cointop - A fast and lightweight interactive terminal based UI application for tracking cryptocurrencies 🚀
Bandwhich - Terminal bandwidth utilization tool
ctop - Top-like interface for container metrics
bpytop - Linux/OSX/FreeBSD resource monitor
Apollo-11 - Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.
exiftool - ExifTool meta information reader/writer
jdupes - A powerful duplicate file finder and an enhanced fork of 'fdupes'.