youtube-discussions-grab
By ArchiveTeam
warcio
Streaming WARC/ARC library for fast web archive IO (by webrecorder)
youtube-discussions-grab | warcio | |
---|---|---|
1 | 4 | |
2 | 345 | |
- | 1.7% | |
10.0 | 2.5 | |
over 2 years ago | 9 days ago | |
Lua | Python | |
The Unlicense | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
youtube-discussions-grab
Posts with mentions or reviews of youtube-discussions-grab.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-10-04.
-
Can anyone familiar with databases of Youtube archives help me? I don't know how to find what I'm looking for. Details in post.
Some additional references: Archival Project information and details) Some code I wrote to convert YouTube's API responses into a more usable JSON format. Note that this script was created back when discussions were still live so it would need to be adjusted to work with archived WARC data instead of trying to retrieve the data from YouTube directly. The script used for the actual archival project. The interesting files would be pipeline.py and youtube.lua.
warcio
Posts with mentions or reviews of warcio.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-10-04.
-
Can anyone familiar with databases of Youtube archives help me? I don't know how to find what I'm looking for. Details in post.
What you're looking at are WARC (Web ARChive) files, which contain the raw API responses saved from YouTube. You need to parse them into usable data with something like warcio, then ingesting it into a database.
- Any very noob friendly way to extract images and videos from WARC files?
-
Help with WARC files/Extracting a portion of a full crawl
The tool you want is warcio which is available in pypi and has a command line interface as well. You can use that to extract contents, or scan through warcs, or build new warcs.
-
Is anyone working on a Yahoo Answers Archive yet, and if so where can we go to find it?
(Note: I use warcio for this, but the description above explains what is actually happening)
What are some alternatives?
When comparing youtube-discussions-grab and warcio you can also consider the following projects:
youtube-discussions-archive - EXPERIMENTAL YouTube Discussion Tab Downloader
pywb - Core Python Web Archiving Toolkit for replay and recording of web archives
at-dataproc - Tools used to process/transform ArchiveTeam WARCs
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
yahoo-answers-archiveteam-compose