youtube-discussions-grab
By ArchiveTeam
at-dataproc
Tools used to process/transform ArchiveTeam WARCs (by signalhunter)
youtube-discussions-grab | at-dataproc | |
---|---|---|
1 | 2 | |
2 | 1 | |
- | - | |
10.0 | 10.0 | |
over 2 years ago | over 1 year ago | |
Lua | Python | |
The Unlicense | - |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
youtube-discussions-grab
Posts with mentions or reviews of youtube-discussions-grab.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-10-04.
-
Can anyone familiar with databases of Youtube archives help me? I don't know how to find what I'm looking for. Details in post.
Some additional references: Archival Project information and details) Some code I wrote to convert YouTube's API responses into a more usable JSON format. Note that this script was created back when discussions were still live so it would need to be adjusted to work with archived WARC data instead of trying to retrieve the data from YouTube directly. The script used for the actual archival project. The interesting files would be pipeline.py and youtube.lua.
at-dataproc
Posts with mentions or reviews of at-dataproc.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-10-04.
-
YouTube Discussions Tab dataset (245.3 million comments)
I've been processing ArchiveTeam's YouTube discussions dataset into something more workable than the unwieldy raw JSON responses saved from YouTube, and I would like to share it to anyone who's interested in the data. This all started when a reddit user asked if their channel's discussion tab was saved, and I challenged myself into processing this dataset for fun. Here's some code that I wrote for this, if anyone is curious.
-
Can anyone familiar with databases of Youtube archives help me? I don't know how to find what I'm looking for. Details in post.
Just a quick update: I'm currently processing all of the WARCs from the ArchiveTeam project, which will take around ~2 days at current transfer rates from the Internet Archive (which is notoriously slow). I wrote my own software to do this, which is available here if you to check it out.
What are some alternatives?
When comparing youtube-discussions-grab and at-dataproc you can also consider the following projects:
warcio - Streaming WARC/ARC library for fast web archive IO
youtube-discussions-archive - EXPERIMENTAL YouTube Discussion Tab Downloader