RedditExtractor
disk.frame
Our great sponsors
RedditExtractor | disk.frame | |
---|---|---|
5 | 5 | |
82 | 592 | |
- | 0.5% | |
3.3 | 0.0 | |
8 months ago | 3 months ago | |
R | R | |
GNU General Public License v3.0 only | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
RedditExtractor
-
Will RedditExtractoR be impacted by API changes?
IIRC RedditExtractor doesn't use OAuth2 so I think the 10reqs/min ratelimit will be applied to the library/client.
-
bulk subreddit datasets?
Sounds like this package might help you reach your objective. The timeframe you can capture will depend on the amount of activity within the subreddit of interest.
- Has anyone here used the Reddit API before (in R)?
-
Using RedditExtractoR to scrape flairs?
My apologies if the Reddit API flair is inappropriate here - RedditExtractoR does use the Reddit API, but it's technically distinct as a simplified package for R (see: https://github.com/ivan-rivera/RedditExtractor)
-
H3 Podcast YouTube Views Analysis
Great idea, yeah Reddit has an API too, and it looks like there are R & Python packages to access it - https://github.com/ivan-rivera/RedditExtractor
disk.frame
-
Do you code from memory? Or do you reference things?
Say hello to disk.frame.
- How can I read in only two columns from a massive 10+ GB tab file?
-
Data cleaning/ analysis 100-200 million rows of data. Is this doable in R, or is there another program I should try instead?
It depends on your hardware, but it should not be a problem. You might look into disk frame (https://diskframe.com) or similar packages.
-
is it possible to have my enviroment objects and work with them on my local drive instead of RAM?
If that doesn't work, the disk.frame package might help. It is new-ish and not common, but does seem to work with data on disk rather than in memory
-
We Test PCIe 4.0 Storage: The AnandTech 2021 SSD Benchmark Suite
> The speeds were just stunning to say the least at 15GB/s.
That is amazing. That is around DDR4-1866 speeds, and not far from DDR4-2666 (~21 GB/s). At those speeds I would happily work with dataframes sitting on the disk rather than in memory [1, 2]. Did you benchmark RAID 0 with less than four disks?
[1] R: https://github.com/xiaodaigh/disk.frame
What are some alternatives?
Pushshift API - Pushshift API
db-benchmark - reproducible benchmark of database-like ops
police-settlements - A FiveThirtyEight/The Marshall Project effort to collect comprehensive data on police misconduct settlements from 2010-19.
drake - An R-focused pipeline toolkit for reproducibility and high-performance computing
reddit-awards-data - Dataset and visualizations of the most popular Reddit Awards, using the PRAW API.
Rcrawler - An R web crawler and scraper
r4ds - R for data science: a book
tuber - :sweet_potato: Access YouTube from R
awesome-R - A curated list of awesome R packages, frameworks and software.
polite - Be nice on the web
opentripplanner - An R package to set up and use OpenTripPlanner (OTP) as a local or remote multimodal trip planner.