How did everyone here learn how to webscrape? and have any of you made any cool projects with it?

This page summarizes the projects mentioned and recommended in the original post on /r/webscraping

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • music-downloader

    This program will first get the metadata of various songs from metadata providers like musicbrainz, and then search for download links on pages like bandcamp. Then it will download the song and edit the metadata accordingly.

  • Th biggest project I did create is a pretty powerfull music downloader which searches the whole internet for metadata as well as audio data and automatically downloads the requested songs and modifies the metadata of the mp3 files accordingly. It can even automatically search and find most songs of most underground artists and download them all with one command. It also fetches the lyrics, and for the cache I use a sql database. - I got over 50 stars on my Github repo: https://github.com/HeIIow2/music-downloader - Of course I also packaged it and uploaded it to PyPI: https://pypi.org/project/music-kraken/ - It is over 3k Lines of python code so far

  • COVID19_mobility

    COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉

  • Specifically for learning Scrapy, I recommend this book, which helped me at the starting point. https://www.amazon.com/Learning-Scrapy-Dimitrios-Kouzis-Loukas/dp/1784399787 But this, of course, is not enough, only practice and studying the experience of experienced colleagues on a real project have seriously improved my mastery. I also have a pet project, which I am not super proud of, but anyway, that has given me invaluable experience in maintaining open-source repo. https://github.com/ActiveConclusion/COVID19_mobility Despite some poor architectural decisions, I see that it's cited in some papers and used in university course labs works because sometimes students still ask me something about it. I absolutely didn't expect that, it looks sort of funny and nice to me.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts