|3 months ago||10 days ago|
|Mozilla Public License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
We haven't tracked posts mentioning Geziyor yet.
Tracking mentions began in Dec 2020.
Simple tool crawl urls form domain
5 projects | dev.to | 30 Jul 2022
cUrls is a simple tool crawl urls from domain using colly library.
Scrape and automate with golang
2 projects | reddit.com/r/golang | 11 Jun 2022
Beautiful Soup: We called him Tortoise because he taught us
7 projects | news.ycombinator.com | 8 Jun 2022
First project - The API for engineering multiple choice questions
2 projects | reddit.com/r/golang | 8 May 2022
There is the scraper struct that spawns goroutine. Then, each goroutine scraps the web URL using go colly, and the structured MCQ questions are served from API endpoints. Users can choose between persistent Postgres model or in-memory data structure to store the questions and retrieve them from API. I have used dependency injection to inject the models via the interface to my application struct.
How I Scraped Michelin Guide Using Go
2 projects | dev.to | 21 Mar 2022
What follows is my thought process on how I collect all restaurant details from the Michelin Guide using Go with Colly. The final dataset is available free to be downloaded here.
Show HN: The Brutalist Report – A rolling snapshot of the day’s headlines
5 projects | news.ycombinator.com | 22 Feb 2022
The whole thing is written in Go on my end. Ingesting new headlines is handled in a goroutine that spawns within the process every 30 mins using a combo of the wonderful gofeed (https://github.com/mmcdole/gofeed) and colly (https://github.com/gocolly/colly) libraries.
When loading the front page, you're loading a 1-minute-cached HTML page of it that was constructed out of headlines already in my PostgreSQL database that were put there by the ingestion goroutine.
I like the idea of word clouds actually, I think you're on to something there. I think you just need to pre-generate them rather than doing it adhoc (if that's what you're doing here) for speed. Additionally, perhaps consider using sentiment in a way that orients stories based on positive and negative sentiment. Right now I am not seeing how I as a visitor/user can act on the sentiment analysis as it is presented now.
It would be neat to see a collection of uplifting stories grouped together through the sentiment analysis.
Anyway, food for thought. I hope you keep hacking away on it as it's just good fun to build things.
Web Crawling Libraries which allow me to view the network requests made from a site?
1 project | reddit.com/r/golang | 16 Dec 2021
I've been looking at various scraping/crawling libraries like colly, but I'm not exactly sure if any of them can handle a specific use case I have.
Colly Framework ile Go dilinde Web Scraping nasıl yapılır?
2 projects | dev.to | 21 Nov 2021
Problem trying to insert data received from Colly web scraper lib into DB via pgx library
2 projects | reddit.com/r/golang | 31 Oct 2021
In the process of trying to get better with Go, I have been trying to create a program that will scrape a subreddit and then save some info into a Postgres table. In this case, I decided to use r/frugalmalefashion and save the title and flair of recent posts into the table. I am using Colly to scrape data, and pgx as the database driver.
The State of Web Scraping in 2021
9 projects | news.ycombinator.com | 11 Oct 2021
If you're familiar with Go, there's Colly too . I liked its simplicity and approach and even wrote a little wrapper around it to run it via Docker and a config file:
What are some alternatives?
GoQuery - A little like that j-thing, only in Go.
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
xpath - XPath package for Golang, supports HTML, XML, JSON document query.
rod - A Devtools driver for web automation and scraping
Ferret - Declarative web scraping
Pholcus - Pholcus is a distributed high-concurrency crawler software written in pure golang
Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
google-search-results-golang - Google Search Results GoLang API
Dataflow kit - Extract structured data from web sites. Web sites scraping.
jsonrpconion - Library for building JSON RPC services on Tor network