Scrapy
seaborn
| Scrapy | seaborn | |
|---|---|---|
| 197 | 83 | |
| 62,120 | 13,901 | |
| 0.9% | 0.5% | |
| 9.5 | 4.1 | |
| about 21 hours ago | 4 months ago | |
| Python | Python | |
| BSD 3-clause "New" or "Revised" License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Scrapy
-
How to write and publish a Python package to PyPI
This guide walks through the full process using uv, a fast, modern Python toolchain that replaces pip, virtualenv, pip-tools, twine, and build with a single tool. We will write a reusable Scrapy download handler, structure it as a proper Python package, test it, and publish it to PyPI.
-
How to tell if a page uses JavaScript rendering (and what to do about it)
In Scrapy, Zyte API integrates via the scrapy-zyte-api package:
-
How to Use rs-trafilatura with Scrapy
Scrapy is the standard Python framework for web scraping. It handles crawling, scheduling, and data pipelines. rs-trafilatura plugs into Scrapy as an item pipeline — your spider yields items with HTML, and the pipeline adds structured extraction results automatically.
-
Best Python Web Scraping Libraries 2026
Official Documentation: Scrapy Project
-
Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples)
User-Agent: Scrapy/2.11.0 (+https://scrapy.org)
-
Progress Updates on Contribution to Scrapy
Last week, I worked on code refactoring in Scrapy, which is an essential practice in larger and more complex projects. Refactoring not only improves code maintainability but also makes it easier for other contributors to understand and extend the project. This task was a good starting point for me to verify that I had the Scrapy project correctly set up locally, as refactoring of codes should not break existing functionalities.
-
Contributing to Larger Open Source Project - Scrapy
In the past three months, I worked on various open source projects, including my own project Repo Context Packager, Math Worksheet Generator and Open Web Calendar. This month, I want to challenge myself to work on a larger and more widely used project - Scrapy, a Python module for web crawling.
-
How I Block All 26M of Your Curl Requests
What I have seen it is hard to tell what "serious scrapers" use. They use many things. Some use this, some not. This is what I have learned reading webscraping on reddit. Nobody speaks things like that out loud.
There are many tools, see links below
Personally I think that running selenium can be a bottle neck, as it does not play nice, sometimes processes break, even system sometimes requires restart because of things blocked, can be memory hog, etc. etc. That is my experience.
To be able to scale I think you have to have your own implementation. Serious scrapers complain about people using selenium, or derivatives as noobs, who will come back asking why page X does not work in scraping mechanisms.
https://github.com/lexiforest/curl_cffi
https://github.com/encode/httpx
https://github.com/scrapy/scrapy
https://github.com/apify/crawlee
- Scrapy needs to have sane defaults that do no harm
-
Top 10 Tools for Efficient Web Scraping in 2025
Scrapy is a robust and scalable open-source web crawling framework. It is highly efficient for large-scale projects and supports asynchronous scraping.
seaborn
-
How I Hacked Uber’s Hidden API to Download 4379 Rides
Below are the key insights. If you want to see the Python code I used to do this analysis and generate the charts using Seaborn, you can find my full analysis Jupyter notebook on my Github repo here: Tip Analysis.ipynb
-
1MinDocker #6 - Building further
seaborn
-
Scientific Visualization: Python and Matplotlib, by Nicolas Rougier
Additionally, Seaborn (https://seaborn.pydata.org/) is a great mention for people that want to use Matplotlib with better default aesthetics, amongst other conveniences:
"Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics."
-
Data Visualisation Basics
Seaborn: built on top of matplotlib, adds a number of functions to make common statistical visualizations easier to generate.
-
Useful Python Libraries for AI/ML
pandas - The standard data analysis and manipulation tool numpy - scientific computing library seaborn - statistical data visualization sklearn - basic machine learning and predictive analysis CausalML - a suite of uplift modeling and causal inference methods PyTorch - professional deep learning framework PivotTablejs - Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook LazyPredict - build and work with and compare multiple models phidata - Build AI Assistants with memory, knowledge and tools. Lux - automates visualization and data analysis pycaret - low-code machine learning library. really nice Cleanlab - for when you are working with messy data drawdata - draw a dataset from inside Jupyter pyforest - lazy import popular data science libs streamlit - simple ui builder, useful for demonstrating ML results
-
Essential Deep Learning Checklist: Best Practices Unveiled
How to Accomplish: Utilize visualization libraries like Matplotlib, Seaborn, or Plotly in Python to create histograms, scatter plots, and bar charts. For image data, use tools that visualize images alongside their labels to check for labeling accuracy. For structured data, correlation matrices and pair plots can be highly informative.
- "No" is not an actionable error message
-
Apache Superset
If you are doing data analysis I don't think any of the 3 pieces of software you mentioned are going to be that helpful.
I see these products as tools for data visualization and reporting i.e. presenting prepared datasets to users in a visually appealing way. They aren't as well suited for serious analytics.
I can't comment on Superset or Tableau but I am familiar with Power BI (it has been rolled out across my org), the type of statistics you can do with it are fairly rudimentary. If you need to do any thing beyond summarizing (counts, averages, min, max etc). It is not particularly easy.
For data analysis I use SAS or R. This software allows you do things like multivariate regression, timeseries forecasting, PCA, Cluster analysis etc. There is also plotting capability.
Both these products are kind of old school, I've been using them since early 2000's, the "new school" seems to be Python. Pretty much all the recent data science people in my organization use Python. Particularly Pandas and libraries like Seaborn (https://seaborn.pydata.org/).
The "power" users of Power BI in my organization tend to be finance/HR people for use cases like drill down into cost figures or Interactively presenting KPI's and other headline figures to management things like that.
-
Seaborn bug responsible for finding of declining disruptiveness in science
It's referring to the seaborn library (https://seaborn.pydata.org/), a Python library for data visualization (built on top of matplotlib).
-
Why Pandas feels clunky when coming from R
While it’s not perfect and it’s not ggplot2, Seaborn is definitely a big improvement over bare matplotlib. You can still use matplotlib to modify the plots it spits out if you want to but the defaults are pretty good most of the time.
https://seaborn.pydata.org/
What are some alternatives?
colly - Elegant Scraper and Crawler Framework for Golang
bokeh - Interactive Data Visualization in the browser, from Python
portia - Visual scraping for Scrapy
Altair - Declarative visualization library for Python
feedparser - Parse feeds in Python
ggplot - ggplot port for python