Our great sponsors
|5 days ago||2 days ago|
|BSD 3-clause "New" or "Revised" License||BSD 3-clause "New" or "Revised" License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
I keep hitting a wall when trying to install modules in Pycharm...
2 projects | reddit.com/r/learnpython | 10 Mar 2023
pip install https://github.com/pandas-dev/pandas/archive/master.zip
Reducing size of dependencies
2 projects | reddit.com/r/Python | 4 Mar 2023
Going through the venv site-packages I found 2 things: * Once your py files are converted into pyc or pyo you can delete the originals at the cost of debugability. It won't be a huge change but it's something. * Pandas specifically carries a huge tests folder it's discussed here. You can delete it. This is probably true for other libraries as well to some extent (carrying non production code into the package.
We are the developers behind pandas, currently preparing for the 2.0 release :) AMA
I'm Patrick Hoefler aka phofl and I'm one of the core team members developing and maintaining pandas (repo, docs), a popular data analysis library.
introducing optional behaviour comes with a huge maintenance cost (I started making such a proposal here, but then withdrew it)
I think this is an interesting question! I've opened https://github.com/pandas-dev/pandas/issues/51751
Personally polars' strictness is making me think about situations when in pandas we end up with object dtype, which we should probably avoid. Here's an example: https://github.com/pandas-dev/pandas/issues/50887 (polars would just error in such a case, which I think is the correct thing to do)
New to python
2 projects | reddit.com/r/learnpython | 28 Feb 2023
If you want to do data analysis or engineering, spend more time on Pandas or PySpark.
Join us for an AMA with the developers of pandas, the powerful data analysis toolkit, this Thursday, March 2nd at 5:30 pm UTC to celebrate the upcoming 2.0 release
2 projects | reddit.com/r/Python | 28 Feb 2023
This Thursday we'll be hosting an AMA with some of the developers of pandas. The AMA will 'officially' start at 5:30pm UTC.2 projects | reddit.com/r/Python | 28 Feb 2023
We released the release candidate for 2.0 last week, so the actual release is expected shortly, possibly next week. Please help us in testing that everything works through testing the rc :)
Extracting git repository data with PyDriller
3 projects | dev.to | 25 Feb 2023
I chose to do this by using the popular Pandas library for tabular data analysis tasks.
fastest web scraping options
2 projects | reddit.com/r/webscraping | 8 Mar 2023
You can use automation tools like Selenium or Playwright. You can work with a full-fledged framework such as Scrapy. I also recently discovered a Python tool like selectolax Lexbor, which allows you to extract data very quickly.
How to run webs scraping script every 15 minutes
2 projects | reddit.com/r/webscraping | 13 Feb 2023
You may want to check out [estela](https://estela.bitmaker.la/docs/), which is a spider management solution, developed by [Bitmaker](https://bitmaker.la) that allows you to run [Scrapy](https://scrapy.org) spiders.
How I used Scrapy for my ML Project
2 projects | dev.to | 27 Jan 2023
I wanted to invest my time and energy in learning the fastest, most efficient one, that can scale with my as my projects get more and more complex scrapy. After all, I want my projects to shine so bright in my cv it blinds the recruiter's eyes.
How to extract / download all URLs from a site?
2 projects | reddit.com/r/nextjs | 8 Dec 2022
Try Scrapy (Python) https://scrapy.org/
Nairobi Stock Exchange Web Scraper (MongoDB Atlas Hackathon 2022 on DEV)
4 projects | dev.to | 7 Dec 2022
Advanced Web Scraping using Python-Scrapy and Splash
3 projects | dev.to | 24 Nov 2022
Ask HN: Best way to keep the raw HTML of scraped pages?
3 projects | news.ycombinator.com | 11 Nov 2022
If you weren't already aware, Scrapy has strong support for this via their HTTPCache middleware; you can choose whether to have it actually behave like a cache, choosing to returned already scraped content if matched or merely to act as a pass-through cache: https://docs.scrapy.org/en/2.7/topics/downloader-middleware....
Their OOtB storage does what the sibling comment says about sha1-ing the request and then sharding the output filename by the first two characters: https://github.com/scrapy/scrapy/blob/2.7.1/scrapy/extension...
Tool to Scrape Manuals and Sensitive PDFs to Generate Stronger Wordlists for Lateral Movement and Initial Access
2 projects | reddit.com/r/cybersecurity | 4 Nov 2022
Surprised at the name of this project given there is an incredibly popular project called scrapy related to web scraping. This project would really benefit from a rebrand.
‘Automate the boring stuff’ — but what do you all actually automate with python
3 projects | reddit.com/r/learnpython | 29 Oct 2022
What are some cool things you've automated with python?
3 projects | reddit.com/r/Python | 16 Oct 2022
I was looking for a used cars. I written a scraper using Scrapy, that was gathering all new offers, filtered by my criteria, every hour. Then it was sending me nicely formatted email.
What are some alternatives?
requests-html - Pythonic HTML Parsing for Humans™
pyspider - A Powerful Spider(Web Crawler) System in Python.
Cubes - Light-weight Python OLAP framework for multi-dimensional data analysis
orange - 🍊 :bar_chart: :bulb: Orange: Interactive data analysis
colly - Elegant Scraper and Crawler Framework for Golang
tensorflow - An Open Source Machine Learning Framework for Everyone
MechanicalSoup - A Python library for automating interaction with websites.
playwright-python - Python version of the Playwright testing and automation library.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
undetected-chromedriver - Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
pyexcel - Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
Keras - Deep Learning for humans