Web scraping with Python

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • WolfensteinCGA

    Wolfenstein 3D with a CGA renderer

  • {'URL': 'https://vpnoverview.com/news/wifi-routers-used-to-produce-3d-images-of-humans/', 'title': 'WiFi Routers Used to Produce 3D Images of Humans (vpnoverview.com)', 'rank': '1'} {'URL': 'https://openjdk.org/jeps/8300786', 'title': 'JEP draft: No longer require super() and this() to appear first in a constructor (openjdk.org)', 'rank': '2'} {'URL': 'item?id=34482433', 'title': 'Ask HN: Those making $500+/month on side projects in 2023 -- Show and tell', 'rank': '3'} {'URL': 'https://www.solipsys.co.uk/new/ThePointOfTheBanachTarskiTheorem.html?wa22hn', 'title': 'The Point of the Banach-Tarski Theorem (solipsys.co.uk)', 'rank': '4'} {'URL': 'https://initialcommit.com/blog/git-sim', 'title': 'Git-sim: Visually simulate Git operations in your own repos (initialcommit.com)', 'rank': '5'} {'URL': 'https://www.cell.com/cell-reports-medicine/fulltext/S2666-3791(22)00474-8', 'title': 'Brief structured respiration enhances mood and reduces physiological arousal (cell.com)', 'rank': '6'} {'URL': 'https://en.wikipedia.org/wiki/I,_Libertine', 'title': 'I, Libertine (wikipedia.org)', 'rank': '7'} {'URL': 'item?id=34465956', 'title': 'Ask HN: Why did BASIC use line numbers instead of a full screen editor?', 'rank': '8'} {'URL': 'https://arxiv.org/abs/2203.03456', 'title': 'Negative-weight single-source shortest paths in near-linear time (arxiv.org)', 'rank': '9'} {'URL': 'https://onesignal.com/careers', 'title': 'OneSignal (YC S11) Is Hiring Engineers (onesignal.com)', 'rank': '10'} {'URL': 'https://neelc.org/posts/chatgpt-gmail-spam/', 'title': "Bypassing Gmail's spam filters with ChatGPT (neelc.org)", 'rank': '11'} {'URL': 'https://cyber.dabamos.de/88x31/', 'title': 'The 88x31 GIF Collection (dabamos.de)', 'rank': '12'} {'URL': 'https://www.middleeasteye.net/opinion/david-graeber-vs-yuval-harari-forgotten-cities-myths-how-civilisation-began', 'title': 'The Dawn of Everything challenges a mainstream telling of prehistory (middleeasteye.net)', 'rank': '13'} {'URL': 'https://blog.thinkst.com/2023/01/swipe-right-on-our-new-credit-card-tokens.html', 'title': 'Detect breaches with Canary credit cards (thinkst.com)', 'rank': '14'} {'URL': 'https://www.atlasobscura.com/articles/heritage-appalachian-apples', 'title': 'Appalachian Apple hunter who rescued 1k 'lost' varieties (2021) (atlasobscura.com)', 'rank': '15'} {'URL': 'https://www.workingsoftware.dev/software-architecture-documentation-the-ultimate-guide/', 'title': 'The Guide to Software Architecture Documentation (workingsoftware.dev)', 'rank': '16'} {'URL': 'https://arstechnica.com/tech-policy/2023/01/supreme-court-allows-reddit-mods-to-anonymously-defend-section-230/', 'title': 'Supreme Court allows Reddit mods to anonymously defend Section 230 (arstechnica.com)', 'rank': '17'} {'URL': 'https://neurosciencenews.com/insula-empathy-pain-21818/', 'title': 'How do we experience the pain of other people? (neurosciencenews.com)', 'rank': '18'} {'URL': 'https://lwn.net/SubscriberLink/920158/313ec4305df220bb/', 'title': 'Nolibc: A minimal C-library replacement shipped with the kernel (lwn.net)', 'rank': '19'} {'URL': 'https://www.economist.com/1843/2017/05/04/the-body-in-the-buddha', 'title': 'The Body in the Buddha (2017) (economist.com)', 'rank': '20'} {'URL': 'https://simonwillison.net/2023/Jan/13/semantic-search-answers/', 'title': 'How to implement Q&A against your docs with GPT3 embeddings and Datasette (simonwillison.net)', 'rank': '21'} {'URL': 'https://destevez.net/2023/01/decoding-lunar-flashlight/', 'title': 'Decoding Lunar Flashlight (destevez.net)', 'rank': '22'} {'URL': 'https://www.hampsteadheath.net/about', 'title': 'Hampstead Heath (hampsteadheath.net)', 'rank': '23'} {'URL': 'https://www.otherlife.co/francisbacon/', 'title': 'The violent focus of Francis Bacon (otherlife.co)', 'rank': '24'} {'URL': 'https://arstechnica.com/gaming/2019/10/explaining-how-fighting-games-use-delay-based-and-rollback-netcode/', 'title': 'How fighting games use delay-based and rollback netcode (2019) (arstechnica.com)', 'rank': '25'} {'URL': 'https://essays.georgestrakhov.com/ai-is-not-a-horse/', 'title': 'AI Is Not a Horse (georgestrakhov.com)', 'rank': '26'} {'URL': 'https://lawliberty.org/features/the-mystery-of-richard-posner/', 'title': 'The Mystery of Richard Posner (lawliberty.org)', 'rank': '27'} {'URL': 'https://rodneybrooks.com/predictions-scorecard-2023-january-01/', 'title': 'Rodney Brooks Predictions Scorecard (rodneybrooks.com)', 'rank': '28'} {'URL': 'https://www.notamonadtutorial.com/how-to-transform-code-into-arithmetic-circuits/', 'title': 'How to transform code into arithmetic circuits (notamonadtutorial.com)', 'rank': '29'} {'URL': 'https://github.com/jhhoward/WolfensteinCGA', 'title': 'Wolfenstein 3D with a CGA Renderer (github.com/jhhoward)', 'rank': '30'}

  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • By definition, Playwright is an open-source framework for web testing and automation developed and maintained by Microsoft.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • httpx

    A next generation HTTP client for Python. 🦋

  • HTTPX is a fully featured HTTP client library for Python 3, including an integrated command-line client while providing both sync and async APIs.

  • scrapy-playwright

    🎭 Playwright integration for Scrapy

  • To integrate Playwright with Scrapy, we will use the scrapy-playwright library. Then, we will scrape https://www.mintmobile.com/product/google-pixel-7-pro-bundle/ to demonstrate how to extract data from a website using Playwright and Scrapy.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • A Retrospective on Requests

    2 projects | news.ycombinator.com | 27 Feb 2024
  • Show HN: Twitter API Wrapper for Python – No API Keys Needed

    3 projects | news.ycombinator.com | 3 Feb 2024
  • HTTP Rate Limit

    3 projects | dev.to | 4 Jan 2024
  • Didn't want to click on refresh to see updates, this is what I did!

    2 projects | dev.to | 15 Nov 2023
  • Python Requests 3

    1 project | news.ycombinator.com | 13 Nov 2023