scrapy-playwright
WolfensteinCGA
scrapy-playwright | WolfensteinCGA | |
---|---|---|
11 | 14 | |
837 | 314 | |
3.1% | - | |
7.8 | 2.8 | |
3 months ago | 9 months ago | |
Python | C | |
BSD 3-clause "New" or "Revised" License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scrapy-playwright
-
Web Scraping Dynamic Websites With Scrapy Playwright
scrapy-playwright is an integration between Scrapy and Playwright. It enables scraping dynamic web pages with Scrapy by processing the web scraping requests using a Playwright instance.
- Turning webpages into pdf
- Scrapy & splash guide
-
Web scraping with Python
To integrate Playwright with Scrapy, we will use the scrapy-playwright library. Then, we will scrape https://www.mintmobile.com/product/google-pixel-7-pro-bundle/ to demonstrate how to extract data from a website using Playwright and Scrapy.
-
which libraries/frameworks could be used for page interaction?
Scrapy-playwright
-
Implementing a Selenium backend on a web app?
your website is a dynamic there is many integration on scrappy can help you This the best best one https://github.com/scrapy-plugins/scrapy-playwright
-
Is Selenium still a good choice?
This concern should be lifted if you are a Scrapy lover. There is a Scrapy integration for playwright, that gives you a lot of freedom and lets you operate from a Scrapy spider.
-
Scraping Dynamic Javascript Websites with Scrapy and Scrapy-playwright
Now we need to modify scrapy's settings to allow it to work with playwright. Instructions can be found on playwright's GitHub page. We need to add settings for DOWNLOAD_HANDLERS and TWISTED_REACTOR. New settings that were added can be found between ###. This is what the settings file should look like:
-
Web Scraping with Python: Everything you need to know
You can use something like scrapy-playwright[0] to run a headless browser framework as your download handler. I think there are versions for some of the other headless systems, if you prefer those.
[0] https://github.com/scrapy-plugins/scrapy-playwright
-
Make an addition to scrapy_playwright source code
[1]: https://github.com/scrapy-plugins/scrapy-playwright/issues/61
WolfensteinCGA
-
Doom for 16-bit DOS computers
I was wondering that was well since Wolfenstein got a CGA port: https://github.com/jhhoward/WolfensteinCGA
The CGA mode is playable, but pretty bad looking as would be expected. On a Tandy 1000 with the Tandy graphics mode enabled, it looks much better. It'd be cool to be able to run Doom in TGA as well.
-
Found this beast of a laptop at an estate sale. Zenith zfl-181-92 from 1986
There's an 8088 port of Wolfenstein 3D, which is basically the same thing, right? https://github.com/jhhoward/WolfensteinCGA
-
Wolfenstein 3D CGA on an IBM 5150 PC
Attempted this to see how well this project runs on the original IBM PC. Pretty amazing that it runs at all! This port was done by 'jhhoward' - the files can be found on GitHub, I found out about it at the VCF Forums.
-
An overview of single-purpose Linux distributions
I'm honestly still trying to figure out what I want to do with it. I got the compact flash to use with a XT-CF-Lite v4. I'm still trying to figure out what DOS I'd like to use. I can get the OEM MSDOS 3.22 installed onto the drive, but that version's max partition size is 32MB, which isn't ideal when the compact flash is 4GB. FreeDOS supports that size and pre-386, but it was a pain to get it on there and is a bit overkill for a computer this old. Right now I'm considering a later MSDOS, but haven't decided on which one.
As far as software goes I'm waiting on finalizing the DOS before exploring games and development, but it was fun was to be able to run this port of Wolfenstein 3D:
https://github.com/jhhoward/WolfensteinCGA
Besides all that I'm thinking of maxing out the RAM (it's 640k now, but can take an additional 128k for video), and maybe adding a real time clock and network card. I do have another Tandy 1000 TX, so I could see how 80's networking worked. That is probably another can of worms though.
-
Web scraping with Python
{'URL': 'https://vpnoverview.com/news/wifi-routers-used-to-produce-3d-images-of-humans/', 'title': 'WiFi Routers Used to Produce 3D Images of Humans (vpnoverview.com)', 'rank': '1'} {'URL': 'https://openjdk.org/jeps/8300786', 'title': 'JEP draft: No longer require super() and this() to appear first in a constructor (openjdk.org)', 'rank': '2'} {'URL': 'item?id=34482433', 'title': 'Ask HN: Those making $500+/month on side projects in 2023 -- Show and tell', 'rank': '3'} {'URL': 'https://www.solipsys.co.uk/new/ThePointOfTheBanachTarskiTheorem.html?wa22hn', 'title': 'The Point of the Banach-Tarski Theorem (solipsys.co.uk)', 'rank': '4'} {'URL': 'https://initialcommit.com/blog/git-sim', 'title': 'Git-sim: Visually simulate Git operations in your own repos (initialcommit.com)', 'rank': '5'} {'URL': 'https://www.cell.com/cell-reports-medicine/fulltext/S2666-3791(22)00474-8', 'title': 'Brief structured respiration enhances mood and reduces physiological arousal (cell.com)', 'rank': '6'} {'URL': 'https://en.wikipedia.org/wiki/I,_Libertine', 'title': 'I, Libertine (wikipedia.org)', 'rank': '7'} {'URL': 'item?id=34465956', 'title': 'Ask HN: Why did BASIC use line numbers instead of a full screen editor?', 'rank': '8'} {'URL': 'https://arxiv.org/abs/2203.03456', 'title': 'Negative-weight single-source shortest paths in near-linear time (arxiv.org)', 'rank': '9'} {'URL': 'https://onesignal.com/careers', 'title': 'OneSignal (YC S11) Is Hiring Engineers (onesignal.com)', 'rank': '10'} {'URL': 'https://neelc.org/posts/chatgpt-gmail-spam/', 'title': "Bypassing Gmail's spam filters with ChatGPT (neelc.org)", 'rank': '11'} {'URL': 'https://cyber.dabamos.de/88x31/', 'title': 'The 88x31 GIF Collection (dabamos.de)', 'rank': '12'} {'URL': 'https://www.middleeasteye.net/opinion/david-graeber-vs-yuval-harari-forgotten-cities-myths-how-civilisation-began', 'title': 'The Dawn of Everything challenges a mainstream telling of prehistory (middleeasteye.net)', 'rank': '13'} {'URL': 'https://blog.thinkst.com/2023/01/swipe-right-on-our-new-credit-card-tokens.html', 'title': 'Detect breaches with Canary credit cards (thinkst.com)', 'rank': '14'} {'URL': 'https://www.atlasobscura.com/articles/heritage-appalachian-apples', 'title': 'Appalachian Apple hunter who rescued 1k 'lost' varieties (2021) (atlasobscura.com)', 'rank': '15'} {'URL': 'https://www.workingsoftware.dev/software-architecture-documentation-the-ultimate-guide/', 'title': 'The Guide to Software Architecture Documentation (workingsoftware.dev)', 'rank': '16'} {'URL': 'https://arstechnica.com/tech-policy/2023/01/supreme-court-allows-reddit-mods-to-anonymously-defend-section-230/', 'title': 'Supreme Court allows Reddit mods to anonymously defend Section 230 (arstechnica.com)', 'rank': '17'} {'URL': 'https://neurosciencenews.com/insula-empathy-pain-21818/', 'title': 'How do we experience the pain of other people? (neurosciencenews.com)', 'rank': '18'} {'URL': 'https://lwn.net/SubscriberLink/920158/313ec4305df220bb/', 'title': 'Nolibc: A minimal C-library replacement shipped with the kernel (lwn.net)', 'rank': '19'} {'URL': 'https://www.economist.com/1843/2017/05/04/the-body-in-the-buddha', 'title': 'The Body in the Buddha (2017) (economist.com)', 'rank': '20'} {'URL': 'https://simonwillison.net/2023/Jan/13/semantic-search-answers/', 'title': 'How to implement Q&A against your docs with GPT3 embeddings and Datasette (simonwillison.net)', 'rank': '21'} {'URL': 'https://destevez.net/2023/01/decoding-lunar-flashlight/', 'title': 'Decoding Lunar Flashlight (destevez.net)', 'rank': '22'} {'URL': 'https://www.hampsteadheath.net/about', 'title': 'Hampstead Heath (hampsteadheath.net)', 'rank': '23'} {'URL': 'https://www.otherlife.co/francisbacon/', 'title': 'The violent focus of Francis Bacon (otherlife.co)', 'rank': '24'} {'URL': 'https://arstechnica.com/gaming/2019/10/explaining-how-fighting-games-use-delay-based-and-rollback-netcode/', 'title': 'How fighting games use delay-based and rollback netcode (2019) (arstechnica.com)', 'rank': '25'} {'URL': 'https://essays.georgestrakhov.com/ai-is-not-a-horse/', 'title': 'AI Is Not a Horse (georgestrakhov.com)', 'rank': '26'} {'URL': 'https://lawliberty.org/features/the-mystery-of-richard-posner/', 'title': 'The Mystery of Richard Posner (lawliberty.org)', 'rank': '27'} {'URL': 'https://rodneybrooks.com/predictions-scorecard-2023-january-01/', 'title': 'Rodney Brooks Predictions Scorecard (rodneybrooks.com)', 'rank': '28'} {'URL': 'https://www.notamonadtutorial.com/how-to-transform-code-into-arithmetic-circuits/', 'title': 'How to transform code into arithmetic circuits (notamonadtutorial.com)', 'rank': '29'} {'URL': 'https://github.com/jhhoward/WolfensteinCGA', 'title': 'Wolfenstein 3D with a CGA Renderer (github.com/jhhoward)', 'rank': '30'}
-
They manage to make Wolfenstein 3D run on a 1979 processor
https://github.com/jhhoward/WolfensteinCGA is the github repo for the project, there are demo images and install instructions and whatnot
- Wolfenstein 3D with a CGA Renderer
- Wolfenstein 3D with a CGA renderer
What are some alternatives?
scrapy-splash - Scrapy+Splash for JavaScript integration
wolf3d - The original open source release of Wolfenstein 3D
scrapy-cloudflare-middleware - A Scrapy middleware to bypass the CloudFlare's anti-bot protection
Flatcar - Flatcar project repository for issue tracking, project documentation, etc.
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
scrapy-rotating-proxies - use multiple proxies with Scrapy
scrapy-fake-useragent - Random User-Agent middleware based on fake-useragent
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
aiopath - 📁 Asynchronous pathlib for Python
scrapy-inline-requests - A decorator to write coroutine-like spider callbacks.
yt-videos-list - Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.
open-gov-crawlers - Parse government documents into well formed JSON