Our great sponsors
-
spider
spider is an OD crawler that crawls through opendirectories and indexes the urls (by pyDiablo)
If any of you is willing to help, I've just uploaded the code to Github. I've added as many comments as I can to help you understand the code.
-
ODmovieindexer
Extract and index movie information of movies found in open directories posted on r/opendirectories.
For my indexer (https://github.com/LaundroMat/ODmovieindexer) I tried crawling by myself too, but I gave up because there were too many special cases to take into account. I used the text files generated by ODScanner as a basis for the URL's to index.
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
I also wrote a NodeJS wrapper for ODD (https://github.com/Chaphasilor/open-directory-downloader) so that I could easily use ODD in my other projects, you might wanna do the same with Python? This way everyone who knows Python could make use of ODDs edge-case handling and stability!
-
This way you can also evolve your application to become async. As your using requests rather than aiohttp, may I suggest you to use gevent with a pool of requests in parallel (not too much ~ 10). You can look at this file as an example.
-
See: https://github.com/KoalaBear84/OpenDirectoryDownloader/tree/master/OpenDirectoryDownloader.Tests/Samples
-
odcrawler-scanner
A reddit bot that scans /r/OpenDirectories for new submissions and submits them to the ODCrawler discovery server
-
DiskCache
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
Do you know this project which covers most your needs ? http://www.grantjenks.com/docs/diskcache/
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.