Need help with an OD indexer that I am writing in Python

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

spider

3 4 0.0 Python

spider is an OD crawler that crawls through opendirectories and indexes the urls (by pyDiablo)

If any of you is willing to help, I've just uploaded the code to Github. I've added as many comments as I can to help you understand the code.

ODmovieindexer

2 27 0.0 Jupyter Notebook

Extract and index movie information of movies found in open directories posted on r/opendirectories.

For my indexer (https://github.com/LaundroMat/ODmovieindexer) I tried crawling by myself too, but I gave up because there were too many special cases to take into account. I used the text files generated by ODScanner as a basis for the URL's to index.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
open-directory-downloader

2 3 0.0 JavaScript

A NodeJS wrapper around KoalaBear84/OpenDirectoryDownloader

I also wrote a NodeJS wrapper for ODD (https://github.com/Chaphasilor/open-directory-downloader) so that I could easily use ODD in my other projects, you might wanna do the same with Python? This way everyone who knows Python could make use of ODDs edge-case handling and stability!

calishot

3 16 0.0 Python

This way you can also evolve your application to become async. As your using requests rather than aiohttp, may I suggest you to use gevent with a pool of requests in parallel (not too much ~ 10). You can look at this file as an example.

OpenDirectoryDownloader

59 1,047 7.1 C#

Indexes open directories

See: https://github.com/KoalaBear84/OpenDirectoryDownloader/tree/master/OpenDirectoryDownloader.Tests/Samples

odcrawler-scanner

10 4 0.0 JavaScript

A reddit bot that scans ODs over at /r/OpenDirectories and submits the results to the ODCrawler discovery server
DiskCache

6 2,157 4.5 Python

Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.

Do you know this project which covers most your needs ? http://www.grantjenks.com/docs/diskcache/

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project