Need help with an OD indexer that I am writing in Python

This page summarizes the projects mentioned and recommended in the original post on /r/opendirectories

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • spider

    spider is an OD crawler that crawls through opendirectories and indexes the urls (by pyDiablo)

  • If any of you is willing to help, I've just uploaded the code to Github. I've added as many comments as I can to help you understand the code.

  • ODmovieindexer

    Extract and index movie information of movies found in open directories posted on r/opendirectories.

  • For my indexer (https://github.com/LaundroMat/ODmovieindexer) I tried crawling by myself too, but I gave up because there were too many special cases to take into account. I used the text files generated by ODScanner as a basis for the URL's to index.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • open-directory-downloader

    A NodeJS wrapper around KoalaBear84/OpenDirectoryDownloader

  • I also wrote a NodeJS wrapper for ODD (https://github.com/Chaphasilor/open-directory-downloader) so that I could easily use ODD in my other projects, you might wanna do the same with Python? This way everyone who knows Python could make use of ODDs edge-case handling and stability!

  • calishot

  • This way you can also evolve your application to become async. As your using requests rather than aiohttp, may I suggest you to use gevent with a pool of requests in parallel (not too much ~ 10). You can look at this file as an example.

  • OpenDirectoryDownloader

    Indexes open directories

  • See: https://github.com/KoalaBear84/OpenDirectoryDownloader/tree/master/OpenDirectoryDownloader.Tests/Samples

  • odcrawler-scanner

    A reddit bot that scans ODs over at /r/OpenDirectories and submits the results to the ODCrawler discovery server

  • DiskCache

    Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.

  • Do you know this project which covers most your needs ? http://www.grantjenks.com/docs/diskcache/

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts