Need help with an OD indexer that I am writing in Python

This page summarizes the projects mentioned and recommended in the original post on /r/opendirectories

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • spider

    spider is an OD crawler that crawls through opendirectories and indexes the urls (by pyDiablo)

  • If any of you is willing to help, I've just uploaded the code to Github. I've added as many comments as I can to help you understand the code.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • ODmovieindexer

    Extract and index movie information of movies found in open directories posted on r/opendirectories.

  • For my indexer (https://github.com/LaundroMat/ODmovieindexer) I tried crawling by myself too, but I gave up because there were too many special cases to take into account. I used the text files generated by ODScanner as a basis for the URL's to index.

  • open-directory-downloader

    A NodeJS wrapper around KoalaBear84/OpenDirectoryDownloader

  • I also wrote a NodeJS wrapper for ODD (https://github.com/Chaphasilor/open-directory-downloader) so that I could easily use ODD in my other projects, you might wanna do the same with Python? This way everyone who knows Python could make use of ODDs edge-case handling and stability!

  • calishot

  • This way you can also evolve your application to become async. As your using requests rather than aiohttp, may I suggest you to use gevent with a pool of requests in parallel (not too much ~ 10). You can look at this file as an example.

  • OpenDirectoryDownloader

    Indexes open directories

  • See: https://github.com/KoalaBear84/OpenDirectoryDownloader/tree/master/OpenDirectoryDownloader.Tests/Samples

  • odcrawler-scanner

    A reddit bot that scans ODs over at /r/OpenDirectories and submits the results to the ODCrawler discovery server

  • DiskCache

    Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.

  • Do you know this project which covers most your needs ? http://www.grantjenks.com/docs/diskcache/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Google, Cloudflare and Cisco Will Poison DNS to Stop Piracy Block Circumvention

    1 project | news.ycombinator.com | 17 Jun 2024
  • Gunicorn: 'Green Unicorn' Is a WSGI HTTP Server for Unix

    1 project | news.ycombinator.com | 30 May 2024
  • Httpx – next-generation HTTP client for Python

    1 project | news.ycombinator.com | 19 May 2024
  • pyaction 4.30.0 Released

    1 project | dev.to | 17 May 2024
  • Update Bunny Edge Storage files with NodeJS

    2 projects | dev.to | 30 Apr 2024