tldw VS wdoc

Compare tldw vs wdoc and see what are their differences.

tldw

tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer' (Open Source NotebookLM) (by rmusser01)

wdoc

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable (?), WIP (by thiswillbeyourgithub)
Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
tldw wdoc
5 7
736 448
17.7% 15.8%
9.9 9.9
1 day ago 8 days ago
Python Python
Apache License 2.0 GNU General Public License v3.0 only
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

tldw

Posts with mentions or reviews of tldw. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2025-04-22.

wdoc

Posts with mentions or reviews of wdoc. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2025-02-10.
  • Show HN: Sort lines semantically using LLM-sort
    4 projects | news.ycombinator.com | 10 Feb 2025
    I recently built a semantic batching function for my RAG system [wdoc](https://github.com/thiswillbeyourgithub/wdoc/) that might be interesting to others. The system splits a corpus into chunks, finds relevant ones via embeddings, and answers questions for each chunk in parallel before aggregating the answers.

    To optimize performance and reduce LLM distraction, instead of aggregating answers two by two, it does batched aggregation. The key innovation is in the batching order - I implemented a [semantic_batching function](https://github.com/thiswillbeyourgithub/wdoc/blob/18bc52128f...) that uses hierarchical clustering on the embeddings and orders texts by leaf order.

    The implementation was straightforward, runs very fast and produces great results. The function is designed to be usable as a standalone tool for others to experiment with.

  • WDoc – Summarise and query documents. Any LLM provider, any filetype, scalable
    1 project | news.ycombinator.com | 31 Oct 2024
  • Ask HN: Local RAG with private knowledge base
    11 projects | news.ycombinator.com | 29 Oct 2024
    I've made wdoc just for that: https://github.com/thiswillbeyourgithub/WDoc

    I am a medical student with thousands of pdfs, various anki databases, video conferences, audio recordings, markdown notes etc. It can query into all of them and return extremely high quality output with sources to each original document.

    It's still in alpha though and there's only 0.5 user beside me that I know of so there are bugs that have yet to be found!

  • Ask HN: What have you built with LLMs?
    6 projects | news.ycombinator.com | 10 Sep 2024
    Here's a highlight (edit: more like an ego dump)

    I couldn't keep up with my news so I made the perfect summarizer that goes through the thought process of the author : https://github.com/thiswillbeyourgithub/WDoc

    I needed an AI based system that go through my anki cards, but might as well make it able to read dozens of file formats. Now I can put entire medical youtube playlists, conferences, anki databases, hundreds of PDFs and ask a single question across all of them at once .

    It's both the same project

  • Ask HN: What are you using to parse PDFs for RAG?
    16 projects | news.ycombinator.com | 30 Jul 2024
    For my RAG projet [WDoc](https://github.com/thiswillbeyourgithub/WDoc/tree/dev) I use multiple pdf parser then use heuristics the keep the best one. The code is at https://github.com/thiswillbeyourgithub/WDoc/blob/654c05c5b2...

    And the heurstics are partly based on using fasttext to detecr languages : https://github.com/thiswillbeyourgithub/WDoc/blob/654c05c5b2...

    It's probably crap for tables but I don't want to rely on external parsers.

  • Ask HN: Is there any software you only made for your own use but nobody else?
    65 projects | news.ycombinator.com | 4 Jul 2024
  • Ask HN: I have many PDFs – what is the best local way to leverage AI for search?
    10 projects | news.ycombinator.com | 30 May 2024
    Don't hesitate to ask for features!

    Here's the link: https://github.com/thiswillbeyourgithub/DocToolsLLM/

What are some alternatives?

When comparing tldw and wdoc you can also consider the following projects:

M.I.L.E.S - M.I.L.E.S, a GPT-4-Turbo voice assistant, self-adapts its prompts and AI model, can play any Spotify song, adjusts system and Spotify volume, performs calculations, browses the web and internet, searches global weather, delivers date and time, autonomously chooses and retains long-term memories. Available for macOS and Windows.

unstract - No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

augini - augini: AI-Powered Tabular Data Assistant

WAP - wet-ass plants

gptme - Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.

PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured

Did you know that Python is
the 2nd most popular programming language
based on number of references?