NoLiMa
chat
| NoLiMa | chat | |
|---|---|---|
| 7 | 9 | |
| 198 | 0 | |
| 3.5% | - | |
| 6.2 | 7.5 | |
| 11 months ago | about 21 hours ago | |
| Python | JavaScript | |
| GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
NoLiMa
-
Agentic Pelican on a Bicycle
Thanks for the answer, sir. OK, yes. That makes a lot more sense. I am context greedy ever since I read that Adobe research paper that I shared with you months ago. [0]
The whole "context engineering" seems like a thing, though I dislike throwing around the word "engineer" all willy-nilly like that. :)
In any case, thanks for the response. I just wanted to make sure I was not missing something.
[0] https://github.com/adobe-research/NoLiMa
- Claude Sonnet 4 now supports 1M tokens of context
- NoLiMa: Long-Context Evaluation Beyond Literal Matching
-
GPT-4.1 in the API – OpenAI
Updated results from the authors: https://github.com/adobe-research/NoLiMa
It's the best known performer on this benchmark, but still falls off even relatively modest context lengths. (Cutting edge reasoning models like Gemini 2.5 Pro haven't been evaluated due to their cost and might be better.)
-
Strong evidence suggesting Quasar Alpha is OpenAI's new model
I only ran the benchmark on Quasar Alpha*; the rest of the scores come from the original paper [0] which was published before 3.7 was available. This is a pretty expensive benchmark to run if you're paying for API usage - I'd actually originally set out to run it on Llama 4 but abandoned that after estimating the cost.
* - I also reproduced the Llama 3.1 8B result to check my setup.
[0] - https://arxiv.org/abs/2502.05167 / https://github.com/adobe-research/NoLiMa*
-
Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison
They are testing for a very straightforward needle retrieval, as LLMs traditionally were terrible for this in longer contexts.
There are some more advanced tests where it's far less impressive. Just a couple of days ago Adobe released one such test- https://github.com/adobe-research/NoLiMa
chat
-
ClickHouse acquires LibreChat: The open-source Agentic Data Stack
Full Disclosure. I am the author of https://github.com/gitsense/chat
> The idea behind the Agentic Data Stack is a higher-level integration to provide a composable software stack for agentic analytics that users can setup quicky, with room for customization.
I agree with this. For those who have been programming with LLM, the difference between something working and not working can be a simple "sentence" conveying the required context. I strongly believe data enrichment will be one of the main ways we can make agents more effective and efficient. Data enrichment is the foundation for my personal assistant feature https://github.com/gitsense/chat/blob/main/packages/chat/wid...
Basically instead of having agents blindly grep for things, you would provide them with analyzers that they can use to search with. By making it dead simple for domain experts to extract 'business logic' from their codebase, we can solve a lot of problems, much more efficiently. Since data is the key, I can see why ClickHouse will make this move since they probably want to become the storage for all business logic.
Note: I will be dropping a massive update to how my tool generates and analyzes metadata this week, so don't read too much into the demo or if you decide to play with it. I haven't really been promoting it because the flow hasn't been right, but it should be this week.
- Cerebras Systems Raises $1.1B Series G at $8.1B Valuation
-
Context is the bottleneck for coding agents now
> A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.
This is how I designed my LLM chat app (https://github.com/gitsense/chat). I think agents have their place, but I really think if you want to solve complex problems without needlessly burning tokens, you will need a human in the loop to curate the context. I will get to it, but I believe in the same way that we developed different flows for working with Git, we will have different 'Chat Flows' for working with LLMs.
I have an interactive demo at https://chat.gitsense.com which shows how you can narrow the focus of the context for the LLM. Click "Start GitSense Chat Demos" then "Context Engineering & Management" to go through the 30 second demo.
-
Deploying DeepSeek on 96 H100 GPUs
I don't think you need to be big data to benefit.
A major issue we have right now is, we want the coding process to be more "Agentic", but we don't have an easy way for LLMs to determine what to pull into context to solve a problem. This is a problem that I am working on with my personal AI search assistant, which I talk about below:
https://github.com/gitsense/chat/blob/main/packages/chat/wid...
Analyzers are the "Brains" for my search, but generating the analysis is both tedious and can be costly. I'm working on the tedious part and with batch processing, you can probably process thousands of files for under 5 dollars with Gemini 2.5 Flash.
With batch processing and the ability to continuously analyze 10s of thousands of files, I can see companies wanting to do making "Agentic" coding smarter, which should help with GPU utilization and drive down the cost of software development.
- Show HN: Vectorless RAG
-
The Leverage Paradox in AI
> Do people really try to one-shot their AI tasks?
Yes. I almost always end with "Do not generate any code unless it can help in our discussions as this is the design stage" I would say, 95% of my code for https://github.com/gitsense/chat in the last 6 months were AI generated, and I would say 80% were one shots.
It is important to note that I can easily get into the 30+ messages of back and forth before any code is generated. For complex tasks, I will literally spend an hour or two (that can span days) chatting and thinking about a problem with the LLM and I do expect the LLM to one shot them.
-
Why LLMs Can't Build Software
> I wish people could be a bit more open about what they build.
I would say for the last 6 months, 95% of the code for my chat app (https://github.com/gitsense/chat) was AI generated (98% human architected). I believe what I created in the last 6 months is far from trivial. One of the features that AI helped a lot with, was the AI Search Assistant feature. You can learn more about it here https://github.com/gitsense/chat/blob/main/packages/chat/wid...
As a debugging partner, LLMs are invaluable. I could easily load all the backend search code into context and have it trace a query and create a context bundle with just the affected files. Once I had that, I would use my tool to filter the context to just those files and then chat with the LLM to figure out what went wrong or why the search was slow.
I very much agree with the author of the blog post about why LLMs can't really build software. AI is an industry game changer as it can truly 3x to 4x senior developers in my opinion. I should also note that I spend about $2 a day on LLM and I probably have to read 200+ LLM generated messages a day and reply back in great detail about 5 times a day (think of an email instead of chat message).
Note: The demo on that I have in the README hasn't been setup, as I am still in the process of finalizing things for release but the NPM install instructions should work.
-
Claude Sonnet 4 now supports 1M tokens of context
> But they have to get better at understanding the repo by asking the right questions.
How I am tackling this problem is making it dead simple for users to create analyzers that are designed to enriched text data. You can read more about how it would be used in a search at https://github.com/gitsense/chat/blob/main/packages/chat/wid...
The basic idea is, users would construct analyzers with the help of LLMs to extract the proper metadata that can be semantically searched. So when the user does an AI Assisted search with my tool, I would load all the analyzers into the system prompt and the LLM can determine which analyzers can be used to answer the question.
A very simplistic analyzer would be to make it easy to identify backend and frontend code so you can just use the command `!ask find all frontend files` and the LLM will construct a deterministic search that knows to match for frontend files.
What are some alternatives?
kcores-llm-arena - LLM Arena by KCORES team
SillyTavern - LLM Frontend for Power Users.
Elemental - Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction
ccprompts - practical claude code commands and subagents
mcp-gemini-tutorial - Building MCP Servers with Google Gemini
open-webui - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)