braq
ustrid
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
braq
-
Show HN: MikeDoc – Neat docstring format for building API references
As a neat docstring format, MikeDoc can extend existing documentation generators such as Sphinx and can be used with any programming language that supports concepts such as function parameters, return value, and exceptions.
As a standalone tool, MikeDoc would fit well into any Python project whose API reference needs to be embedded in the root directory and browsable offline and online without config changes.
MikeDoc's Python package is available on PyPI and you can already explore its own API reference on GitHub [5].
Let me know what you think of this project.
[1] https://en.wikipedia.org/wiki/Docstring
[2] https://news.ycombinator.com/item?id=21289832 [Four kinds of documentation (divio.com)]
[3] https://github.com/pyrustic/braq
[4] https://news.ycombinator.com/item?id=38684724
[5] https://github.com/pyrustic/mikedoc/?tab=readme-ov-file#demo
-
Ask HN: What is the ideal format to structure AI prompts?
User input does not need to be sanitized if it is programmatically inserted into the document as the value of a key in a regular dict section.
To work, I assume the target model needs to be trained on Braq documents with emphasis on the fact that only the top unnamed section contains root instructions (equivalent to the "system" role in ChatML).
[1] https://news.ycombinator.com/item?id=34988748
[2] https://community.openai.com/t/chatml-documentation-update/5...
[3] https://www.reddit.com/r/LocalLLaMA/comments/17u7k2d/once_an...
[4] https://github.com/pyrustic/braq?tab=readme-ov-file#ai-promp...
-
Ask HN: Can we solve AI prompt injection attacks with an indented data format?
Hi HN ! I'm Alex, a tech enthusiast. I have an idea that I can't test and that concerns an area in which I am not an expert. I am making this post to find out to what extent this idea is relevant to the state of the art.
From what little I know, raw user inputs are not directly submitted to LLMs. Typically, user input is carefully wrapped in a special format before being sent to the LLM. The format usually has tags, including special tags to tell the AI, for example, which topic is prohibited.
As with SQL injection, an attacker can craft malicious user input by introducing special tags. Input sanitization can be seen as a solution, but it seems that it isn't enough. Anyway, it doesn't seem very intuitive, I think a document intended to be read by an LLM should also be very human-readable. I also wonder what happens when an attacker uses obscure Unicode characters to forge a string that looks like a special tag.
Instead of using an XML-like language, my idea is to use a format that seamlessly interweave human-readable structured data with prose within a single document. Also, the format must natively support indentation to remove the need for input sanitization, thereby eliminating an entire class of injection attacks.
I am the author of Braq, a data format that seems to be a good candidate.
The idea to better structure a prompt is described in this Markdown section: https://github.com/pyrustic/braq?tab=readme-ov-file#ai-prompts
And here, ChatML from OpenAI: https://news.ycombinator.com/item?id=34988748
As mentioned above, I can't test this idea. Therefore, I'm asking to you: Can we solve AI prompt injection attacks with an indented data format ?
- Show HN: Braq – Customizable data format for config files, AI prompts, and more
- Show HN: Braq – The most obvious way to section a document
-
Show HN: Paradict – Streamable multi-format serialization with schema
Under the hood, Paradict uses Braq (https://github.com/pyrustic/braq), the most obvious way to section a document (as shown just above), and Ustrid (https://github.com/pyrustic/ustrid), to uniquely generate string identifiers.
Paradict is available on PyPI and you can learn more by reading its README, browsing the source code or playing with its tests.
Let me know what you think about all this !
ustrid
-
Show HN: Paradict – Streamable multi-format serialization with schema
Under the hood, Paradict uses Braq (https://github.com/pyrustic/braq), the most obvious way to section a document (as shown just above), and Ustrid (https://github.com/pyrustic/ustrid), to uniquely generate string identifiers.
Paradict is available on PyPI and you can learn more by reading its README, browsing the source code or playing with its tests.
Let me know what you think about all this !
What are some alternatives?
pyrustic - Collection of lightweight Python projects that share the same policy
paradict - Streamable multi-format serialization with schema