Ask HN: Can we solve AI prompt injection attacks with an indented data format?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • braq

    Customizable data format for config files, AI prompts, and more

  • Hi HN ! I'm Alex, a tech enthusiast. I have an idea that I can't test and that concerns an area in which I am not an expert. I am making this post to find out to what extent this idea is relevant to the state of the art.

    From what little I know, raw user inputs are not directly submitted to LLMs. Typically, user input is carefully wrapped in a special format before being sent to the LLM. The format usually has tags, including special tags to tell the AI, for example, which topic is prohibited.

    As with SQL injection, an attacker can craft malicious user input by introducing special tags. Input sanitization can be seen as a solution, but it seems that it isn't enough. Anyway, it doesn't seem very intuitive, I think a document intended to be read by an LLM should also be very human-readable. I also wonder what happens when an attacker uses obscure Unicode characters to forge a string that looks like a special tag.

    Instead of using an XML-like language, my idea is to use a format that seamlessly interweave human-readable structured data with prose within a single document. Also, the format must natively support indentation to remove the need for input sanitization, thereby eliminating an entire class of injection attacks.

    I am the author of Braq, a data format that seems to be a good candidate.

    The idea to better structure a prompt is described in this Markdown section: https://github.com/pyrustic/braq?tab=readme-ov-file#ai-prompts

    And here, ChatML from OpenAI: https://news.ycombinator.com/item?id=34988748

    As mentioned above, I can't test this idea. Therefore, I'm asking to you: Can we solve AI prompt injection attacks with an indented data format ?

  • Puts Debuggerer

    Ruby library for improved puts debugging, automatically displaying bonus useful information such as source line number and source code.

  • https://github.com

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts