llm-attacks

Universal and Transferable Attacks on Aligned Language Models (by llm-attacks)

Llm-attacks Alternatives

Similar projects and alternatives to llm-attacks

  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better llm-attacks alternative or higher similarity.

llm-attacks reviews and mentions

Posts with mentions or reviews of llm-attacks. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-08-02.
  • Hacking Google Bard – From Prompt Injection to Data Exfiltration
    1 project | news.ycombinator.com | 13 Nov 2023
  • Universal and Transferable Adversarial Attacks on Aligned Language Models
    1 project | news.ycombinator.com | 4 Oct 2023
    1 project | news.ycombinator.com | 29 Jul 2023
  • Bing ChatGPT Image Jailbreak
    1 project | news.ycombinator.com | 1 Oct 2023
    Again, kind of? I do see your point.

    But in practice it's not really the same thing as cycling through call centers employees until you find one that's more gullible; the point is that you're navigating a probability space within a single agent more than trying to convince the AI of anything, and getting into a discussion with the AI is more likely to move you out of that probability space. It's not "try something, fail, try again" -- the reason you dump the conversation is that any conversation that contains a refusal is (in my anecdotal experience at least) statistically more likely to contain other refusals.

    Which, you could argue that's not different from what's happening with social engineering; priming someone to be agreeable is part of social engineering. But it feels a little reductive to me. If social engineering is looking at a system/agent that is prone to react in a certain way when in a certain state and then creating that state -- then a lot of stuff is social engineering that we don't generally think of as being in that category?

    The big thing to me is that social engineering skills and instincts around humans are not always applicable to LLM jailbreaking. People tend to overestimate strategies like being polite, providing a justification for what's being asked. Even this example from Bing is kind of eliciting an emotional reaction, and I don't think the emotional reaction is why this works, I think it works because it's nested tasks and I suspect it would work with a lot of other nested tasks as well. I suspect the emotional "my grandma died" part adds very little to this attack.

    So I'm not sure I'd say you're wrong if you argue that's a form of social engineering, just that it feels like at this point we're defining social engineering very broadly, and I don't know that most people using the term use it that broadly. I think they attach a kind of human reasoning to it that's not always applicable to LLM attacks. I can think of justifications for even including stuff like https://llm-attacks.org/ in the category of social engineering, but it's just not the same type of attack that I suspect most people are thinking of when they talk about social engineering. I think leaning too hard on personification sometimes makes jailbreaking slightly harder.

  • Run Llama 2 Uncensored Locally
    3 projects | news.ycombinator.com | 2 Aug 2023
    I think Facebook did a very good job with Llama2, i was skeptical at first with all that talk about 'safe AI'. Their Llama-2 base model is not censored in any way, and it's not fine-tuned as well. It's the pure raw base model, i did some tests as soon as it released and i was surprised with how far i could go (i actually didn't get any warning whatsoever with any of my prompts). The Llama-2-chat model is fine-tuned for chat and censored.

    The fact that they provided us the raw model so we could fine-tune on our own without the hassle of trying to 'uncensor' a botched model is a really great example on how it should be done: give user the choice! Instead, you just have to fine-tune it for chat and other purposes.

    The Llama-2-chat fine-tune is very censored, none of my jailbreaks worked, except for this one[1], and it is a great option for production. The overall quality of the model (i tested the 7b version) has improved a lot, and for the ones interested, it can role-play better than any model i have seen out there with no fine-tune.

    1: https://github.com/llm-attacks/llm-attacks/

  • Researchers uncover "universal" jailbreak that can attack all LLMs in an automated fashion
    1 project | /r/ArtificialInteligence | 31 Jul 2023
    Their paper and code is available here. Note that the attack string they provide has already been patched out by most providers (ChatGPT, Bard, etc.) as the researchers disclosed their findings to LLM providers in advance of publication. But the paper claims that unlimited new attack strings can be made via this method.
  • Universal and Transferable Attacks on Aligned Language Models
    1 project | /r/blueteamsec | 30 Jul 2023
  • Researchers Discover New Vulnerability in Large Language Models
    2 projects | news.ycombinator.com | 29 Jul 2023
    A lot of people here are misreading what this research actually says. If you find the PDF confusing, the base website (https://llm-attacks.org/) lays out the attack in more straightforward terms.

    > We demonstrate that it is in fact possible to automatically construct adversarial attacks on LLMs [...] Unlike traditional jailbreaks, these are built in an entirely automated fashion, allowing one to create a virtually unlimited number of such attacks. Although they are built to target LLMs [..], we find that the strings transfer to many closed-source, publicly-available chatbots like ChatGPT, Bard, and Claude.

  • A note from our sponsor - SaaSHub
    www.saashub.com | 22 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Stats

Basic llm-attacks repo stats
9
2,918
6.2
2 months ago

llm-attacks/llm-attacks is an open source project licensed under MIT License which is an OSI approved license.

The primary programming language of llm-attacks is Python.

Popular Comparisons


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com