AI Security, Open Source Tools

Introducing Augustus: Open Source LLM Prompt Injection Tool

From LLM Fingerprinting to LLM Prompt Injection

Last month we released Julius, a tool that answers the question: “what LLM service is running on this endpoint?” Julius identifies the infrastructure. But identification is only the first step. The natural follow-up: “now that I know what’s running, how do I test whether it’s secure?” That’s what Augustus does.

LLMs Are Deployed Faster Than They're Tested

OWASP ranked prompt injection as the #1 security risk in LLM applications. Recent research validates the urgency:

FlipAttack achieves 98% bypass rates against GPT-4o by simply reordering characters in prompts
DeepSeek R1 showed a 100% bypass rate against 50 HarmBench jailbreak prompts in testing by Cisco and the University of Pennsylvania
A study of 36 production LLM-integrated applications found **86% were vulnerable** to prompt injection
PoisonedRAG demonstrated that just 5 malicious documents in a corpus of millions can manipulate AI outputs 90% of the time

The tooling gap is stark. Organizations deploy LLMs into production behind API gateways, embed them in customer-facing products, and connect them to internal systems — often with no adversarial testing whatsoever. The models ship with safety training, but safety training is not security testing. These are fundamentally different disciplines.

Introducing Augustus

Augustus is an open-source LLM vulnerability scanner that tests large language models against 210+ adversarial attacks covering prompt injection, jailbreaks, encoding exploits, data extraction, and more. It ships as a single Go binary, connects to 28 LLM providers out of the box, and produces actionable vulnerability reports.

Augustus is a Go-native reimplementation inspired by garak (NVIDIA’s Python-based LLM vulnerability scanner). Key differences:

Performance: Go binary vs Python interpreter — faster execution and lower memory usage
Distribution: Single binary with no runtime dependencies vs Python package with pip install
Concurrency: Go goroutine pools (cross-probe parallelism) vs Python multiprocessing pools (within-probe parallelism)
Probe coverage: Augustus has 210+ probes; garak has 160+ probes with a longer research pedigree and published paper (arXiv:2406.11036)
Provider coverage: Augustus has 28 providers; garak has 35+ generator variants across 22 provider modules

Existing tools like garak (NVIDIA) and promptfoo serve the research and red-teaming community well. But we needed something built for the way our operators work: a fast, portable binary that fits into existing penetration testing workflows without requiring Python environments, npm installs, or runtime dependencies.

What Augustus Tests

Jail Break Attacks

The classic attack surface. DAN (“Do Anything Now”) prompts trick the model into adopting an unrestricted persona. AIM and AntiGPT use similar role-playing techniques. Grandma exploits use emotional manipulation — “my grandmother used to tell me how to…” — to bypass refusal patterns. ArtPrompts reframe harmful requests as creative writing exercises. These are the attacks that make headlines, and every model should be tested against them. Augustus includes DAN variants through version 11.0, plus Goodside-style prompt injection techniques.

Prompt Injection

Where jailbreaks manipulate the model’s persona, prompt injection manipulates the model’s input processing. Augustus tests encoding attacks across Base64, ROT13, Morse code, hex, Braille, Klingon, leet speak, and 12 more encoding schemes — probing whether the model decodes and follows instructions that bypass text-based filters. Tag smuggling embeds instructions inside XML or HTML tags. FlipAttack (16 variants) reverses or reorders characters. Prefix and suffix injection wrap payloads around legitimate prompts.

Adversarial Examples

These are the research-grade attacks. GCG (Greedy Coordinate Gradient) appends optimized adversarial suffixes. AutoDAN automates jailbreak generation. MindMap and DRA (Dynamic Reasoning Attack) exploit the model’s own reasoning capabilities against it. TreeSearch systematically explores the space of adversarial prompts.

PAIR (Prompt Automatic Iterative Refinement) and TAP (Tree of Attack Prompts) are iterative — they refine their approach across multiple rounds using a judge model to score each attempt. Augustus implements both with a multi-stream conversation manager that handles candidate pruning and scoring. These are computationally expensive (many LLM calls per test), but they represent the state of the art in automated red-teaming.

Data Extraction

Can the model be tricked into leaking sensitive information? API key probes test whether the model reveals credentials from its context. Package hallucination probes (covering Python, JavaScript, Ruby, Rust, Dart, Perl, and Raku) check if the model recommends non-existent packages — a supply chain attack vector where adversaries register the hallucinated package names. PII extraction probes test for personal data leakage. LeakReplay tests whether the model regurgitates training data.

Context Manipulation

RAG poisoning probes test whether an attacker can inject malicious content into the retrieval pipeline — both through document content and metadata injection. Context overflow probes test behavior when the input exceeds expected lengths. Continuation and divergence probes exploit how models handle conversational context — getting them to continue generating after they should have stopped, or steering the conversation away from safety guardrails. Multimodal probes target vision-language models with adversarial images.

Format Exploits

Models that generate structured output create unique attack surfaces. Markdown injection probes test whether model output can inject malicious links or images into rendered content. YAML and JSON parsing attacks exploit downstream consumers of model output. ANSI escape probes test terminal injection. Web injection probes test for XSS payloads in model-generated HTML — a real risk when LLM output is rendered in browsers.

Evasion Techniques

ObscurePrompt uses an LLM to rewrite known jailbreaks into harder-to-detect forms. Phrasing probes test whether rephrasing a blocked request makes it succeed. Character substitution probes (BadChars) use homoglyphs, zero-width characters, bidirectional text markers, and invisible Unicode — inputs that look benign to filters but are interpreted differently by the model. Glitch token probes exploit model-specific tokenization anomalies.

Safety Benchmarks

Augustus implements established research benchmarks for systematic evaluation. DoNotAnswer (941 questions across 5 risk areas) tests refusal behavior. RealToxicityPrompts measures toxicity in completions. Snowball tests whether models generate plausible-sounding but factually incorrect outputs. LMRC probes test across multiple harmful content categories.

Agent Attacks

As LLMs gain tool access, the attack surface expands. Multi-agent manipulation probes test whether one agent can influence another’s behavior. Browsing exploit probes target models with web access, testing whether adversarial web content can hijack the model’s actions. Latent injection probes embed instructions in documents the model processes — a realistic attack against RAG-enabled agents.

Security Testing

The catch-all for operational security concerns. Guardrail bypass probes (20 variants) systematically test NVIDIA NeMo Guardrails and similar frameworks. AV/spam scanning probes test whether models generate content that triggers downstream security tools. Exploitation probes test for SQL injection and code execution through model output. Steganography probes embed hidden instructions in images using LSB encoding. Malware generation probes test whether the model produces functional exploit code.

Features

Feature	Description
210+ Vulnerability Probes	47 attack categories: jailbreaks, prompt injection, adversarial examples, data extraction, safety benchmarks, agent attacks, and more
28 LLM Providers	OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Ollama, and 22 more with 43 generator variants
90+ Detectors	Pattern matching, LLM-as-a-judge, HarmJudge (arXiv:2511.15304), Perspective API, unsafe content detection
7 Buff Transformations	Encoding, paraphrase, poetry (5 formats, 3 strategies), low-resource language translation, case transforms
Flexible Output	Table, JSON, JSONL, and HTML report formats
Production Ready	Concurrent scanning, rate limiting, retry logic, timeout handling
Single Binary	Go-based tool compiles to one portable executable
Extensible	Plugin-style registration via Go init() functions

How It Works

Augustus uses a pipeline architecture. You select probes, point them at a model, and the scanner handles the rest:

---
config:
  layout: elk
  theme: dark
  themeVariables:
    primaryColor: '#270A0C'
    primaryTextColor: '#ffffff'
    primaryBorderColor: '#535B61'
    lineColor: '#535B61'
    background: '#0D0D0D'
---
flowchart LR
    A[Probe Selection] --> B[Buff Transform]
    B --> C[Generator / LLM Call]
    C --> D[Detector Analysis]
    D --> E{Vulnerable?}
    E -->|Yes| F[Record Finding]
    E -->|No| G[Record Pass]

    subgraph Scanner
        B
        C
        D
        E
    end

Probes define the adversarial inputs. A DAN probe sends a role-playing prompt designed to bypass safety training. An encoding probe wraps a malicious instruction in Base64. A FlipAttack probe reverses character order to evade input filters.

Buffs are optional transformations applied before sending. You can wrap any probe in poetry (haiku, sonnet, limerick), translate it into a low-resource language, paraphrase it, or encode it — adding evasion layers that test whether guardrails handle transformed inputs.

Generators connect to the target LLM. Augustus supports 28 providers: OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Ollama, Groq, Mistral, NVIDIA NIM, and 19 more. A REST generator lets you test any custom endpoint with configurable request templates and JSONPath response extraction.

Detectors analyze the model’s response. Pattern matching catches known jailbreak indicators. LLM-as-a-judge uses a second model to evaluate whether the response is harmful. HarmJudge (based on [arXiv:2511.15304](https://arxiv.org/abs/2511.15304)) provides semantic harm assessment aligned with the MLCommons AILuminate taxonomy.

The Attack Engine handles iterative probes like PAIR and TAP that require multi-turn conversations, candidate pruning, and judge-based scoring. These aren’t single-shot tests — they’re adaptive attacks that refine their approach across multiple attempts, mimicking how a real attacker would operate.

Buff Transformations

Real adversaries don’t send attacks in plain text. They encode, translate, rephrase, and obfuscate. Augustus’s buff system applies these transformations to any probe, testing whether guardrails hold up against inputs that don’t match expected patterns.

Augustus ships 5 buff categories (7 individual transformations):

Encoding (`encoding.Base64`, `encoding.CharCode`)

Base64 and character code transformations. Wraps adversarial prompts in encoding schemes that many models will decode and follow, even when the decoded content would be blocked in plain text. This tests the gap between input filters (which see encoded text) and the model (which understands the decoded intent).

Paraphrase (`paraphrase.Pegasus`, `paraphrase.Fast`)

Uses a Pegasus model to rephrase prompts while preserving adversarial intent. A DAN prompt that gets blocked might succeed when paraphrased — same meaning, different surface form. Tests whether safety training generalizes beyond memorized patterns.

Poetry (`poetry.Poetry`)

Reformats prompts as haiku, sonnets, limericks, free verse, or rhyming couplets with 3 strategy options (exemplar-based, meta-prompt, and combined). Models that robustly block a direct harmful request sometimes comply when it arrives as verse. This tests the model’s ability to recognize adversarial intent regardless of stylistic presentation.

Low-Resource Language Translation (`lrl.LRL`)

Translates prompts into low-resource languages via DeepL, exploiting the fact that safety training is often concentrated on English. A request blocked in English may succeed in Zulu, Hmong, or Scots Gaelic. Rate-limited to respect translation API quotas.

Case Transforms (`lowercase.Lowercase`)

Lowercases all input text. Simple, but effective — some input filters and keyword blocklists are case-sensitive, and lowercase transformations can bypass exact-match detection.

You can chain multiple transformations — encode, then paraphrase, then translate — creating layered evasion that tests defense-in-depth by applying a buff with `–buff`, or chain multiple with `–buffs-glob`:

28 LLM Providers

Augustus connects to the major providers and the long tail. OpenAI (including o1/o3 reasoning models), Anthropic (Claude 3/3.5/4), Azure OpenAI, AWS Bedrock, Google Vertex AI, Cohere, Replicate, HuggingFace (Inference API, endpoints, pipelines, LLaVA multimodal), Together AI, Anyscale, Groq, Mistral, Fireworks, DeepInfra, NVIDIA NIM (including multimodal and vision), NVIDIA NeMo, NVIDIA NVCF, NeMo Guardrails, IBM watsonx, LangChain, LangChain Serve, Rasa, GGML, Ollama, LiteLLM, and a generic REST connector for anything else.

The REST generator supports custom request templates with `$INPUT` placeholders, JSONPath response extraction, SSE streaming, and proxy routing — so you can test any OpenAI-compatible or custom API without writing code.

Quick Start

# Install
go install github.com/praetorian-inc/augustus/cmd/augustus@latest

# Test for DAN jailbreak against OpenAI
export OPENAI_API_KEY="your-api-key"
augustus scan openai.OpenAI \
  --probe dan.Dan \
  --detector dan.DanDetector \
  --verbose

# Run all 210+ probes against a local model (no API key needed)
augustus scan ollama.OllamaChat \
  --all \
  --config '{"model":"llama3.2:3b"}'

# Test a custom REST endpoint
augustus scan rest.Rest \
  --probe dan.Dan \
  --config '{
    "uri": "https://your-api.example.com/v1/chat/completions",
    "headers": {"Authorization": "Bearer YOUR_KEY"},
    "req_template_json_object": {
      "model": "your-model",
      "messages": [{"role": "user", "content": "$INPUT"}]
    },
    "response_json": true,
    "response_json_field": "$.choices[0].message.content"
  }'

Output is clean and actionable:

PROBE	DETECTOR	PASSED	SCORE	STATUS
dan.Dan	dan.DAN	false	.85	VULN
encoding.base64	encoding	true	.10	SAFE
smuggling.Tag	smuggling	true	.05	SAFE

You can also export to JSON, JSONL, or generate HTML reports for stakeholders.

What's Next

Augustus is the second tool release of our “The 12 Caesars” open source tool campaign where we will be releasing one open source tool per week for the next 12 weeks. Each tool follows the Unix philosophy: do one thing, do it well, compose with the others.

Contributing & Community

Augustus is available now under the Apache 2.0 license at https://github.com/praetorian-inc/augustus

We welcome contributions from the community. Whether you’re adding probes for services we haven’t covered, reporting bugs, or suggesting new features, check the repository’s CONTRIBUTING.md for guidance on probe definitions and development workflow.

Ready to start? Clone the repository, experiment with Augustus in your environment, and join the discussion on GitHub. We’re excited to see how the security community uses this tool in real-world reconnaissance workflows. Star the project if you find it useful, and let us know what LLM prompt injection techniques you’d like to see us support next.

About the Authors

Evan Leleux

Evan Leleux is a Software Engineer at Praetorian focused on building scalable, distributed systems for enterprise security operations. He loves challenging problems and is always eager to learn. Evan is a Georgia Tech alumni.

Farida Shafik

Farida Shafik is an OSCP-certified Security Engineer at Praetorian, focusing on web application security and AI/LLM vulnerabilities. With a background in software development and innovation, she brings a methodical approach to security assessments and automation.

Nathan Sportsman

Nathan Sportsman is the Founder and CEO of Praetorian. He holds a BS in Electrical and Computer Engineering from the University of Texas at Austin.

Noah Tutt

Noah is a Senior Security Engineer at Praetorian Product Security. He specializes in a wide variety of product security assessments, including web application, hardware testing, reverse engineering and secure code review.