Download our Latest Industry Report – Continuous Offensive Security Outlook 2026

Introducing Julius: Open Source LLM Service Fingerprinting

The Growing Shadow AI Problem

Over 14,000 Ollama server instances are publicly accessible on the internet right now. A recent Cisco analysis found that 20% of these actively host models susceptible to unauthorized access. Separately, BankInfoSecurity reported discovering more than 10,000 Ollama servers with no authentication layer—the result of hurried AI deployments by developers under pressure.                                                                                              

This is the new shadow IT: developers spinning up local LLM servers for productivity, unaware they’ve exposed sensitive infrastructure to the internet. And Ollama is just one of dozens of AI serving platforms proliferating across enterprise networks.                  

The security question is no longer “are we running AI?” but “where is AI running that we don’t know about?” 

What is LLM Service Fingerprinting?

LLM service fingerprinting identifies what **server software** is running on a network endpoint—not which AI model generated text, but which infrastructure is serving it.          

The LLM security space spans multiple tool categories, each answering a different question:

Question

 Tool Category 

"What ports are open?"

Nmap

"What service is on this port?"

Praetorian Nerva (will be open-sourced)

"Is this HTTP service an LLM?"

Praetorian Julius

"Which LLM wrote this text?"

Model fingerprinting

"Is this prompt malicious?"

Input guardrails

"Can this model be jailbroken?"

Nvidia Garak

Praetorian Augustus (will be open-sourced)

Julius answers the third question: during a penetration test or attack surface assessment, you’ve found an open port. Is it Ollama? vLLM? A Hugging Face deployment? Some enterprise AI gateway? Julius tells you in seconds.

Julius follows the Unix philosophy: do one thing and do it well. It doesn’t port scan. It doesn’t vulnerability scan. It identifies LLM services—nothing more, nothing less.

This design enables Julius to slot into existing security toolchains rather than replace them.  The Praetorian Guard Security Pipeline. In Praetorian’s continuous offensive security platform, Julius occupies a critical position in the  multi-stage scanning pipeline:  

---
config:
  layout: elk
  theme: dark
  themeVariables:
    primaryColor: '#270A0C'
    primaryTextColor: '#ffffff'
    primaryBorderColor: '#535B61'
    lineColor: '#535B61'
    background: '#0D0D0D'
---
flowchart LR
 subgraph subGraph0["Asset Discovery"]
        A["🌱 Seed"]
  end
 subgraph subGraph1["LLM Reconnaissance"]
        B["🔍 Portscan
Nmap"] C["🏷️ Fingerprint
Nerva"] D["🤖 LLM Detection
Julius"] end subgraph subGraph2["LLM Attack"] E["⚔️ Augustus
Syntactic Probes
46+ patterns"] F["🧠 Aurelius
Semantic Reasoning
AI-driven attacks"] end A --> B B --> C C --> D D --> E E -- Static probes
exhausted --> F F -- Adaptive
exploitation --> G["📋 Confirmed Compromise"] E --> G style C fill:#FFCDD2,stroke:#D50000,color:#000000 style D fill:#D50000,color:#FFFFFF style E fill:#FFCDD2,stroke:#D50000,color:#000000 style F fill:#FFCDD2,stroke:#D50000,stroke-width:2px,color:#000000

Why Existing Detection Methods Fall Short

Manual Detection is Slow and Error-Prone

Each LLM platform has different API signatures, default ports, and response patterns:

  • Ollama: port 11434, /api/tags returns {“models”: […]}
  • vLLM: port 8000, OpenAI-compatible /v1/models 
  • LiteLLM: port 4000, proxies to multiple backends
  • LocalAI: port 8080, /models endpoint

Manually checking each possibility during an assessment wastes time and risks missing services.

Shodan Queries Have Limitations

A Cisco’s Study found ~1,100 Ollama instances were indexed on Shodan. While interesting, replicating the research requires a Shodan license.

Limitation

Impact

 Ollama-only detection

Misses vLLM, LiteLLM, and 15+ other platforms

Passive database queries

Data lags behind real-time deployments

Requires Shodan subscription

Cost barrier for some teams

 No model enumeration 

Can't identify what's deployed

Introducing Julius

Julius is an open-source LLM service fingerprinting tool that detects 17+ AI platforms through active HTTP probing. Built in Go, it  compiles to a single binary with no external dependencies.

# Installation
go install github.com/praetorian-inc/julius/cmd/julius@latest   

# Basic usage
julius probe https://target.example.com:11434

# Output

TARGET

SERVICE

SPECIFICITY

CATEGORY

MODELS 

https://target.example.com 

ollama

100

self-hosted 

llama2, mistral

Julius vs Alternatives

Capability

Julius

Shodan Queries

Manual Discovery

Services detected

17+

Comprehensive

Unreliable and varied

External dependencies

None

Shodan API and License

None

Offline operation

Yes

No

Yes

Real-time detection

Yes

Delayed (index lag)

Yes

Model enumeration

Yes

No

Manual

Custom probe extension

Yes (YAML)

No

Not Applicable

Time per target

Seconds

Seconds

Minutes, Hours, Days

How Julius Works

Julius uses a probe-and-match architecture optimized for speed and accuracy:

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#270A0C', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#535B61', 'lineColor':    
  '#535B61', 'background': '#0D0D0D'}}}%%                                                                                                                   
  flowchart LR                                                                                                                                              
      A[Target URL] --> B[Load Probes]                                                                                                                      
      B --> C[HTTP Probes]                                                                                                                                  
      C --> D[Rule Match]                                                                                                                                   
      D --> E[Specificity
Scoring] E --> F[Report
Service] style A fill:#1a1a1a,stroke:#535B61,color:#ffffff style B fill:#1a1a1a,stroke:#535B61,color:#ffffff style C fill:#1a1a1a,stroke:#535B61,color:#ffffff style D fill:#270A0C,stroke:#E63948,color:#ffffff style E fill:#E63948,stroke:#535B61,color:#ffffff style F fill:#270A0C,stroke:#E63948,color:#ffffff

Architectural Decisions

Julius is designed for performance in large-scale assessments: 

Design Decision

Purpose 

Concurrent scanning with errgroup

Scan 50+ targets in parallel without race conditions

Response caching with singleflight 

Multiple probes hitting /api/models trigger only one HTTP request

Embedded probes compiled into binary

True single-binary distribution—no external files

Port-based probe prioritization

Target on :11434 runs Ollama probes first for faster identification

MD5 response deduplication

Identical responses across targets are processed once

  cmd/julius/          CLI entrypoint                
  pkg/                                                                 
    runner/            Command execution (probe, list, validate)
    scanner/           HTTP client, response caching, model extraction 
    rules/             Match rule engine (status, body, header pattern)
    output/            Formatters (table, JSON, JSONL)
    probe/             Probe loader (embedded YAML + filesystem)    
    types/             Core data structures
  probes/              YAML probe definitions (one per service)

Detection Process

  1. Target Normalization: Validates and normalizes input URLs
  2. Probe Selection: Prioritizes probes matching the target’s port (if :11434, Ollama probes run first)
  3. HTTP Probing: Sends requests to service-specific endpoints
  4. Rule Matching: Compares responses against signature patterns
  5. Specificity Scoring: Ranks results 1-100 by most specific match
  6. Model Extraction: Optionally retrieves deployed models via JQ expressions
 
Specificity Scoring: Eliminating False Positives                                                                                      
                                                                                                                                        
Many LLM platforms implement OpenAI-compatible APIs. If Julius detects both “OpenAI-compatible” (specificity: 30) and “LiteLLM” (specificity: 85) on the same endpoint, it reports LiteLLM first.
 
This prevents the generic “OpenAI-compatible” match from obscuring the actual service identity.   

Match Rule Engine

Julius uses six rule types for fingerprinting:

Rule Type

Purpose 

Example

 status

HTTP status code

200 confirms endpoint exists

body.contains

JSON structure detection

"models": identifies list responses

body.prefix

Response format identification

{"object": matches OpenAI-style

content-type

API vs HTML differentiation

application/json

header.contains

Service-specific headers

X-Ollama-Version

header.prefix

Server identification

uvicorn ASGI fingerprint

All rules support negation with not: true—crucial for distinguishing similar services. For example: “has /api/tags endpoint” AND “does NOT contain LiteLLM” ensures Ollama detection doesn’t match LiteLLM proxies. 

Julius also caches HTTP responses during a scan, so multiple probes targeting the same endpoint don’t result in duplicate requests. You can write 100 probes that check / for different signatures without overloading the target. Julius fetches the page once and evaluates all matching rules against the cached response.

Julius prioritizes precision over breadth. Each probe includes specificity scoring to avoid false positives. An Ollama instance should be identified as Ollama, not just “something OpenAI-compatible.” The generic OpenAI-compatible probe exists as a fallback, but specific service detection always takes precedence.

Probes Included in Initial Release

Self-Hosted LLM Servers

Services

Port

Detection Method

ollama

11434

api/tags JSON response + "Ollama is running" banner

vllm

8000

/v1/models with Server: uvicorn header + /version endpoint

local.ai

8080

/metrics endpoint containing "LocalAI" markers

llama

8080

/v1/models with owned_by: llamacpp OR Server: llama.cpp header

Hugging Face

3000

/info endpoint with model_id field 

lm studio

1234

/api/v0/models endpoint (LM Studio-specific)

Nvidia nim

8000

/v1/metadata with modelInfo + /v1/health/ready

Proxy & Gateway Services

Services

Port

Detection Method

LiteLLM

4000

/health with healthy_endpoints or litellm_metadata JSON

Kong

8000

Server: kong header + /status endpoint

Enterprise Cloud Platforms

Services

Port

Detection Method

salesforce einstein

443

Messaging API auth endpoint error response

ML Demo Platforms

Services

Port

Detection Method

Gradio

7860

/config with mode and components

RAG Platforms

Services

Port

Detection Method

AnythingLLM

3001

 HTML containing "AnythingLLM"

Chat Frontends

Services

Port

Detection Method

Open WebUI

3000

/api/config with "name":"Open WebUI"

LibreChat

3080

HTML containing "LibreChat"

SillyTavern

8000

HTML containing "SillyTavern"

Better ChatGPT

3000

HTML containing "Better ChatGPT"

Generic Detection

Services

Port

Detection Method

OpenAI-compatible

Varied

/v1/models with standard response structure

Extending Julius with Custom Probes

Adding support for a new LLM service requires ~20 lines of YAML— no code changes:

# probes/my-service.yaml
name: my-llm-service
description: Custom LLM service detection
category: self-hosted
port_hint: 8080
specificity: 75
api_docs: https://example.com/api-docs

requests:
  - type: http
    path: /health
    method: GET
    match:
      - type: status
        value: 200
      - type: body.contains
        value: '"service":"my-llm"'

  - type: http
    path: /api/version
    method: GET
    match:
      - type: status
        value: 200
      - type: content-type
        value: application/json

models:
  path: /api/models
  method: GET
  extract: ".models[].name"

Validate your probe:

julius validate ./probes

Real World Usage

Single Target Assessment

julius probe https://target.example.com                               

julius probe https://target.example.com:11434

julius probe 192.168.1.100:8080  

Scan Multiple Targets From a File

julius probe -f targets.txt

JSON output for automation:

Table (default) - human readable                                                                                                    
julius probe https://target.example.com                                                                                               
                                                                                                                                        
# JSON - structured for parsing                                                                                                       
julius probe -o json https://target.example.com                                                                                       
                                                                                                                                        
# JSONL - streaming for large scans                                                                                                   
julius probe -o jsonl -f targets.txt | jq '.service' 

What's Next

Julius is the first tool release of our “The 12 Caesars” open source tool campaign where we will be releasing one open source tool per week for the next 12 weeks. Julius focuses on HTTP-based fingerprinting of known LLM services. We’re already working on expanding its capabilities while maintaining the lightweight, fast execution that makes it practical for large-scale reconnaissance.

On our roadmap: additional probes for cloud-hosted LLM services, smarter detection of custom integrations, and the ability to analyze HTTP traffic patterns to identify LLM usage that doesn’t follow standard API conventions. We’re also exploring how Julius can work alongside AI agents to autonomously discover LLM infrastructure across complex environments.

Contributing & Community

Julius is available now under the Apache 2.0 license at https://github.com/praetorian-inc/julius

We welcome contributions from the community. Whether you’re adding probes for services we haven’t covered, reporting bugs, or suggesting new features, check the repository’s CONTRIBUTING.md for guidance on probe definitions and development workflow.

Ready to start? Clone the repository, experiment with Julius in your environment, and join the discussion on GitHub. We’re excited to see how the security community uses this tool in real-world reconnaissance workflows. Star the project if you find it useful, and let us know what LLM services you’d like to see supported next.

About the Authors

Evan Leleux

Evan Leleux

Evan Leleux is a Software Engineer at Praetorian focused on building scalable, distributed systems for enterprise security operations. He loves challenging problems and is always eager to learn. Evan is a Georgia Tech alumni.

Ready to Discuss Your Next Continuous Threat Exposure Management Initiative?

Praetorian’s Offense Security Experts are Ready to Answer Your Questions