TL;DR: Julius v0.2.0 nearly doubles LLM fingerprinting probe coverage from 33 to 63, adding detection for cloud-managed AI services (AWS Bedrock, Azure OpenAI, Vertex AI), high-performance inference servers (SGLang, TensorRT-LLM, Triton), AI gateways (Portkey, Helicone, Bifrost), and self-hosted RAG platforms (PrivateGPT, RAGFlow, Quivr). This release also hardens the scanner itself with response size limiting and TLS configuration for enterprise environments. Update Julius and scan your network — you almost certainly have AI infrastructure you don’t know about.
When we shipped the v0.1.1 update back in February, Julius could detect 33 LLM services. That covered the self-hosted basics (Ollama, vLLM, llama.cpp) and a growing list of orchestration tools. But the gap was obvious: we had almost no coverage for cloud-managed AI services, production inference servers, or the AI gateway layer that sits between applications and models.
That gap is now closed. Julius v0.2.0 ships with 63 probes, adding 30 new detections in a single release. More importantly, the types of infrastructure we now detect reflect where enterprise AI deployments are actually heading: cloud-managed endpoints, high-throughput inference engines, and the growing ecosystem of proxies and gateways that route traffic between them.
What’s new in v0.2.0
Cloud-managed AI services (10 probes)
This is the biggest category and the one we’ve been asked about most. Organizations deploying AI through their cloud provider often assume these endpoints are inherently private. They’re not — misconfigured API gateways, exposed proxy layers, and overly permissive network policies can put them on the open internet.
- AWS Bedrock — Control plane and runtime detection via
/foundation-modelsand/model/{modelId}/converse - Azure OpenAI — Azure-specific OpenAI endpoint detection
- Google Vertex AI — Vertex AI prediction and model endpoint detection
- Databricks Model Serving — Model serving endpoint detection
- Fireworks AI, Groq, Modal, Replicate, Together AI — Managed inference API detection
Self-hosted inference servers (10 probes)
These are the workhorses of production AI: high-performance inference engines that teams deploy for throughput, latency, or cost reasons. They tend to run with default configurations and minimal authentication.
- SGLang — Detected via its unique
/server_infoendpoint exposingmem_fraction_staticanddisaggregation_modefields - TensorRT-LLM — NVIDIA’s optimized inference runtime
- Triton Inference Server — NVIDIA’s multi-framework serving platform
- BentoML — ML model serving framework
- Baseten Truss, DeepSpeed-MII, MLC LLM, Petals, PowerInfer, Ray Serve — Various self-hosted inference engines
AI gateways and proxies (5 probes)
The gateway layer is where organizations route, observe, and control traffic between their applications and LLM providers. An exposed gateway often means access to every model and API key behind it.
- Portkey AI Gateway — AI gateway with provider routing and observability
- Helicone — LLM observability and proxy platform
- Bifrost — Multi-provider AI gateway
- OmniRoute — LLM routing gateway
- TensorZero — Model gateway with experimentation support
RAG and orchestration platforms (5 probes)
Self-hosted RAG platforms are where things get particularly sensitive. These systems are purpose-built to ingest and query internal documents — contracts, HR policies, financial data, source code. An exposed RAG endpoint is, by definition, an exposed document store.
- PrivateGPT — Private document Q&A (detected via its
/v1/ingest/listendpoint, which returns data even with zero ingested documents and auth disabled by default) - RAGFlow — Open-source RAG engine with deep document understanding
- Quivr — Second brain RAG platform
- h2oGPT — H2O.ai‘s document Q&A platform
- Langflow — Visual LLM orchestration framework
Why self-hosted RAG is the new shadow IT
The OpenClaw story from our last update highlighted what happens when AI agent platforms get exposed: leaked API keys, filesystem access, and user impersonation. With this release, we’re seeing the same pattern play out with RAG platforms — except the stakes are different. Instead of agent credentials, you’re looking at the documents themselves.
PrivateGPT is a good example. The entire value proposition is “keep your documents private by running everything locally.” The irony is that PrivateGPT’s API defaults to no authentication. Its /v1/ingest/list endpoint is a simple GET that returns every ingested document’s metadata, including filenames and chunk counts. The model field is hardcoded to "private-gpt", which makes detection trivial and false positives near-zero.
RAGFlow follows a similar pattern. Its /v1/system/healthz endpoint is unauthenticated and returns a JSON health check with a doc_engine field that’s unique to RAGFlow — it tracks the status of the Elasticsearch or Infinity backend that powers document retrieval. Even when RAGFlow is partially broken (HTTP 500), the health endpoint still responds with the same structure, making detection reliable in any state.
The problem isn’t that these tools are insecure by design. It’s that they’re easy to deploy, they serve an obvious need (“let me ask questions about our internal docs”), and teams spin them up without involving security. By the time anyone notices, the system has been indexing sensitive documents on an endpoint with no auth, no network restriction, and no monitoring.
This is shadow IT for the AI era, and it’s why discovery tooling matters.
What else changed
Beyond new probes, v0.2.0 includes changes to the scanner itself:
Breaking API change: scanner.NewScanner() now requires two additional parameters — maxResponseSize and tlsConfig. If you’re using Julius as a library, see the migration guide in the changelog.
New CLI flags:
--max-response-size— Limits response body size (default 10MB) to prevent memory exhaustion from large or malicious responses--insecure— Skips TLS certificate verification for testing environments--ca-cert— Specifies a custom CA certificate file for enterprise PKI environments
Probe quality fixes:
- Fixed Ollama probe false-positiving on Ollama-compatible servers (SGLang, KoboldCpp) by requiring the
"families"field in/api/tagsresponses - Fixed
header.containsrules that silently failed on HTTP/2 connections — this affected 5 cloud probes (AWS Bedrock, Cloudflare AI Gateway, Fireworks AI, Modal, OmniRoute) - Removed overly generic detection blocks from Bifrost, DeepSpeed-MII, and Groq that caused cross-probe false positives
What this means for your assessments
If you’re running Julius as part of your attack surface discovery workflow, update to v0.2.0:
$ go install github.com/praetorian-inc/julius/cmd/julius@latest
$ julius probe
For enterprise environments with internal CAs:
$ julius probe --ca-cert /path/to/ca.pem
All 63 probes are embedded in the binary. No external config, no probe downloads, no API keys.
The coverage now spans the full AI infrastructure stack: from cloud-managed inference (Bedrock, Azure OpenAI, Vertex AI) through self-hosted serving (SGLang, TensorRT-LLM, Triton) to the RAG and orchestration layer (PrivateGPT, RAGFlow, Langflow). If an organization is running AI infrastructure, Julius should find it.
We’re continuing to expand probe coverage as new tools emerge. If there’s a service you’re seeing in the wild that Julius doesn’t cover, open an issue or submit a PR. Probes are simple YAML files — you can test locally with julius validate ./probes before submitting.
FAQ
What’s the difference between Julius and model fingerprinting tools? Model fingerprinting identifies which LLM generated a piece of text. Julius identifies the server infrastructure: what software is running on the endpoint. Think of it as service detection for AI, similar to what Nmap does for traditional services.
Does Julius send anything malicious? No. Julius sends standard HTTP requests (GET/POST to known paths) and analyzes the responses. It doesn’t exploit vulnerabilities, submit prompts, or modify anything on the target. It’s passive fingerprinting.
How do probes get validated before release? Every probe is tested against live instances of the target service and cross-tested against other LLM services to confirm zero false positives. This release also fixed several cross-probe false positives from v0.1.x.
Can I add detection for a service Julius doesn’t support yet? Yes. Probes are defined in simple YAML files. The contributing guide walks through the format, and you can test locally with julius validate ./probes before submitting a PR.
Why is there a breaking API change? The NewScanner() signature now requires maxResponseSize and tlsConfig parameters. This was necessary to add response size limiting (preventing OOM from malicious servers) and TLS configuration for enterprise environments. If you’re only using the CLI, nothing changes.