Synthetic monitoring tool for LLM inference endpoints that measures TTFT, latency, throughput, and errors across major providers like OpenAI, Anthropic, Google, and Azure. Includes CLI and MCP server with Prometheus and OpenTelemetry export capabilities.

What it does

llmprobe is a synthetic monitoring solution designed to track and measure the performance of LLM inference endpoints across multiple providers. It monitors key metrics including Time-to-First-Token (TTFT), latency, throughput, and error rates to ensure optimal LLM service reliability and performance.

Supported Providers

Works with OpenAI, Anthropic, Google, Azure, AWS Bedrock, and local inference servers including vLLM, SGLang, and Ollama. This multi-provider support makes it ideal for teams managing heterogeneous LLM infrastructure.

Key features

Real-time synthetic monitoring of LLM endpoints
Multi-provider support (OpenAI, Anthropic, Google, Azure, Bedrock, local servers)
Comprehensive metrics: TTFT, latency, throughput, error tracking
Prometheus and OpenTelemetry export for integration with observability stacks
CLI interface for manual testing and automation
MCP server for seamless integration with AI tools
Cross-platform compatibility (macOS, Windows, Linux, cloud, local)

Jwrede/llmprobe

What it does

Supported Providers

Key features

関連スキル

Monday.com MCP Server

Sentry MCP Server

Cloudflare MCP Server