CLI tool for benchmarking and evaluating AI coding agents like Claude, Codex, and Gemini using your own API keys. Run evals on tasks you understand with support for multiple LLM providers.
nasde-toolkit is a command-line interface for benchmarking and evaluating AI coding agents across multiple providers. It allows developers to test and compare the performance of Claude, Codex, Gemini, and other LLMs on custom coding tasks using their own subscriptions or API keys.
Install nasde-toolkit via pip or clone from GitHub. Configure your API keys for Claude, Codex, and/or Gemini in environment variables. Define your benchmark tasks and run evaluation commands through the CLI to generate detailed performance reports and comparisons.
Monday.com MCP Server streamlines board management, item operations, and workflow automation for teams. I…
by NotionFlow
Sentry MCP Server provides comprehensive error tracking and performance monitoring, helping developers id…
by AnalyticsPro
Cloudflare MCP Server simplifies Cloudflare management by providing tools for DNS management, Workers dep…
by PricingBot