Evaluation framework for testing LLM knowledge inputs including prompts, RAG corpora, and agent workflows. Features statistical rigor with bootstrap confidence intervals and Krippendorff's alpha for researchers and engineers.
oh-my-knowledge is an evaluation framework designed to systematically assess and improve LLM knowledge inputs. It allows you to fix your model while varying the artifacts being evaluated—prompts, RAG corpora, skills, and agent workflows. The framework provides built-in statistical rigor to ensure reliable, reproducible evaluation results.
Clone the repository from GitHub and install dependencies. The tool is designed as a Python-based framework that integrates with Claude and other LLMs. Detailed setup instructions are available in the project documentation. Users can then define their evaluation scenarios and run statistical analyses on their LLM artifacts.
Thesis Structure Helper assists students and researchers in organizing their academic theses by providing…
by FormAI
Dependency Auditor helps developers audit project dependencies for security vulnerabilities, licensing is…
by EnergyAI
FAQ Generator Pro auto-generates FAQ pages from support tickets, documentation, and product information.…
by EmailForge