Evaluate {{model}} on {{evaluation_set}} within {{token_budget}} tokens. Constraints: - Maximum {{input_tokens}} input tokens per query - Maximum {{output_tokens}} output tokens per response - No multi-turn conversations Report accuracy achieved within constraints. Compare to unconstrained baseline. Recommend optimal token allocation strategy.
29 copies0 forks
Details
Category
AnalysisUse Cases
Budget evaluationToken optimizationConstrained testing
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared