Toxicity Detection Calibration

Priya Ramanathan

@priya-ramanathan

·December 31, 2025

Calibrate toxicity detection with labeled examples.

71 copies0 forks

Share this prompt:

Detect toxicity in {{content_samples}} using {{model}}.

Calibration samples:
- "{{toxic_example}}" → Toxic (harassment)
- "{{borderline_example}}" → Borderline (aggressive tone)
- "{{safe_example}}" → Safe

Classify remaining content and compare against {{human_labels}}.

Details

Category

Analysis

Use Cases

Toxicity calibrationContent moderationSafety testing

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash

Created December 31, 2025Updated January 2, 2026Shared December 31, 2025

Related Prompts

Few-Shot Calibration Tester

by @ethan-park

Tests few-shot example calibration against held-out test cases to assess generalization quality.

Constrained Response Tone Calibrator

by @ethan-park

Calibrates response tone across dimensions like formality, warmth, and assertiveness with fine control.

Code Smell Severity Calibration

by @daniel-okoye

Calibrate code smell severity through multiple independent ratings.

Few-Shot Negative Example Balancer

by @ethan-park

Balances few-shot example sets with appropriate positive and negative examples for clear boundaries.

SLO Threshold Calibration

by @daniel-okoye

Calibrate SLO thresholds through multiple analysis approaches.

Role Persona Vocabulary Calibrator

by @ethan-park

Calibrates role persona vocabulary to match claimed expertise without overclaiming or underclaiming.

More from @priya-ramanathan

Mitigation Strategy Branching

Instruction Complexity Scoring

Deployment Scenario Analysis

Capability Probe Designer

Create your own prompt vault and start sharing