Toxicity Detection Calibration

U

@

·

Calibrate toxicity detection with labeled examples.

71 copies0 forks
Detect toxicity in {{content_samples}} using {{model}}.

Calibration samples:
- "{{toxic_example}}" → Toxic (harassment)
- "{{borderline_example}}" → Borderline (aggressive tone)
- "{{safe_example}}" → Safe

Classify remaining content and compare against {{human_labels}}.

Details

Category

Analysis

Use Cases

Toxicity calibrationContent moderationSafety testing

Works Best With

claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared

Create your own prompt vault and start sharing

Toxicity Detection Calibration | Promptsy