Detect toxicity in {{content_samples}} using {{model}}. Calibration samples: - "{{toxic_example}}" → Toxic (harassment) - "{{borderline_example}}" → Borderline (aggressive tone) - "{{safe_example}}" → Safe Classify remaining content and compare against {{human_labels}}.
71 copies0 forks
Details
Category
AnalysisUse Cases
Toxicity calibrationContent moderationSafety testing
Works Best With
claude-opus-4.5gpt-5.2gemini-2.0-flash
Created Shared