Synthetic Data Generator for RAG Testing

U

@

·

Build a synthetic data generator for RAG testing covering document generation, QA pairs, and edge cases with quality validation.

94 copies0 forks
Build a synthetic data generator for testing RAG systems.

## Domain
{{domain_description}}

## Test Scenarios
{{test_scenarios}}

## Data Requirements
{{data_requirements}}

Create a comprehensive generator:

```python
class SyntheticRAGDataGenerator:
    def generate_documents(self, count: int, config: DocConfig) -> List[Document]:
        """Generate realistic documents"""
        pass
    
    def generate_qa_pairs(self, documents: List[Document], count: int) -> List[QAPair]:
        """Generate question-answer pairs with citations"""
        pass
    
    def generate_edge_cases(self, scenario: str) -> List[TestCase]:
        """Generate challenging test cases"""
        pass
```

Edge cases to generate:
- Multi-hop reasoning queries
- Ambiguous questions
- Out-of-scope queries
- Adversarial inputs
- Long-tail topics

Include quality validation for generated data.

Details

Category

Coding

Use Cases

Test data generationRAG testingQuality assurance

Works Best With

claude-sonnet-4-20250514gpt-4o
Created Shared

Create your own prompt vault and start sharing

Synthetic Data Generator for RAG Testing | Promptsy