Contextual Bandits for Model Selection

U

@

·

Implement contextual bandits for adaptive LLM model selection using LinUCB or Thompson Sampling with online learning updates.

97 copies0 forks
Implement contextual bandits for adaptive model selection.

## Model Options
{{model_options}}

## Context Features
{{context_features}}

## Optimization Goal
{{optimization_goal}}

Build the bandit system:

```python
class ModelSelectionBandit:
    def __init__(self, models: List[str], context_dim: int):
        pass
    
    def select_model(self, context: np.ndarray, exploration_rate: float) -> str:
        """
        Algorithms:
        - LinUCB
        - Thompson Sampling
        - Epsilon-greedy
        """
        pass
    
    def update(self, context: np.ndarray, model: str, reward: float) -> None:
        """Update model with observed reward"""
        pass
    
    def get_model_stats(self) -> Dict[str, ModelStats]:
        """Return selection stats and confidence"""
        pass
```

Include:
- Reward function design
- Exploration vs exploitation tuning
- Cold start handling
- Online learning updates

Details

Category

Coding

Use Cases

Adaptive selectionModel optimizationOnline learning

Works Best With

claude-sonnet-4-20250514gpt-4o
Created Shared

Create your own prompt vault and start sharing