Global Configuration
This guide covers the configuration options for the Semantic Router. The system uses a single YAML configuration file that controls all aspects of routing, classification, and security.
Configuration File​
The configuration file is located at config/config.yaml. Here's the structure based on the actual implementation:
# config/config.yaml - Actual configuration structure
# BERT model for semantic similarity
bert_model:
  model_id: sentence-transformers/all-MiniLM-L12-v2
  threshold: 0.6
  use_cpu: true
# Semantic caching
semantic_cache:
  backend_type: "memory"  # Options: "memory" or "milvus"
  enabled: false
  similarity_threshold: 0.8
  max_entries: 1000
  ttl_seconds: 3600
  eviction_policy: "fifo"  # Options: "fifo", "lru", "lfu"
# Tool auto-selection
tools:
  enabled: false
  top_k: 3
  similarity_threshold: 0.2
  tools_db_path: "config/tools_db.json"
  fallback_to_empty: true
# Jailbreak protection
prompt_guard:
  enabled: false
  use_modernbert: true
  model_id: "models/jailbreak_classifier_modernbert-base_model"
  threshold: 0.7
  use_cpu: true
# vLLM endpoints - your backend models
vllm_endpoints:
  - name: "endpoint1"
    address: "your-server.com"  # Replace with your server
    port: 11434
    models:
      - "your-model"           # Replace with your model
    weight: 1
# Model configuration
model_config:
  "your-model":
    pii_policy:
      allow_by_default: true
      pii_types_allowed: ["EMAIL_ADDRESS", "PERSON"]
    preferred_endpoints: ["endpoint1"]
# Classification models
classifier:
  category_model:
    model_id: "models/category_classifier_modernbert-base_model"
    use_modernbert: true
    threshold: 0.6
    use_cpu: true
  pii_model:
    model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
    use_modernbert: true
    threshold: 0.7
    use_cpu: true
# Categories and routing rules
categories:
- name: math
  use_reasoning: true  # Enable reasoning for math
  model_scores:
  - model: your-model
    score: 1.0
- name: computer science
  use_reasoning: true  # Enable reasoning for code
  model_scores:
  - model: your-model
    score: 1.0
- name: other
  use_reasoning: false # No reasoning for general queries
  model_scores:
  - model: your-model
    score: 0.8
default_model: your-model
# Reasoning family configurations - define how different model families handle reasoning syntax
reasoning_families:
  deepseek:
    type: "chat_template_kwargs"
    parameter: "thinking"
  
  qwen3:
    type: "chat_template_kwargs"
    parameter: "enable_thinking"
  
  gpt-oss:
    type: "reasoning_effort"
    parameter: "reasoning_effort"
  
  gpt:
    type: "reasoning_effort"
    parameter: "reasoning_effort"
# Global default reasoning effort level
default_reasoning_effort: "medium"
# Model configurations - assign reasoning families to specific models
model_config:
  # Example: DeepSeek model with custom name
  "ds-v31-custom":
    reasoning_family: "deepseek"  # This model uses DeepSeek reasoning syntax
    preferred_endpoints: ["endpoint1"]
  
  # Example: Qwen3 model with custom name
  "my-qwen3-model":
    reasoning_family: "qwen3"     # This model uses Qwen3 reasoning syntax  
    preferred_endpoints: ["endpoint2"]
  
  # Example: Model without reasoning support
  "phi4":
    # No reasoning_family field - this model doesn't support reasoning mode
    preferred_endpoints: ["endpoint1"]
Key Configuration Sections​
Backend Endpoints​
Configure your LLM servers:
vllm_endpoints:
  - name: "my_endpoint"
    address: "127.0.0.1"  # Your server IP
    port: 8000                # Your server port
    models:
      - "llama2-7b"          # Model name
    weight: 1                 # Load balancing weight
Model Settings​
Configure model-specific settings:
model_config:
  "llama2-7b":
    pii_policy:
      allow_by_default: true    # Allow PII by default
      pii_types_allowed: ["EMAIL_ADDRESS", "PERSON"]
    preferred_endpoints: ["my_endpoint"]