文章 代码库 城市生活记忆 Claude Skill AI分享 问龙虾
返回 Claude Skill

DSPy Ruby

Ruby 中的 DSPy 框架实现,用于编程式 LLM 管道

研究 社区公开 by Community

DSPy.rb

Build LLM apps like you build software. Type-safe, modular, testable.

DSPy.rb brings software engineering best practices to LLM development. Instead of tweaking prompts, define what you want with Ruby types and let DSPy handle the rest.

Overview

DSPy.rb is a Ruby framework for building language model applications with programmatic prompts. It provides:

  • Type-safe signatures — Define inputs/outputs with Sorbet types
  • Modular components — Compose and reuse LLM logic
  • Automatic optimization — Use data to improve prompts, not guesswork
  • Production-ready — Built-in observability, testing, and error handling

Core Concepts

1. Signatures

Define interfaces between your app and LLMs using Ruby types:

class EmailClassifier < DSPy::Signature
  description "Classify customer support emails by category and priority"

  class Priority < T::Enum
    enums do
      Low = new('low')
      Medium = new('medium')
      High = new('high')
      Urgent = new('urgent')
    end
  end

  input do
    const :email_content, String
    const :sender, String
  end

  output do
    const :category, String
    const :priority, Priority  # Type-safe enum with defined values
    const :confidence, Float
  end
end

2. Modules

Build complex workflows from simple building blocks:

  • Predict — Basic LLM calls with signatures
  • ChainOfThought — Step-by-step reasoning
  • ReAct — Tool-using agents
  • CodeAct — Dynamic code generation agents (install the dspy-code_act gem)

3. Tools & Toolsets

Create type-safe tools for agents with comprehensive Sorbet support:

# Enum-based tool with automatic type conversion
class CalculatorTool < DSPy::Tools::Base
  tool_name 'calculator'
  tool_description 'Performs arithmetic operations with type-safe enum inputs'

  class Operation < T::Enum
    enums do
      Add = new('add')
      Subtract = new('subtract')
      Multiply = new('multiply')
      Divide = new('divide')
    end
  end

  sig { params(operation: Operation, num1: Float, num2: Float).returns(T.any(Float, String)) }
  def call(operation:, num1:, num2:)
    case operation
    when Operation::Add then num1 + num2
    when Operation::Subtract then num1 - num2
    when Operation::Multiply then num1 * num2
    when Operation::Divide
      return "Error: Division by zero" if num2 == 0
      num1 / num2
    end
  end
end

# Multi-tool toolset with rich types
class DataToolset < DSPy::Tools::Toolset
  toolset_name "data_processing"

  class Format < T::Enum
    enums do
      JSON = new('json')
      CSV = new('csv')
      XML = new('xml')
    end
  end

  tool :convert, description: "Convert data between formats"
  tool :validate, description: "Validate data structure"

  sig { params(data: String, from: Format, to: Format).returns(String) }
  def convert(data:, from:, to:)
    "Converted from #{from.serialize} to #{to.serialize}"
  end

  sig { params(data: String, format: Format).returns(T::Hash[String, T.any(String, Integer, T::Boolean)]) }
  def validate(data:, format:)
    { valid: true, format: format.serialize, row_count: 42, message: "Data validation passed" }
  end
end

4. Type System & Discriminators

DSPy.rb uses sophisticated type discrimination for complex data structures:

  • Automatic _type field injection — DSPy adds discriminator fields to structs for type safety
  • Union type supportT.any() types automatically disambiguated by _type
  • Reserved field name — Avoid defining your own _type fields in structs
  • Recursive filtering_type fields filtered during deserialization at all nesting levels

5. Optimization

Improve accuracy with real data:

  • MIPROv2 — Advanced multi-prompt optimization with bootstrap sampling and Bayesian optimization
  • GEPA — Genetic-Pareto Reflective Prompt Evolution with feedback maps, experiment tracking, and telemetry
  • Evaluation — Comprehensive framework with built-in and custom metrics, error handling, and batch processing

Quick Start

# Install
gem 'dspy'

# Configure
DSPy.configure do |c|
  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
end

# Define a task
class SentimentAnalysis < DSPy::Signature
  description "Analyze sentiment of text"

  input do
    const :text, String
  end

  output do
    const :sentiment, String  # positive, negative, neutral
    const :score, Float       # 0.0 to 1.0
  end
end

# Use it
analyzer = DSPy::Predict.new(SentimentAnalysis)
result = analyzer.call(text: "This product is amazing!")
puts result.sentiment  # => "positive"
puts result.score      # => 0.92

Provider Adapter Gems

Two strategies for connecting to LLM providers:

Per-provider adapters (direct SDK access)

# Gemfile
gem 'dspy'
gem 'dspy-openai'    # OpenAI, OpenRouter, Ollama
gem 'dspy-anthropic' # Claude
gem 'dspy-gemini'    # Gemini

Each adapter gem pulls in the official SDK (openai, anthropic, gemini-ai).

# Gemfile
gem 'dspy'
gem 'dspy-ruby_llm'  # Routes to any provider via ruby_llm
gem 'ruby_llm'

RubyLLM handles provider routing based on the model name. Use the ruby_llm/ prefix:

DSPy.configure do |c|
  c.lm = DSPy::LM.new('ruby_llm/gemini-2.5-flash', structured_outputs: true)
  # c.lm = DSPy::LM.new('ruby_llm/claude-sonnet-4-20250514', structured_outputs: true)
  # c.lm = DSPy::LM.new('ruby_llm/gpt-4o-mini', structured_outputs: true)
end

Events System

DSPy.rb ships with a structured event bus for observing runtime behavior.

Module-Scoped Subscriptions (preferred for agents)

class MyAgent < DSPy::Module
  subscribe 'lm.tokens', :track_tokens, scope: :descendants

  def track_tokens(_event, attrs)
    @total_tokens += attrs.fetch(:total_tokens, 0)
  end
end

Global Subscriptions (for observability/integrations)

subscription_id = DSPy.events.subscribe('score.create') do |event, attrs|
  Langfuse.export_score(attrs)
end

# Wildcards supported
DSPy.events.subscribe('llm.*') { |name, attrs| puts "[#{name}] tokens=#{attrs[:total_tokens]}" }

Event names use dot-separated namespaces (llm.generate, react.iteration_complete). Every event includes module metadata (module_path, module_leaf, module_scope.ancestry_token) for filtering.

Lifecycle Callbacks

Rails-style lifecycle hooks ship with every DSPy::Module:

  • before — Runs ahead of forward for setup (metrics, context loading)
  • around — Wraps forward, calls yield, and lets you pair setup/teardown logic
  • after — Fires after forward returns for cleanup or persistence
class InstrumentedModule < DSPy::Module
  before :setup_metrics
  around :manage_context
  after :log_metrics

  def forward(question:)
    @predictor.call(question: question)
  end

  private

  def setup_metrics
    @start_time = Time.now
  end

  def manage_context
    load_context
    result = yield
    save_context
    result
  end

  def log_metrics
    duration = Time.now - @start_time
    Rails.logger.info "Prediction completed in #{duration}s"
  end
end

Execution order: before → around (before yield) → forward → around (after yield) → after. Callbacks are inherited from parent classes and execute in registration order.

Fiber-Local LM Context

Override the language model temporarily using fiber-local storage:

fast_model = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])

DSPy.with_lm(fast_model) do
  result = classifier.call(text: "test")  # Uses fast_model inside this block
end
# Back to global LM outside the block

LM resolution hierarchy: Instance-level LM → Fiber-local LM (DSPy.with_lm) → Global LM (DSPy.configure).

Use configure_predictor for fine-grained control over agent internals:

agent = DSPy::ReAct.new(MySignature, tools: tools)
agent.configure { |c| c.lm = default_model }
agent.configure_predictor('thought_generator') { |c| c.lm = powerful_model }

Evaluation Framework

Systematically test LLM application performance with DSPy::Evals:

metric = DSPy::Metrics.exact_match(field: :answer, case_sensitive: false)
evaluator = DSPy::Evals.new(predictor, metric: metric)
result = evaluator.evaluate(test_examples, display_table: true)
puts "Pass Rate: #{(result.pass_rate * 100).round(1)}%"

Built-in metrics: exact_match, contains, numeric_difference, composite_and. Custom metrics return true/false or a DSPy::Prediction with score: and feedback: fields.

Use DSPy::Example for typed test data and export_scores: true to push results to Langfuse.

GEPA Optimization

GEPA (Genetic-Pareto Reflective Prompt Evolution) uses reflection-driven instruction rewrites:

gem 'dspy-gepa'

teleprompter = DSPy::Teleprompt::GEPA.new(
  metric: metric,
  reflection_lm: DSPy::ReflectionLM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']),
  feedback_map: feedback_map,
  config: { max_metric_calls: 600, minibatch_size: 6 }
)

result = teleprompter.compile(program, trainset: train, valset: val)
optimized_program = result.optimized_program

The metric must return DSPy::Prediction.new(score:, feedback:) so the reflection model can reason about failures. Use feedback_map to target individual predictors in composite modules.

Typed Context Pattern

Replace opaque string context blobs with T::Struct inputs. Each field gets its own description: annotation in the JSON schema the LLM sees:

class NavigationContext < T::Struct
  const :workflow_hint, T.nilable(String),
        description: "Current workflow phase guidance for the agent"
  const :action_log, T::Array[String], default: [],
        description: "Compact one-line-per-action history of research steps taken"
  const :iterations_remaining, Integer,
        description: "Budget remaining. Each tool call costs 1 iteration."
end

class ToolSelectionSignature < DSPy::Signature
  input do
    const :query, String
    const :context, NavigationContext  # Structured, not an opaque string
  end

  output do
    const :tool_name, String
    const :tool_args, String, description: "JSON-encoded arguments"
  end
end

Benefits: type safety at compile time, per-field descriptions in the LLM schema, easy to test as value objects, extensible by adding const declarations.

Schema Formats (BAML / TOON)

Control how DSPy describes signature structure to the LLM:

  • JSON Schema (default) — Standard format, works with structured_outputs: true
  • BAML (schema_format: :baml) — 84% token reduction for Enhanced Prompting mode. Requires sorbet-baml gem.
  • TOON (schema_format: :toon, data_format: :toon) — Table-oriented format for both schemas and data. Enhanced Prompting mode only.

BAML and TOON apply only when structured_outputs: false. With structured_outputs: true, the provider receives JSON Schema directly.

Storage System

Persist and reload optimized programs with DSPy::Storage::ProgramStorage:

storage = DSPy::Storage::ProgramStorage.new(storage_path: "./dspy_storage")
storage.save_program(result.optimized_program, result, metadata: { optimizer: 'MIPROv2' })

Supports checkpoint management, optimization history tracking, and import/export between environments.

Rails Integration

Directory Structure

Organize DSPy components using Rails conventions:

app/
  entities/          # T::Struct types shared across signatures
  signatures/        # DSPy::Signature definitions
  tools/             # DSPy::Tools::Base implementations
    concerns/        # Shared tool behaviors (error handling, etc.)
  modules/           # DSPy::Module orchestrators
  services/          # Plain Ruby services that compose DSPy modules
config/
  initializers/
    dspy.rb          # DSPy + provider configuration
    feature_flags.rb # Model selection per role
spec/
  signatures/        # Schema validation tests
  tools/             # Tool unit tests
  modules/           # Integration tests with VCR
  vcr_cassettes/     # Recorded HTTP interactions

Initializer

# config/initializers/dspy.rb
Rails.application.config.after_initialize do
  next if Rails.env.test? && ENV["DSPY_ENABLE_IN_TEST"].blank?

  RubyLLM.configure do |config|
    config.gemini_api_key = ENV["GEMINI_API_KEY"] if ENV["GEMINI_API_KEY"].present?
    config.anthropic_api_key = ENV["ANTHROPIC_API_KEY"] if ENV["ANTHROPIC_API_KEY"].present?
    config.openai_api_key = ENV["OPENAI_API_KEY"] if ENV["OPENAI_API_KEY"].present?
  end

  model = ENV.fetch("DSPY_MODEL", "ruby_llm/gemini-2.5-flash")
  DSPy.configure do |config|
    config.lm = DSPy::LM.new(model, structured_outputs: true)
    config.logger = Rails.logger
  end

  # Langfuse observability (optional)
  if ENV["LANGFUSE_PUBLIC_KEY"].present? && ENV["LANGFUSE_SECRET_KEY"].present?
    DSPy::Observability.configure!
  end
end

Feature-Flagged Model Selection

Use different models for different roles (fast/cheap for classification, powerful for synthesis):

# config/initializers/feature_flags.rb
module FeatureFlags
  SELECTOR_MODEL = ENV.fetch("DSPY_SELECTOR_MODEL", "ruby_llm/gemini-2.5-flash-lite")
  SYNTHESIZER_MODEL = ENV.fetch("DSPY_SYNTHESIZER_MODEL", "ruby_llm/gemini-2.5-flash")
end

Then override per-tool or per-predictor:

class ClassifyTool < DSPy::Tools::Base
  def call(query:)
    predictor = DSPy::Predict.new(ClassifyQuery)
    predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SELECTOR_MODEL, structured_outputs: true) }
    predictor.call(query: query)
  end
end

Schema-Driven Signatures

Prefer typed schemas over string descriptions. Let the type system communicate structure to the LLM rather than prose in the signature description.

Entities as Shared Types

Define reusable T::Struct and T::Enum types in app/entities/ and reference them across signatures:

# app/entities/search_strategy.rb
class SearchStrategy < T::Enum
  enums do
    SingleSearch = new("single_search")
    DateDecomposition = new("date_decomposition")
  end
end

# app/entities/scored_item.rb
class ScoredItem < T::Struct
  const :id, String
  const :score, Float, description: "Relevance score 0.0-1.0"
  const :verdict, String, description: "relevant, maybe, or irrelevant"
  const :reason, String, default: ""
end

Schema vs Description: When to Use Each

Use schemas (T::Struct/T::Enum) for:

  • Multi-field outputs with specific types
  • Enums with defined values the LLM must pick from
  • Nested structures, arrays of typed objects
  • Outputs consumed by code (not displayed to users)

Use string descriptions for:

  • Simple single-field outputs where the type is String
  • Natural language generation (summaries, answers)
  • Fields where constraint guidance helps (e.g., description: "YYYY-MM-DD format")

Rule of thumb: If you’d write a case statement on the output, it should be a T::Enum. If you’d call .each on it, it should be T::Array[SomeStruct].

Tool Patterns

Tools That Wrap Predictions

A common pattern: tools encapsulate a DSPy prediction, adding error handling, model selection, and serialization:

class RerankTool < DSPy::Tools::Base
  tool_name "rerank"
  tool_description "Score and rank search results by relevance"

  MAX_ITEMS = 200
  MIN_ITEMS_FOR_LLM = 5

  sig { params(query: String, items: T::Array[T::Hash[Symbol, T.untyped]]).returns(T::Hash[Symbol, T.untyped]) }
  def call(query:, items: [])
    return { scored_items: items, reranked: false } if items.size < MIN_ITEMS_FOR_LLM

    capped_items = items.first(MAX_ITEMS)
    predictor = DSPy::Predict.new(RerankSignature)
    predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SYNTHESIZER_MODEL, structured_outputs: true) }

    result = predictor.call(query: query, items: capped_items)
    { scored_items: result.scored_items, reranked: true }
  rescue => e
    Rails.logger.warn "[RerankTool] LLM rerank failed: #{e.message}"
    { error: "Rerank failed: #{e.message}", scored_items: items, reranked: false }
  end
end

Key patterns:

  • Short-circuit LLM calls when unnecessary (small data, trivial cases)
  • Cap input size to prevent token overflow
  • Per-tool model selection via configure
  • Graceful error handling with fallback data

Error Handling Concern

module ErrorHandling
  extend ActiveSupport::Concern

  private

  def safe_predict(signature_class, **inputs)
    predictor = DSPy::Predict.new(signature_class)
    yield predictor if block_given?
    predictor.call(**inputs)
  rescue Faraday::Error, Net::HTTPError => e
    Rails.logger.error "[#{self.class.name}] API error: #{e.message}"
    nil
  rescue JSON::ParserError => e
    Rails.logger.error "[#{self.class.name}] Invalid LLM output: #{e.message}"
    nil
  end
end

Observability

Tracing with DSPy::Context

Wrap operations in spans for Langfuse/OpenTelemetry visibility:

result = DSPy::Context.with_span(
  operation: "tool_selector.select",
  "dspy.module" => "ToolSelector",
  "tool_selector.tools" => tool_names.join(",")
) do
  @predictor.call(query: query, context: context, available_tools: schemas)
end

Setup for Langfuse

# Gemfile
gem 'dspy-o11y'
gem 'dspy-o11y-langfuse'

# .env
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...
DSPY_TELEMETRY_BATCH_SIZE=5

Every DSPy::Predict, DSPy::ReAct, and tool call is automatically traced when observability is configured.

Score Reporting

Report evaluation scores to Langfuse:

DSPy.score(name: "relevance", value: 0.85, trace_id: current_trace_id)

Testing

VCR Setup for Rails

VCR.configure do |config|
  config.cassette_library_dir = "spec/vcr_cassettes"
  config.hook_into :webmock
  config.configure_rspec_metadata!
  config.filter_sensitive_data('<GEMINI_API_KEY>') { ENV['GEMINI_API_KEY'] }
  config.filter_sensitive_data('<OPENAI_API_KEY>') { ENV['OPENAI_API_KEY'] }
end

Signature Schema Tests

Test that signatures produce valid schemas without calling any LLM:

RSpec.describe ClassifyResearchQuery do
  it "has required input fields" do
    schema = described_class.input_json_schema
    expect(schema[:required]).to include("query")
  end

  it "has typed output fields" do
    schema = described_class.output_json_schema
    expect(schema[:properties]).to have_key(:search_strategy)
  end
end

Tool Tests with Mocked Predictions

RSpec.describe RerankTool do
  let(:tool) { described_class.new }

  it "skips LLM for small result sets" do
    expect(DSPy::Predict).not_to receive(:new)
    result = tool.call(query: "test", items: [{ id: "1" }])
    expect(result[:reranked]).to be false
  end

  it "calls LLM for large result sets", :vcr do
    items = 10.times.map { |i| { id: i.to_s, title: "Item #{i}" } }
    result = tool.call(query: "relevant items", items: items)
    expect(result[:reranked]).to be true
  end
end

Resources

  • core-concepts.md — Signatures, modules, predictors, type system deep-dive
  • toolsets.md — Tools::Base, Tools::Toolset DSL, type safety, testing
  • providers.md — Provider adapters, RubyLLM, fiber-local LM context, compatibility matrix
  • optimization.md — MIPROv2, GEPA, evaluation framework, storage system
  • observability.md — Event system, dspy-o11y gems, Langfuse, score reporting
  • signature-template.rb — Signature scaffold with T::Enum, Date/Time, defaults, union types
  • module-template.rb — Module scaffold with .call(), lifecycle callbacks, fiber-local LM
  • config-template.rb — Rails initializer with RubyLLM, observability, feature flags

Key URLs

Guidelines for Claude

When helping users with DSPy.rb:

  1. Schema over prose — Define output structure with T::Struct and T::Enum types, not string descriptions
  2. Entities in app/entities/ — Extract shared types so signatures stay thin
  3. Per-tool model selection — Use predictor.configure { |c| c.lm = ... } to pick the right model per task
  4. Short-circuit LLM calls — Skip the LLM for trivial cases (small data, cached results)
  5. Cap input sizes — Prevent token overflow by limiting array sizes before sending to LLM
  6. Test schemas without LLM — Validate input_json_schema and output_json_schema in unit tests
  7. VCR for integration tests — Record real HTTP interactions, never mock LLM responses by hand
  8. Trace with spans — Wrap tool calls in DSPy::Context.with_span for observability
  9. Graceful degradation — Always rescue LLM errors and return fallback data

Signature Best Practices

Keep description concise — The signature description should state the goal, not the field details:

# Good — concise goal
class ParseOutline < DSPy::Signature
  description 'Extract block-level structure from HTML as a flat list of skeleton sections.'

  input do
    const :html, String, description: 'Raw HTML to parse'
  end

  output do
    const :sections, T::Array[Section], description: 'Block elements: headings, paragraphs, code blocks, lists'
  end
end

Use defaults over nilable arrays — For OpenAI structured outputs compatibility:

# Good — works with OpenAI structured outputs
class ASTNode < T::Struct
  const :children, T::Array[ASTNode], default: []
end

Recursive Types with $defs

DSPy.rb supports recursive types in structured outputs using JSON Schema $defs:

class TreeNode < T::Struct
  const :value, String
  const :children, T::Array[TreeNode], default: []  # Self-reference
end

The schema generator automatically creates #/$defs/TreeNode references for recursive types, compatible with OpenAI and Gemini structured outputs.

Field Descriptions for T::Struct

DSPy.rb extends T::Struct to support field-level description: kwargs that flow to JSON Schema:

class ASTNode < T::Struct
  const :node_type, NodeType, description: 'The type of node (heading, paragraph, etc.)'
  const :text, String, default: "", description: 'Text content of the node'
  const :level, Integer, default: 0  # No description — field is self-explanatory
  const :children, T::Array[ASTNode], default: []
end

When to use field descriptions: complex field semantics, enum-like strings, constrained values, nested structs with ambiguous names. When to skip: self-explanatory fields like name, id, url, or boolean flags.

Version

Current: 0.34.3


Reference: Core Concepts

DSPy.rb Core Concepts

Signatures

Signatures define the interface between application code and language models. They specify inputs, outputs, and a task description using Sorbet types for compile-time and runtime type safety.

Structure

class ClassifyEmail < DSPy::Signature
  description "Classify customer support emails by urgency and category"

  input do
    const :subject, String
    const :body, String
  end

  output do
    const :category, String
    const :urgency, String
  end
end

Supported Types

TypeJSON SchemaNotes
StringstringRequired string
IntegerintegerWhole numbers
FloatnumberDecimal numbers
T::Booleanbooleantrue/false
T::Array[X]arrayTyped arrays
T::Hash[K, V]objectTyped key-value maps
T.nilable(X)nullableOptional fields
Datestring (ISO 8601)Auto-converted
DateTimestring (ISO 8601)Preserves timezone
Timestring (ISO 8601)Converted to UTC

Date and Time Types

Date, DateTime, and Time fields serialize to ISO 8601 strings and auto-convert back to Ruby objects on output.

class EventScheduler < DSPy::Signature
  description "Schedule events based on requirements"

  input do
    const :start_date, Date                  # ISO 8601: YYYY-MM-DD
    const :preferred_time, DateTime          # ISO 8601 with timezone
    const :deadline, Time                    # Converted to UTC
    const :end_date, T.nilable(Date)         # Optional date
  end

  output do
    const :scheduled_date, Date              # String from LLM, auto-converted to Date
    const :event_datetime, DateTime          # Preserves timezone info
    const :created_at, Time                  # Converted to UTC
  end
end

predictor = DSPy::Predict.new(EventScheduler)
result = predictor.call(
  start_date: "2024-01-15",
  preferred_time: "2024-01-15T10:30:45Z",
  deadline: Time.now,
  end_date: nil
)

result.scheduled_date.class  # => Date
result.event_datetime.class  # => DateTime

Timezone conventions follow ActiveRecord: Time objects convert to UTC, DateTime objects preserve timezone, Date objects are timezone-agnostic.

Enums with T::Enum

Define constrained output values using T::Enum classes. Do not use inline T.enum([...]) syntax.

class SentimentAnalysis < DSPy::Signature
  description "Analyze sentiment of text"

  class Sentiment < T::Enum
    enums do
      Positive = new('positive')
      Negative = new('negative')
      Neutral = new('neutral')
    end
  end

  input do
    const :text, String
  end

  output do
    const :sentiment, Sentiment
    const :confidence, Float
  end
end

predictor = DSPy::Predict.new(SentimentAnalysis)
result = predictor.call(text: "This product is amazing!")

result.sentiment              # => #<Sentiment::Positive>
result.sentiment.serialize    # => "positive"
result.confidence             # => 0.92

Enum matching is case-insensitive. The LLM returning "POSITIVE" matches new('positive').

Default Values

Default values work on both inputs and outputs. Input defaults reduce caller boilerplate. Output defaults provide fallbacks when the LLM omits optional fields.

class SmartSearch < DSPy::Signature
  description "Search with intelligent defaults"

  input do
    const :query, String
    const :max_results, Integer, default: 10
    const :language, String, default: "English"
  end

  output do
    const :results, T::Array[String]
    const :total_found, Integer
    const :cached, T::Boolean, default: false
  end
end

search = DSPy::Predict.new(SmartSearch)
result = search.call(query: "Ruby programming")
# max_results defaults to 10, language defaults to "English"
# If LLM omits `cached`, it defaults to false

Field Descriptions

Add description: to any field to guide the LLM on expected content. These descriptions appear in the generated JSON schema sent to the model.

class ASTNode < T::Struct
  const :node_type, String, description: "The type of AST node (heading, paragraph, code_block)"
  const :text, String, default: "", description: "Text content of the node"
  const :level, Integer, default: 0, description: "Heading level 1-6, only for heading nodes"
  const :children, T::Array[ASTNode], default: []
end

ASTNode.field_descriptions[:node_type]  # => "The type of AST node ..."
ASTNode.field_descriptions[:children]   # => nil (no description set)

Field descriptions also work inside signature input and output blocks:

class ExtractEntities < DSPy::Signature
  description "Extract named entities from text"

  input do
    const :text, String, description: "Raw text to analyze"
    const :language, String, default: "en", description: "ISO 639-1 language code"
  end

  output do
    const :entities, T::Array[String], description: "List of extracted entity names"
    const :count, Integer, description: "Total number of unique entities found"
  end
end

Schema Formats

DSPy.rb supports three schema formats for communicating type structure to LLMs.

JSON Schema (default)

Verbose but universally supported. Access via YourSignature.output_json_schema.

BAML Schema

Compact format that reduces schema tokens by 80-85%. Requires the sorbet-baml gem.

DSPy.configure do |c|
  c.lm = DSPy::LM.new('openai/gpt-4o-mini',
    api_key: ENV['OPENAI_API_KEY'],
    schema_format: :baml
  )
end

BAML applies only in Enhanced Prompting mode (structured_outputs: false). When structured_outputs: true, the provider receives JSON Schema directly.

TOON Schema + Data Format

Table-oriented text format that shrinks both schema definitions and prompt values.

DSPy.configure do |c|
  c.lm = DSPy::LM.new('openai/gpt-4o-mini',
    api_key: ENV['OPENAI_API_KEY'],
    schema_format: :toon,
    data_format:   :toon
  )
end

schema_format: :toon replaces the schema block in the system prompt. data_format: :toon renders input values and output templates inside toon fences. Only works with Enhanced Prompting mode. The sorbet-toon gem is included automatically as a dependency.

Recursive Types

Structs that reference themselves produce $defs entries in the generated JSON schema, using $ref pointers to avoid infinite recursion.

class ASTNode < T::Struct
  const :node_type, String
  const :text, String, default: ""
  const :children, T::Array[ASTNode], default: []
end

The schema generator detects the self-reference in T::Array[ASTNode] and emits:

{
  "$defs": {
    "ASTNode": { "type": "object", "properties": { ... } }
  },
  "properties": {
    "children": {
      "type": "array",
      "items": { "$ref": "#/$defs/ASTNode" }
    }
  }
}

Access the schema with accumulated definitions via YourSignature.output_json_schema_with_defs.

Union Types with T.any()

Specify fields that accept multiple types:

output do
  const :result, T.any(Float, String)
end

For struct unions, DSPy.rb automatically adds a _type discriminator field to each struct’s JSON schema. The LLM returns _type in its response, and DSPy converts the hash to the correct struct instance.

class CreateTask < T::Struct
  const :title, String
  const :priority, String
end

class DeleteTask < T::Struct
  const :task_id, String
  const :reason, T.nilable(String)
end

class TaskRouter < DSPy::Signature
  description "Route user request to the appropriate task action"

  input do
    const :request, String
  end

  output do
    const :action, T.any(CreateTask, DeleteTask)
  end
end

result = DSPy::Predict.new(TaskRouter).call(request: "Create a task for Q4 review")
result.action.class  # => CreateTask
result.action.title  # => "Q4 Review"

Pattern matching works on the result:

case result.action
when CreateTask then puts "Creating: #{result.action.title}"
when DeleteTask then puts "Deleting: #{result.action.task_id}"
end

Union types also work inside arrays for heterogeneous collections:

output do
  const :events, T::Array[T.any(LoginEvent, PurchaseEvent)]
end

Limit unions to 2-4 types for reliable LLM comprehension. Use clear struct names since they become the _type discriminator values.


Modules

Modules are composable building blocks that wrap predictors. Define a forward method; invoke the module with .call().

Basic Structure

class SentimentAnalyzer < DSPy::Module
  def initialize
    super
    @predictor = DSPy::Predict.new(SentimentSignature)
  end

  def forward(text:)
    @predictor.call(text: text)
  end
end

analyzer = SentimentAnalyzer.new
result = analyzer.call(text: "I love this product!")

result.sentiment    # => "positive"
result.confidence   # => 0.9

API rules:

  • Invoke modules and predictors with .call(), not .forward().
  • Access result fields with result.field, not result[:field].

Module Composition

Combine multiple modules through explicit method calls in forward:

class DocumentProcessor < DSPy::Module
  def initialize
    super
    @classifier = DocumentClassifier.new
    @summarizer = DocumentSummarizer.new
  end

  def forward(document:)
    classification = @classifier.call(content: document)
    summary = @summarizer.call(content: document)

    {
      document_type: classification.document_type,
      summary: summary.summary
    }
  end
end

Lifecycle Callbacks

Modules support before, after, and around callbacks on forward. Declare them as class-level macros referencing private methods.

Execution order

  1. before callbacks (in registration order)
  2. around callbacks (before yield)
  3. forward method
  4. around callbacks (after yield)
  5. after callbacks (in registration order)
class InstrumentedModule < DSPy::Module
  before :setup_metrics
  after :log_metrics
  around :manage_context

  def initialize
    super
    @predictor = DSPy::Predict.new(MySignature)
    @metrics = {}
  end

  def forward(question:)
    @predictor.call(question: question)
  end

  private

  def setup_metrics
    @metrics[:start_time] = Time.now
  end

  def manage_context
    load_context
    result = yield
    save_context
    result
  end

  def log_metrics
    @metrics[:duration] = Time.now - @metrics[:start_time]
  end
end

Multiple callbacks of the same type execute in registration order. Callbacks inherit from parent classes; parent callbacks run first.

Around callbacks

Around callbacks must call yield to execute the wrapped method and return the result:

def with_retry
  retries = 0
  begin
    yield
  rescue StandardError => e
    retries += 1
    retry if retries < 3
    raise e
  end
end

Instruction Update Contract

Teleprompters (GEPA, MIPROv2) require modules to expose immutable update hooks. Include DSPy::Mixins::InstructionUpdatable and implement with_instruction and with_examples, each returning a new instance:

class SentimentPredictor < DSPy::Module
  include DSPy::Mixins::InstructionUpdatable

  def initialize
    super
    @predictor = DSPy::Predict.new(SentimentSignature)
  end

  def with_instruction(instruction)
    clone = self.class.new
    clone.instance_variable_set(:@predictor, @predictor.with_instruction(instruction))
    clone
  end

  def with_examples(examples)
    clone = self.class.new
    clone.instance_variable_set(:@predictor, @predictor.with_examples(examples))
    clone
  end
end

If a module omits these hooks, teleprompters raise DSPy::InstructionUpdateError instead of silently mutating state.


Predictors

Predictors are execution engines that take a signature and produce structured results from a language model. DSPy.rb provides four predictor types.

Predict

Direct LLM call with typed input/output. Fastest option, lowest token usage.

classifier = DSPy::Predict.new(ClassifyText)
result = classifier.call(text: "Technical document about APIs")

result.sentiment    # => #<Sentiment::Positive>
result.topics       # => ["APIs", "technical"]
result.confidence   # => 0.92

ChainOfThought

Adds a reasoning field to the output automatically. The model generates step-by-step reasoning before the final answer. Do not define a :reasoning field in the signature output when using ChainOfThought.

class SolveMathProblem < DSPy::Signature
  description "Solve mathematical word problems step by step"

  input do
    const :problem, String
  end

  output do
    const :answer, String
    # :reasoning is added automatically by ChainOfThought
  end
end

solver = DSPy::ChainOfThought.new(SolveMathProblem)
result = solver.call(problem: "Sarah has 15 apples. She gives 7 away and buys 12 more.")

result.reasoning  # => "Step by step: 15 - 7 = 8, then 8 + 12 = 20"
result.answer     # => "20 apples"

Use ChainOfThought for complex analysis, multi-step reasoning, or when explainability matters.

ReAct

Reasoning + Action agent that uses tools in an iterative loop. Define tools by subclassing DSPy::Tools::Base. Group related tools with DSPy::Tools::Toolset.

class WeatherTool < DSPy::Tools::Base
  extend T::Sig

  tool_name "weather"
  tool_description "Get weather information for a location"

  sig { params(location: String).returns(String) }
  def call(location:)
    { location: location, temperature: 72, condition: "sunny" }.to_json
  end
end

class TravelSignature < DSPy::Signature
  description "Help users plan travel"

  input do
    const :destination, String
  end

  output do
    const :recommendations, String
  end
end

agent = DSPy::ReAct.new(
  TravelSignature,
  tools: [WeatherTool.new],
  max_iterations: 5
)

result = agent.call(destination: "Tokyo, Japan")
result.recommendations  # => "Visit Senso-ji Temple early morning..."
result.history          # => Array of reasoning steps, actions, observations
result.iterations       # => 3
result.tools_used       # => ["weather"]

Use toolsets to expose multiple tool methods from a single class:

text_tools = DSPy::Tools::TextProcessingToolset.to_tools
agent = DSPy::ReAct.new(MySignature, tools: text_tools)

CodeAct

Think-Code-Observe agent that synthesizes and executes Ruby code. Ships as a separate gem.

# Gemfile
gem 'dspy-code_act', '~> 0.29'
programmer = DSPy::CodeAct.new(ProgrammingSignature, max_iterations: 10)
result = programmer.call(task: "Calculate the factorial of 20")

Predictor Comparison

PredictorSpeedToken UsageBest For
PredictFastestLowClassification, extraction
ChainOfThoughtModerateMedium-HighComplex reasoning, analysis
ReActSlowerHighMulti-step tasks with tools
CodeActSlowestVery HighDynamic programming, calculations

Concurrent Predictions

Process multiple independent predictions simultaneously using Async::Barrier:

require 'async'
require 'async/barrier'

analyzer = DSPy::Predict.new(ContentAnalyzer)
documents = ["Text one", "Text two", "Text three"]

Async do
  barrier = Async::Barrier.new

  tasks = documents.map do |doc|
    barrier.async { analyzer.call(content: doc) }
  end

  barrier.wait
  predictions = tasks.map(&:wait)

  predictions.each { |p| puts p.sentiment }
end

Add gem 'async', '~> 2.29' to the Gemfile. Handle errors within each barrier.async block to prevent one failure from cancelling others:

barrier.async do
  begin
    analyzer.call(content: doc)
  rescue StandardError => e
    nil
  end
end

Few-Shot Examples and Instruction Tuning

classifier = DSPy::Predict.new(SentimentAnalysis)

examples = [
  DSPy::FewShotExample.new(
    input: { text: "Love it!" },
    output: { sentiment: "positive", confidence: 0.95 }
  )
]

optimized = classifier.with_examples(examples)
tuned = classifier.with_instruction("Be precise and confident.")

Type System

Automatic Type Conversion

DSPy.rb v0.9.0+ automatically converts LLM JSON responses to typed Ruby objects:

  • Enums: String values become T::Enum instances (case-insensitive)
  • Structs: Nested hashes become T::Struct objects
  • Arrays: Elements convert recursively
  • Defaults: Missing fields use declared defaults

Discriminators for Union Types

When a field uses T.any() with struct types, DSPy adds a _type field to each struct’s schema. On deserialization, _type selects the correct struct class:

{
  "action": {
    "_type": "CreateTask",
    "title": "Review Q4 Report"
  }
}

DSPy matches "CreateTask" against the union members and instantiates the correct struct. No manual discriminator field is needed.

Recursive Types

Structs referencing themselves are supported. The schema generator tracks visited types and produces $ref pointers under $defs:

class TreeNode < T::Struct
  const :label, String
  const :children, T::Array[TreeNode], default: []
end

The generated schema uses "$ref": "#/$defs/TreeNode" for the children array items, preventing infinite schema expansion.

Nesting Depth

  • 1-2 levels: reliable across all providers.
  • 3-4 levels: works but increases schema complexity.
  • 5+ levels: may trigger OpenAI depth validation warnings and reduce LLM accuracy. Flatten deeply nested structures or split into multiple signatures.

Tips

  • Prefer T::Array[X], default: [] over T.nilable(T::Array[X]) — the nilable form causes schema issues with OpenAI structured outputs.
  • Use clear struct names for union types since they become _type discriminator values.
  • Limit union types to 2-4 members for reliable model comprehension.
  • Check schema compatibility with DSPy::OpenAI::LM::SchemaConverter.validate_compatibility(schema).

Reference: Observability

DSPy.rb Observability

DSPy.rb provides an event-driven observability system built on OpenTelemetry. The system replaces monkey-patching with structured event emission, pluggable listeners, automatic span creation, and non-blocking Langfuse export.

Event System

Emitting Events

Emit structured events with DSPy.event:

DSPy.event('lm.tokens', {
  'gen_ai.system' => 'openai',
  'gen_ai.request.model' => 'gpt-4',
  input_tokens: 150,
  output_tokens: 50,
  total_tokens: 200
})

Event names are strings with dot-separated namespaces (e.g., 'llm.generate', 'react.iteration_complete', 'chain_of_thought.reasoning_complete'). Do not use symbols for event names.

Attributes must be JSON-serializable. DSPy automatically merges context (trace ID, module stack) and creates OpenTelemetry spans.

Global Subscriptions

Subscribe to events across the entire application with DSPy.events.subscribe:

# Exact event name
subscription_id = DSPy.events.subscribe('lm.tokens') do |event_name, attrs|
  puts "Tokens used: #{attrs[:total_tokens]}"
end

# Wildcard pattern -- matches llm.generate, llm.stream, etc.
DSPy.events.subscribe('llm.*') do |event_name, attrs|
  track_llm_usage(attrs)
end

# Catch-all wildcard
DSPy.events.subscribe('*') do |event_name, attrs|
  log_everything(event_name, attrs)
end

Use global subscriptions for cross-cutting concerns: observability exporters (Langfuse, Datadog), centralized logging, metrics collection.

Module-Scoped Subscriptions

Declare listeners inside a DSPy::Module subclass. Subscriptions automatically scope to the module instance and its descendants:

class ResearchReport < DSPy::Module
  subscribe 'lm.tokens', :track_tokens, scope: :descendants

  def initialize
    super
    @outliner = DSPy::Predict.new(OutlineSignature)
    @writer   = DSPy::Predict.new(SectionWriterSignature)
    @token_count = 0
  end

  def forward(question:)
    outline = @outliner.call(question: question)
    outline.sections.map do |title|
      draft = @writer.call(question: question, section_title: title)
      { title: title, body: draft.paragraph }
    end
  end

  def track_tokens(_event, attrs)
    @token_count += attrs.fetch(:total_tokens, 0)
  end
end

The scope: parameter accepts:

  • :descendants (default) — receives events from the module and every nested module invoked inside it.
  • DSPy::Module::SubcriptionScope::SelfOnly — restricts delivery to events emitted by the module instance itself; ignores descendants.

Inspect active subscriptions with registered_module_subscriptions. Tear down with unsubscribe_module_events.

Unsubscribe and Cleanup

Remove a global listener by subscription ID:

id = DSPy.events.subscribe('llm.*') { |name, attrs| }
DSPy.events.unsubscribe(id)

Build tracker classes that manage their own subscription lifecycle:

class TokenBudgetTracker
  def initialize(budget:)
    @budget = budget
    @usage  = 0
    @subscriptions = []
    @subscriptions << DSPy.events.subscribe('lm.tokens') do |_event, attrs|
      @usage += attrs.fetch(:total_tokens, 0)
      warn("Budget hit") if @usage >= @budget
    end
  end

  def unsubscribe
    @subscriptions.each { |id| DSPy.events.unsubscribe(id) }
    @subscriptions.clear
  end
end

Clearing Listeners in Tests

Call DSPy.events.clear_listeners in before/after blocks to prevent cross-contamination between test cases:

RSpec.configure do |config|
  config.after(:each) { DSPy.events.clear_listeners }
end

dspy-o11y Gems

Three gems compose the observability stack:

GemPurpose
dspyCore event bus (DSPy.event, DSPy.events) — always available
dspy-o11yOpenTelemetry spans, AsyncSpanProcessor, DSPy::Context.with_span helpers
dspy-o11y-langfuseLangfuse adapter — configures OTLP exporter targeting Langfuse endpoints

Installation

# Gemfile
gem 'dspy'
gem 'dspy-o11y'           # core spans + helpers
gem 'dspy-o11y-langfuse'  # Langfuse/OpenTelemetry adapter (optional)

If the optional gems are absent, DSPy falls back to logging-only mode with no errors.

Langfuse Integration

Environment Variables

# Required
export LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key
export LANGFUSE_SECRET_KEY=sk-lf-your-secret-key

# Optional (defaults to https://cloud.langfuse.com)
export LANGFUSE_HOST=https://us.cloud.langfuse.com

# Tuning (optional)
export DSPY_TELEMETRY_BATCH_SIZE=100        # spans per export batch (default 100)
export DSPY_TELEMETRY_QUEUE_SIZE=1000       # max queued spans (default 1000)
export DSPY_TELEMETRY_EXPORT_INTERVAL=60    # seconds between timed exports (default 60)
export DSPY_TELEMETRY_SHUTDOWN_TIMEOUT=10   # seconds to drain on shutdown (default 10)

Automatic Configuration

Call DSPy::Observability.configure! once at boot (it is already called automatically when require 'dspy' runs and Langfuse env vars are present):

require 'dspy'
# If LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are set,
# DSPy::Observability.configure! runs automatically and:
#   1. Configures the OpenTelemetry SDK with an OTLP exporter
#   2. Creates dual output: structured logs AND OpenTelemetry spans
#   3. Exports spans to Langfuse using proper authentication
#   4. Falls back gracefully if gems are missing

Verify status with DSPy::Observability.enabled?.

Automatic Tracing

With observability enabled, every DSPy::Module#forward call, LM request, and tool invocation creates properly nested spans. Langfuse receives hierarchical traces:

Trace: abc-123-def
+-- ChainOfThought.forward [2000ms]  (observation type: chain)
    +-- llm.generate [1000ms]        (observation type: generation)
        Model: gpt-4-0613
        Tokens: 100 in / 50 out / 150 total

DSPy maps module classes to Langfuse observation types automatically via DSPy::ObservationType.for_module_class:

ModuleObservation Type
DSPy::LM (raw chat)generation
DSPy::ChainOfThoughtchain
DSPy::ReActagent
Tool invocationstool
Memory/retrievalretriever
Embedding enginesembedding
Evaluation modulesevaluator
Generic operationsspan

Score Reporting

DSPy.score API

Report evaluation scores with DSPy.score:

# Numeric (default)
DSPy.score('accuracy', 0.95)

# With comment
DSPy.score('relevance', 0.87, comment: 'High semantic similarity')

# Boolean
DSPy.score('is_valid', 1, data_type: DSPy::Scores::DataType::Boolean)

# Categorical
DSPy.score('sentiment', 'positive', data_type: DSPy::Scores::DataType::Categorical)

# Explicit trace binding
DSPy.score('accuracy', 0.95, trace_id: 'custom-trace-id')

Available data types: DSPy::Scores::DataType::Numeric, ::Boolean, ::Categorical.

score.create Events

Every DSPy.score call emits a 'score.create' event. Subscribe to react:

DSPy.events.subscribe('score.create') do |event_name, attrs|
  puts "#{attrs[:score_name]} = #{attrs[:score_value]}"
  # Also available: attrs[:score_id], attrs[:score_data_type],
  # attrs[:score_comment], attrs[:trace_id], attrs[:observation_id],
  # attrs[:timestamp]
end

Async Langfuse Export with DSPy::Scores::Exporter

Configure the exporter to send scores to Langfuse in the background:

exporter = DSPy::Scores::Exporter.configure(
  public_key: ENV['LANGFUSE_PUBLIC_KEY'],
  secret_key: ENV['LANGFUSE_SECRET_KEY'],
  host: 'https://cloud.langfuse.com'
)

# Scores are now exported automatically via a background Thread::Queue
DSPy.score('accuracy', 0.95)

# Shut down gracefully (waits up to 5 seconds by default)
exporter.shutdown

The exporter subscribes to 'score.create' events internally, queues them for async processing, and retries with exponential backoff on failure.

Automatic Export with DSPy::Evals

Pass export_scores: true to DSPy::Evals to export per-example scores and an aggregate batch score automatically:

evaluator = DSPy::Evals.new(
  program,
  metric: my_metric,
  export_scores: true,
  score_name: 'qa_accuracy'
)

result = evaluator.evaluate(test_examples)

DSPy::Context.with_span

Create manual spans for custom operations. Requires dspy-o11y.

DSPy::Context.with_span(operation: 'custom.retrieval', 'retrieval.source' => 'pinecone') do |span|
  results = pinecone_client.query(embedding)
  span&.set_attribute('retrieval.count', results.size) if span
  results
end

Pass semantic attributes as keyword arguments alongside operation:. The block receives an OpenTelemetry span object (or nil when observability is disabled). The span automatically nests under the current parent span and records duration.ms, langfuse.observation.startTime, and langfuse.observation.endTime.

Assign a Langfuse observation type to custom spans:

DSPy::Context.with_span(
  operation: 'evaluate.batch',
  **DSPy::ObservationType::Evaluator.langfuse_attributes,
  'batch.size' => examples.length
) do |span|
  run_evaluation(examples)
end

Scores reported inside a with_span block automatically inherit the current trace context.

Module Stack Metadata

When DSPy::Module#forward runs, the context layer maintains a module stack. Every event includes:

{
  module_path: [
    { id: "root_uuid",    class: "DeepSearch",    label: nil },
    { id: "planner_uuid", class: "DSPy::Predict", label: "planner" }
  ],
  module_root: { id: "root_uuid", class: "DeepSearch", label: nil },
  module_leaf: { id: "planner_uuid", class: "DSPy::Predict", label: "planner" },
  module_scope: {
    ancestry_token: "root_uuid>planner_uuid",
    depth: 2
  }
}
KeyMeaning
module_pathOrdered array of {id, class, label} entries from root to leaf
module_rootThe outermost module in the current call chain
module_leafThe innermost (currently executing) module
module_scope.ancestry_tokenStable string of joined UUIDs representing the nesting path
module_scope.depthInteger depth of the current module in the stack

Labels are set via module_scope_label= on a module instance or derived automatically from named predictors. Use this metadata to power Langfuse filters, scoped metrics, or custom event routing.

Dedicated Export Worker

The DSPy::Observability::AsyncSpanProcessor (from dspy-o11y) keeps telemetry export off the hot path:

  • Runs on a Concurrent::SingleThreadExecutor — LLM workflows never compete with OTLP networking.
  • Buffers finished spans in a Thread::Queue (max size configurable via DSPY_TELEMETRY_QUEUE_SIZE).
  • Drains spans in batches of DSPY_TELEMETRY_BATCH_SIZE (default 100). When the queue reaches batch size, an immediate async export fires.
  • A background timer thread triggers periodic export every DSPY_TELEMETRY_EXPORT_INTERVAL seconds (default 60).
  • Applies exponential backoff (0.1 * 2^attempt seconds) on export failures, up to DEFAULT_MAX_RETRIES (3).
  • On shutdown, flushes all remaining spans within DSPY_TELEMETRY_SHUTDOWN_TIMEOUT seconds, then terminates the executor.
  • Drops the oldest span when the queue is full, logging 'observability.span_dropped'.

No application code interacts with the processor directly. Configure it entirely through environment variables.

Built-in Events Reference

Event NameEmitted ByKey Attributes
lm.tokensDSPy::LMgen_ai.system, gen_ai.request.model, input_tokens, output_tokens, total_tokens
chain_of_thought.reasoning_completeDSPy::ChainOfThoughtdspy.signature, cot.reasoning_steps, cot.reasoning_length, cot.has_reasoning
react.iteration_completeDSPy::ReActiteration, thought, action, observation
codeact.iteration_completedspy-code_act gemiteration, code_executed, execution_result
optimization.trial_completeTeleprompters (MIPROv2)trial_number, score
score.createDSPy.scorescore_name, score_value, score_data_type, trace_id
span.startDSPy::Context.with_spantrace_id, span_id, parent_span_id, operation

Best Practices

  • Use dot-separated string names for events. Follow OpenTelemetry gen_ai.* conventions for LLM attributes.
  • Always call unsubscribe (or unsubscribe_module_events for scoped subscriptions) when a tracker is no longer needed to prevent memory leaks.
  • Call DSPy.events.clear_listeners in test teardown to avoid cross-contamination.
  • Wrap risky listener logic in a rescue block. The event system isolates listener failures, but explicit rescue prevents silent swallowing of domain errors.
  • Prefer module-scoped subscribe for agent internals. Reserve global DSPy.events.subscribe for infrastructure-level concerns.

Reference: Optimization

DSPy.rb Optimization

MIPROv2

MIPROv2 (Multi-prompt Instruction Proposal with Retrieval Optimization) is the primary instruction tuner in DSPy.rb. It proposes new instructions and few-shot demonstrations per predictor, evaluates them on mini-batches, and retains candidates that improve the metric. It ships as a separate gem to keep the Gaussian Process dependency tree out of apps that do not need it.

Installation

# Gemfile
gem "dspy"
gem "dspy-miprov2"

Bundler auto-requires dspy/miprov2. No additional require statement is needed.

AutoMode presets

Use DSPy::Teleprompt::MIPROv2::AutoMode for preconfigured optimizers:

light  = DSPy::Teleprompt::MIPROv2::AutoMode.light(metric: metric)   # 6 trials, greedy
medium = DSPy::Teleprompt::MIPROv2::AutoMode.medium(metric: metric)  # 12 trials, adaptive
heavy  = DSPy::Teleprompt::MIPROv2::AutoMode.heavy(metric: metric)   # 18 trials, Bayesian
PresetTrialsStrategyUse case
light6:greedyQuick wins on small datasets or during prototyping.
medium12:adaptiveBalanced exploration vs. runtime for most pilots.
heavy18:bayesianHighest accuracy targets or multi-stage programs.

Manual configuration with dry-configurable

DSPy::Teleprompt::MIPROv2 includes Dry::Configurable. Configure at the class level (defaults for all instances) or instance level (overrides class defaults).

Class-level defaults:

DSPy::Teleprompt::MIPROv2.configure do |config|
  config.optimization_strategy = :bayesian
  config.num_trials = 30
  config.bootstrap_sets = 10
end

Instance-level overrides:

optimizer = DSPy::Teleprompt::MIPROv2.new(metric: metric)
optimizer.configure do |config|
  config.num_trials = 15
  config.num_instruction_candidates = 6
  config.bootstrap_sets = 5
  config.max_bootstrapped_examples = 4
  config.max_labeled_examples = 16
  config.optimization_strategy = :adaptive       # :greedy, :adaptive, :bayesian
  config.early_stopping_patience = 3
  config.init_temperature = 1.0
  config.final_temperature = 0.1
  config.minibatch_size = nil                     # nil = auto
  config.auto_seed = 42
end

The optimization_strategy setting accepts symbols (:greedy, :adaptive, :bayesian) and coerces them internally to DSPy::Teleprompt::OptimizationStrategy T::Enum values.

The old config: constructor parameter is removed. Passing config: raises ArgumentError.

Auto presets via configure

Instead of AutoMode, set the preset through the configure block:

optimizer = DSPy::Teleprompt::MIPROv2.new(metric: metric)
optimizer.configure do |config|
  config.auto_preset = DSPy::Teleprompt::AutoPreset.deserialize("medium")
end

Compile and inspect

program = DSPy::Predict.new(MySignature)

result = optimizer.compile(
  program,
  trainset: train_examples,
  valset: val_examples
)

optimized_program = result.optimized_program
puts "Best score: #{result.best_score_value}"

The result object exposes:

  • optimized_program — ready-to-use predictor with updated instruction and demos.
  • optimization_trace[:trial_logs] — per-trial record of instructions, demos, and scores.
  • metadata[:optimizer]"MIPROv2", useful when persisting experiments from multiple optimizers.

Multi-stage programs

MIPROv2 generates dataset summaries for each predictor and proposes per-stage instructions. For a ReAct agent with thought_generator and observation_processor predictors, the optimizer handles credit assignment internally. The metric only needs to evaluate the final output.

Bootstrap sampling

During the bootstrap phase MIPROv2:

  1. Generates dataset summaries from the training set.
  2. Bootstraps few-shot demonstrations by running the baseline program.
  3. Proposes candidate instructions grounded in the summaries and bootstrapped examples.
  4. Evaluates each candidate on mini-batches drawn from the validation set.

Control the bootstrap phase with bootstrap_sets, max_bootstrapped_examples, and max_labeled_examples.

Bayesian optimization

When optimization_strategy is :bayesian (or when using the heavy preset), MIPROv2 fits a Gaussian Process surrogate over past trial scores to select the next candidate. This replaces random search with informed exploration, reducing the number of trials needed to find high-scoring instructions.


GEPA

GEPA (Genetic-Pareto Reflective Prompt Evolution) is a feedback-driven optimizer. It runs the program on a small batch, collects scores and textual feedback, and asks a reflection LM to rewrite the instruction. Improved candidates are retained on a Pareto frontier.

Installation

# Gemfile
gem "dspy"
gem "dspy-gepa"

The dspy-gepa gem depends on the gepa core optimizer gem automatically.

Metric contract

GEPA metrics return DSPy::Prediction with both a numeric score and a feedback string. Do not return a plain boolean.

metric = lambda do |example, prediction|
  expected  = example.expected_values[:label]
  predicted = prediction.label

  score = predicted == expected ? 1.0 : 0.0
  feedback = if score == 1.0
    "Correct (#{expected}) for: \"#{example.input_values[:text][0..60]}\""
  else
    "Misclassified (expected #{expected}, got #{predicted}) for: \"#{example.input_values[:text][0..60]}\""
  end

  DSPy::Prediction.new(score: score, feedback: feedback)
end

Keep the score in [0, 1]. Always include a short feedback message explaining what happened — GEPA hands this text to the reflection model so it can reason about failures.

Feedback maps

feedback_map targets individual predictors inside a composite module. Each entry receives keyword arguments and returns a DSPy::Prediction:

feedback_map = {
  'self' => lambda do |predictor_output:, predictor_inputs:, module_inputs:, module_outputs:, captured_trace:|
    expected  = module_inputs.expected_values[:label]
    predicted = predictor_output.label

    DSPy::Prediction.new(
      score: predicted == expected ? 1.0 : 0.0,
      feedback: "Classifier saw \"#{predictor_inputs[:text][0..80]}\" -> #{predicted} (expected #{expected})"
    )
  end
}

For single-predictor programs, key the map with 'self'. For multi-predictor chains, add entries per component so the reflection LM sees localized context at each step. Omit feedback_map entirely if the top-level metric already covers the basics.

Configuring the teleprompter

teleprompter = DSPy::Teleprompt::GEPA.new(
  metric: metric,
  reflection_lm: DSPy::ReflectionLM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']),
  feedback_map: feedback_map,
  config: {
    max_metric_calls: 600,
    minibatch_size: 6,
    skip_perfect_score: false
  }
)

Key configuration knobs:

KnobPurpose
max_metric_callsHard budget on evaluation calls. Set to at least the validation set size plus a few minibatches.
minibatch_sizeExamples per reflective replay batch. Smaller = cheaper iterations, noisier scores.
skip_perfect_scoreSet true to stop early when a candidate reaches score 1.0.

Minibatch sizing

GoalSuggested sizeRationale
Explore many candidates within a tight budget3—6Cheap iterations, more prompt variants, noisier metrics.
Stable metrics when each rollout is costly8—12Smoother scores, fewer candidates unless budget is raised.
Investigate specific failure modes3—4 then 8+Start with breadth, increase once patterns emerge.

Compile and evaluate

program = DSPy::Predict.new(MySignature)

result = teleprompter.compile(program, trainset: train, valset: val)
optimized_program = result.optimized_program

test_metrics = evaluate(optimized_program, test)

The result object exposes:

  • optimized_program — predictor with updated instruction and few-shot examples.
  • best_score_value — validation score for the best candidate.
  • metadata — candidate counts, trace hashes, and telemetry IDs.

Reflection LM

Swap DSPy::ReflectionLM for any callable object that accepts the reflection prompt hash and returns a string. The default reflection signature extracts the new instruction from triple backticks in the response.

Experiment tracking

Plug GEPA::Logging::ExperimentTracker into a persistence layer:

tracker = GEPA::Logging::ExperimentTracker.new
tracker.with_subscriber { |event| MyModel.create!(payload: event) }

teleprompter = DSPy::Teleprompt::GEPA.new(
  metric: metric,
  reflection_lm: reflection_lm,
  experiment_tracker: tracker,
  config: { max_metric_calls: 900 }
)

The tracker emits Pareto update events, merge decisions, and candidate evolution records as JSONL.

Pareto frontier

GEPA maintains a diverse candidate pool and samples from the Pareto frontier instead of mutating only the top-scoring program. This balances exploration and prevents the search from collapsing onto a single lineage.

Enable the merge proposer after multiple strong lineages emerge:

config: {
  max_metric_calls: 900,
  enable_merge_proposer: true
}

Premature merges eat budget without meaningful gains. Gate merge on having several validated candidates first.

Advanced options

  • acceptance_strategy: — plug in bespoke Pareto filters or early-stop heuristics.
  • Telemetry spans emit via GEPA::Telemetry. Enable global observability with DSPy.configure { |c| c.observability = true } to stream spans to an OpenTelemetry exporter.

Evaluation Framework

DSPy::Evals provides batch evaluation of predictors against test datasets with built-in and custom metrics.

Basic usage

metric = proc do |example, prediction|
  prediction.answer == example.expected_values[:answer]
end

evaluator = DSPy::Evals.new(predictor, metric: metric)

result = evaluator.evaluate(
  test_examples,
  display_table: true,
  display_progress: true
)

puts "Pass rate: #{(result.pass_rate * 100).round(1)}%"
puts "Passed: #{result.passed_examples}/#{result.total_examples}"

DSPy::Example

Convert raw data into DSPy::Example instances before passing to optimizers or evaluators. Each example carries input_values and expected_values:

examples = rows.map do |row|
  DSPy::Example.new(
    input_values: { text: row[:text] },
    expected_values: { label: row[:label] }
  )
end

train, val, test = split_examples(examples, train_ratio: 0.6, val_ratio: 0.2, seed: 42)

Hold back a test set from the optimization loop. Optimizers work on train/val; only the test set proves generalization.

Built-in metrics

# Exact match -- prediction must exactly equal expected value
metric = DSPy::Metrics.exact_match(field: :answer, case_sensitive: true)

# Contains -- prediction must contain expected substring
metric = DSPy::Metrics.contains(field: :answer, case_sensitive: false)

# Numeric difference -- numeric output within tolerance
metric = DSPy::Metrics.numeric_difference(field: :answer, tolerance: 0.01)

# Composite AND -- all sub-metrics must pass
metric = DSPy::Metrics.composite_and(
  DSPy::Metrics.exact_match(field: :answer),
  DSPy::Metrics.contains(field: :reasoning)
)

Custom metrics

quality_metric = lambda do |example, prediction|
  return false unless prediction

  score = 0.0
  score += 0.5 if prediction.answer == example.expected_values[:answer]
  score += 0.3 if prediction.explanation && prediction.explanation.length > 50
  score += 0.2 if prediction.confidence && prediction.confidence > 0.8
  score >= 0.7
end

evaluator = DSPy::Evals.new(predictor, metric: quality_metric)

Access prediction fields with dot notation (prediction.answer), not hash notation.

Observability hooks

Register callbacks without editing the evaluator:

DSPy::Evals.before_example do |payload|
  example = payload[:example]
  DSPy.logger.info("Evaluating example #{example.id}") if example.respond_to?(:id)
end

DSPy::Evals.after_batch do |payload|
  result = payload[:result]
  Langfuse.event(
    name: 'eval.batch',
    metadata: {
      total: result.total_examples,
      passed: result.passed_examples,
      score: result.score
    }
  )
end

Available hooks: before_example, after_example, before_batch, after_batch.

Langfuse score export

Enable export_scores: true to emit score.create events for each evaluated example and a batch score at the end:

evaluator = DSPy::Evals.new(
  predictor,
  metric: metric,
  export_scores: true,
  score_name: 'qa_accuracy'   # default: 'evaluation'
)

result = evaluator.evaluate(test_examples)
# Emits per-example scores + overall batch score via DSPy::Scores::Exporter

Scores attach to the current trace context automatically and flow to Langfuse asynchronously.

Evaluation results

result = evaluator.evaluate(test_examples)

result.score            # Overall score (0.0 to 1.0)
result.passed_count     # Examples that passed
result.failed_count     # Examples that failed
result.error_count      # Examples that errored

result.results.each do |r|
  r.passed              # Boolean
  r.score               # Numeric score
  r.error               # Error message if the example errored
end

Integration with optimizers

metric = proc do |example, prediction|
  expected  = example.expected_values[:answer].to_s.strip.downcase
  predicted = prediction.answer.to_s.strip.downcase
  !expected.empty? && predicted.include?(expected)
end

optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.medium(metric: metric)

result = optimizer.compile(
  DSPy::Predict.new(QASignature),
  trainset: train_examples,
  valset: val_examples
)

evaluator = DSPy::Evals.new(result.optimized_program, metric: metric)
test_result = evaluator.evaluate(test_examples, display_table: true)
puts "Test accuracy: #{(test_result.pass_rate * 100).round(2)}%"

Storage System

DSPy::Storage persists optimization results, tracks history, and manages multiple versions of optimized programs.

ProgramStorage (low-level)

storage = DSPy::Storage::ProgramStorage.new(storage_path: "./dspy_storage")

# Save
saved = storage.save_program(
  result.optimized_program,
  result,
  metadata: {
    signature_class: 'ClassifyText',
    optimizer: 'MIPROv2',
    examples_count: examples.size
  }
)
puts "Stored with ID: #{saved.program_id}"

# Load
saved = storage.load_program(program_id)
predictor = saved.program
score = saved.optimization_result[:best_score_value]

# List
storage.list_programs.each do |p|
  puts "#{p[:program_id]} -- score: #{p[:best_score]} -- saved: #{p[:saved_at]}"
end
manager = DSPy::Storage::StorageManager.new

# Save with tags
saved = manager.save_optimization_result(
  result,
  tags: ['production', 'sentiment-analysis'],
  description: 'Optimized sentiment classifier v2'
)

# Find programs
programs = manager.find_programs(
  optimizer: 'MIPROv2',
  min_score: 0.85,
  tags: ['production']
)

recent = manager.find_programs(
  max_age_days: 7,
  signature_class: 'ClassifyText'
)

# Get best program for a signature
best = manager.get_best_program('ClassifyText')
predictor = best.program

Global shorthand:

DSPy::Storage::StorageManager.save(result, metadata: { version: '2.0' })
DSPy::Storage::StorageManager.load(program_id)
DSPy::Storage::StorageManager.best('ClassifyText')

Checkpoints

Create and restore checkpoints during long-running optimizations:

# Save a checkpoint
manager.create_checkpoint(
  current_result,
  'iteration_50',
  metadata: { iteration: 50, current_score: 0.87 }
)

# Restore
restored = manager.restore_checkpoint('iteration_50')
program = restored.program

# Auto-checkpoint every N iterations
if iteration % 10 == 0
  manager.create_checkpoint(current_result, "auto_checkpoint_#{iteration}")
end

Import and export

Share programs between environments:

storage = DSPy::Storage::ProgramStorage.new

# Export
storage.export_programs(['abc123', 'def456'], './export_backup.json')

# Import
imported = storage.import_programs('./export_backup.json')
puts "Imported #{imported.size} programs"

Optimization history

history = manager.get_optimization_history

history[:summary][:total_programs]
history[:summary][:avg_score]

history[:optimizer_stats].each do |optimizer, stats|
  puts "#{optimizer}: #{stats[:count]} programs, best: #{stats[:best_score]}"
end

history[:trends][:improvement_percentage]

Program comparison

comparison = manager.compare_programs(id_a, id_b)
comparison[:comparison][:score_difference]
comparison[:comparison][:better_program]
comparison[:comparison][:age_difference_hours]

Storage configuration

config = DSPy::Storage::StorageManager::StorageConfig.new
config.storage_path = Rails.root.join('dspy_storage')
config.auto_save = true
config.save_intermediate_results = false
config.max_stored_programs = 100

manager = DSPy::Storage::StorageManager.new(config: config)

Cleanup

Remove old programs. Cleanup retains the best performing and most recent programs using a weighted score (70% performance, 30% recency):

deleted_count = manager.cleanup_old_programs

Storage events

The storage system emits structured log events for monitoring:

  • dspy.storage.save_start, dspy.storage.save_complete, dspy.storage.save_error
  • dspy.storage.load_start, dspy.storage.load_complete, dspy.storage.load_error
  • dspy.storage.delete, dspy.storage.export, dspy.storage.import, dspy.storage.cleanup

File layout

dspy_storage/
  programs/
    abc123def456.json
    789xyz012345.json
  history.json

API rules

  • Call predictors with .call(), not .forward().
  • Access prediction fields with dot notation (result.answer), not hash notation (result[:answer]).
  • GEPA metrics return DSPy::Prediction.new(score:, feedback:), not a boolean.
  • MIPROv2 metrics may return true/false, a numeric score, or DSPy::Prediction.

Reference: Providers

DSPy.rb LLM Providers

Adapter Architecture

DSPy.rb ships provider SDKs as separate adapter gems. Install only the adapters the project needs. Each adapter gem depends on the official SDK for its provider and auto-loads when present — no explicit require necessary.

# Gemfile
gem 'dspy'              # core framework (no provider SDKs)
gem 'dspy-openai'       # OpenAI, OpenRouter, Ollama
gem 'dspy-anthropic'    # Claude
gem 'dspy-gemini'       # Gemini
gem 'dspy-ruby_llm'     # RubyLLM unified adapter (12+ providers)

Per-Provider Adapters

dspy-openai

Covers any endpoint that speaks the OpenAI chat-completions protocol: OpenAI itself, OpenRouter, and Ollama.

SDK dependency: openai ~> 0.17

# OpenAI
lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])

# OpenRouter -- access 200+ models behind a single key
lm = DSPy::LM.new('openrouter/x-ai/grok-4-fast:free',
  api_key: ENV['OPENROUTER_API_KEY']
)

# Ollama -- local models, no API key required
lm = DSPy::LM.new('ollama/llama3.2')

# Remote Ollama instance
lm = DSPy::LM.new('ollama/llama3.2',
  base_url: 'https://my-ollama.example.com/v1',
  api_key: 'optional-auth-token'
)

All three sub-adapters share the same request handling, structured-output support, and error reporting. Swap providers without changing higher-level DSPy code.

For OpenRouter models that lack native structured-output support, disable it explicitly:

lm = DSPy::LM.new('openrouter/deepseek/deepseek-chat-v3.1:free',
  api_key: ENV['OPENROUTER_API_KEY'],
  structured_outputs: false
)

dspy-anthropic

Provides the Claude adapter. Install it for any anthropic/* model id.

SDK dependency: anthropic ~> 1.12

lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514',
  api_key: ENV['ANTHROPIC_API_KEY']
)

Structured outputs default to tool-based JSON extraction (structured_outputs: true). Set structured_outputs: false to use enhanced-prompting extraction instead.

# Tool-based extraction (default, most reliable)
lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514',
  api_key: ENV['ANTHROPIC_API_KEY'],
  structured_outputs: true
)

# Enhanced prompting extraction
lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514',
  api_key: ENV['ANTHROPIC_API_KEY'],
  structured_outputs: false
)

dspy-gemini

Provides the Gemini adapter. Install it for any gemini/* model id.

SDK dependency: gemini-ai ~> 4.3

lm = DSPy::LM.new('gemini/gemini-2.5-flash',
  api_key: ENV['GEMINI_API_KEY']
)

Environment variable: GEMINI_API_KEY (also accepts GOOGLE_API_KEY).


RubyLLM Unified Adapter

The dspy-ruby_llm gem provides a single adapter that routes to 12+ providers through RubyLLM. Use it when a project talks to multiple providers or needs access to Bedrock, VertexAI, DeepSeek, or Mistral without dedicated adapter gems.

SDK dependency: ruby_llm ~> 1.3

Model ID Format

Prefix every model id with ruby_llm/:

lm = DSPy::LM.new('ruby_llm/gpt-4o-mini')
lm = DSPy::LM.new('ruby_llm/claude-sonnet-4-20250514')
lm = DSPy::LM.new('ruby_llm/gemini-2.5-flash')

The adapter detects the provider from RubyLLM’s model registry automatically. For models not in the registry, pass provider: explicitly:

lm = DSPy::LM.new('ruby_llm/llama3.2', provider: 'ollama')
lm = DSPy::LM.new('ruby_llm/anthropic/claude-3-opus',
  api_key: ENV['OPENROUTER_API_KEY'],
  provider: 'openrouter'
)

Using Existing RubyLLM Configuration

When RubyLLM is already configured globally, omit the api_key: argument. DSPy reuses the global config automatically:

RubyLLM.configure do |config|
  config.openai_api_key = ENV['OPENAI_API_KEY']
  config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
end

# No api_key needed -- picks up the global config
DSPy.configure do |c|
  c.lm = DSPy::LM.new('ruby_llm/gpt-4o-mini')
end

When an api_key: (or any of base_url:, timeout:, max_retries:) is passed, DSPy creates a scoped context instead of reusing the global config.

Cloud-Hosted Providers (Bedrock, VertexAI)

Configure RubyLLM globally first, then reference the model:

# AWS Bedrock
RubyLLM.configure do |c|
  c.bedrock_api_key = ENV['AWS_ACCESS_KEY_ID']
  c.bedrock_secret_key = ENV['AWS_SECRET_ACCESS_KEY']
  c.bedrock_region = 'us-east-1'
end
lm = DSPy::LM.new('ruby_llm/anthropic.claude-3-5-sonnet', provider: 'bedrock')

# Google VertexAI
RubyLLM.configure do |c|
  c.vertexai_project_id = 'your-project-id'
  c.vertexai_location = 'us-central1'
end
lm = DSPy::LM.new('ruby_llm/gemini-pro', provider: 'vertexai')

Supported Providers Table

ProviderExample Model IDNotes
OpenAIruby_llm/gpt-4o-miniAuto-detected from registry
Anthropicruby_llm/claude-sonnet-4-20250514Auto-detected from registry
Geminiruby_llm/gemini-2.5-flashAuto-detected from registry
DeepSeekruby_llm/deepseek-chatAuto-detected from registry
Mistralruby_llm/mistral-largeAuto-detected from registry
Ollamaruby_llm/llama3.2Use provider: 'ollama'
AWS Bedrockruby_llm/anthropic.claude-3-5-sonnetConfigure RubyLLM globally
VertexAIruby_llm/gemini-proConfigure RubyLLM globally
OpenRouterruby_llm/anthropic/claude-3-opusUse provider: 'openrouter'
Perplexityruby_llm/llama-3.1-sonar-largeUse provider: 'perplexity'
GPUStackruby_llm/model-nameUse provider: 'gpustack'

Rails Initializer Pattern

Configure DSPy inside an after_initialize block so Rails credentials and environment are fully loaded:

# config/initializers/dspy.rb
Rails.application.config.after_initialize do
  return if Rails.env.test? # skip in test -- use VCR cassettes instead

  DSPy.configure do |config|
    config.lm = DSPy::LM.new(
      'openai/gpt-4o-mini',
      api_key: Rails.application.credentials.openai_api_key,
      structured_outputs: true
    )

    config.logger = if Rails.env.production?
      Dry.Logger(:dspy, formatter: :json) do |logger|
        logger.add_backend(stream: Rails.root.join("log/dspy.log"))
      end
    else
      Dry.Logger(:dspy) do |logger|
        logger.add_backend(level: :debug, stream: $stdout)
      end
    end
  end
end

Key points:

  • Wrap in after_initialize so Rails.application.credentials is available.
  • Return early in the test environment. Rely on VCR cassettes for deterministic LLM responses.
  • Set structured_outputs: true (the default) for provider-native JSON extraction.
  • Use Dry.Logger with :json formatter in production for structured log parsing.

Fiber-Local LM Context

DSPy.with_lm sets a temporary language-model override scoped to the current Fiber. Every predictor call inside the block uses the override; outside the block the previous LM takes effect again.

fast = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
powerful = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', api_key: ENV['ANTHROPIC_API_KEY'])

classifier = Classifier.new

# Uses the global LM
result = classifier.call(text: "Hello")

# Temporarily switch to the fast model
DSPy.with_lm(fast) do
  result = classifier.call(text: "Hello")   # uses gpt-4o-mini
end

# Temporarily switch to the powerful model
DSPy.with_lm(powerful) do
  result = classifier.call(text: "Hello")   # uses claude-sonnet-4
end

LM Resolution Hierarchy

DSPy resolves the active language model in this order:

  1. Instance-level LM — set directly on a module instance via configure
  2. Fiber-local LM — set via DSPy.with_lm
  3. Global LM — set via DSPy.configure

Instance-level configuration always wins, even inside a DSPy.with_lm block:

classifier = Classifier.new
classifier.configure { |c| c.lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', api_key: ENV['ANTHROPIC_API_KEY']) }

fast = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])

DSPy.with_lm(fast) do
  classifier.call(text: "Test")  # still uses claude-sonnet-4 (instance-level wins)
end

configure_predictor for Fine-Grained Agent Control

Complex agents (ReAct, CodeAct, DeepResearch, DeepSearch) contain internal predictors. Use configure for a blanket override and configure_predictor to target a specific sub-predictor:

agent = DSPy::ReAct.new(MySignature, tools: tools)

# Set a default LM for the agent and all its children
agent.configure { |c| c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) }

# Override just the reasoning predictor with a more capable model
agent.configure_predictor('thought_generator') do |c|
  c.lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', api_key: ENV['ANTHROPIC_API_KEY'])
end

result = agent.call(question: "Summarize the report")

Both methods support chaining:

agent
  .configure { |c| c.lm = cheap_model }
  .configure_predictor('thought_generator') { |c| c.lm = expensive_model }

Available Predictors by Agent Type

AgentInternal Predictors
DSPy::ReActthought_generator, observation_processor
DSPy::CodeActcode_generator, observation_processor
DSPy::DeepResearchplanner, synthesizer, qa_reviewer, reporter
DSPy::DeepSearchseed_predictor, search_predictor, reader_predictor, reason_predictor

Propagation Rules

  • Configuration propagates recursively to children and grandchildren.
  • Children with an already-configured LM are not overwritten by a later parent configure call.
  • Configure the parent first, then override specific children.

Feature-Flagged Model Selection

Use a FeatureFlags module backed by ENV vars to centralize model selection. Each tool or agent reads its model from the flags, falling back to a global default.

module FeatureFlags
  module_function

  def default_model
    ENV.fetch('DSPY_DEFAULT_MODEL', 'openai/gpt-4o-mini')
  end

  def default_api_key
    ENV.fetch('DSPY_DEFAULT_API_KEY') { ENV.fetch('OPENAI_API_KEY', nil) }
  end

  def model_for(tool_name)
    env_key = "DSPY_MODEL_#{tool_name.upcase}"
    ENV.fetch(env_key, default_model)
  end

  def api_key_for(tool_name)
    env_key = "DSPY_API_KEY_#{tool_name.upcase}"
    ENV.fetch(env_key, default_api_key)
  end
end

Per-Tool Model Override

Override an individual tool’s model without touching application code:

# .env
DSPY_DEFAULT_MODEL=openai/gpt-4o-mini
DSPY_DEFAULT_API_KEY=sk-...

# Override the classifier to use Claude
DSPY_MODEL_CLASSIFIER=anthropic/claude-sonnet-4-20250514
DSPY_API_KEY_CLASSIFIER=sk-ant-...

# Override the summarizer to use Gemini
DSPY_MODEL_SUMMARIZER=gemini/gemini-2.5-flash
DSPY_API_KEY_SUMMARIZER=...

Wire each agent to its flag at initialization:

class ClassifierAgent < DSPy::Module
  def initialize
    super
    model = FeatureFlags.model_for('classifier')
    api_key = FeatureFlags.api_key_for('classifier')

    @predictor = DSPy::Predict.new(ClassifySignature)
    configure { |c| c.lm = DSPy::LM.new(model, api_key: api_key) }
  end

  def forward(text:)
    @predictor.call(text: text)
  end
end

This pattern keeps model routing declarative and avoids scattering DSPy::LM.new calls across the codebase.


Compatibility Matrix

Feature support across direct adapter gems. All features listed assume structured_outputs: true (the default).

FeatureOpenAIAnthropicGeminiOllamaOpenRouterRubyLLM
Structured OutputNative JSON modeTool-based extractionNative JSON schemaOpenAI-compatible JSONVaries by modelVia with_schema
Vision (Images)File + URLFile + Base64File + Base64LimitedVariesDelegates to underlying provider
Image URLsYesNoNoNoVariesDepends on provider
Tool CallingYesYesYesVariesVariesYes
StreamingYesYesYesYesYesYes

Notes:

  • Structured Output is enabled by default on every adapter. Set structured_outputs: false to fall back to enhanced-prompting extraction.
  • Vision / Image URLs: Only OpenAI supports passing a URL directly. For Anthropic and Gemini, load images from file or Base64:
    DSPy::Image.from_url("https://example.com/img.jpg")    # OpenAI only
    DSPy::Image.from_file("path/to/image.jpg")             # all providers
    DSPy::Image.from_base64(data, mime_type: "image/jpeg")  # all providers
  • RubyLLM delegates to the underlying provider, so feature support matches the provider column in the table.

Choosing an Adapter Strategy

ScenarioRecommended Adapter
Single provider (OpenAI, Claude, or Gemini)Dedicated gem (dspy-openai, dspy-anthropic, dspy-gemini)
Multi-provider with per-agent model routingdspy-ruby_llm
AWS Bedrock or Google VertexAIdspy-ruby_llm
Local development with Ollamadspy-openai (Ollama sub-adapter) or dspy-ruby_llm
OpenRouter for cost optimizationdspy-openai (OpenRouter sub-adapter)
ProviderModel IDUse Case
OpenAIopenai/gpt-4o-miniFast, cost-effective
Anthropicanthropic/claude-sonnet-4-20250514Balanced reasoning
Geminigemini/gemini-2.5-flashFast, cost-effective
Ollamaollama/llama3.2Local, zero API cost

Reference: Toolsets

DSPy.rb Toolsets

Tools::Base

DSPy::Tools::Base is the base class for single-purpose tools. Each subclass exposes one operation to an LLM agent through a call method.

Defining a Tool

Set the tool’s identity with the tool_name and tool_description class-level DSL methods. Define the call instance method with a Sorbet sig declaration so DSPy.rb can generate the JSON schema the LLM uses to invoke the tool.

class WeatherLookup < DSPy::Tools::Base
  extend T::Sig

  tool_name "weather_lookup"
  tool_description "Look up current weather for a given city"

  sig { params(city: String, units: T.nilable(String)).returns(String) }
  def call(city:, units: nil)
    # Fetch weather data and return a string summary
    "72F and sunny in #{city}"
  end
end

Key points:

  • Inherit from DSPy::Tools::Base, not DSPy::Tool.
  • Use tool_name (class method) to set the name the LLM sees. Without it, the class name is lowercased as a fallback.
  • Use tool_description (class method) to set the human-readable description surfaced in the tool schema.
  • The call method must use keyword arguments. Positional arguments are supported but keyword arguments produce better schemas.
  • Always attach a Sorbet sig to call. Without a signature, the generated schema has empty properties and the LLM cannot determine parameter types.

Schema Generation

call_schema_object introspects the Sorbet signature on call and returns a hash representing the JSON Schema parameters object:

WeatherLookup.call_schema_object
# => {
#   type: "object",
#   properties: {
#     city:  { type: "string", description: "Parameter city" },
#     units: { type: "string", description: "Parameter units (optional)" }
#   },
#   required: ["city"]
# }

call_schema wraps this in the full LLM tool-calling format:

WeatherLookup.call_schema
# => {
#   type: "function",
#   function: {
#     name: "call",
#     description: "Call the WeatherLookup tool",
#     parameters: { ... }
#   }
# }

Using Tools with ReAct

Pass tool instances in an array to DSPy::ReAct:

agent = DSPy::ReAct.new(
  MySignature,
  tools: [WeatherLookup.new, AnotherTool.new]
)

result = agent.call(question: "What is the weather in Berlin?")
puts result.answer

Access output fields with dot notation (result.answer), not hash access (result[:answer]).


Tools::Toolset

DSPy::Tools::Toolset groups multiple related methods into a single class. Each exposed method becomes an independent tool from the LLM’s perspective.

Defining a Toolset

class DatabaseToolset < DSPy::Tools::Toolset
  extend T::Sig

  toolset_name "db"

  tool :query,  description: "Run a read-only SQL query"
  tool :insert, description: "Insert a record into a table"
  tool :delete, description: "Delete a record by ID"

  sig { params(sql: String).returns(String) }
  def query(sql:)
    # Execute read query
  end

  sig { params(table: String, data: T::Hash[String, String]).returns(String) }
  def insert(table:, data:)
    # Insert record
  end

  sig { params(table: String, id: Integer).returns(String) }
  def delete(table:, id:)
    # Delete record
  end
end

DSL Methods

toolset_name(name) — Set the prefix for all generated tool names. If omitted, the class name minus Toolset suffix is lowercased (e.g., DatabaseToolset becomes database).

toolset_name "db"
# tool :query produces a tool named "db_query"

tool(method_name, tool_name:, description:) — Expose a method as a tool.

  • method_name (Symbol, required) — the instance method to expose.
  • tool_name: (String, optional) — override the default <toolset_name>_<method_name> naming.
  • description: (String, optional) — description shown to the LLM. Defaults to a humanized version of the method name.
tool :word_count, tool_name: "text_wc", description: "Count lines, words, and characters"
# Produces a tool named "text_wc" instead of "text_word_count"

Converting to a Tool Array

Call to_tools on the class (not an instance) to get an array of ToolProxy objects compatible with DSPy::Tools::Base:

agent = DSPy::ReAct.new(
  AnalyzeText,
  tools: DatabaseToolset.to_tools
)

Each ToolProxy wraps one method, delegates call to the underlying toolset instance, and generates its own JSON schema from the method’s Sorbet signature.

Shared State

All tool proxies from a single to_tools call share one toolset instance. Store shared state (connections, caches, configuration) in the toolset’s initialize:

class ApiToolset < DSPy::Tools::Toolset
  extend T::Sig

  toolset_name "api"

  tool :get,  description: "Make a GET request"
  tool :post, description: "Make a POST request"

  sig { params(base_url: String).void }
  def initialize(base_url:)
    @base_url = base_url
    @client = HTTP.persistent(base_url)
  end

  sig { params(path: String).returns(String) }
  def get(path:)
    @client.get("#{@base_url}#{path}").body.to_s
  end

  sig { params(path: String, body: String).returns(String) }
  def post(path:, body:)
    @client.post("#{@base_url}#{path}", body: body).body.to_s
  end
end

Type Safety

Sorbet signatures on tool methods drive both JSON schema generation and automatic type coercion of LLM responses.

Basic Types

sig { params(
  text: String,
  count: Integer,
  score: Float,
  enabled: T::Boolean,
  threshold: Numeric
).returns(String) }
def analyze(text:, count:, score:, enabled:, threshold:)
  # ...
end
Sorbet TypeJSON Schema
String{"type": "string"}
Integer{"type": "integer"}
Float{"type": "number"}
Numeric{"type": "number"}
T::Boolean{"type": "boolean"}
T::Enum{"type": "string", "enum": [...]}
T::Struct{"type": "object", "properties": {...}}
T::Array[Type]{"type": "array", "items": {...}}
T::Hash[K, V]{"type": "object", "additionalProperties": {...}}
T.nilable(Type){"type": [original, "null"]}
T.any(T1, T2){"oneOf": [{...}, {...}]}
T.class_of(X){"type": "string"}

T::Enum Parameters

Define a T::Enum and reference it in a tool signature. DSPy.rb generates a JSON Schema enum constraint and automatically deserializes the LLM’s string response into the correct enum instance.

class Priority < T::Enum
  enums do
    Low = new('low')
    Medium = new('medium')
    High = new('high')
    Critical = new('critical')
  end
end

class Status < T::Enum
  enums do
    Pending = new('pending')
    InProgress = new('in-progress')
    Completed = new('completed')
  end
end

sig { params(priority: Priority, status: Status).returns(String) }
def update_task(priority:, status:)
  "Updated to #{priority.serialize} / #{status.serialize}"
end

The generated schema constrains the parameter to valid values:

{
  "priority": {
    "type": "string",
    "enum": ["low", "medium", "high", "critical"]
  }
}

Case-insensitive matching: When the LLM returns "HIGH" or "High" instead of "high", DSPy.rb first tries an exact try_deserialize, then falls back to a case-insensitive lookup. This prevents failures caused by LLM casing variations.

T::Struct Parameters

Use T::Struct for complex nested objects. DSPy.rb generates nested JSON Schema properties and recursively coerces the LLM’s hash response into struct instances.

class TaskMetadata < T::Struct
  prop :id, String
  prop :priority, Priority
  prop :tags, T::Array[String]
  prop :estimated_hours, T.nilable(Float), default: nil
end

class TaskRequest < T::Struct
  prop :title, String
  prop :description, String
  prop :status, Status
  prop :metadata, TaskMetadata
  prop :assignees, T::Array[String]
end

sig { params(task: TaskRequest).returns(String) }
def create_task(task:)
  "Created: #{task.title} (#{task.status.serialize})"
end

The LLM sees the full nested object schema and DSPy.rb reconstructs the struct tree from the JSON response, including enum fields inside nested structs.

Nilable Parameters

Mark optional parameters with T.nilable(...) and provide a default value of nil in the method signature. These parameters are excluded from the JSON Schema required array.

sig { params(
  query: String,
  max_results: T.nilable(Integer),
  filter: T.nilable(String)
).returns(String) }
def search(query:, max_results: nil, filter: nil)
  # query is required; max_results and filter are optional
end

Collections

Typed arrays and hashes generate precise item/value schemas:

sig { params(
  tags: T::Array[String],
  priorities: T::Array[Priority],
  config: T::Hash[String, T.any(String, Integer, Float)]
).returns(String) }
def configure(tags:, priorities:, config:)
  # Array elements and hash values are validated and coerced
end

Union Types

T.any(...) generates a oneOf JSON Schema. When one of the union members is a T::Struct, DSPy.rb uses the _type discriminator field to select the correct struct class during coercion.

sig { params(value: T.any(String, Integer, Float)).returns(String) }
def handle_flexible(value:)
  # Accepts multiple types
end

Built-in Toolsets

TextProcessingToolset

DSPy::Tools::TextProcessingToolset provides Unix-style text analysis and manipulation operations. Toolset name prefix: text.

Tool NameMethodDescription
text_grepgrepSearch for patterns with optional case-insensitive and count-only modes
text_wcword_countCount lines, words, and characters
text_rgripgrepFast pattern search with context lines
text_extract_linesextract_linesExtract a range of lines by number
text_filter_linesfilter_linesKeep or reject lines matching a regex
text_unique_linesunique_linesDeduplicate lines, optionally preserving order
text_sort_linessort_linesSort lines alphabetically or numerically
text_summarize_textsummarize_textProduce a statistical summary (counts, averages, frequent words)

Usage:

agent = DSPy::ReAct.new(
  AnalyzeText,
  tools: DSPy::Tools::TextProcessingToolset.to_tools
)

result = agent.call(text: log_contents, question: "How many error lines are there?")
puts result.answer

GitHubCLIToolset

DSPy::Tools::GitHubCLIToolset wraps the gh CLI for read-oriented GitHub operations. Toolset name prefix: github.

Tool NameMethodDescription
github_list_issueslist_issuesList issues filtered by state, labels, assignee
github_list_prslist_prsList pull requests filtered by state, author, base
github_get_issueget_issueRetrieve details of a single issue
github_get_prget_prRetrieve details of a single pull request
github_api_requestapi_requestMake an arbitrary GET request to the GitHub API
github_traffic_viewstraffic_viewsFetch repository traffic view counts
github_traffic_clonestraffic_clonesFetch repository traffic clone counts

This toolset uses T::Enum parameters (IssueState, PRState, ReviewState) for state filters, demonstrating enum-based tool signatures in practice.

agent = DSPy::ReAct.new(
  RepoAnalysis,
  tools: DSPy::Tools::GitHubCLIToolset.to_tools
)

Testing

Unit Testing Individual Tools

Test DSPy::Tools::Base subclasses by instantiating and calling call directly:

RSpec.describe WeatherLookup do
  subject(:tool) { described_class.new }

  it "returns weather for a city" do
    result = tool.call(city: "Berlin")
    expect(result).to include("Berlin")
  end

  it "exposes the correct tool name" do
    expect(tool.name).to eq("weather_lookup")
  end

  it "generates a valid schema" do
    schema = described_class.call_schema_object
    expect(schema[:required]).to include("city")
    expect(schema[:properties]).to have_key(:city)
  end
end

Unit Testing Toolsets

Test toolset methods directly on an instance. Verify tool generation with to_tools:

RSpec.describe DatabaseToolset do
  subject(:toolset) { described_class.new }

  it "executes a query" do
    result = toolset.query(sql: "SELECT 1")
    expect(result).to be_a(String)
  end

  it "generates tools with correct names" do
    tools = described_class.to_tools
    names = tools.map(&:name)
    expect(names).to contain_exactly("db_query", "db_insert", "db_delete")
  end

  it "generates tool descriptions" do
    tools = described_class.to_tools
    query_tool = tools.find { |t| t.name == "db_query" }
    expect(query_tool.description).to eq("Run a read-only SQL query")
  end
end

Mocking Predictions Inside Tools

When a tool calls a DSPy predictor internally, stub the predictor to isolate tool logic from LLM calls:

class SmartSearchTool < DSPy::Tools::Base
  extend T::Sig

  tool_name "smart_search"
  tool_description "Search with query expansion"

  sig { void }
  def initialize
    @expander = DSPy::Predict.new(QueryExpansionSignature)
  end

  sig { params(query: String).returns(String) }
  def call(query:)
    expanded = @expander.call(query: query)
    perform_search(expanded.expanded_query)
  end

  private

  def perform_search(query)
    # actual search logic
  end
end

RSpec.describe SmartSearchTool do
  subject(:tool) { described_class.new }

  before do
    expansion_result = double("result", expanded_query: "expanded test query")
    allow_any_instance_of(DSPy::Predict).to receive(:call).and_return(expansion_result)
  end

  it "expands the query before searching" do
    allow(tool).to receive(:perform_search).with("expanded test query").and_return("found 3 results")
    result = tool.call(query: "test")
    expect(result).to eq("found 3 results")
  end
end

Testing Enum Coercion

Verify that string values from LLM responses deserialize into the correct enum instances:

RSpec.describe "enum coercion" do
  it "handles case-insensitive enum values" do
    toolset = GitHubCLIToolset.new
    # The LLM may return "OPEN" instead of "open"
    result = toolset.list_issues(state: IssueState::Open)
    expect(result).to be_a(String)
  end
end

Constraints

  • All exposed tool methods must use keyword arguments. Positional-only parameters generate schemas but keyword arguments produce more reliable LLM interactions.
  • Each exposed method becomes a separate, independent tool. Method chaining or multi-step sequences within a single tool call are not supported.
  • Shared state across tool proxies is scoped to a single to_tools call. Separate to_tools invocations create separate toolset instances.
  • Methods without a Sorbet sig produce an empty parameter schema. The LLM will not know what arguments to pass.
#dspy #ruby

数据统计

总访客 -- 总访问 --
ESC
输入关键词开始搜索