Skip to content

Feedback & Calibration Guide

Drift includes a Bayesian learning model that adjusts signal weights based on your feedback. Over time, drift learns which signals are accurate for your codebase and which produce false positives — automatically tuning detection to your context.

How it works

Drift uses three evidence sources to calibrate signal weights:

                    ┌──────────────────┐
                    │  Default Weights  │
                    │  (ablation study) │
                    └────────┬─────────┘
         ┌───────────────────┼───────────────────┐
         │                   │                   │
   ┌─────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
   │   Manual    │    │    Git      │    │   GitHub    │
   │  Feedback   │    │ Correlation │    │  Correlation│
   │  (CLI/API)  │    │ (auto)      │    │  (auto)     │
   └─────┬──────┘    └──────┬──────┘    └──────┬──────┘
         │                   │                   │
         └───────────────────┼───────────────────┘
                    ┌────────▼─────────┐
                    │  Bayesian Engine  │
                    │  build_profile()  │
                    └────────┬─────────┘
                    ┌────────▼─────────┐
                    │ Calibrated Weights│
                    │   (per-repo)      │
                    └──────────────────┘

Evidence source 1: Manual feedback

You mark findings as true positive (TP), false positive (FP), or false negative (FN):

# This finding is real — I fixed it
drift feedback mark --mark tp --signal PFS --file src/core/handler.py

# This is a false alarm — intentional pattern
drift feedback mark --mark fp --signal AVS --file src/api/routes.py \
    --reason "Allowed cross-layer import by design"

# Drift missed this problem entirely
drift feedback mark --mark fn --signal MDS --file src/utils/helpers.py

Evidence source 2: Git correlation (automatic)

Drift correlates historical findings with subsequent defect-fix commits. If a file flagged by a signal later receives a bugfix commit (matching patterns like fix:, bug, hotfix, revert), that counts as automatic TP evidence.

If no defect-fix appears within a configurable window (default 60 days), that counts as weak FP evidence.

Evidence source 3: GitHub issue correlation (automatic)

When a GitHub token is configured, drift correlates closed bug-labeled issues with the files changed in their fixing PRs. If a signal flagged those files, it gains TP evidence. If no signal flagged buggy files, those signals gain FN evidence.

The Bayesian formula

For each signal, drift computes calibrated weights using confidence-gated Bayesian interpolation:

\[ \text{confidence} = \min\left(1.0,\ \frac{TP + FP}{\text{min\_samples}}\right) \]
\[ \text{precision} = \frac{TP}{TP + FP} \]
\[ w_{\text{calibrated}} = (1 - \text{confidence}) \times w_{\text{default}} + \text{confidence} \times w_{\text{default}} \times \text{precision} \]

In plain language:

  • With few observations → weight stays close to the default (conservative)
  • With many observations and high precision → weight stays high
  • With many observations but low precision → weight drops (signal has too many false positives)
  • A safety floor prevents any signal from being fully suppressed (minimum 0.001)

FN boost

If a signal has false negatives (missed real problems), drift can boost its weight:

\[ w_{\text{calibrated}} \mathrel{+}= w_{\text{default}} \times \text{fn\_boost\_factor} \times \frac{FN}{TP + FN} \times \text{confidence} \]

Quick start

1. Enable calibration

# drift.yaml
calibration:
  enabled: true

2. Run analysis and review findings

drift analyze --repo .

3. Mark findings as you review them

# Real problem — true positive
drift feedback mark -m tp -s PFS -f src/core/handler.py

# False alarm — false positive
drift feedback mark -m fp -s AVS -f src/api/routes.py --reason "Intentional"

# Missed problem — false negative
drift feedback mark -m fn -s MDS -f src/utils/helpers.py

4. Check feedback summary

drift feedback summary

Output shows TP/FP/FN counts, precision, recall, and F1 per signal.

5. Run calibration

# Preview changes without applying
drift calibrate run --dry-run

# Apply calibrated weights to drift.yaml
drift calibrate run

# See detailed evidence per signal
drift calibrate explain

6. Verify calibration status

drift calibrate status

Auto-calibration

Enable continuous calibration — weights are recomputed on every drift analyze run:

calibration:
  enabled: true
  auto_recalibrate: true

GitHub correlation setup

For automatic issue↔finding correlation:

calibration:
  enabled: true
  github_token: null  # Or set DRIFT_GITHUB_TOKEN env var
  bug_labels:
    - bug
    - regression
    - defect

Import existing feedback

If you have prior feedback data (e.g., from another system), import it:

drift feedback import /path/to/prior_feedback.jsonl

Format: one JSON object per line with fields signal_type, file_path, verdict (tp/fp/fn).

Calibration lifecycle

Week 1:  drift analyze → review findings → mark TP/FP → calibrate run
Week 2:  drift analyze (auto-recalibrate) → fewer false alarms
Week 4:  confidence reaches 100% for active signals
Week 8:  Git correlation adds automatic evidence
Week 12: GitHub correlation enriches the profile further

How much feedback is actually needed?

Confidence is a linear ramp per signal — it kicks in immediately and grows with every observation:

TP+FP observations (per signal) Confidence Effect on weights
0 0 % No effect — default weight is used
5 25 % Minor shift toward observed precision
10 50 % Weight is a 50/50 blend of default and precision-scaled
15 75 % Weight mostly follows observed precision
20 100 % Full calibration: weight = default × observed_precision

The threshold (default min_samples: 20) applies per signal and per repo. A repo with 6 active signals needs up to 120 individual feedback entries for full coverage across all signals — but partial calibration is already effective from the first few entries.

Reset calibration

To revert to default weights:

drift calibrate reset

Configuration reference

Field Default Description
calibration.enabled false Master switch
calibration.min_samples 20 Observations for full confidence
calibration.correlation_window_days 30 Days to look for defect-fix commits
calibration.decay_days 90 Stale profile threshold
calibration.weak_fp_window_days 60 No fix in window → weak FP
calibration.fn_boost_factor 0.1 FN boost strength (0.0–1.0)
calibration.auto_recalibrate false Auto-calibrate on each analyze
calibration.github_token null GitHub API token
calibration.bug_labels ["bug", "regression", "defect"] Bug issue labels
calibration.feedback_path ".drift/feedback.jsonl" Feedback storage
calibration.history_dir ".drift/history" Scan history snapshots
calibration.max_snapshots 20 Max retained snapshots

Storage

File Purpose
.drift/feedback.jsonl Append-only feedback log (TP/FP/FN verdicts)
.drift/history/scan_*.json Historical scan snapshots for outcome correlation
.drift/history/calibration_profile.json Computed calibration profile

All paths are relative to repository root and configurable in drift.yaml.