Cohesion Deficit Signal (COD)¶
Signal ID: COD
Full name: Cohesion Deficit
Type: Scoring signal (contributes to drift score)
Default weight: 0.01
Scope: cross_file
What COD detects¶
COD detects "god modules" — files containing many semantically unrelated responsibilities. When a file has functions that share little naming overlap, it's likely doing too many things at once. This is especially common in AI-generated code, where an LLM adds new functionality to the nearest available file.
Before — low cohesion¶
# utils.py
def validate_email(email): ...
def connect_database(url): ...
def format_currency(amount): ...
def resize_image(path, width): ...
def send_notification(user, message): ...
def parse_csv_headers(content): ...
Six functions with zero semantic overlap — a classic "utility dumping ground."
After — cohesive modules¶
# validation.py
def validate_email(email): ...
def validate_phone(phone): ...
# formatting.py
def format_currency(amount): ...
def format_date(dt): ...
# notifications.py
def send_notification(user, message): ...
Why cohesion deficits matter¶
- Change amplification — any feature change might touch the god module, increasing merge conflicts.
- Testing difficulty — a test for one function requires importing unrelated dependencies.
- AI grows god modules — code assistants see the file and keep adding to it.
- Ownership ambiguity — no one team or developer owns the full module.
How the score is calculated¶
COD uses pairwise name-overlap (Jaccard similarity) on function and class names within a file:
- Tokenize names — split
validate_emailinto{validate, email}. - Compute pairwise Jaccard — measure word overlap between all function name pairs.
- Average similarity — lower average = lower cohesion = higher score.
Severity thresholds:
| Score range | Severity |
|---|---|
| ≥ 0.7 | HIGH |
| ≥ 0.5 | MEDIUM |
| ≥ 0.3 | LOW |
| < 0.3 | INFO |
How to fix COD findings¶
- Split the module — group related functions into separate files.
- Use the "single reason to change" rule — each module should have one cohesive responsibility.
- Create a package — replace
utils.pywith autils/directory containing focused modules. - Move incrementally — introduce the new structure and migrate function by function.
Configuration¶
Detection details¶
- Parse all functions and classes from AST.
- Tokenize names using underscore/camelCase splitting.
- Compute pairwise Jaccard similarity for all name pairs in the file.
- Average across all pairs to get a cohesion score.
- Invert — low cohesion = high deficit score.
COD is deterministic, AST-based, and uses no LLM calls.
Related signals¶
- MDS (Mutant Duplicates) — detects duplicate functions. COD detects unrelated functions in the same file.
- FOE (Fan-Out Explosion) — detects files importing too many modules. COD detects files containing too many responsibilities.