Skip to content

False positives in PII/secret/crypto rules after indexer schema update #19

@dev-corelift

Description

@dev-corelift

Summary

Multiple security/privacy rules are flagging noise after the object-literal + assignment normalization landed in the indexer. The rules are still relying on
substring heuristics instead of the structured data (assignments.value_type, literal tables, resolved call metadata), so they treat routine code paths as critical
findings.

Affected rules

Rule name Example What actually happens
secret-hardcoded-assignment packages/edge/src/api/cache.ts:35 (const apiKey = request.headers.get('X-API-Key')) Header lookup is treated as a literal
secret because the rule doesn’t check value_type
crypto-weak-encryption packages/edge/src/api/github.ts:240 (changes.some(...includes('robots.txt'))) The helper sees includes( and matches des(;
there’s no DES usage
pii-api-exposure packages/edge/src/api/dashboard.ts:77 (GET /api/dashboard) The stored path string contains age (e.g. packages/... or query param
time_range), so the rule assumes GET-age in the route
pii-error-response packages/edge/src/api/sites.ts:989 (message: 'Site updated successfully') Flagged because message contains age
pii-unencrypted-storage packages/frontend/src/app/pages/sites/RemovalWizard.tsx:453 (React render loop) Literal strings such as package.json contain
age → rule thinks ssn is being stored

Additional hits can be reproduced by running:

aud full --offline
jq '.findings[] | select(.pattern_name=="secret-hardcoded-assignment")' .pf/raw/patterns.json

(Replace the pattern name to inspect the other buckets.)

## Root cause

The indexer now records assignments and call arguments in normalized tables with value_type, literal_value, and function metadata. The rules still look at raw
strings, so any substring that coincidentally contains des, age, ssn, etc. triggers a finding.

## Proposed fix

1. Update the affected rules to query the normalized schema:
    - Use assignments.value_type == 'string' or the new literal table when checking for hardcoded secrets.
    - Inspect resolved call names (e.g. assignment_id → function_call_args.callee_function) instead of substring matching.
    - For PII detectors, normalize tokens via the DB columns rather than raw string comparisons.
2. Add regression tests that cover the lovaseo examples so these code paths stay quiet.
3. Document the schema dependency in each rule module.

## Environment

- TheAuditor v1.4.2-RC1
- Python 3.12.3, aud full --offline

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions