Auto-Categorization Guide
Version: 1.0 Last Updated: November 28, 2025
Overview
The auto-categorization system intelligently classifies uploaded documents to determine which require full AI analysis. This dramatically reduces API costs (74% fewer calls) while ensuring critical documents are always analyzed.
How It Works
Two-Stage Process
-
Filename Pattern Matching (instant, ~80% accuracy)
- Uses 60+ patterns to match common document types
- Returns category and confidence score
-
AI Scan Fallback (when confidence < 0.8)
- Analyzes first page with AI
- Provides reasoning and higher confidence
Decision Flow
Categories
Critical (Auto-Analyzed)
Purpose: Documents essential for safety, legal compliance, or major financial decisions.
Filename Patterns:
(?i)inspection.*report
(?i)home.*inspection
(?i)property.*inspection
(?i)structural.*report
(?i)pest.*inspection
(?i)termite.*report
(?i)seller.*disclosure
(?i)property.*disclosure
(?i)natural.*hazard.*disclosure
(?i)lead.*based.*paint
(?i)environmental.*disclosure
(?i)material.*facts.*disclosure
Example Filenames:
Home_Inspection_Report_123_Main_St.pdfSeller_Property_Disclosure_Statement.pdfPest_Termite_Inspection_2024.pdfNatural_Hazard_Disclosure_Form.pdfLead_Based_Paint_Disclosure.pdf
Auto-Analysis: ✅ Yes (queued immediately on upload)
Important (Auto-Analyzed)
Purpose: Documents providing significant financial or legal value.
Filename Patterns:
(?i)loan.*estimate
(?i)closing.*disclosure
(?i)mortgage.*application
(?i)pre.*approval
(?i)financing.*terms
(?i)lending.*disclosure
(?i)appraisal.*report
(?i)property.*valuation
(?i)comparative.*market.*analysis
(?i)cma.*report
(?i)title.*report
(?i)preliminary.*title
(?i)title.*commitment
Example Filenames:
Loan_Estimate_Wells_Fargo.pdfClosing_Disclosure_Final.pdfAppraisal_Report_456_Oak_Ave.pdfPreliminary_Title_Report.pdfMortgage_Pre_Approval_Letter.pdf
Auto-Analysis: ✅ Yes (queued immediately on upload)
Optional (On-Demand)
Purpose: Documents analyzed only when user specifically asks about them.
Filename Patterns:
(?i)purchase.*agreement
(?i)sales.*contract
(?i)listing.*agreement
(?i)addendum
(?i)amendment
(?i)buyer.*representation
(?i)agency.*disclosure
(?i)escrow.*instructions
(?i)wire.*transfer.*authorization
Example Filenames:
Purchase_Agreement_Residential.pdfSales_Contract_Signed.pdfAddendum_A_Repairs.pdfBuyer_Representation_Agreement.pdfEscrow_Instructions.pdf
Auto-Analysis: ❌ No (analyzed when user asks)
Trigger Example:
User: "What does the purchase agreement say about contingencies?"
System: Detects "purchase" + "agreement" keywords
→ Finds unanalyzed Purchase_Agreement.pdf
→ Queues for background analysis
→ Notifies user analysis is in progress
Noise (Not Analyzed)
Purpose: Administrative documents that don't require AI analysis.
Filename Patterns:
(?i)receipt
(?i)acknowledgement
(?i)acknowledgment
(?i)notice.*to.*perform
(?i)contingency.*removal
(?i)hoa.*rules
(?i)cc&r
(?i)covenants.*conditions.*restrictions
(?i)calendar
(?i)schedule
(?i)checklist
Example Filenames:
Receipt_Earnest_Money.pdfAcknowledgement_Of_Receipt.pdfHOA_Rules_And_Regulations.pdfCC&Rs_Community_Association.pdfClosing_Checklist.pdf
Auto-Analysis: ❌ No (metadata only)
Performance Metrics
Categorization Speed
| Method | Speed | Accuracy |
|---|---|---|
| Filename Patterns | <1ms | ~80% |
| AI Scan | 500-1500ms | ~95% |
| Hybrid (filename + AI fallback) | 1-1500ms | ~90% |
Resource Savings
Scenario: 50 documents uploaded
Without categorization:
- All 50 documents analyzed
With categorization:
- Only critical/important documents analyzed automatically
- Optional documents analyzed on-demand
- 74% reduction in processing time
Best Practices
1. Use Descriptive Filenames
✅ Good: Home_Inspection_Report_123_Main_St_2024.pdf
❌ Bad: scan.pdf, document1.pdf, file.pdf
2. Monitor Categorization Results
Check the upload response for categorization details:
{
"documents": [
{
"filename": "Home_Inspection_Report.pdf",
"category": "critical",
"confidence": 0.95,
"reasoning": "Filename matches critical patterns"
}
]
}
3. Handle Skipped Documents
Documents categorized as uncategorized or noise are skipped by default. Use the force-analyze endpoint if needed:
POST /v1/analyses/{analysis_id}/analyze
Support
- Report Miscategorization: Email support@homeinsightai.com with filename
- Custom Categories: Contact enterprise@homeinsightai.com