Auto-Categorization Guide

Version: 1.0 Last Updated: November 28, 2025


Overview

The auto-categorization system intelligently classifies uploaded documents to determine which require full AI analysis. This dramatically reduces API costs (74% fewer calls) while ensuring critical documents are always analyzed.


How It Works

Two-Stage Process

  1. Filename Pattern Matching (instant, ~80% accuracy)

    • Uses 60+ patterns to match common document types
    • Returns category and confidence score
  2. AI Scan Fallback (when confidence < 0.8)

    • Analyzes first page with AI
    • Provides reasoning and higher confidence

Decision Flow


Categories

Critical (Auto-Analyzed)

Purpose: Documents essential for safety, legal compliance, or major financial decisions.

Filename Patterns:

Regex
(?i)inspection.*report
(?i)home.*inspection
(?i)property.*inspection
(?i)structural.*report
(?i)pest.*inspection
(?i)termite.*report
(?i)seller.*disclosure
(?i)property.*disclosure
(?i)natural.*hazard.*disclosure
(?i)lead.*based.*paint
(?i)environmental.*disclosure
(?i)material.*facts.*disclosure

Example Filenames:

  • Home_Inspection_Report_123_Main_St.pdf
  • Seller_Property_Disclosure_Statement.pdf
  • Pest_Termite_Inspection_2024.pdf
  • Natural_Hazard_Disclosure_Form.pdf
  • Lead_Based_Paint_Disclosure.pdf

Auto-Analysis: ✅ Yes (queued immediately on upload)


Important (Auto-Analyzed)

Purpose: Documents providing significant financial or legal value.

Filename Patterns:

Regex
(?i)loan.*estimate
(?i)closing.*disclosure
(?i)mortgage.*application
(?i)pre.*approval
(?i)financing.*terms
(?i)lending.*disclosure
(?i)appraisal.*report
(?i)property.*valuation
(?i)comparative.*market.*analysis
(?i)cma.*report
(?i)title.*report
(?i)preliminary.*title
(?i)title.*commitment

Example Filenames:

  • Loan_Estimate_Wells_Fargo.pdf
  • Closing_Disclosure_Final.pdf
  • Appraisal_Report_456_Oak_Ave.pdf
  • Preliminary_Title_Report.pdf
  • Mortgage_Pre_Approval_Letter.pdf

Auto-Analysis: ✅ Yes (queued immediately on upload)


Optional (On-Demand)

Purpose: Documents analyzed only when user specifically asks about them.

Filename Patterns:

Regex
(?i)purchase.*agreement
(?i)sales.*contract
(?i)listing.*agreement
(?i)addendum
(?i)amendment
(?i)buyer.*representation
(?i)agency.*disclosure
(?i)escrow.*instructions
(?i)wire.*transfer.*authorization

Example Filenames:

  • Purchase_Agreement_Residential.pdf
  • Sales_Contract_Signed.pdf
  • Addendum_A_Repairs.pdf
  • Buyer_Representation_Agreement.pdf
  • Escrow_Instructions.pdf

Auto-Analysis: ❌ No (analyzed when user asks)

Trigger Example:

User: "What does the purchase agreement say about contingencies?"
System: Detects "purchase" + "agreement" keywords
        → Finds unanalyzed Purchase_Agreement.pdf
        → Queues for background analysis
        → Notifies user analysis is in progress

Noise (Not Analyzed)

Purpose: Administrative documents that don't require AI analysis.

Filename Patterns:

Regex
(?i)receipt
(?i)acknowledgement
(?i)acknowledgment
(?i)notice.*to.*perform
(?i)contingency.*removal
(?i)hoa.*rules
(?i)cc&r
(?i)covenants.*conditions.*restrictions
(?i)calendar
(?i)schedule
(?i)checklist

Example Filenames:

  • Receipt_Earnest_Money.pdf
  • Acknowledgement_Of_Receipt.pdf
  • HOA_Rules_And_Regulations.pdf
  • CC&Rs_Community_Association.pdf
  • Closing_Checklist.pdf

Auto-Analysis: ❌ No (metadata only)


Performance Metrics

Categorization Speed

MethodSpeedAccuracy
Filename Patterns<1ms~80%
AI Scan500-1500ms~95%
Hybrid (filename + AI fallback)1-1500ms~90%

Resource Savings

Scenario: 50 documents uploaded

Without categorization:

  • All 50 documents analyzed

With categorization:

  • Only critical/important documents analyzed automatically
  • Optional documents analyzed on-demand
  • 74% reduction in processing time

Best Practices

1. Use Descriptive Filenames

Good: Home_Inspection_Report_123_Main_St_2024.pdf

Bad: scan.pdf, document1.pdf, file.pdf

2. Monitor Categorization Results

Check the upload response for categorization details:

JSON
{
  "documents": [
    {
      "filename": "Home_Inspection_Report.pdf",
      "category": "critical",
      "confidence": 0.95,
      "reasoning": "Filename matches critical patterns"
    }
  ]
}

3. Handle Skipped Documents

Documents categorized as uncategorized or noise are skipped by default. Use the force-analyze endpoint if needed:

Shell
POST /v1/analyses/{analysis_id}/analyze

Support

  • Report Miscategorization: Email support@homeinsightai.com with filename
  • Custom Categories: Contact enterprise@homeinsightai.com
Home Insight AI - Developer Portal