Database Schema Updates - Phase 2
Migration: 004_multi_file_auto_categorization.sql Applied: November 28, 2025 Purpose: Support multi-file uploads, auto-categorization, and on-demand analysis
Overview
Phase 2 introduces database schema changes to support intelligent document management with auto-categorization and selective analysis. This migration adds new columns to track document categories, analysis status, and automated property-level counters.
Schema Changes
1. Analyses Table - New Columns
The analyses table now includes the following columns:
| Column | Type | Default | Description |
|---|---|---|---|
category | TEXT | 'uncategorized' | Document importance category |
is_analyzed | BOOLEAN | false | Whether full analysis (vision + embeddings) is complete |
categorization_metadata | JSONB | Details about how document was categorized | |
original_filename | TEXT | NULL | Original uploaded filename for display |
file_size_bytes | BIGINT | NULL | File size in bytes |
error_message | TEXT | NULL | Processing error if analysis failed |
Category Values
The category column is constrained to these values:
CHECK (category IN ('critical', 'important', 'optional', 'noise', 'uncategorized'))
Category Definitions:
- critical: Inspection reports, property disclosures, structural reports (auto-analyzed)
- important: Loan documents, appraisals, title reports (auto-analyzed)
- optional: Purchase agreements, contracts, forms (analyzed on-demand)
- noise: Receipts, acknowledgements, HOA rules (not analyzed)
- uncategorized: Documents that couldn't be categorized
Categorization Metadata Structure
The categorization_metadata JSONB field stores details about the categorization process:
{
"method": "filename_pattern",
"confidence": 0.95,
"matched_patterns": [
"(?i)inspection.*report",
"(?i)home.*inspection"
],
"reasoning": "Filename matches critical patterns",
"timestamp": "2025-11-28T10:30:00Z"
}
Fields:
method: How document was categorized (filename_pattern, ai_scan, manual)confidence: Confidence score 0.0-1.0matched_patterns: Regex patterns that matched (for filename_pattern method)reasoning: Human-readable explanationtimestamp: When categorization occurred
Analysis Status Flow
Documents go through the following states:
1. Upload → is_analyzed: false, status: 'processing'
2. Categorization → category: 'critical', categorization_metadata: {...}
3. Background Analysis → is_analyzed: true, status: 'completed'
Example Queries:
-- Find all unanalyzed critical documents for a property
SELECT id, original_filename, category
FROM analyses
WHERE property_id = '550e8400-e29b-41d4-a716-446655440000'
AND is_analyzed = false
AND category IN ('critical', 'important');
-- Get analysis completion rate for a property
SELECT
COUNT(*) as total_docs,
COUNT(*) FILTER (WHERE is_analyzed = true) as analyzed_docs,
ROUND(100.0 * COUNT(*) FILTER (WHERE is_analyzed = true) / COUNT(*), 2) as completion_rate
FROM analyses
WHERE property_id = '550e8400-e29b-41d4-a716-446655440000';
2. Properties Table - New Columns
The properties table now includes denormalized counters for performance:
| Column | Type | Default | Description |
|---|---|---|---|
total_documents | INTEGER | 0 | Total number of documents for this property |
analyzed_documents | INTEGER | 0 | Count of documents with is_analyzed = true |
critical_documents | INTEGER | 0 | Count of critical/important documents |
These counters are automatically maintained by database triggers (see below).
Example Queries:
-- Get properties with unanalyzed critical documents
SELECT id, address, total_documents, analyzed_documents, critical_documents
FROM properties
WHERE critical_documents > analyzed_documents;
-- Get properties sorted by document count
SELECT id, address, total_documents, analyzed_documents
FROM properties
ORDER BY total_documents DESC
LIMIT 10;
Database Triggers
Auto-Update Property Counters
A trigger automatically updates property-level counters whenever documents are added, updated, or deleted.
Trigger: trigger_update_property_counts
Function: update_property_document_counts()
Fires: AFTER INSERT, UPDATE, DELETE on analyses
How It Works
-- Trigger fires on any change to analyses table
INSERT INTO analyses (...) VALUES (...); -- Trigger updates property counts
UPDATE analyses SET is_analyzed = true WHERE id = 'ana_123'; -- Trigger updates
DELETE FROM analyses WHERE id = 'ana_123'; -- Trigger updates
The trigger function:
- Identifies which property was affected
- Recalculates all three counters from the
analysestable - Updates the
propertiestable - Sets
updated_atto current timestamp
Performance Note: The trigger uses COUNT queries which are fast for moderate document counts (<1000 per property). For high-volume properties, consider async counter updates.
Example Trigger Execution
-- Before: property has no documents
SELECT total_documents, analyzed_documents FROM properties WHERE id = 'prop_123';
-- Result: 0, 0
-- Upload 3 documents
INSERT INTO analyses (property_id, category, is_analyzed) VALUES
('prop_123', 'critical', false),
('prop_123', 'important', false),
('prop_123', 'optional', false);
-- After: trigger automatically updated counts
SELECT total_documents, analyzed_documents, critical_documents FROM properties WHERE id = 'prop_123';
-- Result: 3, 0, 2
-- Background analysis completes for 2 documents
UPDATE analyses SET is_analyzed = true WHERE property_id = 'prop_123' AND category IN ('critical', 'important');
-- After: trigger updated analyzed count
SELECT total_documents, analyzed_documents, critical_documents FROM properties WHERE id = 'prop_123';
-- Result: 3, 2, 2
Indexes
New indexes optimize common query patterns:
Single-Column Indexes
-- Fast lookup by property
CREATE INDEX idx_analyses_property_id ON analyses(property_id);
-- Filter by category
CREATE INDEX idx_analyses_category ON analyses(category);
-- Filter by analysis status
CREATE INDEX idx_analyses_is_analyzed ON analyses(is_analyzed);
Composite Index
-- Optimize multi-condition queries
CREATE INDEX idx_analyses_property_category ON analyses(property_id, category, is_analyzed);
Query Optimization Examples:
-- Uses idx_analyses_property_category
SELECT * FROM analyses
WHERE property_id = 'prop_123'
AND category = 'critical'
AND is_analyzed = false;
-- Uses idx_analyses_property_id
SELECT * FROM analyses
WHERE property_id = 'prop_123'
ORDER BY created_at DESC;
-- Uses idx_analyses_category
SELECT category, COUNT(*) FROM analyses
GROUP BY category;
Migration SQL
Full migration file: /backend/migrations/004_multi_file_auto_categorization.sql
Key Sections
1. Add Columns
ALTER TABLE analyses
ADD COLUMN IF NOT EXISTS category TEXT DEFAULT 'uncategorized'
CHECK (category IN ('critical', 'important', 'optional', 'noise', 'uncategorized')),
ADD COLUMN IF NOT EXISTS is_analyzed BOOLEAN DEFAULT false,
ADD COLUMN IF NOT EXISTS categorization_metadata JSONB DEFAULT '{}'::jsonb,
ADD COLUMN IF NOT EXISTS original_filename TEXT,
ADD COLUMN IF NOT EXISTS file_size_bytes BIGINT,
ADD COLUMN IF NOT EXISTS error_message TEXT;
2. Create Indexes
CREATE INDEX IF NOT EXISTS idx_analyses_property_id ON analyses(property_id);
CREATE INDEX IF NOT EXISTS idx_analyses_category ON analyses(category);
CREATE INDEX IF NOT EXISTS idx_analyses_is_analyzed ON analyses(is_analyzed);
CREATE INDEX IF NOT EXISTS idx_analyses_property_category ON analyses(property_id, category, is_analyzed);
3. Add Property Counters
ALTER TABLE properties
ADD COLUMN IF NOT EXISTS total_documents INTEGER DEFAULT 0,
ADD COLUMN IF NOT EXISTS analyzed_documents INTEGER DEFAULT 0,
ADD COLUMN IF NOT EXISTS critical_documents INTEGER DEFAULT 0;
4. Create Trigger Function
CREATE OR REPLACE FUNCTION update_property_document_counts()
RETURNS TRIGGER AS $$
DECLARE
target_property_id UUID;
BEGIN
-- Determine which property to update
IF TG_OP = 'DELETE' THEN
target_property_id := OLD.property_id;
ELSE
target_property_id := NEW.property_id;
END IF;
-- Update all counters atomically
UPDATE properties
SET
total_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = target_property_id),
analyzed_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = target_property_id AND is_analyzed = true),
critical_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = target_property_id AND category IN ('critical', 'important')),
updated_at = NOW()
WHERE id = target_property_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
5. Attach Trigger
CREATE TRIGGER trigger_update_property_counts
AFTER INSERT OR UPDATE OR DELETE ON analyses
FOR EACH ROW
EXECUTE FUNCTION update_property_document_counts();
6. Backfill Existing Data
-- Mark existing documents as analyzed
UPDATE analyses
SET
category = 'critical',
is_analyzed = (status = 'completed'),
categorization_metadata = jsonb_build_object('method', 'migration_backfill', 'timestamp', NOW())
WHERE category IS NULL OR is_analyzed IS NULL;
-- Backfill property counters
UPDATE properties p
SET
total_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = p.id),
analyzed_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = p.id AND is_analyzed = true),
critical_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = p.id AND category IN ('critical', 'important'))
WHERE id IN (SELECT DISTINCT property_id FROM analyses WHERE property_id IS NOT NULL);
Rollback Strategy
To rollback this migration:
-- 1. Drop trigger
DROP TRIGGER IF EXISTS trigger_update_property_counts ON analyses;
DROP FUNCTION IF EXISTS update_property_document_counts();
-- 2. Drop indexes
DROP INDEX IF EXISTS idx_analyses_property_id;
DROP INDEX IF EXISTS idx_analyses_category;
DROP INDEX IF EXISTS idx_analyses_is_analyzed;
DROP INDEX IF EXISTS idx_analyses_property_category;
-- 3. Remove columns from properties
ALTER TABLE properties
DROP COLUMN IF EXISTS total_documents,
DROP COLUMN IF EXISTS analyzed_documents,
DROP COLUMN IF EXISTS critical_documents;
-- 4. Remove columns from analyses
ALTER TABLE analyses
DROP COLUMN IF EXISTS category,
DROP COLUMN IF EXISTS is_analyzed,
DROP COLUMN IF EXISTS categorization_metadata,
DROP COLUMN IF EXISTS original_filename,
DROP COLUMN IF EXISTS file_size_bytes,
DROP COLUMN IF EXISTS error_message;
Warning: Rollback will delete all categorization data. Ensure you have a database backup before rolling back.
Performance Considerations
Index Maintenance
- Indexes are automatically maintained by PostgreSQL
- No manual REINDEX required for normal operations
- Consider VACUUM ANALYZE after bulk inserts
Trigger Performance
- Trigger fires on EVERY row change
- For bulk operations, consider disabling trigger temporarily:
-- Disable trigger for bulk insert
ALTER TABLE analyses DISABLE TRIGGER trigger_update_property_counts;
-- Bulk insert
INSERT INTO analyses (...) SELECT ... FROM ...;
-- Re-enable trigger
ALTER TABLE analyses ENABLE TRIGGER trigger_update_property_counts;
-- Manually update counts once
UPDATE properties p
SET total_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = p.id),
analyzed_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = p.id AND is_analyzed = true),
critical_documents = (SELECT COUNT(*) FROM analyses WHERE property_id = p.id AND category IN ('critical', 'important'));
Query Optimization
Use the composite index for common queries:
-- Optimized: uses composite index
EXPLAIN ANALYZE
SELECT * FROM analyses
WHERE property_id = 'prop_123'
AND category IN ('critical', 'important')
AND is_analyzed = false;
-- Expected: Index Scan using idx_analyses_property_category
Monitoring
Check Migration Status
-- Verify columns exist
SELECT column_name, data_type, column_default
FROM information_schema.columns
WHERE table_name = 'analyses'
AND column_name IN ('category', 'is_analyzed', 'categorization_metadata', 'original_filename', 'file_size_bytes', 'error_message');
-- Verify indexes exist
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'analyses'
AND indexname LIKE 'idx_analyses_%';
-- Verify trigger exists
SELECT trigger_name, event_manipulation, event_object_table
FROM information_schema.triggers
WHERE trigger_name = 'trigger_update_property_counts';
Data Quality Checks
-- Check for uncategorized documents
SELECT COUNT(*), category
FROM analyses
GROUP BY category;
-- Check for documents missing filenames
SELECT COUNT(*) FROM analyses WHERE original_filename IS NULL;
-- Check for properties with incorrect counters
SELECT p.id, p.address,
p.total_documents as stored_count,
COUNT(a.id) as actual_count
FROM properties p
LEFT JOIN analyses a ON a.property_id = p.id
GROUP BY p.id
HAVING p.total_documents != COUNT(a.id);
Best Practices
1. Always Set Category on Insert
-- Good: category specified
INSERT INTO analyses (id, property_id, category, is_analyzed)
VALUES ('ana_123', 'prop_456', 'critical', false);
-- Acceptable: uses default
INSERT INTO analyses (id, property_id)
VALUES ('ana_123', 'prop_456'); -- category defaults to 'uncategorized'
2. Update is_analyzed After Processing
-- Mark document as analyzed after background processing completes
UPDATE analyses
SET
is_analyzed = true,
status = 'completed'
WHERE id = 'ana_123';
-- Trigger automatically updates property counters
3. Store Categorization Details
-- Store rich categorization metadata
UPDATE analyses
SET
category = 'critical',
categorization_metadata = jsonb_build_object(
'method', 'filename_pattern',
'confidence', 0.95,
'matched_patterns', ARRAY['(?i)inspection.*report'],
'reasoning', 'Filename matches critical patterns',
'timestamp', NOW()
)
WHERE id = 'ana_123';
4. Handle Errors Gracefully
-- Record processing errors
UPDATE analyses
SET
status = 'failed',
error_message = 'Gemini Vision API timeout after 30s',
is_analyzed = false
WHERE id = 'ana_123';
Support
For database-related questions:
- Migration Issues: Check
/backend/migrations/directory - Query Performance: Use EXPLAIN ANALYZE
- Data Integrity: Run data quality checks above
- Supabase Console: https://app.supabase.com/project/omwmzlmmdrkfhylvgkcy