Bank Statement Converter
AI-powered PDF to CSV converter using Gemini 2.0 Flash. Supports 1000+ banks with 98% extraction accuracy and 5-second processing.
TL;DR: TL;DR: I built an AI-powered PDF converter that extracts bank transactions using Gemini 2.5 Flash. Supports 1000+ banks with 98% extraction accuracy. Processes statements in under 5 seconds with zero data storage.
The Problem
Converting bank statement PDFs to spreadsheets is a surprisingly painful process:
- Manual data entry takes 30+ minutes per statement
- Copy-paste doesn't work because PDFs render text inconsistently
- Existing tools either cost $20+/month or require uploading to unknown servers
- Multi-page statements with hundreds of transactions? Forget about it.
I needed a tool for tracking expenses that could handle any bank format, work instantly, and not store my financial data.
My Approach
I built a two-phase processing pipeline:
- Inspect Phase: Validates the PDF structure, detects the bank format, and estimates page count
- Convert Phase: Sends pages to Gemini 2.5 Flash with a structured extraction schema
The key insight was using schema-based extraction rather than free-form parsing. I defined exactly what fields I wanted (date, description, amount, type), and Gemini returns structured JSON that maps directly to CSV columns.
For rate limiting without accounts, I track conversions in localStorage with a 24-hour rolling window.
Architecture
Bank Statement Converter - Architecture Diagram
Key Features
- Multi-bank support: Automatically detects and processes 1000+ bank formats
- Advanced OCR: Handles both digitally-generated and scanned PDFs
- Zero data storage: Files processed in-memory, deleted immediately after conversion
- Rate limiting: 2 free conversions/24hrs anonymous, 5 for registered users
- Preview mode: See extracted transactions before downloading
- Error recovery: 3 retries with exponential backoff for API failures
Results & Metrics
| Metric | Value |
|---|---|
| Extraction Accuracy | 98% on standard statements |
| Processing Time | <5 seconds per statement |
| File Size Limit | 10MB |
| Supported Banks | 1000+ |
| Page Support | Unlimited (multi-page) |
| Data Retention | 0 seconds (immediate deletion) |
What I Learned
The biggest challenge was handling edge cases in bank formats. Some banks use weird date formats (DD/MMM/YYYY), others split transactions across multiple lines, and a few put credits and debits in separate columns instead of signed amounts.
I solved this by making the Gemini prompt extremely specific about edge cases:
# Prompt includes explicit handling for:
# - Date formats: MM/DD/YYYY, DD/MM/YYYY, MMM DD YYYY
# - Amount formats: -$100.00, ($100.00), 100.00 CR
# - Multi-line descriptions: concatenate with spaces
# - Header/footer detection: skip non-transaction rows
If I were starting over, I'd add a feedback loop where users can correct misextracted transactions, and I'd use that to fine-tune the extraction prompt per bank.
Frequently Asked Questions
What problem does Bank Statement Converter solve?
It eliminates manual data entry when converting bank PDFs to spreadsheets. Instead of spending 30+ minutes copying transactions by hand, you get a clean CSV in under 5 seconds.
What technologies power this project?
Next.js 15 with App Router, Google Gemini 2.5 Flash for AI extraction, Supabase for authentication, and Vercel for deployment. The PDF is processed entirely in-memory with no server-side storage.
How accurate is the extraction?
98% accuracy on standard bank statements from major US banks. Edge cases like handwritten notes or unusual layouts may require manual correction. The tool works best with digitally-generated PDFs.
Frequently Asked Questions
More Projects
View allBuilt by Abhinav Sinha
AI-First Product Manager who builds production-grade tools. Passionate about turning complex problems into elegant solutions using AI, automation, and modern web technologies.