Skip to content

How Cently Categorizes Your Transactions (Reliably, Fast, and With Context)

2025-09-12

September 12, 2025

So I got this email that made me realize we'd been thinking about transaction categorization all wrong.

A customer uploaded 248 transactions with categories, but 37 of them came through blank. They tried multiple times and kept getting the same result. They were frustrated, and honestly, so was I.

The problem wasn't that our AI was bad - it was that we were treating categorization like a one-shot problem instead of learning from what users actually do.

The obvious solution that doesn't work

My first instinct was to build better rules. You know, "if merchant contains 'STARBUCKS' then category = 'Coffee'." But merchant names are chaos. You get "SBUX Store #1234", "Starbucks Coffee #567", "SBX*DOWNTOWN", and about fifteen other variations for the same coffee shop.

Rules break constantly. Merchants change their payment processors, add location codes, use abbreviations. You end up spending more time fixing categories than actually budgeting.

What we built instead

We built something that learns from your own spending patterns while still being smart about new merchants. Here's how it actually works:

First, we try to map everything at once. When you upload a file, we look at all your categories and ask an AI to map them to standard categories. Things like "Coffee" → "food and drink coffee" or "Gas" → "transportation gas." This handles most of your transactions in one go.

Then we check your history. For anything that didn't map cleanly, we search through your past transactions using fuzzy matching. If you've ever categorized something similar before, we use that. So "SBUX Store #1234" matches your previous "Starbucks Coffee" transaction and gets the same category.

Finally, we analyze the weird ones individually. If we still can't figure it out, we look at each transaction separately - the merchant name, the amount, any category you provided - and make our best guess.

How we categorize your transactions

The key insight is that your own spending patterns are way more valuable than any generic rules we could write.

Why this approach works

Let me give you some real examples:

  • "AMZN Mktp US*AB12" looks like gibberish, but if you've bought from Amazon before, we know it's shopping.
  • "ACH DEBIT RENT PAYMENT" is obviously rent, even though it doesn't mention your landlord's name.
  • "VENMO PAYMENT 1234" could be anything, but if you always categorize Venmo as "Personal Transfer," we remember that.

The system gets smarter every time you upload transactions because it's learning from your actual behavior, not trying to guess what you might want.

The technical stuff (for the curious)

We use PostgreSQL's trigram similarity to find fuzzy matches in your transaction history. It's fast enough to search thousands of transactions in under a second, and it catches variations that exact string matching would miss.

For the AI parts, we use Claude to understand messy merchant names and map categories. But we're not relying on it to be perfect - it's just one part of a system that has multiple fallbacks.

Everything maps to our standard category system (which is compatible with Plaid's taxonomy), so your reports stay consistent whether you upload files or connect your bank directly.

What you can expect

The first upload might have some misses while the system learns your patterns. But it gets dramatically better with each file because it's building up a history of how you categorize things.

Most customers see significant improvement after their second or third upload as the system builds up their transaction history. And if you connect your bank accounts directly through Plaid, you skip the whole upload process - transactions just sync automatically with their proper categories.

The bigger lesson

This whole experience taught me that the best solutions often come from combining different approaches instead of trying to find one perfect method. AI is great at understanding context, but your historical data is great at consistency. Fuzzy matching handles variations, but exact rules work for clear cases.

The magic happens when you layer these approaches so they complement each other's weaknesses.

If you're dealing with messy, inconsistent data in your own product, maybe the answer isn't finding the perfect algorithm - maybe it's building a system that tries multiple approaches and learns from what works.

Try it yourself →