A friend of ours was beta-testing the agent. His agency is called Kolect. He set it up, ran the first scan, and what came back was… advice on how to organize a Pokémon trading-card collection.
He was good-natured about it. We spent the next three days finding out why.
Layer 1: the obvious culprit
The first guess was that our query expansion was being too aggressive — turning “Kolect” into “collect” via an over-eager spell-correction. We logged the actual queries hitting the search providers and confirmed the suspicion: yes, three of the five providers were silently substituting collect for Kolect.
We turned spell-correction off across the board and re-ran. Things got worse. Now the queries returned almost nothing, and what they did return was unrelated — random Reddit threads where someone happened to type the word “Kolect” in a typo for “collect.”
Layer 2: the disambiguation gap
It turned out spell-correction wasn't the cause; it was a symptom. The real issue was that our query planner had no idea what kind of entity “Kolect” was supposed to be. Without context — industry, geography, what they actually do — the planner was treating the brand name as a free-form keyword. And free-form keywords sit on top of a power-law distribution: most documents matching the literal string are noise.
Stripe doesn't have this problem because everyone knows what Stripe is. Kolect, with 38 employees and a niche in creator-matching, does. So do almost all our users — that's the market we serve.
Layer 3: the rewrite
We rebuilt the query planner around a strict rule: every search query carries the full brand context — industry, market, product description, and disambiguation phrases — whether the underlying provider supports structured queries or not.
For providers that support boolean operators, this becomes:
(Kolect OR "Kolect agency") AND (
"creator agency" OR "ugc" OR "influencer marketing"
) AND (
"us" OR "united states"
)For providers that only accept a single string, we generate a denser query:
Kolect creator agency UGC influencer marketing USAnd we score every returned result against the same context vector before we surface it to the user. Anything below a 0.6 relevance score gets dropped silently. Results between 0.6 and 0.8 get a small “low-confidence” flag in the UI so the user can see we're less sure.
What it taught us
Three things that have shaped how we think about the agent:
- The user's brand name is rarely enough. Treat every search as a context-rich operation, even if that triples the prompt.
- Silent quality is more important than apparent quantity. Better to show 7 relevant results than 30 with 23 noise.
- The user has to be able to feel us being careful.The “low-confidence” flag added cost; it also added trust. People who saw the flag assumed the high-confidence results were actually high-confidence.
Kolect now sees Kolect-relevant signals. Pokémon collectors are, presumably, also seeing Pokémon-relevant signals — somewhere else.
Not my brandreject button on every signal. Each press updates the per-brand filter within minutes. It's the most-used button in the beta.