Deduplicate HubSpot contacts safely, at any scale

HubSpot's native duplicate manager stops at 5,000 records and cannot prevent new duplicates. How an engineered, name-safe deduplication clears the backlog at scale and keeps it clean.

John Kelleher
John Kelleher

Most teams discover the limits of HubSpot's native duplicate manager at the worst moment: when they finally try to clean up a database that has quietly filled with duplicates. You open the tool, and it shows you a sample. It will not show you the whole problem, it will not stop new duplicates arriving, and every merge it makes is permanent. For a small list that is fine. For a database of tens or hundreds of thousands of contacts, it is not the tool for the job.

This guide is about the job native dedup cannot finish: clearing a large duplicate backlog safely, and keeping it clean afterwards.

What HubSpot's native duplicate manager does, and where it stops

HubSpot's built-in tool genuinely helps for small, manual clean-ups, and it is worth using for that. But it has hard limits that matter at scale:

  • It caps the view at 5,000 records. Beyond that you cannot see the full extent of the problem, let alone clear it.
  • Merges are manual and permanent. You work through pairs by hand, and a merge cannot be cleanly undone.
  • It matches on a fixed basis. It does not correct typo email domains, so a misspelled domain hides a duplicate rather than catching it.
  • It has no safety check on identity. It will let you merge two records that share a detail even when they are clearly two different people.
  • It does nothing going forward. Once you have cleaned up, the next import or form fill starts the problem again.

Why "just add an app" is not automatically the answer

The usual next step is a dedup app. For some teams that is reasonable. But an app is an ongoing subscription you have to learn, configure, run and own the risk of. What most mid-market teams actually want is the duplicate problem solved: the backlog cleared once, correctly, with a guardrail so it stays clear. That is an engineering outcome a partner delivers, not a tool you administer in perpetuity.

The engineered approach to a large backlog

We are a UK HubSpot Diamond partner and a software engineering firm with HubSpot's Custom Integration Accreditation. For a large duplicate backlog we run a supervised, one-time clean-up built to scale past the native cap:

  • It sees the whole database, not a 5,000-record sample, so the figure at the end is the real one.
  • It matches intelligently. Mobile numbers are normalised to a single international format so the same number written five ways still matches, and email matching corrects common typo domains first, so a misspelled domain no longer hides a duplicate.
  • It resolves groups to a single survivor using clear, deterministic rules: a correctly spelled email beats a typo, then the most recently engaged record, then the most recently created.
  • It reports in your terms, including a clean split of B2B and B2C, in a branded run report.

The safety layer that makes scale survivable

Merging at scale is only responsible if it is safe at scale. The core safeguard is the name-gate: a record is only auto-merged when the name agrees too. Where two records share a phone number or email but the names do not match, they are not merged; they are held for a person to review. This prevents the most damaging error in deduplication, fusing two different people who happen to share a handset, which a bulk merge cannot cleanly undo. We go deeper in why merging on phone number alone is dangerous.

Around that sit the rest of the guardrails: a short-name guard so initials cannot drive a fuzzy match, automatic safety checks that abort the run if anything looks inconsistent, a full backup of every affected record before any merge, a dry run that is reviewed before anything is changed, and a complete audit log. Nothing is merged until the dry run is approved.

Companies as well as contacts

The same approach extends to duplicate companies, matched on domain root and reduced to a single surviving record, so your account-level reporting is as clean as your contact-level reporting.

Stopping duplicates coming back

A clean-up that drifts straight back is wasted effort, so the second half of the job is prevention: a go-forward guardrail built as a HubSpot custom-code workflow that catches and resolves new duplicates as contacts arrive. Merging is not a native workflow action, which is why this needs custom-coded workflow actions (see the mechanics here).

Native HubSpot dedup vs an engineered service

Native duplicate managerEngineered deduplication
ScaleCaps the view at 5,000 recordsSees and clears the whole database
MatchingFixed; no typo-domain correctionNormalised mobile and typo-corrected email
SafetyWill let you merge different peopleName-gate holds mismatches for review
ReversibilityPermanent mergesDry-run approval and full pre-merge backup
Going forwardNo preventionCustom-code workflow guardrail
Who runs itYou do, by handDelivered and owned by your partner

Is this right for you?

This is worth doing when duplicates are distorting your reporting and attribution, your database has outgrown the native tool, and you want the problem solved rather than managed by hand. It suits mid-market B2B firms, and the same engineering serves larger data estates.

Want your HubSpot duplicates cleared at scale, safely? See our HubSpot data engineering work, or tell us what you are running and we will scope it. Related: HubSpot data migration and managed RevOps.

FAQ

Can I deduplicate more than 5,000 contacts in HubSpot?
Not with the native duplicate manager, which caps the view at 5,000. An engineered clean-up works across the whole database, however large.

Are HubSpot merges reversible?
No. A HubSpot merge cannot be cleanly undone, which is why a dry run, a full pre-merge backup and a name-gate that holds uncertain matches for review all come first.

Can a HubSpot workflow merge duplicates?
Not natively; merge is not a standard workflow action. A custom-code workflow can do it, which is how go-forward prevention is built.

Will deduplication delete data?
No. Records are merged to a single survivor under clear winner rules, with a backup of every affected record taken first.

John Kelleher

John Kelleher

Author
John is the founder and the Chief Executive at SpotDev.

Stay Updated with Our Latest Insights

Get expert HubSpot tips and integration strategies delivered to your inbox.