$ man deduplication

GTM · Automation & Scripts

Deduplication

Checking for and removing duplicate records before processing. If a company was already enriched in a previous run, skip it. If a contact appears twice with slightly different names, merge them.

by Shawn Tenam


为什么重要

duplicate records waste API credits, create confusion in CRMs, and inflate your pipeline numbers. if you run an enrichment script twice without dedup, you get duplicate rows. if you import contacts to HubSpot without dedup, you get duplicate contact objects. if you send emails to duplicates, the same person gets two identical messages and marks you as spam. deduplication is a gate that should exist at every handoff point in the pipeline.

我怎么用它

I build dedup into every script. at the start of a batch run, I load the existing output CSV and build a set of already-processed domains or emails. before each API call, I check: is this domain already in the set? if yes, skip. if no, process and add to the set. this means I can re-run scripts safely — if a run fails at record 40 of 73, I restart and it picks up at record 41 automatically. for CRM dedup, Clay has Sculptor (AI-powered fuzzy matching) that catches "Microsoft" vs "Microsoft Corporation" vs "MSFT." for contact-level dedup, I use email as the primary key — if the email already exists in HubSpot, update the record instead of creating a new one.


相关术语
Batch ProcessingSculptorValidationEnrichment Pipeline
GTM 知识库指南所有术语
ShawnOS.ai|theGTMOS.ai|theContentOS.ai
built with Next.js · Tailwind · Claude · Remotion