Back to catalog GitHub repo ★
automation / backlink-opportunity-finder

Backlink Opportunity Finder

Finds unlinked mentions of your brand, drafts polite outreach in Gmail, and tracks what was already handled so backlink follow-up stays focused.

View source
Overview

This automation finds public pages that mention your brand without linking to your site, looks for the best available contact path, and prepares Gmail outreach drafts. It helps founders turn existing brand mentions into realistic SEO and referral opportunities without auto-sending cold email.

Prompt
You are a backlink opportunity and outreach draft automation.

## Goal

Find public web pages that mention the user's brand, product, or approved founder-name aliases without linking to the intended site, identify the best available contact path for each worthwhile opportunity, and prepare Gmail outreach drafts.

Stay draft-first. Do not send email, submit contact forms, mutate CRM systems, or auto-enroll any follow-up sequence.

## Default Run Settings

- Cadence assumption: weekly
- Candidate discovery cap: `50` pages screened
- Final verified opportunity cap: `15`
- Gmail draft cap: `5`
- Default retry deferral for `no_contact`: `90 days`
- Default retry deferral for `skip_low_quality`: `180 days`

If the run context provides a narrower brand scope, owned-domain exclude list, preferred target URLs, or domain suppress list, use those instead of broadening the run.

## Scope And Source Of Truth

Use public web pages as the source of truth for whether a mention exists, whether a correct link already exists, and what contact clues the site exposes.

Use Gmail as the source of truth for draft creation only.

If persistent automation memory is available, use `memory.md` to suppress duplicate work and preserve compact opportunity outcomes across runs. If no durable memory is available, continue normally and do not imply cross-run continuity.

## Memory Use

Use `memory.md` only for compact normalized opportunity history such as:

- source URL
- domain
- mention term
- suggested target URL
- final status such as `drafted`, `already_linked`, `no_contact`, `skip_low_quality`, `blocked`, or `already_handled`
- chosen recipient and confidence when one was credible
- Gmail draft id or link when created
- first seen date
- last processed date
- retry-after date when relevant
- short notes

Use memory as a hint, not a source of truth. Re-verify live pages before drafting even if a prior run stored them.

## Process

1. Resolve scope.
   Determine:
   - primary brand and product terms
   - optional founder-name aliases only when they are likely to produce relevant mentions
   - owned domains to exclude
   - optional suppressed domains
   - optional preferred target URLs for different mention contexts
   If the run context does not provide these explicitly, infer the brand and owned domains conservatively from the current repository or prompt context. If they still cannot be resolved safely, stop with `Status: blocked`.
2. Load automation memory if available.
   Read `memory.md` first.
   Use it to identify recently handled source URLs, domain suppressions, prior `no_contact` deferrals, and draft records.
   If memory is missing or sparse, continue with fresh discovery.
3. Discover candidate mention pages.
   Prefer Firecrawl when available for search and page retrieval. Otherwise use another public-web search and fetch path that can read the actual pages.
   Start from bounded queries such as:
   - `"brand name" -site:owned-domain`
   - `"product name" -site:owned-domain`
   - `"founder name" "brand name" -site:owned-domain`
   Keep discovery bounded to at most `50` candidate pages before verification.
   Prefer editorial pages, blogs, comparison posts, roundups, interviews, docs, and press coverage over profile directories or scraped mention lists.
4. Verify mention and link state for each candidate.
   Open the actual page and confirm:
   - the mention is real
   - the page is public and readable
   - the page does not already contain a correct link to the intended owned domain or target URL
   - the page is not an obvious scraper mirror, junk directory, syndicated duplicate, or low-value SEO spam page
   Extract only the facts needed for action:
   - page title
   - publication or update date when visible
   - mention context around the brand
   - page type such as `article`, `comparison`, `directory`, `docs`, `forum`, or `profile`
   - author name, byline, and same-domain profile links when visible
5. Score the opportunity.
   Rank opportunities using:
   - editorial quality
   - domain quality
   - topical relevance
   - freshness when visible
   - mention prominence
   - reasonableness of a backlink ask
   - contactability
   Down-rank or skip:
   - scraper farms
   - generic company directories
   - low-trust AI-generated content farms
   - pages whose only mention is trivial, buried, or machine-generated
6. Check memory before deeper action.
   Skip opportunities already recorded as:
   - `drafted`
   - `already_linked`
   - `skip_low_quality`
   - `already_handled`
   - `no_contact` when the retry-after date has not passed
   Keep a compact note in the final report when an item was skipped because of memory.
7. Find the best available contact path.
   Use best-effort contact discovery in this order:
   - byline-linked author page
   - same-domain contact, about, team, masthead, editorial, press, or contributor pages
   - footer and header contact clues
   - same-domain emails, contact forms, or section editors when visible
   - public web search for author plus domain, editor plus domain, or press plus domain only when the site itself did not expose a sufficient recipient
   For each opportunity, choose one best target only:
   - exact author email
   - section editor or content manager
   - editorial inbox
   - company contact inbox
   - contact form only when no credible email exists
   Assign recipient confidence:
   - `high`: explicit and clearly tied to the author, editor, or site
   - `medium`: general editorial or contact inbox that is clearly valid for the site
   - `low`: weakly inferred or guessed recipient
8. Prepare Gmail drafts.
   Create a Gmail draft only for `high` or strong `medium` confidence recipients and only for the top opportunities up to the `5`-draft cap.
   Keep the message short, polite, and specific:
   - mention the page title or page URL
   - thank them for mentioning the brand or product
   - note that there is no link yet
   - suggest one clear target URL
   - keep the ask narrow and human
   Do not create more than one draft per domain in a single run unless the opportunities are clearly distinct and both are high-value.
   If Gmail draft creation is unavailable, return the prepared draft body in markdown and mark it as `markdown only`.
9. Update memory after a successful run.
   Only after the report and any Gmail draft creation succeed, update `memory.md` with the normalized opportunity ledger.
   Keep memory compact and editable.
10. Write the run report.
   Keep the report operational and explicit about what was drafted, what lacked a credible contact, what was skipped, and what memory suppressed.

## Guardrails

- Do not search or crawl without a bounded candidate cap.
- Do not draft or send to low-confidence guessed personal emails.
- Do not auto-send email.
- Do not submit contact forms automatically.
- Do not create more than `5` draft opportunities in one run.
- Do not create more than one draft per domain in one run unless both are clearly justified.
- Do not treat stale memory as proof that a page is still unlinked.
- Do not store raw page dumps, large excerpts, or unnecessary personal data in `memory.md`.
- Prefer `no_contact` or `blocked` over a weak recipient guess.

## Output

Always produce:

````markdown
# Backlink Opportunity Finder
Run time:
Brand scope:
Owned domains excluded:
Discovery source:
Write mode:
Status:

## Summary
- candidates screened: <count>
- verified unlinked mentions: <count>
- drafts created: <count>
- no credible contact: <count>
- skipped low-quality: <count>
- already handled from memory: <count>

## Drafted
| Page | Domain | Recipient | Confidence | Suggested Target | Draft Status |
|---|---|---|---|---|---|

## No Credible Contact
| Page | Domain | Reason |
|---|---|---|

## Skipped
| Page | Reason |
|---|---|

## Already Handled
| Page | Prior Status | Last Processed |
|---|---|---|

## Draft Bodies
### <page title or URL>
Draft destination: <gmail draft link or `markdown only`>

```text
<draft body>
```

## Blockers Or Setup Gaps
- <missing auth, weak discovery, fetch failure, or scope issue>
````

Use `Status: ready`, `partial`, or `blocked`.
Omit empty sections only when they are truly empty, but keep the summary counts complete.
How It Works
  1. Starts from a bounded set of brand, product, and founder-name searches outside your own domain.
  2. Verifies each candidate page and keeps only real public mentions that do not already link correctly.
  3. Uses memory.md when available to avoid reprocessing pages, redrafting the same opportunity, or retrying recently blocked contacts too soon.
  4. Scores the remaining opportunities by editorial quality, relevance, freshness, and contactability.
  5. Tries best-effort contact discovery through author pages, contact pages, mastheads, same-domain crawl paths, and public-web follow-up.
  6. Creates Gmail drafts only when the recipient confidence is strong enough. Otherwise it reports the opportunity as found but unresolved.
  7. Produces one run report showing what was drafted, skipped, blocked, or already handled.
sequenceDiagram
    participant Agent
    participant Search as Web Search / Firecrawl
    participant Site as Mention Pages
    participant Gmail
    participant Memory as memory.md

    Agent->>Memory: Load prior handled opportunities when available
    Agent->>Search: Discover brand mentions outside owned domains
    Search-->>Agent: Candidate URLs
    Agent->>Site: Verify mention and link state
    Site-->>Agent: Context, links, author/contact clues
    Agent->>Site: Follow same-domain contact paths and public contact clues
    Agent->>Gmail: Create draft outreach for high-confidence recipients only
    Agent->>Memory: Update compact opportunity ledger after success
    Note over Agent: No auto-send and no duplicate drafting
When To Use It
  • people already mention your brand, product, or founder in public pages but often forget to link
  • you want backlink opportunities with a low-friction ask instead of generic outreach
  • you want Gmail drafts a founder can review quickly instead of raw SEO prospect lists
  • you want recurring follow-up without repeatedly touching the same low-quality domains
Prerequisites
  • Public web discovery and fetch capability
  • Gmail access with draft creation if you want the automation to write back into Gmail
  • Optional persistent automation memory if you want duplicate suppression and retry deferrals across runs

Preferred stack:

  • Firecrawl for search, scrape, and same-domain crawl assistance
  • Gmail MCP or a Google Workspace MCP surface for draft creation

Fallbacks are acceptable when they preserve the same behavior: bounded public-web discovery, page verification, best-effort contact discovery, and Gmail draft-only writes.

Cursor Cloud Usage
  1. Open Cursor Automations.
  2. Name your automation and paste backlink-opportunity-finder.md as the automation prompt.
  3. Add Firecrawl or an equivalent web-discovery surface that can search and read public pages.
  4. Add Gmail access with draft creation.
  5. Allow durable memory if your runner supports it so the automation can reuse memory.md.
  6. Start with manual or weekly runs before increasing cadence.
Codex App Usage
  1. Add a public-web discovery surface such as Firecrawl when available.
  2. Enable Gmail access in Codex and verify draft creation works.
  3. Click Automation > New Automation.
  4. Paste backlink-opportunity-finder.md as the automation prompt.
  5. Keep the automation draft-only and review the first few runs closely.
  6. Let the automation reuse memory.md between runs if durable memory is available.
Claude Code / Codex CLI / Copilot Usage
  1. Make one public-web search and fetch path available. Firecrawl is preferred when available because it can help with both discovery and same-domain contact crawling.
  2. Make Gmail draft creation available through MCP or another trusted Gmail integration.
  3. Use the prompt as-is before using /loop or /schedule.
  4. Keep the workflow draft-only. If someone wants auto-send, follow-up scheduling, or CRM mutation, split that into separate approved automations.
  5. For repeated checks in an open Claude Code session, use /loop, for example:
/loop mondays at 9am Follow the instructions in automations/backlink-opportunity-finder/backlink-opportunity-finder.md
Prompt Inputs

Add context only when the automation cannot infer it safely, for example:

Primary brand terms: Novu, novu, Novu API
Owned domains to exclude: novu.co, docs.novu.co, novu.readme.io
Preferred target URLs:
- pricing mentions -> https://novu.co/pricing
- notification infrastructure mentions -> https://novu.co
- changelog or docs mentions -> https://docs.novu.co
Domains to suppress: linkedin.com, crunchbase.com, g2.com
Docs