Manual SEO audits are a time sink. A full technical and content audit on a 200-page site eats 10 to 20 hours of analyst time — clicking through Screaming Frog exports, cross-referencing Google Search Console, and building spreadsheets nobody reads twice.
We built something different. Our SEO audit AI agents run as a swarm of 6 Python scripts, each handling one audit dimension. An orchestrator coordinates them. The full suite finishes in about 15 minutes.
This post walks through every script in the toolkit: what it checks, what it outputs, and how to run the whole swarm on your own site. If you’ve read our full-stack AI SEO breakdown, this is the audit layer in action.
What Our Audit Swarm Does (Overview Table)
Here’s the full toolkit at a glance. Six scripts, one unified report.
| Script | Purpose | Input | Output | Run Time |
|---|---|---|---|---|
06_seo_audit_swarm.py | Orchestrator | Site URL | Combined report | 12–15 min |
audit_onpage.py | On-page SEO | URL list | Issue scores | 3–5 min |
smart_interlinker.py | Internal links | Content + URLs | Link suggestions | 2–3 min |
content_freshness_auditor.py | Stale content | Published dates | Freshness report | 1–2 min |
orphan_audit.py | Orphan pages | Sitemap + links | Orphan list | 2–3 min |
schema_injector.py | Schema markup | Post content | JSON-LD snippets | 1–2 min |
Each script works independently. You can run any single one on its own. But the orchestrator is where the real value sits — it sequences them, merges results, and flags the highest-priority fixes.
Script 1 — SEO Audit Swarm (Orchestrator)
06_seo_audit_swarm.py is the control center. It accepts a site URL, triggers each audit script in sequence, and collects everything into one JSON report.
What it does:
- Accepts a root URL and optional flags (e.g.,
--parallel,--skip-schema) - Crawls the sitemap to build a URL inventory
- Passes URL lists and content data to each child script
- Merges all outputs into
audit_report.json - Sends a Telegram summary (or prints to terminal if no bot token is configured)
- Logs errors per-script so one failure doesn’t kill the whole run
RUNNING THE ORCHESTRATOR
python 06_seo_audit_swarm.py --url https://yoursite.com # With parallel execution (faster, higher memory) python 06_seo_audit_swarm.py --url https://yoursite.com --parallel # Skip specific scripts python 06_seo_audit_swarm.py --url https://yoursite.com --skip-schema
The orchestrator handles retries. If the on-page audit fails mid-crawl (timeout, rate limit), it logs the failure and moves on to the next script. You don’t lose the entire run because one URL returned a 503.
💡 Pro Tip
Set up a cron job to run the orchestrator weekly. Pipe the Telegram notification to a dedicated SEO-alerts channel so your team sees audit results without logging into any dashboard.
Script 2 — On-Page Analysis (audit_onpage.py)
This is the most granular script in the swarm. It pulls each URL and runs a checklist of on-page SEO factors.
What it checks:
- Title tags — Length (50–60 chars), keyword presence, uniqueness across the site
- Meta descriptions — Length (150–160 chars), duplication, missing entries
- Heading hierarchy — H1 presence, multiple H1s, skipped heading levels (H2 → H4)
- Keyword density — Focus keyword in first 100 words, overall density, stuffing flags
- Image alt text — Missing alt attributes, generic alt text (“image1.png”), oversized images without compression
- Page speed indicators — Render-blocking resources, image file sizes, total page weight
Each page gets scored 0–100. Scores below 60 are flagged as critical. The output is a CSV with one row per URL and columns for every check.
Critical issues auto-flagged:
- ❌ Missing H1 tag
- ❌ Duplicate title tags across multiple pages
- ❌ Empty meta descriptions
- ❌ Images over 500KB without alt text
- ❌ Pages exceeding 3MB total weight
💡 Pro Tip
Run the on-page audit after every batch publish. Catching missing alt text across 12 new posts is faster than finding and fixing them individually three weeks later.
Script 3 — Smart Internal Linking (smart_interlinker.py)
Internal links are one of the most underused ranking signals. Most sites have a messy link structure because nobody goes back to add links after publishing.
smart_interlinker.py fixes that automatically.
How it works:
- Pulls all published content via the WordPress REST API
- Generates embeddings for each post using a local model (or OpenAI if configured)
- Calculates semantic similarity between every post pair
- Filters for relevance scores above 0.7
- Generates anchor text suggestions based on the target post’s focus keyword
- Outputs a CSV: source URL, anchor text, target URL, relevance score
Prioritization logic:
- Orphan pages first — Pages with zero incoming links get top priority
- Pillar pages second — Hub pages need the most incoming links to signal authority
- Fresh content third — New posts need quick integration into the link graph
RANKING FACTOR
Top 5
Internal links rank among Google’s top 5 most impactful on-site ranking signals — Moz
The output CSV looks like this:
SAMPLE OUTPUT — smart_interlinker.py
source_url,anchor_text,target_url,relevance_score /ai-keyword-research-guide/,"SEO audit AI agents",/seo-audit-swarm-ai-agents-toolkit/,0.87 /content-pipeline-automated/,"internal linking strategy",/smart-internal-linking-guide/,0.82 /batch-publish-workflow/,"orphan page detection",/orphan-page-audit-fix/,0.79
You can review the suggestions manually or feed them directly into a WordPress bulk-editor plugin.
Script 4 — Content Freshness Auditor (content_freshness_auditor.py)
Old content decays. Posts that ranked 6 months ago with accurate data start slipping when competitors publish updated versions.
This script catches staleness before it costs you traffic.
What it does:
- Pulls
published_dateandmodified_datefor every post via the WordPress API - Flags anything not updated in the last 6 months
- Cross-references with Google Search Console data (if API key is configured) to pull traffic numbers
- Sorts the output: high-traffic stale pages appear first
Output columns:
- URL
- Last modified date
- Days since update
- Monthly organic sessions (if GSC is connected)
- Priority level (Critical / High / Medium / Low)
⚠️ Warning
Stale content loses rankings. Google’s Helpful Content system explicitly rewards freshness. A post from 6 months ago with outdated statistics, broken links, or superseded advice is actively dragging your site’s quality score down.
The freshness auditor pairs well with the orchestrator’s Telegram notifications. You’ll get a weekly ping: “12 posts haven’t been updated in 180+ days. Top 3 by traffic: [URL1], [URL2], [URL3].”
Script 5 — Orphan Page Finder (orphan_audit.py)
An orphan page has zero incoming internal links. Google can still find it via the sitemap, but it gets almost no crawl priority. Users won’t find it through navigation.
orphan_audit.py catches these invisible pages.
How it works:
- Fetches your XML sitemap to get all indexed URLs
- Crawls your site to build a complete internal link graph
- Compares the two lists
- Any URL in the sitemap but absent from the link graph is flagged as orphaned
INDUSTRY AVERAGE
10–20%
Percentage of pages on the average website that are orphaned — Ahrefs Site Audit data
On a 200-page site, that’s 20 to 40 pages getting almost no organic visibility. That’s wasted content investment.
Output includes:
- Orphan URL
- Page title
- Published date
- Word count
- Suggested linking targets (if
smart_interlinker.pyhas run first)
💡 Pro Tip
We found 14 orphan pages after our first batch publish of 50 posts. Running smart_interlinker.py once generated link suggestions for all 14. The fix took 20 minutes instead of a manual afternoon.
Script 6 — Schema Markup Injector (schema_injector.py)
Structured data helps search engines classify your content. It also qualifies pages for rich snippets — FAQ dropdowns, how-to steps, breadcrumb trails in search results.
schema_injector.py reads your posts and generates the right JSON-LD markup.
Supported schema types:
- Article — Default for all blog posts. Includes headline, author, datePublished, dateModified
- FAQPage — Automatically detected when a post contains an FAQ section with Q&A formatting
- HowTo — Triggered by numbered step-by-step sections
- BreadcrumbList — Generated from your site’s URL hierarchy and category structure
Two output modes:
--output file— Writes JSON-LD to a.jsonfile per post for manual review--output inject— Pushes the schema directly to WordPress via the REST API (requires auth)
SAMPLE JSON-LD OUTPUT
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Building an SEO Audit Swarm with AI Agents",
"author": {
"@type": "Organization",
"name": "DesignCopy Editorial"
},
"datePublished": "2026-03-02",
"dateModified": "2026-03-02"
}Schema markup doesn’t directly boost rankings. But it increases click-through rates by making your search listings more visible and informative. FAQ schema alone can double the vertical space your listing occupies on page one.
How They Work Together (Architecture)
The real power isn’t in any single script. It’s in the pipeline.
Here’s the execution flow when you run the orchestrator:
- Crawl phase —
06_seo_audit_swarm.pyfetches your sitemap and builds a URL inventory - On-page audit —
audit_onpage.pyscores every page for technical SEO issues - Freshness check —
content_freshness_auditor.pyflags stale content by age and traffic - Orphan detection —
orphan_audit.pycompares sitemap URLs against the crawled link graph - Internal linking —
smart_interlinker.pygenerates link suggestions, prioritizing orphans and pillars - Schema generation —
schema_injector.pycreates JSON-LD for any page missing structured data - Report merge — All results collected into
audit_report.json - Notification — Summary pushed to Telegram, Slack, or terminal output
Steps 5 and 6 aren’t just diagnostic — they’re prescriptive. The interlinker and schema injector produce actionable fixes, not just reports.
“The best audit is the one that fixes things automatically. Reports that sit in a Google Drive folder don’t move rankings. Scripts that generate patches and push them upstream — that’s operational SEO.”
— DesignCopy Engineering Team
Want the Full AI SEO Stack?
This audit swarm is one layer of our complete operation. Read the full-stack breakdown to see how content generation, publishing, and monitoring all connect.
Running an Audit (Step-by-Step)
Here’s how to get the swarm running on your own site.
Step 1: Clone the repository
CLONE & SETUP
git clone https://github.com/designcopy/seo-audit-swarm.git cd seo-audit-swarm
Step 2: Install dependencies
INSTALL REQUIREMENTS
pip install -r requirements.txt
Key dependencies: requests, beautifulsoup4, sentence-transformers, python-wordpress-xmlrpc, pandas
Step 3: Configure your site
Edit site_config.yaml with your WordPress URL, API credentials, and optional integrations:
site_config.yaml
site_url: "https://yoursite.com" wp_user: "your-username" wp_app_password: "xxxx-xxxx-xxxx-xxxx" telegram_bot_token: "" # Optional telegram_chat_id: "" # Optional gsc_credentials: "" # Optional — path to GSC JSON key
Step 4: Run the audit
EXECUTE
python 06_seo_audit_swarm.py --url https://yoursite.com
Step 5: Review the report
Open audit_report.json in your editor. Each section maps to one script’s output. Look for the critical_issues array first — those are the highest-impact fixes.
Step 6: Auto-fix with the interlinker
If orphan pages were found, run the interlinker standalone to generate fix suggestions:
AUTO-FIX ORPHANS
python smart_interlinker.py --priority orphans --output suggestions.csv
☑ Quick-Start Checklist
- ☐ Clone the repository
- ☐ Run
pip install -r requirements.txt - ☐ Add WordPress credentials to
site_config.yaml - ☐ Run the orchestrator with your site URL
- ☐ Review
audit_report.json— focus on critical issues first - ☐ Run smart_interlinker for orphan page fixes
- ☐ Schedule a weekly cron job for recurring audits
FAQ
How often should I run an SEO audit?
Weekly for active sites publishing 5+ posts per month. Biweekly for sites publishing less frequently. The orchestrator’s Telegram integration makes weekly runs painless — set it and forget it.
Can I use these scripts on any WordPress site?
Yes. The scripts use the WordPress REST API, which is enabled by default on all WordPress installations running version 4.7+. You’ll need an application password for authenticated endpoints (schema injection, content pulling).
Do the scripts need API keys?
Only optionally. The core audit runs without any external API. If you want Google Search Console traffic data in the freshness report, you’ll need a GSC API credential. If you want OpenAI-powered embeddings for the interlinker (instead of local sentence-transformers), you’ll need an OpenAI key.
What’s the difference between this and Semrush/Ahrefs audits?
Semrush and Ahrefs are great for competitive analysis and backlink data. This toolkit focuses on operational fixes — things you can patch programmatically. The interlinker doesn’t just report missing links; it generates the exact anchor text and target URL. The schema injector doesn’t just flag missing markup; it writes the JSON-LD. These tools produce patches, not just dashboards.
Are these scripts open-source?
Yes. MIT licensed. Fork them, modify them, use them commercially. The repo includes documentation for each script and example outputs.
🔎 Key Takeaways
- SEO audit AI agents reduce a 10–20 hour manual audit to a 15-minute automated run
- Six scripts cover on-page analysis, internal linking, content freshness, orphan detection, and schema markup
- The orchestrator coordinates all scripts and produces a unified JSON report
- The interlinker and schema injector don’t just report problems — they generate fixes
- Weekly automated audits catch issues before they cost you rankings
- The entire toolkit is open-source Python — no SaaS subscriptions required
See the Full AI SEO Operation
This audit swarm is one piece of the stack. Read our pillar post to see how content generation, batch publishing, and SEO monitoring all fit together.
What to Read Next
- Pillar post: AI SEO Operation: Full-Stack Breakdown — the complete system architecture
- Content pipeline: AI SEO Content Pipeline, Automated — how we generate and publish at scale
- Hub page: AI-Powered SEO — all posts in this cluster
Get the Toolkit
Clone the repo, configure your site, and run your first audit in under 5 minutes. All 6 scripts are MIT licensed and ready to use.
