Broken Link Checker

Scan all published posts and pages for broken outbound links and manage the results from the admin panel.

TL;DR

Run php spark links:check from the command line (or set up a weekly cron job). View results at Admin → Broken Links. Each result can be individually rechecked or dismissed. URLs that return 200 on recheck are automatically removed.

Details

Running the Scanner

The broken link checker is a Spark command:

php spark links:check

Run this from your site root. Output is logged to the console. For automated weekly runs, add a cron job:

0 3 * * 0 /usr/bin/php /var/www/yoursite/spark links:check >> /dev/null 2>&1

This example runs every Sunday at 3:00 AM server time.

New for 2.3.x: You can run a broken link scan on demand by clicking the Run Scan button on the Broken Links admin page. Previously this was only available via the CLI command php spark links:check.

What Gets Scanned

The command scans all published posts and pages. For each piece of content it:

  1. Parses the stored HTML body and extracts every <a href> URL.
  2. Skips internal links (same domain), anchors (#), mailto:, tel:, javascript:, and data: URIs.
  3. Sends an HTTP HEAD request to each external URL with a 10-second timeout.
  4. Falls back to a GET request if the server responds with 405 Method Not Allowed.

Results Storage

Scan results are stored in the broken_link_results table. Each row contains:

Column Description
source_type post or page
source_id ID of the content containing the link
source_title Title of the content (for display)
url The external URL that was checked
http_status HTTP response code, or 0 for connection errors
error_message Curl error message if the request failed
last_checked_at Timestamp of the most recent check

Admin UI

View all broken link results at Admin → Broken Links. The table shows:

  • Source post/page (linked to admin edit)
  • Broken URL
  • HTTP status code (colour-coded: red for 4xx/5xx, orange for timeouts)
  • Last checked timestamp
  • Actions: Recheck and Dismiss

Recheck re-sends the HTTP request immediately. If the URL now returns 200 the row is automatically deleted. If it still fails the last_checked_at and http_status are updated.

Dismiss removes the row from the results table without fixing the link. Use this for false positives or links you intend to remove later.

Self-Healing

On each full scan run, URLs that previously failed but now return 200 are automatically removed from the results table. This means the broken link list stays accurate without any manual cleanup after you fix a link in a post.