Broken Link Checker
Scan all published posts and pages for broken outbound links and manage the results from the admin panel.
TL;DR
Run php spark links:check from the command line (or set up a weekly cron job). View results at Admin → Broken Links. Each result can be individually rechecked or dismissed. URLs that return 200 on recheck are automatically removed.
Details
Running the Scanner
The broken link checker is a Spark command:
php spark links:check
Run this from your site root. Output is logged to the console. For automated weekly runs, add a cron job:
0 3 * * 0 /usr/bin/php /var/www/yoursite/spark links:check >> /dev/null 2>&1
This example runs every Sunday at 3:00 AM server time.
New for 2.3.x: You can run a broken link scan on demand by clicking the Run Scan button on the Broken Links admin page. Previously this was only available via the CLI command
php spark links:check.
What Gets Scanned
The command scans all published posts and pages. For each piece of content it:
- Parses the stored HTML body and extracts every
<a href>URL. - Skips internal links (same domain), anchors (
#),mailto:,tel:,javascript:, anddata:URIs. - Sends an HTTP HEAD request to each external URL with a 10-second timeout.
- Falls back to a GET request if the server responds with
405 Method Not Allowed.
Results Storage
Scan results are stored in the broken_link_results table. Each row contains:
| Column | Description |
|---|---|
source_type |
post or page |
source_id |
ID of the content containing the link |
source_title |
Title of the content (for display) |
url |
The external URL that was checked |
http_status |
HTTP response code, or 0 for connection errors |
error_message |
Curl error message if the request failed |
last_checked_at |
Timestamp of the most recent check |
Admin UI
View all broken link results at Admin → Broken Links. The table shows:
- Source post/page (linked to admin edit)
- Broken URL
- HTTP status code (colour-coded: red for 4xx/5xx, orange for timeouts)
- Last checked timestamp
- Actions: Recheck and Dismiss
Recheck re-sends the HTTP request immediately. If the URL now returns 200 the row is automatically deleted. If it still fails the last_checked_at and http_status are updated.
Dismiss removes the row from the results table without fixing the link. Use this for false positives or links you intend to remove later.
Self-Healing
On each full scan run, URLs that previously failed but now return 200 are automatically removed from the results table. This means the broken link list stays accurate without any manual cleanup after you fix a link in a post.