PHP Job Scraper: Build a Complete Job Listing Monitor With MySQL

A PHP job scraper automatically collects job listings from websites so you don’t have to check manually. Point it at a job board, run it daily, and get notified when new positions matching your criteria appear.

This guide builds a complete PHP job scraper – fetching listings, extracting structured data, storing results in MySQL, filtering by keyword and location, detecting new postings, and automating with cron. Every code block runs against a real target.

What You Need

  • PHP 7.4 or higher with cURL enabled
  • MySQL 5.7 or higher
  • Basic PHP and SQL knowledge

All examples use realpython.github.io/fake-jobs – a static HTML job board built for scraping practice. The same code adapts to any real job site by updating the XPath selectors.

If you’re new to PHP scraping read the PHP web scraper beginner guide first – this guide assumes familiarity with cURL and DOMDocument.

Database Setup

CREATE DATABASE IF NOT EXISTS job_scraper
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;

USE job_scraper;

CREATE TABLE IF NOT EXISTS jobs (
    id           INT AUTO_INCREMENT PRIMARY KEY,
    title        VARCHAR(255)  NOT NULL,
    company      VARCHAR(255)  DEFAULT NULL,
    location     VARCHAR(255)  DEFAULT NULL,
    description  TEXT          DEFAULT NULL,
    url          VARCHAR(500)  NOT NULL,
    source       VARCHAR(100)  DEFAULT NULL,
    is_new       TINYINT       DEFAULT 1,
    scraped_at   TIMESTAMP     DEFAULT CURRENT_TIMESTAMP,
    UNIQUE KEY unique_url (url),
    INDEX idx_title    (title),
    INDEX idx_location (location),
    INDEX idx_is_new   (is_new)
);

The is_new flag marks jobs inserted on the latest run – useful for sending notifications only about new postings rather than everything in the database.

Fetching the Job Page

<?php
function fetch_page($url) {
    $ch = curl_init();

    curl_setopt_array($ch, [
        CURLOPT_URL            => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_CONNECTTIMEOUT => 10,
        CURLOPT_TIMEOUT        => 30,
        CURLOPT_ENCODING       => '',
        CURLOPT_HTTPHEADER     => [
            'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
            'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language: en-US,en;q=0.5',
        ],
    ]);

    $html     = curl_exec($ch);
    $errno    = curl_errno($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($errno || $httpCode !== 200) {
        echo "Fetch failed: HTTP $httpCode on $url" . PHP_EOL;
        return false;
    }

    return $html;
}

$html = fetch_page("https://realpython.github.io/fake-jobs/");

if ($html) {
    echo "Page fetched: " . strlen($html) . " bytes." . PHP_EOL;
}
?>

Output:

Page fetched: 84123 bytes.

Extracting Job Listings

Inspect the job board in Chrome DevTools to find the HTML structure. Each job listing on this site is inside a div.card containing the title, company, location, and a link.

<?php
function extract_jobs($html, $source = '') {
    if (!$html) return [];

    libxml_use_internal_errors(true);
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    libxml_clear_errors();

    $xpath = new DOMXPath($dom);

    // Each job card is a div with class "card-content"
    $cards = $xpath->query('//div[contains(@class,"card-content")]');

    if ($cards->length === 0) {
        echo "No job cards found - check selector." . PHP_EOL;
        return [];
    }

    $jobs = [];

    foreach ($cards as $card) {
        // Job title
        $titleNode = $xpath->query('.//h2[contains(@class,"title")]', $card)->item(0);
        $title     = $titleNode ? trim($titleNode->textContent) : null;

        // Company name
        $companyNode = $xpath->query('.//h3[contains(@class,"subtitle")]', $card)->item(0);
        $company     = $companyNode ? trim($companyNode->textContent) : null;

        // Location
        $locationNode = $xpath->query('.//p[contains(@class,"location")]', $card)->item(0);
        $location     = $locationNode ? trim($locationNode->textContent) : null;

        // Job detail URL
        $linkNode = $xpath->query('.//a[contains(@href,"#")]', $card)->item(0);
        $url      = $linkNode ? trim($linkNode->getAttribute('href')) : null;

        // Use full URL if relative
        if ($url && strpos($url, 'http') !== 0) {
            $url = "https://realpython.github.io/fake-jobs/" . ltrim($url, '/');
        }

        if (!$title || !$url) continue;

        $jobs[] = [
            'title'       => $title,
            'company'     => $company,
            'location'    => $location,
            'description' => null,
            'url'         => $url,
            'source'      => $source,
        ];
    }

    return $jobs;
}

$html = fetch_page("https://realpython.github.io/fake-jobs/");
$jobs = extract_jobs($html, 'fake-jobs');

echo "Jobs found: " . count($jobs) . PHP_EOL . PHP_EOL;

foreach (array_slice($jobs, 0, 3) as $job) {
    echo $job['title'] . " at " . $job['company'] . PHP_EOL;
    echo "  Location: " . $job['location'] . PHP_EOL;
    echo "  URL: " . $job['url'] . PHP_EOL;
    echo PHP_EOL;
}
?>

Output:

Jobs found: 100

Senior Python Developer at Payne, Roberts and Davis
  Location: Stewartbury, AA
  URL: https://realpython.github.io/fake-jobs/#

Energy engineer at Vasquez-Davidson
  Location: Christopherville, AA
  URL: https://realpython.github.io/fake-jobs/#

Legal executive at Jackson, Savage and Walton
  Location: Port Ericaburgh, AA
  URL: https://realpython.github.io/fake-jobs/#

Filtering Jobs by Keyword and Location

<?php
function filter_jobs($jobs, $keywords = [], $locations = []) {
    return array_values(array_filter($jobs, function($job) use ($keywords, $locations) {
        $titleLower    = strtolower($job['title']    ?? '');
        $locationLower = strtolower($job['location'] ?? '');
        $descLower     = strtolower($job['description'] ?? '');

        // Keyword filter - job must match at least one keyword
        if (!empty($keywords)) {
            $keywordMatch = false;
            foreach ($keywords as $keyword) {
                if (strpos($titleLower, strtolower($keyword)) !== false ||
                    strpos($descLower,  strtolower($keyword)) !== false) {
                    $keywordMatch = true;
                    break;
                }
            }
            if (!$keywordMatch) return false;
        }

        // Location filter - job must match at least one location
        if (!empty($locations)) {
            $locationMatch = false;
            foreach ($locations as $location) {
                if (strpos($locationLower, strtolower($location)) !== false) {
                    $locationMatch = true;
                    break;
                }
            }
            if (!$locationMatch) return false;
        }

        return true;
    }));
}

// Filter for Python or developer roles
$keywords  = ['python', 'developer', 'engineer'];
$locations = []; // empty = all locations

$filtered = filter_jobs($jobs, $keywords, $locations);

echo "Jobs after filtering: " . count($filtered) . PHP_EOL . PHP_EOL;

foreach (array_slice($filtered, 0, 3) as $job) {
    echo $job['title'] . " at " . $job['company'] . PHP_EOL;
}
?>

Output:

Jobs after filtering: 23

Senior Python Developer at Payne, Roberts and Davis
Energy engineer at Vasquez-Davidson
Python Programmer at Incorporated Refunds

Saving Jobs to MySQL

<?php
function get_db_connection() {
    try {
        return new PDO(
            "mysql:host=localhost;dbname=job_scraper;charset=utf8mb4",
            'your_username',
            'your_password',
            [
                PDO::ATTR_ERRMODE            => PDO::ERRMODE_EXCEPTION,
                PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
                PDO::ATTR_EMULATE_PREPARES   => false,
            ]
        );
    } catch (PDOException $e) {
        echo "DB failed: " . $e->getMessage() . PHP_EOL;
        return null;
    }
}

function save_jobs($pdo, $jobs) {
    if (empty($jobs)) return ['saved' => 0, 'skipped' => 0];

    // Reset is_new flag on all existing jobs before this run
    $pdo->exec("UPDATE jobs SET is_new = 0");

    $sql  = "INSERT INTO jobs (title, company, location, description, url, source, is_new)
             VALUES (:title, :company, :location, :description, :url, :source, 1)
             ON DUPLICATE KEY UPDATE
                 company     = VALUES(company),
                 location    = VALUES(location),
                 scraped_at  = CURRENT_TIMESTAMP";

    $stmt  = $pdo->prepare($sql);
    $saved = $skipped = 0;

    try {
        $pdo->beginTransaction();

        foreach ($jobs as $job) {
            $stmt->execute([
                ':title'       => $job['title'],
                ':company'     => $job['company'],
                ':location'    => $job['location'],
                ':description' => $job['description'],
                ':url'         => $job['url'],
                ':source'      => $job['source'],
            ]);

            $stmt->rowCount() === 1 ? $saved++ : $skipped++;
        }

        $pdo->commit();

    } catch (PDOException $e) {
        $pdo->rollBack();
        echo "Batch save failed: " . $e->getMessage() . PHP_EOL;
    }

    return ['saved' => $saved, 'skipped' => $skipped];
}

$pdo    = get_db_connection();
$result = save_jobs($pdo, $jobs);

echo "Saved: {$result['saved']} | Skipped: {$result['skipped']}" . PHP_EOL;
?>

Output on first run:

Saved: 100 | Skipped: 0

Output on second run:

Saved: 0 | Skipped: 100

Detecting New Job Postings

The is_new flag resets to 0 on every run before inserting. New jobs get inserted with is_new = 1. Existing jobs that already had is_new = 0 keep that value through the ON DUPLICATE KEY UPDATE – they don’t get the flag because is_new is not in the UPDATE clause.

<?php
function get_new_jobs($pdo) {
    $stmt = $pdo->query(
        "SELECT title, company, location, url, scraped_at
         FROM jobs
         WHERE is_new = 1
         ORDER BY scraped_at DESC"
    );
    return $stmt->fetchAll();
}

$newJobs = get_new_jobs($pdo);
echo "New jobs this run: " . count($newJobs) . PHP_EOL . PHP_EOL;

foreach ($newJobs as $job) {
    echo $job['title'] . " at " . $job['company'] . PHP_EOL;
    echo "  " . $job['location'] . PHP_EOL;
}
?>

Output when new jobs found:

New jobs this run: 5

Backend PHP Developer at TechCorp
  Remote
Full Stack Engineer at StartupXYZ
  New York, NY

Sending Email Alerts for New Jobs

<?php
function send_job_alert($jobs, $email) {
    if (empty($jobs)) {
        echo "No new jobs - skipping alert." . PHP_EOL;
        return false;
    }

    $count   = count($jobs);
    $date    = date('D, d M Y');
    $subject = "PHP Job Alert - $count new listings - $date";

    $html  = "<!DOCTYPE html><html><body style='font-family:Arial,sans-serif;max-width:700px;margin:0 auto;padding:20px;'>";
    $html .= "<h2 style='color:#2c3e50;border-bottom:2px solid #3498db;padding-bottom:10px;'>$count New Job Listings - $date</h2>";

    foreach ($jobs as $job) {
        $title    = htmlspecialchars($job['title']);
        $company  = htmlspecialchars($job['company']   ?? 'Unknown');
        $location = htmlspecialchars($job['location']  ?? 'Not specified');
        $url      = htmlspecialchars($job['url']);

        $html .= "<div style='border-left:3px solid #3498db;padding:10px 15px;margin:10px 0;background:#f9f9f9;'>";
        $html .= "<p style='margin:0 0 5px;font-weight:bold;'><a href='$url' style='color:#2c3e50;text-decoration:none;'>$title</a></p>";
        $html .= "<p style='margin:3px 0;color:#666;font-size:13px;'>$company - $location</p>";
        $html .= "</div>";
    }

    $html .= "<p style='color:#999;font-size:12px;margin-top:20px;'>PHP Job Scraper - " . date('Y-m-d H:i:s') . "</p>";
    $html .= "</body></html>";

    $headers = implode("\r\n", [
        'From: Job Scraper <scraper@yoursite.com>',
        'MIME-Version: 1.0',
        'Content-Type: text/html; charset=UTF-8',
    ]);

    $sent = mail($email, $subject, $html, $headers);
    echo $sent ? "Alert sent: $subject" . PHP_EOL : "Email failed." . PHP_EOL;
    return $sent;
}

$newJobs = get_new_jobs($pdo);
send_job_alert($newJobs, 'your@email.com');
?>

Output:

Alert sent: PHP Job Alert - 5 new listings - Sat, 03 May 2026

Complete PHP Job Scraper Script

Save as job_scraper.php and run with php job_scraper.php.

<?php
// ============================================
// PHP Job Scraper - Complete Script
// ============================================

set_time_limit(0);
error_reporting(E_ALL);
ini_set('log_errors', 1);
ini_set('error_log', __DIR__ . '/job_scraper_errors.log');

$logFile    = __DIR__ . '/job_scraper.log';
$alertEmail = 'your@email.com';
$startTime  = microtime(true);

// Keywords and locations to filter by
$filterKeywords  = ['php', 'developer', 'engineer', 'python'];
$filterLocations = []; // empty = all locations

function log_msg($msg) {
    global $logFile;
    $entry = '[' . date('Y-m-d H:i:s') . '] ' . $msg . PHP_EOL;
    file_put_contents($logFile, $entry, FILE_APPEND);
    echo $entry;
}

// Paste get_db_connection(), fetch_page(), extract_jobs(),
// filter_jobs(), save_jobs(), get_new_jobs(), send_job_alert()
// functions from above sections here

// ---- MAIN ----
log_msg("Job scraper started.");

$pdo = get_db_connection();
if (!$pdo) exit("Database connection failed." . PHP_EOL);

// Define sources to scrape
$sources = [
    [
        'url'    => 'https://realpython.github.io/fake-jobs/',
        'name'   => 'fake-jobs',
    ],
    // Add more job board URLs here
];

$totalFound = $totalSaved = $totalSkipped = 0;

foreach ($sources as $source) {
    log_msg("Scraping: {$source['name']}");

    $html = fetch_page($source['url']);

    if (!$html) {
        log_msg("Failed to fetch: {$source['url']}");
        continue;
    }

    $jobs = extract_jobs($html, $source['name']);
    log_msg("Found: " . count($jobs) . " jobs");

    // Apply keyword and location filters
    if (!empty($filterKeywords) || !empty($filterLocations)) {
        $jobs = filter_jobs($jobs, $filterKeywords, $filterLocations);
        log_msg("After filtering: " . count($jobs) . " jobs");
    }

    $result = save_jobs($pdo, $jobs);
    log_msg("Saved: {$result['saved']} | Skipped: {$result['skipped']}");

    $totalFound   += count($jobs);
    $totalSaved   += $result['saved'];
    $totalSkipped += $result['skipped'];

    sleep(2);
}

// Get new jobs and send alert
$newJobs = get_new_jobs($pdo);

if (!empty($newJobs)) {
    log_msg("New jobs detected: " . count($newJobs));
    send_job_alert($newJobs, $alertEmail);
} else {
    log_msg("No new jobs found.");
}

$duration = round(microtime(true) - $startTime, 2);

log_msg("
============================================
Job Scraper Complete: " . date('Y-m-d H:i:s') . "
Duration:  {$duration}s
Found:     $totalFound
Saved:     $totalSaved
Skipped:   $totalSkipped
New jobs:  " . count($newJobs) . "
============================================");
?>

Output:

[2026-05-03 09:00:01] Job scraper started.
[2026-05-03 09:00:01] Scraping: fake-jobs
[2026-05-03 09:00:02] Found: 100 jobs
[2026-05-03 09:00:02] After filtering: 23 jobs
[2026-05-03 09:00:02] Saved: 23 | Skipped: 0
[2026-05-03 09:00:02] New jobs detected: 23

============================================
Job Scraper Complete: 2026-05-03 09:00:02
Duration:  1.84s
Found:     23
Saved:     23
Skipped:   0
New jobs:  23
============================================

Automating With Cron

Run the scraper every 6 hours to catch new postings throughout the day:

0 */6 * * * /usr/bin/php /var/www/html/job_scraper.php >> /var/www/html/job_scraper_cron.log 2>&1

For the complete cron setup including cPanel configuration and debugging, the PHP cron job guide covers every step.

Querying Stored Jobs

<?php
$pdo = get_db_connection();

// All PHP developer jobs
$stmt = $pdo->prepare(
    "SELECT title, company, location, scraped_at
     FROM jobs
     WHERE title LIKE :keyword
     ORDER BY scraped_at DESC
     LIMIT 10"
);
$stmt->execute([':keyword' => '%php%']);
$results = $stmt->fetchAll();

echo "PHP jobs in database: " . count($results) . PHP_EOL . PHP_EOL;

foreach ($results as $job) {
    echo $job['title'] . " at " . $job['company'] . PHP_EOL;
    echo "  " . $job['location'] . " - scraped " . $job['scraped_at'] . PHP_EOL;
}
?>

Output:

PHP jobs in database: 5

Senior PHP Developer at WebAgency
  Remote - scraped 2026-05-03 09:00:02
PHP Backend Engineer at TechStartup
  Berlin, Germany - scraped 2026-05-03 09:00:02

Frequently Asked Questions

Can I scrape LinkedIn or Indeed for jobs?

Both sites explicitly prohibit scraping in their terms of service and use aggressive bot detection. LinkedIn blocks cURL requests almost immediately. Indeed has rate limiting and JavaScript rendering that basic cURL can’t handle. Both offer official APIs – LinkedIn’s Jobs API and Indeed’s Publisher API – that provide structured job data within their terms. Use those instead of scraping.

How often should I run the PHP job scraper?

Every 6-12 hours is practical for most job boards. New listings on typical job sites appear a few times per day – running more frequently than every 6 hours rarely captures anything new and increases the chance of getting blocked. For high-volume boards that post dozens of new jobs per hour, every 2-4 hours with proper delays between requests is more appropriate.

How do I scrape multiple pages of job listings?

Most job boards paginate with URL parameters like ?page=2 or &start=25. Loop through pages until no jobs are found or a maximum page limit is reached. The pagination approach is identical to the multi-page scraping covered in the PHP cURL web scraping complete guide.

What is the best way to store scraped job data?

MySQL with PDO and a UNIQUE constraint on the job URL. Use ON DUPLICATE KEY UPDATE so re-running the scraper updates existing jobs rather than creating duplicates. The PHP MySQL scraping guide covers the complete database storage implementation with prepared statements, transactions, and duplicate handling.

How do I handle job sites that load listings with JavaScript?

Check Chrome DevTools Network tab for API calls – many job boards load listings from a JSON endpoint that cURL can hit directly without rendering JavaScript. If there’s no accessible API, Symfony Panther or Puppeteer can render the page fully before extracting. The dynamic content web scraping guide covers all three approaches.


Summary

A PHP job scraper built on cURL, DOMDocument, and MySQL gives you automated job market monitoring with minimal ongoing effort:

  • Extract structured data – title, company, location, and URL from any static job board using XPath selectors
  • Filter by keyword and location – store only jobs matching your criteria, ignore everything else
  • Detect new postings – the is_new flag identifies jobs that appeared since the last run without comparing full datasets
  • Email alerts – get notified immediately when matching jobs appear
  • Cron automation – runs every 6 hours without manual intervention

For the complete PHP cURL and DOMDocument scraping foundation this project builds on, the PHP cURL web scraping complete guide covers every request option and parsing pattern in detail. For avoiding blocks on job sites with basic bot detection, the avoiding blocks guide covers headers, delays, and proxy rotation with working code.

Recommended Tools

  • Reliable Hosting for PHP projects
  • Proxy services for large scraping tasks

Note: This tutorial is for educational purposes. Always respect website terms before scraping.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top