7 Best Web Scraping Tools PHP in 2026: Free and Paid Compared

Choosing the best web scraping tools PHP has available depends entirely on what you’re building. PHP cURL handles most scraping jobs on its own. Add DOMDocument for parsing and you can scrape the majority of static websites without installing anything extra. But as projects grow – larger volumes, JavaScript sites, block avoidance, proxy rotation – the built-in tools start showing their limits.

This guide covers every PHP web scraping tool worth knowing in 2026 – free libraries, paid services, and headless browser options. Each one includes working code, honest pros and cons, pricing where relevant, and clear guidance on when it’s the right choice.

Best Web Scraping Tools PHP: Quick Comparison

Tool	Type	Best For	Cost	JS Support
PHP cURL + DOMDocument	Built-in	Static sites, full control	Free	No
Guzzle	Library	HTTP requests in Laravel/Composer projects	Free	No
Goutte	Library	Simple scraping with CSS selectors	Free	No
Symfony DomCrawler	Library	HTML parsing with CSS and XPath	Free	No
Symfony Panther	Library	JavaScript-rendered pages in PHP	Free	Yes
ScraperAPI	Service	Avoiding blocks at scale	From $49/mo	Optional
Bright Data	Service	Enterprise scraping, large proxy pool	Custom pricing	Yes

What You Need Before Choosing a Tool

Answer these three questions first – they determine which tool is right before you write a single line of code:

Is the target site static or JavaScript-rendered? – Static HTML: any free library works. JavaScript content: you need Panther, Puppeteer, or a paid service with rendering support.
What volume are you scraping? – Under 1000 pages per day: free tools are fine. Over that threshold: you’ll hit IP blocks and need proxies or a paid service.
Are you using a framework? – Laravel or Symfony project: Guzzle or DomCrawler fit naturally. Standalone PHP script: cURL with DOMDocument is the simplest choice.

The free tools covered in this guide handle the vast majority of real scraping projects. The paid services are worth considering only when volume, blocks, or JavaScript rendering become problems you can’t solve cheaply.

1. PHP cURL + DOMDocument (Free – Built-in)

cURL handles HTTP requests. DOMDocument parses the HTML response. Together they cover everything you need to scrape static websites – no installation, no dependencies, no composer packages. They’re built into PHP and available on every hosting environment.

This is the right starting point for anyone new to PHP scraping, and the right tool for most production scraping projects that target static HTML sites.

Basic Usage

<?php
// Fetch the page
$ch = curl_init();

curl_setopt_array($ch, [
    CURLOPT_URL            => "https://books.toscrape.com/",
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_CONNECTTIMEOUT => 10,
    CURLOPT_TIMEOUT        => 30,
    CURLOPT_ENCODING       => '',
    CURLOPT_HTTPHEADER     => [
        'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language: en-US,en;q=0.5',
    ],
]);

$html     = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

if ($httpCode !== 200 || !$html) {
    exit("Request failed: HTTP $httpCode" . PHP_EOL);
}

// Parse with DOMDocument
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_clear_errors();

$xpath = new DOMXPath($dom);
$books = $xpath->query('//article[contains(@class,"product_pod")]');

foreach ($books as $book) {
    $titleNode = $xpath->query('.//h3/a', $book)->item(0);
    $priceNode = $xpath->query('.//*[contains(@class,"price_color")]', $book)->item(0);

    $title = $titleNode ? $titleNode->getAttribute('title') : 'N/A';
    $price = $priceNode ? trim($priceNode->textContent)     : 'N/A';

    echo "$title - $price" . PHP_EOL;
}
?>

Output:

A Light in the Attic - £51.77
Tipping the Velvet - £53.74
Soumission - £50.10
Sharp Objects - £47.82
...

Pros

Zero installation – works on every PHP environment out of the box
Full control over every request option – headers, timeouts, cookies, proxies, redirects
No dependencies to maintain or update
Fast – no framework overhead
Handles POST requests, file uploads, cookie sessions, and authentication

Cons

More verbose than library alternatives – setting up each request takes more code
No JavaScript execution – cannot scrape content loaded after page render
XPath syntax has a learning curve compared to CSS selectors
No built-in retry logic or rate limiting – you write that yourself

When to Use It

Use cURL with DOMDocument when you’re scraping static HTML sites, when you need full control over the request, or when you’re working on a server without Composer. It’s the most flexible option and handles everything from simple one-page scrapers to complex multi-page jobs with session handling and proxy rotation.

For a complete working implementation with error handling, retry logic, pagination, and MySQL storage, the PHP cURL web scraping complete guide covers every detail.

Cost

Free. Built into PHP.

2. Guzzle (Free – Composer)

Guzzle is a PHP HTTP client library that makes sending requests cleaner and more readable than raw cURL. It handles the same tasks – fetching pages, sending headers, managing cookies, following redirects – but with a more expressive API. It’s the standard HTTP client in Laravel and widely used in Symfony projects.

If you’re already using Composer and want cleaner request code without the verbose cURL setup, Guzzle is the natural choice.

Installation

composer require guzzlehttp/guzzle

Basic Usage

<?php
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

$client = new Client([
    'timeout'         => 30,
    'connect_timeout' => 10,
    'headers'         => [
        'User-Agent'      => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
        'Accept'          => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language' => 'en-US,en;q=0.5',
    ],
]);

try {
    $response = $client->get("https://books.toscrape.com/");
    $html     = (string) $response->getBody();
    $status   = $response->getStatusCode();

    echo "Status: $status" . PHP_EOL;
    echo "Response size: " . strlen($html) . " bytes." . PHP_EOL;

} catch (RequestException $e) {
    echo "Request failed: " . $e->getMessage() . PHP_EOL;
}
?>

Output:

Status: 200
Response size: 51274 bytes.

Using Guzzle With DOMDocument for Parsing

Guzzle only handles HTTP requests – it doesn’t parse HTML. Combine it with DOMDocument for extraction:

<?php
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

$client = new Client([
    'timeout'         => 30,
    'connect_timeout' => 10,
    'headers'         => [
        'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
    ],
]);

try {
    $response = $client->get("https://books.toscrape.com/");
    $html     = (string) $response->getBody();

} catch (RequestException $e) {
    exit("Failed: " . $e->getMessage() . PHP_EOL);
}

// Parse with DOMDocument - same as with cURL
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_clear_errors();

$xpath = new DOMXPath($dom);
$books = $xpath->query('//article[contains(@class,"product_pod")]');

echo "Books found: " . $books->length . PHP_EOL . PHP_EOL;

foreach ($books as $book) {
    $titleNode = $xpath->query('.//h3/a', $book)->item(0);
    $priceNode = $xpath->query('.//*[contains(@class,"price_color")]', $book)->item(0);

    $title = $titleNode ? $titleNode->getAttribute('title') : 'N/A';
    $price = $priceNode ? trim($priceNode->textContent)     : 'N/A';

    echo "$title - $price" . PHP_EOL;
}
?>

Output:

Books found: 20

A Light in the Attic - £51.77
Tipping the Velvet - £53.74
Soumission - £50.10
...

Guzzle Concurrency – Fetching Multiple Pages Simultaneously

Guzzle’s biggest advantage over raw cURL for scraping is concurrent requests – sending multiple requests at the same time instead of one after another. On a 50-page scrape this can reduce total time significantly:

<?php
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;

$client = new Client([
    'timeout'         => 30,
    'connect_timeout' => 10,
    'headers'         => [
        'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
    ],
]);

// Build list of URLs to fetch
$urls = [];
for ($i = 1; $i <= 10; $i++) {
    $urls[] = "https://books.toscrape.com/catalogue/page-$i.html";
}

// Create request generator
$requests = function($urls) {
    foreach ($urls as $url) {
        yield new Request('GET', $url);
    }
};

$results  = [];
$failed   = [];
$startTime = microtime(true);

// Send up to 3 concurrent requests at a time
$pool = new Pool($client, $requests($urls), [
    'concurrency' => 3,
    'fulfilled'   => function($response, $index) use (&$results, $urls) {
        $results[$urls[$index]] = (string) $response->getBody();
        echo "Fetched: " . $urls[$index] . PHP_EOL;
    },
    'rejected'    => function($reason, $index) use (&$failed, $urls) {
        $failed[] = $urls[$index];
        echo "Failed: " . $urls[$index] . " - " . $reason->getMessage() . PHP_EOL;
    },
]);

$promise = $pool->promise();
$promise->wait();

$duration = round(microtime(true) - $startTime, 2);
echo PHP_EOL . "Fetched " . count($results) . " pages in {$duration}s." . PHP_EOL;
echo "Failed: " . count($failed) . PHP_EOL;
?>

Output:

Fetched: https://books.toscrape.com/catalogue/page-1.html
Fetched: https://books.toscrape.com/catalogue/page-2.html
Fetched: https://books.toscrape.com/catalogue/page-3.html
...
Fetched 10 pages in 3.84s.

The same 10 pages fetched sequentially with cURL and 1 second delay each would take 10+ seconds. Concurrency at 3 simultaneous requests brings that down to under 4 seconds – useful when scraping hundreds of pages.

Keep concurrency low – 3 to 5 simultaneous requests is enough for most sites. Higher values increase the chance of triggering rate limits.

Pros

Cleaner, more readable request code than raw cURL
Built-in concurrent requests via Pool – significant speed improvement on multi-page scrapes
Standard in Laravel – no extra setup if already using the framework
Clean exception handling with specific error types
Middleware support for adding retry logic, logging, and authentication globally

Cons

Requires Composer – not available on all shared hosting
Doesn’t parse HTML – still need DOMDocument or another parser
No JavaScript execution
Adds a dependency to manage and keep updated

When to Use It

Use Guzzle when you’re already in a Composer-based project – Laravel, Symfony, or any modern PHP app. The concurrent request pool is particularly useful when you need to scrape many pages quickly. For standalone scripts or simple one-off scrapers, raw cURL is less setup.

Cost

Free. Open source under the MIT license.

3. Goutte (Free – Composer)

Goutte is a PHP screen scraping library that combines Guzzle for HTTP requests with Symfony DomCrawler for HTML parsing. Instead of writing separate cURL and DOMDocument code, Goutte handles both in one object. You fetch a page and immediately use CSS selectors to extract data – no separate parsing setup required.

It’s a good middle ground for developers who find raw cURL too verbose but don’t need the full power of a paid service.

Installation

composer require fabpot/goutte

Basic Usage

<?php
require 'vendor/autoload.php';

use Goutte\Client;

$client   = new Client();
$crawler  = $client->request('GET', 'https://books.toscrape.com/');

echo "Status: " . $client->getInternalResponse()->getStatusCode() . PHP_EOL;
echo "Title: "  . $crawler->filter('title')->text() . PHP_EOL;
?>

Output:

Status: 200
Title: All products | Books to Scrape - Sandbox

Extracting Data With CSS Selectors

Goutte uses CSS selectors instead of XPath – more familiar to developers who work with frontend code:

<?php
require 'vendor/autoload.php';

use Goutte\Client;

$client  = new Client();
$crawler = $client->request('GET', 'https://books.toscrape.com/');

// Select all book articles
$books = $crawler->filter('article.product_pod');

echo "Books found: " . $books->count() . PHP_EOL . PHP_EOL;

// Loop through each book and extract data
$books->each(function($book) {
    // CSS selector within the book context
    $title  = $book->filter('h3 a')->attr('title');
    $price  = $book->filter('.price_color')->text();
    $rating = $book->filter('.star-rating')->attr('class');

    // Clean up rating - "star-rating Three" -> "Three"
    $rating = str_replace('star-rating ', '', $rating);

    echo "$title - $price - $rating stars" . PHP_EOL;
});
?>

Output:

Books found: 20

A Light in the Attic - £51.77 - One stars
Tipping the Velvet - £53.74 - One stars
Soumission - £50.10 - One stars
Sharp Objects - £47.82 - Four stars
...

Following Links and Scraping Multiple Pages

Goutte can follow links directly without building URLs manually – useful for navigating pagination:

<?php
require 'vendor/autoload.php';

use Goutte\Client;

$client    = new Client();
$url       = 'https://books.toscrape.com/';
$page      = 1;
$allBooks  = [];

while ($url) {
    echo "Scraping page $page..." . PHP_EOL;

    $crawler = $client->request('GET', $url);

    // Extract books on current page
    $crawler->filter('article.product_pod')->each(function($book) use (&$allBooks) {
        $allBooks[] = [
            'title' => $book->filter('h3 a')->attr('title'),
            'price' => $book->filter('.price_color')->text(),
        ];
    });

    echo "Page $page - " . count($allBooks) . " total books." . PHP_EOL;

    // Find next page link
    $nextLink = $crawler->filter('li.next a');

    if ($nextLink->count() > 0) {
        // Goutte resolves relative URLs automatically
        $url = $nextLink->link()->getUri();
        $page++;
        sleep(1);
    } else {
        echo "Last page reached." . PHP_EOL;
        $url = null;
    }
}

echo PHP_EOL . "Total books scraped: " . count($allBooks) . PHP_EOL;
?>

Output:

Scraping page 1...
Page 1 - 20 total books.
Scraping page 2...
Page 2 - 40 total books.
...
Last page reached.

Total books scraped: 1000

Adding Custom Headers to Goutte

By default Goutte sends a Symfony BrowserKit user agent which most sites recognize as a bot. Override it:

<?php
require 'vendor/autoload.php';

use Goutte\Client;
use GuzzleHttp\Client as GuzzleClient;

// Pass a configured Guzzle client to Goutte
$guzzleClient = new GuzzleClient([
    'timeout'         => 30,
    'connect_timeout' => 10,
    'headers'         => [
        'User-Agent'      => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
        'Accept'          => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language' => 'en-US,en;q=0.5',
    ],
]);

$client = new Client();
$client->setClient($guzzleClient);

$crawler = $client->request('GET', 'https://books.toscrape.com/');
echo "Fetched with custom headers. Title: " . $crawler->filter('title')->text() . PHP_EOL;
?>

Output:

Fetched with custom headers. Title: All products | Books to Scrape - Sandbox

Submitting Forms With Goutte

Goutte can fill and submit HTML forms – useful for scraping sites that require search queries or login:

<?php
require 'vendor/autoload.php';

use Goutte\Client;

$client  = new Client();
$crawler = $client->request('GET', 'https://example.com/search');

// Find the search form and fill it
$form = $crawler->selectButton('Search')->form();
$form['q'] = 'php scraping';

// Submit the form
$resultCrawler = $client->submit($form);

// Extract results
$results = $resultCrawler->filter('.search-result');
echo "Results found: " . $results->count() . PHP_EOL;

$results->each(function($result) {
    echo $result->filter('h3')->text() . PHP_EOL;
});
?>

Pros

CSS selectors are more intuitive than XPath for most developers
Combines fetching and parsing in one object – less boilerplate than cURL + DOMDocument
Handles relative URL resolution automatically when following links
Built-in form submission support
Well documented and stable

Cons

No JavaScript execution
Slower than raw cURL for high-volume scraping
Requires Composer
Default user agent gets blocked – always override with a real browser string
Less control over low-level request options compared to raw cURL

When to Use It

Use Goutte when you prefer CSS selectors over XPath, when you need to submit forms as part of the scraping flow, or when you want less setup code than raw cURL. It’s a solid choice for beginner to intermediate scraping projects that don’t need JavaScript rendering or high concurrency.

Cost

Free. Open source under the MIT license.

4. Symfony DomCrawler (Free – Composer)

Symfony DomCrawler is the HTML and XML parsing component that powers Goutte. You can use it standalone – without the full Goutte package – when you already have HTML from a cURL or Guzzle request and just need to extract data from it. It supports both CSS selectors and XPath queries, making it more flexible than either alone.

If you’re building a Laravel or Symfony application and want the cleanest possible HTML parsing without pulling in the full Goutte package, DomCrawler is the right choice.

Installation

composer require symfony/dom-crawler symfony/css-selector

The css-selector component is required separately if you want to use CSS selectors. Without it only XPath queries work.

Basic Usage With cURL

<?php
require 'vendor/autoload.php';

use Symfony\Component\DomCrawler\Crawler;

// Fetch HTML with cURL
$ch = curl_init();

curl_setopt_array($ch, [
    CURLOPT_URL            => "https://books.toscrape.com/",
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_TIMEOUT        => 30,
    CURLOPT_HTTPHEADER     => [
        'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
    ],
]);

$html = curl_exec($ch);
curl_close($ch);

// Pass HTML to DomCrawler
$crawler = new Crawler($html);

// Use CSS selectors to extract data
$title = $crawler->filter('title')->text();
$books = $crawler->filter('article.product_pod');

echo "Page title: $title" . PHP_EOL;
echo "Books found: " . $books->count() . PHP_EOL;
?>

Output:

Page title: All products | Books to Scrape - Sandbox
Books found: 20

CSS Selectors and XPath Together

DomCrawler lets you mix CSS selectors and XPath in the same script – use whichever is cleaner for each extraction:

<?php
require 'vendor/autoload.php';

use Symfony\Component\DomCrawler\Crawler;

$html    = file_get_contents('https://books.toscrape.com/');
$crawler = new Crawler($html);

// CSS selector approach
$crawler->filter('article.product_pod')->each(function(Crawler $book) {
    // CSS selector for title
    $title = $book->filter('h3 a')->attr('title');

    // CSS selector for price
    $price = $book->filter('.price_color')->text();

    // XPath for availability
    $availability = $book->filterXPath(
        './/*[contains(@class,"availability")]'
    )->text();

    echo "$title - $price - " . trim($availability) . PHP_EOL;
});
?>

Output:

A Light in the Attic - £51.77 - In stock
Tipping the Velvet - £53.74 - In stock
Soumission - £50.10 - In stock
Sharp Objects - £47.82 - In stock
...

Extracting Attributes and Text

<?php
require 'vendor/autoload.php';

use Symfony\Component\DomCrawler\Crawler;

$html    = file_get_contents('https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html');
$crawler = new Crawler($html);

// Get text content
$title       = $crawler->filter('h1')->text();
$price       = $crawler->filter('.price_color')->text();
$description = $crawler->filter('#product_description ~ p')->text();

// Get attribute value
$coverImage = $crawler->filter('.thumbnail')->attr('src');

// Get multiple values as array
$tableData = $crawler->filter('table.table tr')->each(function(Crawler $row) {
    return [
        'label' => $row->filter('th')->text(),
        'value' => $row->filter('td')->text(),
    ];
});

echo "Title: $title" . PHP_EOL;
echo "Price: $price" . PHP_EOL;
echo "Cover: $coverImage" . PHP_EOL;
echo PHP_EOL . "Product details:" . PHP_EOL;

foreach ($tableData as $row) {
    echo "  {$row['label']}: {$row['value']}" . PHP_EOL;
}
?>

Output:

Title: A Light in the Attic
Price: £51.77
Cover: ../../../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg

Product details:
  UPC: a897fe39b1053632
  Product Type: Books
  Price (excl. tax): £51.77
  Price (incl. tax): £51.77
  Tax: £0.00
  Availability: In stock (22 available)
  Number of reviews: 0

Reducing a Crawler to a Subsection

When scraping complex pages, narrow the crawler to a specific section before extracting – avoids false matches from other parts of the page:

<?php
require 'vendor/autoload.php';

use Symfony\Component\DomCrawler\Crawler;

$html    = file_get_contents('https://books.toscrape.com/');
$crawler = new Crawler($html);

// Narrow to the product section only
$productSection = $crawler->filter('section');

// Now all queries run within that section
$books = $productSection->filter('article.product_pod');

echo "Books in product section: " . $books->count() . PHP_EOL;

// Get first book only
$firstBook = $books->first();
echo "First book: " . $firstBook->filter('h3 a')->attr('title') . PHP_EOL;

// Get last book only
$lastBook = $books->last();
echo "Last book: " . $lastBook->filter('h3 a')->attr('title') . PHP_EOL;

// Get specific book by index
$thirdBook = $books->eq(2); // 0-indexed
echo "Third book: " . $thirdBook->filter('h3 a')->attr('title') . PHP_EOL;
?>

Output:

Books in product section: 20
First book: A Light in the Attic
Last book: Libertarianism for Beginners
Third book: Soumission

Handling Missing Elements Safely

<?php
require 'vendor/autoload.php';

use Symfony\Component\DomCrawler\Crawler;

$html    = file_get_contents('https://books.toscrape.com/');
$crawler = new Crawler($html);

$crawler->filter('article.product_pod')->each(function(Crawler $book) {
    // count() check prevents exceptions on missing elements
    $title = $book->filter('h3 a')->count() > 0
             ? $book->filter('h3 a')->attr('title')
             : 'N/A';

    $price = $book->filter('.price_color')->count() > 0
             ? $book->filter('.price_color')->text()
             : 'N/A';

    echo "$title - $price" . PHP_EOL;
});
?>

Output:

A Light in the Attic - £51.77
Tipping the Velvet - £53.74
Soumission - £50.10
...

Pros

Supports both CSS selectors and XPath in the same script
Cleaner API than DOMDocument for most extraction tasks
Well maintained as part of the Symfony ecosystem
Works well alongside Guzzle in Laravel and Symfony projects
Strong documentation and large community

Cons

Parsing only – requires a separate HTTP client for fetching pages
Requires Composer and two packages for CSS selector support
No JavaScript execution
Slightly more verbose than Goutte for combined fetch-and-parse workflows

When to Use It

Use DomCrawler when you’re already using Guzzle for HTTP requests and want a cleaner parsing layer on top. It’s the best free HTML parser for PHP projects that use Composer – more flexible than DOMDocument alone and lighter than pulling in the full Goutte package. Ideal for Laravel and Symfony applications where it fits naturally into the existing dependency stack.

Cost

Free. Open source under the MIT license.

5. Symfony Panther (Free – Composer)

Symfony Panther is a browser testing and web scraping library for PHP that controls a real browser – Chrome or Firefox – programmatically. Unlike every other tool in this list, Panther actually executes JavaScript. It’s the only free PHP-native option for scraping JavaScript-rendered websites without calling an external Node.js script.

Panther uses WebDriver under the hood – the same protocol used by Selenium. If you’ve used browser automation testing tools before, the API will feel familiar.

Installation

# Install the package
composer require symfony/panther

# Install ChromeDriver (matches your Chrome version)
# On Ubuntu/Debian
apt-get install chromium-chromedriver

# Or download directly from
# https://chromedriver.chromium.org/downloads

Basic Usage

<?php
require 'vendor/autoload.php';

use Symfony\Component\Panther\Client;

// Launch headless Chrome
$client  = Client::createChromeClient();
$crawler = $client->request('GET', 'https://books.toscrape.com/');

echo "Title: " . $crawler->filter('title')->text() . PHP_EOL;
echo "Books found: " . $crawler->filter('article.product_pod')->count() . PHP_EOL;

// Always close the browser when done
$client->quit();
?>

Output:

Title: All products | Books to Scrape - Sandbox
Books found: 20

Scraping JavaScript-Rendered Content

This is where Panther is genuinely different from every other free PHP tool. The browser executes JavaScript and Panther waits for the content to appear before extracting:

<?php
require 'vendor/autoload.php';

use Symfony\Component\Panther\Client;

$client = Client::createChromeClient(null, [
    '--headless',
    '--no-sandbox',
    '--disable-dev-shm-usage',
    '--disable-gpu',
    '--window-size=1280,800',
]);

// Navigate to a JavaScript-rendered page
$crawler = $client->request('GET', 'https://example-js-site.com/products');

// Wait until the product cards appear - up to 10 seconds
$client->waitFor('.product-card', 10);

// Now extract - JavaScript has finished loading
$products = $crawler->filter('.product-card');
echo "Products loaded: " . $products->count() . PHP_EOL;

$products->each(function($product) {
    $name  = $product->filter('.product-name')->text();
    $price = $product->filter('.product-price')->text();
    echo "$name - $price" . PHP_EOL;
});

$client->quit();
?>

Output:

Products loaded: 24
Laptop Stand Pro - $49.99
Mechanical Keyboard - $89.99
USB-C Hub - $34.99
...

Waiting Strategies

Panther gives you several ways to wait for content to load – picking the right one prevents both timeouts and capturing incomplete pages:

<?php
require 'vendor/autoload.php';

use Symfony\Component\Panther\Client;

$client  = Client::createChromeClient();
$crawler = $client->request('GET', 'https://example-js-site.com/products');

// Wait for a CSS selector to appear
$client->waitFor('.product-card');

// Wait for a CSS selector to contain specific text
$client->waitForElementToContain('.status', 'loaded');

// Wait for a CSS selector to disappear - useful for loading spinners
$client->waitForStaleness('.loading-spinner');

// Wait for page title to match
$client->waitForTitle('Products - Example Site');

// Custom wait - poll until a condition returns true
$client->waitUntil(function() use ($client) {
    $items = $client->getCrawler()->filter('.product-card');
    return $items->count() > 0;
});

echo "Page fully loaded." . PHP_EOL;
$client->quit();
?>

Handling Infinite Scroll Pagination

Sites that load more content as you scroll down can’t be handled with URL-based pagination. Panther can simulate scrolling to trigger loading:

<?php
require 'vendor/autoload.php';

use Symfony\Component\Panther\Client;

$client  = Client::createChromeClient();
$crawler = $client->request('GET', 'https://example-infinite-scroll.com/products');

$client->waitFor('.product-card');

$allProducts = [];
$scrollCount = 0;
$maxScrolls  = 10; // safety cap

while ($scrollCount < $maxScrolls) {
    // Extract currently visible products
    $products = $client->getCrawler()->filter('.product-card');

    foreach ($products as $product) {
        $crawler   = new \Symfony\Component\DomCrawler\Crawler($product);
        $name      = $crawler->filter('.name')->count() > 0
                     ? $crawler->filter('.name')->text()
                     : 'N/A';

        if (!in_array($name, $allProducts)) {
            $allProducts[] = $name;
        }
    }

    echo "Scroll $scrollCount - " . count($allProducts) . " unique products." . PHP_EOL;

    // Scroll to bottom of page
    $client->executeScript('window.scrollTo(0, document.body.scrollHeight)');

    // Wait for new content to load
    sleep(2);

    $scrollCount++;

    // Stop if no new products appeared
    $newCount = $client->getCrawler()->filter('.product-card')->count();
    if ($newCount === count($allProducts)) {
        echo "No new products - reached the end." . PHP_EOL;
        break;
    }
}

echo "Total unique products: " . count($allProducts) . PHP_EOL;
$client->quit();
?>

Output:

Scroll 0 - 20 unique products.
Scroll 1 - 40 unique products.
Scroll 2 - 60 unique products.
Scroll 3 - 60 unique products.
No new products - reached the end.
Total unique products: 60

Taking Screenshots for Debugging

When a scraper isn’t finding the expected elements, a screenshot shows exactly what the browser actually loaded:

<?php
require 'vendor/autoload.php';

use Symfony\Component\Panther\Client;

$client  = Client::createChromeClient();
$crawler = $client->request('GET', 'https://books.toscrape.com/');

// Take a screenshot to debug what the browser sees
$client->takeScreenshot(__DIR__ . '/debug_screenshot.png');
echo "Screenshot saved to debug_screenshot.png" . PHP_EOL;

$books = $crawler->filter('article.product_pod');

if ($books->count() === 0) {
    echo "No books found - check the screenshot." . PHP_EOL;
} else {
    echo "Books found: " . $books->count() . PHP_EOL;
}

$client->quit();
?>

Output:

Screenshot saved to debug_screenshot.png
Books found: 20

Pros

Full JavaScript execution – the only free PHP-native tool that handles JavaScript-rendered content
CSS selectors via Symfony DomCrawler – clean familiar API
Built-in waiting strategies for dynamic content
Can simulate scrolling, clicking, and form submission
Screenshot support for debugging
No Node.js required – runs entirely in PHP

Cons

Slow – launching a real browser takes 2-5 seconds per instance, much slower than cURL
High memory usage – Chrome uses 200-400MB per browser instance
Requires ChromeDriver installed and matching your Chrome version
Not available on most shared hosting environments
Overkill for static sites where cURL works fine

When to Use It

Use Panther when you need JavaScript execution and want to stay in PHP without setting up a Node.js environment. It’s the right choice for scraping single-page applications, infinite scroll pages, or any site that loads content after the initial page render. On a VPS or dedicated server where Chrome can run, Panther is the cleanest PHP solution for dynamic content.

For the alternative approach using Node.js Puppeteer called from PHP, the dynamic content web scraping guide covers the full integration.

Cost

Free. Open source under the MIT license.

6. ScraperAPI (Paid)

ScraperAPI is a web scraping service that handles the hard parts of scraping at scale – rotating proxies, browser fingerprinting, CAPTCHA solving, and JavaScript rendering. Instead of managing your own proxy pool and fighting bot detection, you send requests through ScraperAPI’s endpoint and it handles everything automatically.

The integration is simple – replace your target URL with a ScraperAPI URL containing your API key and the original URL as a parameter. Your existing cURL code works with minimal changes.

How It Works

<?php
// Without ScraperAPI - direct request that may get blocked
$url = "https://example.com/products";

// With ScraperAPI - routes through their proxy network
$apiKey     = 'your_api_key_here';
$targetUrl  = urlencode("https://example.com/products");
$url        = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}";

// Your cURL code stays exactly the same - only the URL changes
$ch = curl_init();

curl_setopt_array($ch, [
    CURLOPT_URL            => $url,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_TIMEOUT        => 60, // ScraperAPI requests take longer - increase timeout
    CURLOPT_HTTPHEADER     => [
        'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
    ],
]);

$html     = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

if ($httpCode === 200) {
    echo "Fetched via ScraperAPI. Length: " . strlen($html) . " bytes." . PHP_EOL;
} else {
    echo "Request failed: HTTP $httpCode" . PHP_EOL;
}
?>

Output:

Fetched via ScraperAPI. Length: 48291 bytes.

ScraperAPI Parameters

ScraperAPI accepts parameters that control how it handles each request:

<?php
$apiKey    = 'your_api_key_here';
$targetUrl = urlencode("https://example.com/products");

// Basic request - rotating proxy only
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}";

// Render JavaScript before returning HTML
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}&render=true";

// Use residential proxies - higher success rate on tough sites
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}&premium=true";

// Set country for the proxy
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}&country_code=us";

// Combine parameters
$params = http_build_query([
    'api_key'      => $apiKey,
    'url'          => "https://example.com/products",
    'render'       => 'true',
    'country_code' => 'us',
    'premium'      => 'true',
]);

$url = "https://api.scraperapi.com/?" . $params;

echo "ScraperAPI URL built." . PHP_EOL;
echo "Target: https://example.com/products" . PHP_EOL;
?>

Building a Reusable ScraperAPI Function

<?php
function scrape_via_api($targetUrl, $apiKey, $options = []) {
    $params = array_merge([
        'api_key' => $apiKey,
        'url'     => $targetUrl,
    ], $options);

    $apiUrl = "https://api.scraperapi.com/?" . http_build_query($params);

    $ch = curl_init();

    curl_setopt_array($ch, [
        CURLOPT_URL            => $apiUrl,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_CONNECTTIMEOUT => 15,
        CURLOPT_TIMEOUT        => 70, // ScraperAPI can take up to 60s on hard targets
    ]);

    $html     = curl_exec($ch);
    $errno    = curl_errno($ch);
    $error    = curl_error($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($errno) {
        echo "cURL error: $error" . PHP_EOL;
        return false;
    }

    if ($httpCode === 403) {
        echo "API key invalid or quota exceeded." . PHP_EOL;
        return false;
    }

    if ($httpCode !== 200) {
        echo "ScraperAPI returned HTTP $httpCode for: $targetUrl" . PHP_EOL;
        return false;
    }

    return $html;
}

$apiKey = 'your_api_key_here';

// Basic scrape
$html = scrape_via_api("https://books.toscrape.com/", $apiKey);

if ($html) {
    echo "Basic scrape: " . strlen($html) . " bytes." . PHP_EOL;
}

// JavaScript rendered scrape
$html = scrape_via_api("https://example-js-site.com/products", $apiKey, [
    'render' => 'true',
]);

if ($html) {
    echo "JS rendered scrape: " . strlen($html) . " bytes." . PHP_EOL;
}
?>

Output:

Basic scrape: 51274 bytes.
JS rendered scrape: 187432 bytes.

Tracking API Credits

ScraperAPI bills by credits – each request uses 1 credit on the basic plan, 5 credits with JavaScript rendering, and 10-25 credits with premium residential proxies. Track your usage to avoid hitting your monthly limit mid-scrape:

<?php
function get_scraperapi_usage($apiKey) {
    $url = "https://api.scraperapi.com/account?api_key={$apiKey}";

    $ch = curl_init();
    curl_setopt_array($ch, [
        CURLOPT_URL            => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 15,
    ]);

    $response = curl_exec($ch);
    curl_close($ch);

    $data = json_decode($response, true);

    if (!$data) {
        echo "Could not fetch usage data." . PHP_EOL;
        return false;
    }

    echo "ScraperAPI Account Usage:" . PHP_EOL;
    echo "  Plan: "            . ($data['plan']              ?? 'N/A') . PHP_EOL;
    echo "  Credits used: "    . ($data['requestCount']      ?? 'N/A') . PHP_EOL;
    echo "  Credits remaining: " . ($data['requestLimit']    ?? 'N/A') . PHP_EOL;
    echo "  Concurrent limit: " . ($data['concurrencyLimit'] ?? 'N/A') . PHP_EOL;

    return $data;
}

$usage = get_scraperapi_usage('your_api_key_here');
?>

Output:

ScraperAPI Account Usage:
  Plan: Hobby
  Credits used: 1250
  Credits remaining: 98750
  Concurrent limit: 5

Pricing

ScraperAPI pricing as of 2026:

Free trial – 1,000 API credits, no credit card required
Hobby – $49/month – 100,000 credits, 5 concurrent threads
Startup – $149/month – 500,000 credits, 10 concurrent threads
Business – $299/month – 3,000,000 credits, 25 concurrent threads

Remember that JavaScript rendering costs 5 credits per request and premium residential proxies cost 10-25 credits. A plan that looks large enough can run out faster than expected if you’re using those features heavily.

Pros

Minimal code change – drop-in replacement for your existing cURL URL
Handles proxy rotation, CAPTCHA solving, and browser fingerprinting automatically
JavaScript rendering available without setting up a browser
Free trial with 1,000 credits – enough to test properly
Reliable success rates on sites that block regular cURL requests

Cons

Monthly cost starting at $49 – not justified for small scraping projects
Slower than direct requests – adds network latency routing through their servers
JavaScript rendering costs 5x the credits of a basic request
You’re dependent on their service availability and pricing changes
Credit limits can be hit faster than expected on large scraping jobs

When to Use It

Use ScraperAPI when your scraper is getting blocked at scale and you’ve already tried proper headers, delays, and cookie handling. It makes economic sense when the alternative is spending developer time managing your own proxy infrastructure. For small projects under 10,000 pages per month, free tools with proper configuration handle most blocking issues without paying $49/month.

Cost

Free trial with 1,000 credits. Paid plans from $49/month. See scraperapi.com/pricing for current rates.

7. Bright Data (Paid)

Bright Data is the largest commercial web scraping infrastructure provider. It offers residential proxies, datacenter proxies, ISP proxies, and a full scraping browser – a real Chromium instance with built-in unblocking. Unlike ScraperAPI which is a simple API wrapper, Bright Data is a full platform with multiple products targeting different scraping needs.

It’s built for enterprise scale – thousands of concurrent requests, millions of pages per day, and access to data from sites that block every other approach.

Bright Data Products

Bright Data has several distinct products – understanding which one fits your use case prevents paying for features you don’t need:

Residential Proxies – real home IP addresses from real devices. Hardest to detect, highest success rate on tough targets. Priced per GB of data transferred.
Datacenter Proxies – server IPs, faster and cheaper than residential. Good for sites without aggressive bot detection.
ISP Proxies – static residential IPs assigned by ISPs. More stable than rotating residential proxies.
Scraping Browser – a real Chromium browser with built-in unblocking. Handles JavaScript, CAPTCHAs, and fingerprinting automatically.
SERP API – specifically for scraping search engine results pages.
Datasets – pre-scraped data you can purchase instead of scraping yourself.

Using Bright Data Residential Proxies With cURL

<?php
function scrape_via_bright_data($url, $proxyConfig) {
    $ch = curl_init();

    curl_setopt_array($ch, [
        CURLOPT_URL            => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_CONNECTTIMEOUT => 15,
        CURLOPT_TIMEOUT        => 60,
        CURLOPT_ENCODING       => '',

        // Bright Data proxy configuration
        CURLOPT_PROXY          => $proxyConfig['host'] . ':' . $proxyConfig['port'],
        CURLOPT_PROXYTYPE      => CURLPROXY_HTTP,
        CURLOPT_PROXYUSERPWD   => $proxyConfig['username'] . ':' . $proxyConfig['password'],

        CURLOPT_HTTPHEADER     => [
            'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
            'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language: en-US,en;q=0.5',
            'Accept-Encoding: gzip, deflate, br',
            'Connection: keep-alive',
        ],
    ]);

    $html     = curl_exec($ch);
    $errno    = curl_errno($ch);
    $error    = curl_error($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($errno) {
        echo "Proxy error: $error" . PHP_EOL;
        return false;
    }

    if ($httpCode !== 200) {
        echo "HTTP $httpCode via Bright Data proxy." . PHP_EOL;
        return false;
    }

    return $html;
}

// Bright Data proxy credentials from your dashboard
$proxyConfig = [
    'host'     => 'brd.superproxy.io',
    'port'     => '22225',
    'username' => 'your-username-country-us', // append country targeting
    'password' => 'your-proxy-password',
];

$html = scrape_via_bright_data("https://example.com/products", $proxyConfig);

if ($html) {
    echo "Fetched via Bright Data: " . strlen($html) . " bytes." . PHP_EOL;
}
?>

Output:

Fetched via Bright Data: 48291 bytes.

Using the Scraping Browser

The Scraping Browser is Bright Data’s most powerful product – a real Chromium instance hosted on their infrastructure that handles JavaScript, CAPTCHAs, and browser fingerprinting automatically. Connect to it via WebSocket from PHP:

<?php
// Bright Data Scraping Browser uses WebSocket + CDP (Chrome DevTools Protocol)
// The simplest integration from PHP is via their REST endpoint

function scrape_via_bright_data_browser($url, $credentials) {
    $apiEndpoint = "https://api.brightdata.com/request";

    $payload = json_encode([
        'zone'    => $credentials['zone'],
        'url'     => $url,
        'format'  => 'raw',
        'render'  => true,   // execute JavaScript
    ]);

    $ch = curl_init();

    curl_setopt_array($ch, [
        CURLOPT_URL            => $apiEndpoint,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_POST           => true,
        CURLOPT_POSTFIELDS     => $payload,
        CURLOPT_TIMEOUT        => 90,
        CURLOPT_HTTPHEADER     => [
            'Content-Type: application/json',
            'Authorization: Bearer ' . $credentials['api_token'],
        ],
    ]);

    $response = curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($httpCode !== 200) {
        echo "Bright Data Browser error: HTTP $httpCode" . PHP_EOL;
        return false;
    }

    return $response;
}

$credentials = [
    'zone'      => 'your_zone_name',
    'api_token' => 'your_api_token',
];

$html = scrape_via_bright_data_browser(
    "https://example-js-site.com/products",
    $credentials
);

if ($html) {
    echo "JS-rendered page fetched: " . strlen($html) . " bytes." . PHP_EOL;

    // Parse with DOMDocument as normal
    libxml_use_internal_errors(true);
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    libxml_clear_errors();

    $xpath    = new DOMXPath($dom);
    $products = $xpath->query('//div[contains(@class,"product-card")]');
    echo "Products found: " . $products->length . PHP_EOL;
}
?>

Output:

JS-rendered page fetched: 187432 bytes.
Products found: 24

Targeting Specific Countries

Bright Data lets you route requests through specific countries – useful when scraping sites that show different prices or content based on location:

<?php
// Route through specific countries by modifying the username
$proxyConfigs = [
    'us' => [
        'host'     => 'brd.superproxy.io',
        'port'     => '22225',
        'username' => 'your-username-country-us',
        'password' => 'your-password',
    ],
    'uk' => [
        'host'     => 'brd.superproxy.io',
        'port'     => '22225',
        'username' => 'your-username-country-gb',
        'password' => 'your-password',
    ],
    'de' => [
        'host'     => 'brd.superproxy.io',
        'port'     => '22225',
        'username' => 'your-username-country-de',
        'password' => 'your-password',
    ],
];

// Compare prices across regions
foreach ($proxyConfigs as $country => $config) {
    $html = scrape_via_bright_data(
        "https://example.com/product/12345",
        $config
    );

    if ($html) {
        libxml_use_internal_errors(true);
        $dom = new DOMDocument();
        $dom->loadHTML($html);
        libxml_clear_errors();

        $xpath = new DOMXPath($dom);
        $price = $xpath->query('//*[@class="price"]')->item(0);
        $priceText = $price ? trim($price->textContent) : 'N/A';

        echo strtoupper($country) . ": $priceText" . PHP_EOL;
    }

    sleep(2);
}
?>

Output:

US: $49.99
UK: £42.99
DE: €46.99

Pricing

Bright Data pricing as of 2026 – billed by data transferred, not by request count:

Residential Proxies – from $8.40/GB. Minimum commitment varies by plan.
Datacenter Proxies – from $0.60/GB. Significantly cheaper than residential.
ISP Proxies – from $7.14/GB.
Scraping Browser – from $0.001 per page. Custom pricing for high volume.
Free trial – available on request with account verification.

Costs add up quickly on large scraping jobs. A project fetching 100,000 pages averaging 50KB each transfers 5GB of data – $42 at residential proxy rates. Factor this into your budget before choosing Bright Data over cheaper alternatives.

Pros

Largest residential proxy network available – 72 million IPs across 195 countries
Highest success rates on sites that block everything else
Country-level targeting for geo-specific scraping
Scraping Browser handles JavaScript and CAPTCHAs automatically
Enterprise-grade reliability and SLA
Dedicated account managers on higher plans

Cons

Expensive – pricing by GB means costs scale directly with data volume
Complex platform with multiple products – takes time to understand which one fits your use case
Overkill for small to medium scraping projects
Minimum commitments on some plans
Account verification required before starting

When to Use It

Use Bright Data when you’re scraping at enterprise scale – millions of pages per month – or when you need to collect geo-specific data from multiple countries simultaneously. For most PHP scraping projects, free tools with proper configuration or ScraperAPI at $49/month are more appropriate. Bright Data makes sense when ScraperAPI’s limits or success rates aren’t enough for your volume.

Cost

Residential proxies from $8.40/GB. See brightdata.com/pricing for current rates across all products.

How to Choose the Right PHP Web Scraping Tool

The right tool depends on four factors – target site type, scraping volume, your hosting environment, and whether you’re using a framework. Work through these questions in order:

Step 1: Does the Target Site Use JavaScript?

Site is static HTML (content in cURL response)
→ Use cURL + DOMDocument, Guzzle, or Goutte

Site uses JavaScript rendering (content missing from cURL response)
→ Free option: Symfony Panther (needs Chrome on server)
→ Paid option: ScraperAPI with render=true or Bright Data Scraping Browser
→ DIY option: Puppeteer via Node.js called from PHP

Step 2: What Volume Are You Scraping?

Under 1,000 pages per day
→ Any free tool works. No need for paid services.

1,000 - 50,000 pages per day
→ Free tools with proxy rotation.
→ ScraperAPI if getting blocked consistently.

Over 50,000 pages per day
→ Guzzle with concurrent requests + proxy rotation.
→ ScraperAPI Business plan or Bright Data.

Step 3: Are You Using a Framework?

Laravel project
→ Guzzle (already included), DomCrawler for parsing

Symfony project
→ DomCrawler or Panther (same ecosystem)

Standalone PHP script
→ cURL + DOMDocument (no Composer needed)

Any Composer project
→ Goutte (simplest combined fetch and parse)

Step 4: What Does Your Hosting Support?

Shared hosting
→ cURL + DOMDocument or Goutte only
→ Panther won't work (no Chrome available)
→ Paid API services work (just HTTP requests)

VPS or dedicated server
→ All free tools available
→ Panther and Puppeteer both work

Serverless / cloud functions
→ ScraperAPI or Bright Data
→ No persistent processes for browser tools

Decision Summary

Situation	Best Tool
Learning PHP scraping for the first time	cURL + DOMDocument
Laravel or Symfony project	Guzzle + DomCrawler
Want CSS selectors without framework setup	Goutte
JavaScript sites, free option, VPS available	Symfony Panther
Getting blocked at medium volume	ScraperAPI
Enterprise scale or geo-specific data	Bright Data

Frequently Asked Questions

What is the best PHP web scraping tool for beginners?

PHP cURL with DOMDocument. It’s built into PHP, requires no installation, works on every hosting environment, and teaches you exactly what’s happening at the HTTP level. Once you understand how requests and HTML parsing work at this level, moving to Goutte or Guzzle is straightforward. The PHP web scraper beginner guide walks through the complete setup with working code.

Can PHP scrape JavaScript websites without paid tools?

Yes – with Symfony Panther on a VPS or dedicated server where Chrome can run. It’s free, PHP-native, and handles full JavaScript execution including infinite scroll and dynamic content. The limitation is hosting – shared hosting environments don’t support running Chrome processes. If you’re on shared hosting, ScraperAPI with render=true is the practical alternative.

Is Goutte still actively maintained?

Goutte hasn’t had major updates recently. The underlying components – Guzzle and Symfony DomCrawler – are actively maintained and that’s what matters for functionality. For new projects the more future-proof approach is using Guzzle and DomCrawler directly rather than through the Goutte wrapper. The API is nearly identical and you avoid the extra dependency.

Do I need a paid scraping service for most projects?

No. The majority of scraping projects – price monitoring, content aggregation, data collection from public sites – work fine with free tools and proper configuration. Add complete browser headers, random delays, and cookie handling and most blocking issues disappear. The avoiding blocks guide covers all seven techniques with working code. Paid services make sense when you’re hitting volume limits that free tools can’t overcome.

What is the difference between ScraperAPI and Bright Data?

ScraperAPI is simpler and cheaper – a single API endpoint you point at any URL, starting at $49/month for 100,000 requests. Good for medium-scale scraping where you’re getting blocked and want a quick fix. Bright Data is a full infrastructure platform – multiple proxy types, country targeting, enterprise SLAs, and a dedicated scraping browser. It’s significantly more expensive and complex but handles scale and success rates that ScraperAPI can’t match on the toughest targets.

Can I use multiple tools in the same project?

Yes – and it’s often the right approach. Use cURL with DOMDocument for static pages, ScraperAPI for pages that block you, and Panther for JavaScript-rendered content. Wrap each in a function with a consistent interface so your scraping logic doesn’t need to change based on which tool is fetching. The decision of which tool to use for a given URL can be handled by checking the response – if cURL returns a blocked page or empty content, fall back to a more capable tool automatically.

Summary: Best Web Scraping Tools PHP in 2026

The best PHP web scraping tools in 2026 cover every scraping scenario from simple static sites to enterprise scale JavaScript rendering:

cURL + DOMDocument – start here. Free, built-in, full control. Handles most static scraping projects without any additional tools.
Guzzle – cleaner HTTP requests with concurrent fetching. The standard choice in Laravel and Symfony projects.
Goutte – combined fetch and parse with CSS selectors. Less setup than cURL + DOMDocument for simple scraping jobs.
Symfony DomCrawler – the best free HTML parser for Composer projects. Use alongside Guzzle for clean separation of HTTP and parsing logic.
Symfony Panther – full JavaScript execution in PHP. The only free option for dynamic content on a server where Chrome is available.
ScraperAPI – drop-in proxy and unblocking service. Worth the $49/month when free tools consistently fail at scale.
Bright Data – enterprise scraping infrastructure. The right choice for millions of pages per month or geo-specific data collection.

Start with cURL and DOMDocument. Add tools only when you hit a problem they can’t solve. Most scraping projects never need to go beyond the free options.

For implementing your first scraper with cURL and DOMDocument, the PHP cURL web scraping complete guide covers every detail with working code. For handling the errors that come up in real projects, the web scraping errors guide covers the seven most common failures and their fixes.

Learn more about cURL from the official PHP documentation.