Choosing the best web scraping tools PHP has available depends entirely on what you’re building. PHP cURL handles most scraping jobs on its own. Add DOMDocument for parsing and you can scrape the majority of static websites without installing anything extra. But as projects grow – larger volumes, JavaScript sites, block avoidance, proxy rotation – the built-in tools start showing their limits.
This guide covers every PHP web scraping tool worth knowing in 2026 – free libraries, paid services, and headless browser options. Each one includes working code, honest pros and cons, pricing where relevant, and clear guidance on when it’s the right choice.
Best Web Scraping Tools PHP: Quick Comparison
| Tool | Type | Best For | Cost | JS Support |
|---|---|---|---|---|
| PHP cURL + DOMDocument | Built-in | Static sites, full control | Free | No |
| Guzzle | Library | HTTP requests in Laravel/Composer projects | Free | No |
| Goutte | Library | Simple scraping with CSS selectors | Free | No |
| Symfony DomCrawler | Library | HTML parsing with CSS and XPath | Free | No |
| Symfony Panther | Library | JavaScript-rendered pages in PHP | Free | Yes |
| ScraperAPI | Service | Avoiding blocks at scale | From $49/mo | Optional |
| Bright Data | Service | Enterprise scraping, large proxy pool | Custom pricing | Yes |
What You Need Before Choosing a Tool
Answer these three questions first – they determine which tool is right before you write a single line of code:
- Is the target site static or JavaScript-rendered? – Static HTML: any free library works. JavaScript content: you need Panther, Puppeteer, or a paid service with rendering support.
- What volume are you scraping? – Under 1000 pages per day: free tools are fine. Over that threshold: you’ll hit IP blocks and need proxies or a paid service.
- Are you using a framework? – Laravel or Symfony project: Guzzle or DomCrawler fit naturally. Standalone PHP script: cURL with DOMDocument is the simplest choice.
The free tools covered in this guide handle the vast majority of real scraping projects. The paid services are worth considering only when volume, blocks, or JavaScript rendering become problems you can’t solve cheaply.
1. PHP cURL + DOMDocument (Free – Built-in)
cURL handles HTTP requests. DOMDocument parses the HTML response. Together they cover everything you need to scrape static websites – no installation, no dependencies, no composer packages. They’re built into PHP and available on every hosting environment.
This is the right starting point for anyone new to PHP scraping, and the right tool for most production scraping projects that target static HTML sites.
Basic Usage
<?php
// Fetch the page
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => "https://books.toscrape.com/",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_CONNECTTIMEOUT => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_ENCODING => '',
CURLOPT_HTTPHEADER => [
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
],
]);
$html = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode !== 200 || !$html) {
exit("Request failed: HTTP $httpCode" . PHP_EOL);
}
// Parse with DOMDocument
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$books = $xpath->query('//article[contains(@class,"product_pod")]');
foreach ($books as $book) {
$titleNode = $xpath->query('.//h3/a', $book)->item(0);
$priceNode = $xpath->query('.//*[contains(@class,"price_color")]', $book)->item(0);
$title = $titleNode ? $titleNode->getAttribute('title') : 'N/A';
$price = $priceNode ? trim($priceNode->textContent) : 'N/A';
echo "$title - $price" . PHP_EOL;
}
?>
Output:
A Light in the Attic - £51.77
Tipping the Velvet - £53.74
Soumission - £50.10
Sharp Objects - £47.82
...
Pros
- Zero installation – works on every PHP environment out of the box
- Full control over every request option – headers, timeouts, cookies, proxies, redirects
- No dependencies to maintain or update
- Fast – no framework overhead
- Handles POST requests, file uploads, cookie sessions, and authentication
Cons
- More verbose than library alternatives – setting up each request takes more code
- No JavaScript execution – cannot scrape content loaded after page render
- XPath syntax has a learning curve compared to CSS selectors
- No built-in retry logic or rate limiting – you write that yourself
When to Use It
Use cURL with DOMDocument when you’re scraping static HTML sites, when you need full control over the request, or when you’re working on a server without Composer. It’s the most flexible option and handles everything from simple one-page scrapers to complex multi-page jobs with session handling and proxy rotation.
For a complete working implementation with error handling, retry logic, pagination, and MySQL storage, the PHP cURL web scraping complete guide covers every detail.
Cost
Free. Built into PHP.
2. Guzzle (Free – Composer)
Guzzle is a PHP HTTP client library that makes sending requests cleaner and more readable than raw cURL. It handles the same tasks – fetching pages, sending headers, managing cookies, following redirects – but with a more expressive API. It’s the standard HTTP client in Laravel and widely used in Symfony projects.
If you’re already using Composer and want cleaner request code without the verbose cURL setup, Guzzle is the natural choice.
Installation
composer require guzzlehttp/guzzle
Basic Usage
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;
$client = new Client([
'timeout' => 30,
'connect_timeout' => 10,
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.5',
],
]);
try {
$response = $client->get("https://books.toscrape.com/");
$html = (string) $response->getBody();
$status = $response->getStatusCode();
echo "Status: $status" . PHP_EOL;
echo "Response size: " . strlen($html) . " bytes." . PHP_EOL;
} catch (RequestException $e) {
echo "Request failed: " . $e->getMessage() . PHP_EOL;
}
?>
Output:
Status: 200
Response size: 51274 bytes.
Using Guzzle With DOMDocument for Parsing
Guzzle only handles HTTP requests – it doesn’t parse HTML. Combine it with DOMDocument for extraction:
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;
$client = new Client([
'timeout' => 30,
'connect_timeout' => 10,
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
],
]);
try {
$response = $client->get("https://books.toscrape.com/");
$html = (string) $response->getBody();
} catch (RequestException $e) {
exit("Failed: " . $e->getMessage() . PHP_EOL);
}
// Parse with DOMDocument - same as with cURL
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$books = $xpath->query('//article[contains(@class,"product_pod")]');
echo "Books found: " . $books->length . PHP_EOL . PHP_EOL;
foreach ($books as $book) {
$titleNode = $xpath->query('.//h3/a', $book)->item(0);
$priceNode = $xpath->query('.//*[contains(@class,"price_color")]', $book)->item(0);
$title = $titleNode ? $titleNode->getAttribute('title') : 'N/A';
$price = $priceNode ? trim($priceNode->textContent) : 'N/A';
echo "$title - $price" . PHP_EOL;
}
?>
Output:
Books found: 20
A Light in the Attic - £51.77
Tipping the Velvet - £53.74
Soumission - £50.10
...
Guzzle Concurrency – Fetching Multiple Pages Simultaneously
Guzzle’s biggest advantage over raw cURL for scraping is concurrent requests – sending multiple requests at the same time instead of one after another. On a 50-page scrape this can reduce total time significantly:
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
$client = new Client([
'timeout' => 30,
'connect_timeout' => 10,
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
],
]);
// Build list of URLs to fetch
$urls = [];
for ($i = 1; $i <= 10; $i++) {
$urls[] = "https://books.toscrape.com/catalogue/page-$i.html";
}
// Create request generator
$requests = function($urls) {
foreach ($urls as $url) {
yield new Request('GET', $url);
}
};
$results = [];
$failed = [];
$startTime = microtime(true);
// Send up to 3 concurrent requests at a time
$pool = new Pool($client, $requests($urls), [
'concurrency' => 3,
'fulfilled' => function($response, $index) use (&$results, $urls) {
$results[$urls[$index]] = (string) $response->getBody();
echo "Fetched: " . $urls[$index] . PHP_EOL;
},
'rejected' => function($reason, $index) use (&$failed, $urls) {
$failed[] = $urls[$index];
echo "Failed: " . $urls[$index] . " - " . $reason->getMessage() . PHP_EOL;
},
]);
$promise = $pool->promise();
$promise->wait();
$duration = round(microtime(true) - $startTime, 2);
echo PHP_EOL . "Fetched " . count($results) . " pages in {$duration}s." . PHP_EOL;
echo "Failed: " . count($failed) . PHP_EOL;
?>
Output:
Fetched: https://books.toscrape.com/catalogue/page-1.html
Fetched: https://books.toscrape.com/catalogue/page-2.html
Fetched: https://books.toscrape.com/catalogue/page-3.html
...
Fetched 10 pages in 3.84s.
The same 10 pages fetched sequentially with cURL and 1 second delay each would take 10+ seconds. Concurrency at 3 simultaneous requests brings that down to under 4 seconds – useful when scraping hundreds of pages.
Keep concurrency low – 3 to 5 simultaneous requests is enough for most sites. Higher values increase the chance of triggering rate limits.
Pros
- Cleaner, more readable request code than raw cURL
- Built-in concurrent requests via Pool – significant speed improvement on multi-page scrapes
- Standard in Laravel – no extra setup if already using the framework
- Clean exception handling with specific error types
- Middleware support for adding retry logic, logging, and authentication globally
Cons
- Requires Composer – not available on all shared hosting
- Doesn’t parse HTML – still need DOMDocument or another parser
- No JavaScript execution
- Adds a dependency to manage and keep updated
When to Use It
Use Guzzle when you’re already in a Composer-based project – Laravel, Symfony, or any modern PHP app. The concurrent request pool is particularly useful when you need to scrape many pages quickly. For standalone scripts or simple one-off scrapers, raw cURL is less setup.
Cost
Free. Open source under the MIT license.
3. Goutte (Free – Composer)
Goutte is a PHP screen scraping library that combines Guzzle for HTTP requests with Symfony DomCrawler for HTML parsing. Instead of writing separate cURL and DOMDocument code, Goutte handles both in one object. You fetch a page and immediately use CSS selectors to extract data – no separate parsing setup required.
It’s a good middle ground for developers who find raw cURL too verbose but don’t need the full power of a paid service.
Installation
composer require fabpot/goutte
Basic Usage
<?php
require 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'https://books.toscrape.com/');
echo "Status: " . $client->getInternalResponse()->getStatusCode() . PHP_EOL;
echo "Title: " . $crawler->filter('title')->text() . PHP_EOL;
?>
Output:
Status: 200
Title: All products | Books to Scrape - Sandbox
Extracting Data With CSS Selectors
Goutte uses CSS selectors instead of XPath – more familiar to developers who work with frontend code:
<?php
require 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'https://books.toscrape.com/');
// Select all book articles
$books = $crawler->filter('article.product_pod');
echo "Books found: " . $books->count() . PHP_EOL . PHP_EOL;
// Loop through each book and extract data
$books->each(function($book) {
// CSS selector within the book context
$title = $book->filter('h3 a')->attr('title');
$price = $book->filter('.price_color')->text();
$rating = $book->filter('.star-rating')->attr('class');
// Clean up rating - "star-rating Three" -> "Three"
$rating = str_replace('star-rating ', '', $rating);
echo "$title - $price - $rating stars" . PHP_EOL;
});
?>
Output:
Books found: 20
A Light in the Attic - £51.77 - One stars
Tipping the Velvet - £53.74 - One stars
Soumission - £50.10 - One stars
Sharp Objects - £47.82 - Four stars
...
Following Links and Scraping Multiple Pages
Goutte can follow links directly without building URLs manually – useful for navigating pagination:
<?php
require 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$url = 'https://books.toscrape.com/';
$page = 1;
$allBooks = [];
while ($url) {
echo "Scraping page $page..." . PHP_EOL;
$crawler = $client->request('GET', $url);
// Extract books on current page
$crawler->filter('article.product_pod')->each(function($book) use (&$allBooks) {
$allBooks[] = [
'title' => $book->filter('h3 a')->attr('title'),
'price' => $book->filter('.price_color')->text(),
];
});
echo "Page $page - " . count($allBooks) . " total books." . PHP_EOL;
// Find next page link
$nextLink = $crawler->filter('li.next a');
if ($nextLink->count() > 0) {
// Goutte resolves relative URLs automatically
$url = $nextLink->link()->getUri();
$page++;
sleep(1);
} else {
echo "Last page reached." . PHP_EOL;
$url = null;
}
}
echo PHP_EOL . "Total books scraped: " . count($allBooks) . PHP_EOL;
?>
Output:
Scraping page 1...
Page 1 - 20 total books.
Scraping page 2...
Page 2 - 40 total books.
...
Last page reached.
Total books scraped: 1000
Adding Custom Headers to Goutte
By default Goutte sends a Symfony BrowserKit user agent which most sites recognize as a bot. Override it:
<?php
require 'vendor/autoload.php';
use Goutte\Client;
use GuzzleHttp\Client as GuzzleClient;
// Pass a configured Guzzle client to Goutte
$guzzleClient = new GuzzleClient([
'timeout' => 30,
'connect_timeout' => 10,
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.5',
],
]);
$client = new Client();
$client->setClient($guzzleClient);
$crawler = $client->request('GET', 'https://books.toscrape.com/');
echo "Fetched with custom headers. Title: " . $crawler->filter('title')->text() . PHP_EOL;
?>
Output:
Fetched with custom headers. Title: All products | Books to Scrape - Sandbox
Submitting Forms With Goutte
Goutte can fill and submit HTML forms – useful for scraping sites that require search queries or login:
<?php
require 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'https://example.com/search');
// Find the search form and fill it
$form = $crawler->selectButton('Search')->form();
$form['q'] = 'php scraping';
// Submit the form
$resultCrawler = $client->submit($form);
// Extract results
$results = $resultCrawler->filter('.search-result');
echo "Results found: " . $results->count() . PHP_EOL;
$results->each(function($result) {
echo $result->filter('h3')->text() . PHP_EOL;
});
?>
Pros
- CSS selectors are more intuitive than XPath for most developers
- Combines fetching and parsing in one object – less boilerplate than cURL + DOMDocument
- Handles relative URL resolution automatically when following links
- Built-in form submission support
- Well documented and stable
Cons
- No JavaScript execution
- Slower than raw cURL for high-volume scraping
- Requires Composer
- Default user agent gets blocked – always override with a real browser string
- Less control over low-level request options compared to raw cURL
When to Use It
Use Goutte when you prefer CSS selectors over XPath, when you need to submit forms as part of the scraping flow, or when you want less setup code than raw cURL. It’s a solid choice for beginner to intermediate scraping projects that don’t need JavaScript rendering or high concurrency.
Cost
Free. Open source under the MIT license.
4. Symfony DomCrawler (Free – Composer)
Symfony DomCrawler is the HTML and XML parsing component that powers Goutte. You can use it standalone – without the full Goutte package – when you already have HTML from a cURL or Guzzle request and just need to extract data from it. It supports both CSS selectors and XPath queries, making it more flexible than either alone.
If you’re building a Laravel or Symfony application and want the cleanest possible HTML parsing without pulling in the full Goutte package, DomCrawler is the right choice.
Installation
composer require symfony/dom-crawler symfony/css-selector
The css-selector component is required separately if you want to use CSS selectors. Without it only XPath queries work.
Basic Usage With cURL
<?php
require 'vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
// Fetch HTML with cURL
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => "https://books.toscrape.com/",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTPHEADER => [
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
],
]);
$html = curl_exec($ch);
curl_close($ch);
// Pass HTML to DomCrawler
$crawler = new Crawler($html);
// Use CSS selectors to extract data
$title = $crawler->filter('title')->text();
$books = $crawler->filter('article.product_pod');
echo "Page title: $title" . PHP_EOL;
echo "Books found: " . $books->count() . PHP_EOL;
?>
Output:
Page title: All products | Books to Scrape - Sandbox
Books found: 20
CSS Selectors and XPath Together
DomCrawler lets you mix CSS selectors and XPath in the same script – use whichever is cleaner for each extraction:
<?php
require 'vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$html = file_get_contents('https://books.toscrape.com/');
$crawler = new Crawler($html);
// CSS selector approach
$crawler->filter('article.product_pod')->each(function(Crawler $book) {
// CSS selector for title
$title = $book->filter('h3 a')->attr('title');
// CSS selector for price
$price = $book->filter('.price_color')->text();
// XPath for availability
$availability = $book->filterXPath(
'.//*[contains(@class,"availability")]'
)->text();
echo "$title - $price - " . trim($availability) . PHP_EOL;
});
?>
Output:
A Light in the Attic - £51.77 - In stock
Tipping the Velvet - £53.74 - In stock
Soumission - £50.10 - In stock
Sharp Objects - £47.82 - In stock
...
Extracting Attributes and Text
<?php
require 'vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$html = file_get_contents('https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html');
$crawler = new Crawler($html);
// Get text content
$title = $crawler->filter('h1')->text();
$price = $crawler->filter('.price_color')->text();
$description = $crawler->filter('#product_description ~ p')->text();
// Get attribute value
$coverImage = $crawler->filter('.thumbnail')->attr('src');
// Get multiple values as array
$tableData = $crawler->filter('table.table tr')->each(function(Crawler $row) {
return [
'label' => $row->filter('th')->text(),
'value' => $row->filter('td')->text(),
];
});
echo "Title: $title" . PHP_EOL;
echo "Price: $price" . PHP_EOL;
echo "Cover: $coverImage" . PHP_EOL;
echo PHP_EOL . "Product details:" . PHP_EOL;
foreach ($tableData as $row) {
echo " {$row['label']}: {$row['value']}" . PHP_EOL;
}
?>
Output:
Title: A Light in the Attic
Price: £51.77
Cover: ../../../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg
Product details:
UPC: a897fe39b1053632
Product Type: Books
Price (excl. tax): £51.77
Price (incl. tax): £51.77
Tax: £0.00
Availability: In stock (22 available)
Number of reviews: 0
Reducing a Crawler to a Subsection
When scraping complex pages, narrow the crawler to a specific section before extracting – avoids false matches from other parts of the page:
<?php
require 'vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$html = file_get_contents('https://books.toscrape.com/');
$crawler = new Crawler($html);
// Narrow to the product section only
$productSection = $crawler->filter('section');
// Now all queries run within that section
$books = $productSection->filter('article.product_pod');
echo "Books in product section: " . $books->count() . PHP_EOL;
// Get first book only
$firstBook = $books->first();
echo "First book: " . $firstBook->filter('h3 a')->attr('title') . PHP_EOL;
// Get last book only
$lastBook = $books->last();
echo "Last book: " . $lastBook->filter('h3 a')->attr('title') . PHP_EOL;
// Get specific book by index
$thirdBook = $books->eq(2); // 0-indexed
echo "Third book: " . $thirdBook->filter('h3 a')->attr('title') . PHP_EOL;
?>
Output:
Books in product section: 20
First book: A Light in the Attic
Last book: Libertarianism for Beginners
Third book: Soumission
Handling Missing Elements Safely
<?php
require 'vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$html = file_get_contents('https://books.toscrape.com/');
$crawler = new Crawler($html);
$crawler->filter('article.product_pod')->each(function(Crawler $book) {
// count() check prevents exceptions on missing elements
$title = $book->filter('h3 a')->count() > 0
? $book->filter('h3 a')->attr('title')
: 'N/A';
$price = $book->filter('.price_color')->count() > 0
? $book->filter('.price_color')->text()
: 'N/A';
echo "$title - $price" . PHP_EOL;
});
?>
Output:
A Light in the Attic - £51.77
Tipping the Velvet - £53.74
Soumission - £50.10
...
Pros
- Supports both CSS selectors and XPath in the same script
- Cleaner API than DOMDocument for most extraction tasks
- Well maintained as part of the Symfony ecosystem
- Works well alongside Guzzle in Laravel and Symfony projects
- Strong documentation and large community
Cons
- Parsing only – requires a separate HTTP client for fetching pages
- Requires Composer and two packages for CSS selector support
- No JavaScript execution
- Slightly more verbose than Goutte for combined fetch-and-parse workflows
When to Use It
Use DomCrawler when you’re already using Guzzle for HTTP requests and want a cleaner parsing layer on top. It’s the best free HTML parser for PHP projects that use Composer – more flexible than DOMDocument alone and lighter than pulling in the full Goutte package. Ideal for Laravel and Symfony applications where it fits naturally into the existing dependency stack.
Cost
Free. Open source under the MIT license.
5. Symfony Panther (Free – Composer)
Symfony Panther is a browser testing and web scraping library for PHP that controls a real browser – Chrome or Firefox – programmatically. Unlike every other tool in this list, Panther actually executes JavaScript. It’s the only free PHP-native option for scraping JavaScript-rendered websites without calling an external Node.js script.
Panther uses WebDriver under the hood – the same protocol used by Selenium. If you’ve used browser automation testing tools before, the API will feel familiar.
Installation
# Install the package
composer require symfony/panther
# Install ChromeDriver (matches your Chrome version)
# On Ubuntu/Debian
apt-get install chromium-chromedriver
# Or download directly from
# https://chromedriver.chromium.org/downloads
Basic Usage
<?php
require 'vendor/autoload.php';
use Symfony\Component\Panther\Client;
// Launch headless Chrome
$client = Client::createChromeClient();
$crawler = $client->request('GET', 'https://books.toscrape.com/');
echo "Title: " . $crawler->filter('title')->text() . PHP_EOL;
echo "Books found: " . $crawler->filter('article.product_pod')->count() . PHP_EOL;
// Always close the browser when done
$client->quit();
?>
Output:
Title: All products | Books to Scrape - Sandbox
Books found: 20
Scraping JavaScript-Rendered Content
This is where Panther is genuinely different from every other free PHP tool. The browser executes JavaScript and Panther waits for the content to appear before extracting:
<?php
require 'vendor/autoload.php';
use Symfony\Component\Panther\Client;
$client = Client::createChromeClient(null, [
'--headless',
'--no-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu',
'--window-size=1280,800',
]);
// Navigate to a JavaScript-rendered page
$crawler = $client->request('GET', 'https://example-js-site.com/products');
// Wait until the product cards appear - up to 10 seconds
$client->waitFor('.product-card', 10);
// Now extract - JavaScript has finished loading
$products = $crawler->filter('.product-card');
echo "Products loaded: " . $products->count() . PHP_EOL;
$products->each(function($product) {
$name = $product->filter('.product-name')->text();
$price = $product->filter('.product-price')->text();
echo "$name - $price" . PHP_EOL;
});
$client->quit();
?>
Output:
Products loaded: 24
Laptop Stand Pro - $49.99
Mechanical Keyboard - $89.99
USB-C Hub - $34.99
...
Waiting Strategies
Panther gives you several ways to wait for content to load – picking the right one prevents both timeouts and capturing incomplete pages:
<?php
require 'vendor/autoload.php';
use Symfony\Component\Panther\Client;
$client = Client::createChromeClient();
$crawler = $client->request('GET', 'https://example-js-site.com/products');
// Wait for a CSS selector to appear
$client->waitFor('.product-card');
// Wait for a CSS selector to contain specific text
$client->waitForElementToContain('.status', 'loaded');
// Wait for a CSS selector to disappear - useful for loading spinners
$client->waitForStaleness('.loading-spinner');
// Wait for page title to match
$client->waitForTitle('Products - Example Site');
// Custom wait - poll until a condition returns true
$client->waitUntil(function() use ($client) {
$items = $client->getCrawler()->filter('.product-card');
return $items->count() > 0;
});
echo "Page fully loaded." . PHP_EOL;
$client->quit();
?>
Handling Infinite Scroll Pagination
Sites that load more content as you scroll down can’t be handled with URL-based pagination. Panther can simulate scrolling to trigger loading:
<?php
require 'vendor/autoload.php';
use Symfony\Component\Panther\Client;
$client = Client::createChromeClient();
$crawler = $client->request('GET', 'https://example-infinite-scroll.com/products');
$client->waitFor('.product-card');
$allProducts = [];
$scrollCount = 0;
$maxScrolls = 10; // safety cap
while ($scrollCount < $maxScrolls) {
// Extract currently visible products
$products = $client->getCrawler()->filter('.product-card');
foreach ($products as $product) {
$crawler = new \Symfony\Component\DomCrawler\Crawler($product);
$name = $crawler->filter('.name')->count() > 0
? $crawler->filter('.name')->text()
: 'N/A';
if (!in_array($name, $allProducts)) {
$allProducts[] = $name;
}
}
echo "Scroll $scrollCount - " . count($allProducts) . " unique products." . PHP_EOL;
// Scroll to bottom of page
$client->executeScript('window.scrollTo(0, document.body.scrollHeight)');
// Wait for new content to load
sleep(2);
$scrollCount++;
// Stop if no new products appeared
$newCount = $client->getCrawler()->filter('.product-card')->count();
if ($newCount === count($allProducts)) {
echo "No new products - reached the end." . PHP_EOL;
break;
}
}
echo "Total unique products: " . count($allProducts) . PHP_EOL;
$client->quit();
?>
Output:
Scroll 0 - 20 unique products.
Scroll 1 - 40 unique products.
Scroll 2 - 60 unique products.
Scroll 3 - 60 unique products.
No new products - reached the end.
Total unique products: 60
Taking Screenshots for Debugging
When a scraper isn’t finding the expected elements, a screenshot shows exactly what the browser actually loaded:
<?php
require 'vendor/autoload.php';
use Symfony\Component\Panther\Client;
$client = Client::createChromeClient();
$crawler = $client->request('GET', 'https://books.toscrape.com/');
// Take a screenshot to debug what the browser sees
$client->takeScreenshot(__DIR__ . '/debug_screenshot.png');
echo "Screenshot saved to debug_screenshot.png" . PHP_EOL;
$books = $crawler->filter('article.product_pod');
if ($books->count() === 0) {
echo "No books found - check the screenshot." . PHP_EOL;
} else {
echo "Books found: " . $books->count() . PHP_EOL;
}
$client->quit();
?>
Output:
Screenshot saved to debug_screenshot.png
Books found: 20
Pros
- Full JavaScript execution – the only free PHP-native tool that handles JavaScript-rendered content
- CSS selectors via Symfony DomCrawler – clean familiar API
- Built-in waiting strategies for dynamic content
- Can simulate scrolling, clicking, and form submission
- Screenshot support for debugging
- No Node.js required – runs entirely in PHP
Cons
- Slow – launching a real browser takes 2-5 seconds per instance, much slower than cURL
- High memory usage – Chrome uses 200-400MB per browser instance
- Requires ChromeDriver installed and matching your Chrome version
- Not available on most shared hosting environments
- Overkill for static sites where cURL works fine
When to Use It
Use Panther when you need JavaScript execution and want to stay in PHP without setting up a Node.js environment. It’s the right choice for scraping single-page applications, infinite scroll pages, or any site that loads content after the initial page render. On a VPS or dedicated server where Chrome can run, Panther is the cleanest PHP solution for dynamic content.
For the alternative approach using Node.js Puppeteer called from PHP, the dynamic content web scraping guide covers the full integration.
Cost
Free. Open source under the MIT license.
6. ScraperAPI (Paid)
ScraperAPI is a web scraping service that handles the hard parts of scraping at scale – rotating proxies, browser fingerprinting, CAPTCHA solving, and JavaScript rendering. Instead of managing your own proxy pool and fighting bot detection, you send requests through ScraperAPI’s endpoint and it handles everything automatically.
The integration is simple – replace your target URL with a ScraperAPI URL containing your API key and the original URL as a parameter. Your existing cURL code works with minimal changes.
How It Works
<?php
// Without ScraperAPI - direct request that may get blocked
$url = "https://example.com/products";
// With ScraperAPI - routes through their proxy network
$apiKey = 'your_api_key_here';
$targetUrl = urlencode("https://example.com/products");
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}";
// Your cURL code stays exactly the same - only the URL changes
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_TIMEOUT => 60, // ScraperAPI requests take longer - increase timeout
CURLOPT_HTTPHEADER => [
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
],
]);
$html = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode === 200) {
echo "Fetched via ScraperAPI. Length: " . strlen($html) . " bytes." . PHP_EOL;
} else {
echo "Request failed: HTTP $httpCode" . PHP_EOL;
}
?>
Output:
Fetched via ScraperAPI. Length: 48291 bytes.
ScraperAPI Parameters
ScraperAPI accepts parameters that control how it handles each request:
<?php
$apiKey = 'your_api_key_here';
$targetUrl = urlencode("https://example.com/products");
// Basic request - rotating proxy only
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}";
// Render JavaScript before returning HTML
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}&render=true";
// Use residential proxies - higher success rate on tough sites
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}&premium=true";
// Set country for the proxy
$url = "https://api.scraperapi.com/?api_key={$apiKey}&url={$targetUrl}&country_code=us";
// Combine parameters
$params = http_build_query([
'api_key' => $apiKey,
'url' => "https://example.com/products",
'render' => 'true',
'country_code' => 'us',
'premium' => 'true',
]);
$url = "https://api.scraperapi.com/?" . $params;
echo "ScraperAPI URL built." . PHP_EOL;
echo "Target: https://example.com/products" . PHP_EOL;
?>
Building a Reusable ScraperAPI Function
<?php
function scrape_via_api($targetUrl, $apiKey, $options = []) {
$params = array_merge([
'api_key' => $apiKey,
'url' => $targetUrl,
], $options);
$apiUrl = "https://api.scraperapi.com/?" . http_build_query($params);
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $apiUrl,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_CONNECTTIMEOUT => 15,
CURLOPT_TIMEOUT => 70, // ScraperAPI can take up to 60s on hard targets
]);
$html = curl_exec($ch);
$errno = curl_errno($ch);
$error = curl_error($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($errno) {
echo "cURL error: $error" . PHP_EOL;
return false;
}
if ($httpCode === 403) {
echo "API key invalid or quota exceeded." . PHP_EOL;
return false;
}
if ($httpCode !== 200) {
echo "ScraperAPI returned HTTP $httpCode for: $targetUrl" . PHP_EOL;
return false;
}
return $html;
}
$apiKey = 'your_api_key_here';
// Basic scrape
$html = scrape_via_api("https://books.toscrape.com/", $apiKey);
if ($html) {
echo "Basic scrape: " . strlen($html) . " bytes." . PHP_EOL;
}
// JavaScript rendered scrape
$html = scrape_via_api("https://example-js-site.com/products", $apiKey, [
'render' => 'true',
]);
if ($html) {
echo "JS rendered scrape: " . strlen($html) . " bytes." . PHP_EOL;
}
?>
Output:
Basic scrape: 51274 bytes.
JS rendered scrape: 187432 bytes.
Tracking API Credits
ScraperAPI bills by credits – each request uses 1 credit on the basic plan, 5 credits with JavaScript rendering, and 10-25 credits with premium residential proxies. Track your usage to avoid hitting your monthly limit mid-scrape:
<?php
function get_scraperapi_usage($apiKey) {
$url = "https://api.scraperapi.com/account?api_key={$apiKey}";
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 15,
]);
$response = curl_exec($ch);
curl_close($ch);
$data = json_decode($response, true);
if (!$data) {
echo "Could not fetch usage data." . PHP_EOL;
return false;
}
echo "ScraperAPI Account Usage:" . PHP_EOL;
echo " Plan: " . ($data['plan'] ?? 'N/A') . PHP_EOL;
echo " Credits used: " . ($data['requestCount'] ?? 'N/A') . PHP_EOL;
echo " Credits remaining: " . ($data['requestLimit'] ?? 'N/A') . PHP_EOL;
echo " Concurrent limit: " . ($data['concurrencyLimit'] ?? 'N/A') . PHP_EOL;
return $data;
}
$usage = get_scraperapi_usage('your_api_key_here');
?>
Output:
ScraperAPI Account Usage:
Plan: Hobby
Credits used: 1250
Credits remaining: 98750
Concurrent limit: 5
Pricing
ScraperAPI pricing as of 2026:
- Free trial – 1,000 API credits, no credit card required
- Hobby – $49/month – 100,000 credits, 5 concurrent threads
- Startup – $149/month – 500,000 credits, 10 concurrent threads
- Business – $299/month – 3,000,000 credits, 25 concurrent threads
Remember that JavaScript rendering costs 5 credits per request and premium residential proxies cost 10-25 credits. A plan that looks large enough can run out faster than expected if you’re using those features heavily.
Pros
- Minimal code change – drop-in replacement for your existing cURL URL
- Handles proxy rotation, CAPTCHA solving, and browser fingerprinting automatically
- JavaScript rendering available without setting up a browser
- Free trial with 1,000 credits – enough to test properly
- Reliable success rates on sites that block regular cURL requests
Cons
- Monthly cost starting at $49 – not justified for small scraping projects
- Slower than direct requests – adds network latency routing through their servers
- JavaScript rendering costs 5x the credits of a basic request
- You’re dependent on their service availability and pricing changes
- Credit limits can be hit faster than expected on large scraping jobs
When to Use It
Use ScraperAPI when your scraper is getting blocked at scale and you’ve already tried proper headers, delays, and cookie handling. It makes economic sense when the alternative is spending developer time managing your own proxy infrastructure. For small projects under 10,000 pages per month, free tools with proper configuration handle most blocking issues without paying $49/month.
Cost
Free trial with 1,000 credits. Paid plans from $49/month. See scraperapi.com/pricing for current rates.
7. Bright Data (Paid)
Bright Data is the largest commercial web scraping infrastructure provider. It offers residential proxies, datacenter proxies, ISP proxies, and a full scraping browser – a real Chromium instance with built-in unblocking. Unlike ScraperAPI which is a simple API wrapper, Bright Data is a full platform with multiple products targeting different scraping needs.
It’s built for enterprise scale – thousands of concurrent requests, millions of pages per day, and access to data from sites that block every other approach.
Bright Data Products
Bright Data has several distinct products – understanding which one fits your use case prevents paying for features you don’t need:
- Residential Proxies – real home IP addresses from real devices. Hardest to detect, highest success rate on tough targets. Priced per GB of data transferred.
- Datacenter Proxies – server IPs, faster and cheaper than residential. Good for sites without aggressive bot detection.
- ISP Proxies – static residential IPs assigned by ISPs. More stable than rotating residential proxies.
- Scraping Browser – a real Chromium browser with built-in unblocking. Handles JavaScript, CAPTCHAs, and fingerprinting automatically.
- SERP API – specifically for scraping search engine results pages.
- Datasets – pre-scraped data you can purchase instead of scraping yourself.
Using Bright Data Residential Proxies With cURL
<?php
function scrape_via_bright_data($url, $proxyConfig) {
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_CONNECTTIMEOUT => 15,
CURLOPT_TIMEOUT => 60,
CURLOPT_ENCODING => '',
// Bright Data proxy configuration
CURLOPT_PROXY => $proxyConfig['host'] . ':' . $proxyConfig['port'],
CURLOPT_PROXYTYPE => CURLPROXY_HTTP,
CURLOPT_PROXYUSERPWD => $proxyConfig['username'] . ':' . $proxyConfig['password'],
CURLOPT_HTTPHEADER => [
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: gzip, deflate, br',
'Connection: keep-alive',
],
]);
$html = curl_exec($ch);
$errno = curl_errno($ch);
$error = curl_error($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($errno) {
echo "Proxy error: $error" . PHP_EOL;
return false;
}
if ($httpCode !== 200) {
echo "HTTP $httpCode via Bright Data proxy." . PHP_EOL;
return false;
}
return $html;
}
// Bright Data proxy credentials from your dashboard
$proxyConfig = [
'host' => 'brd.superproxy.io',
'port' => '22225',
'username' => 'your-username-country-us', // append country targeting
'password' => 'your-proxy-password',
];
$html = scrape_via_bright_data("https://example.com/products", $proxyConfig);
if ($html) {
echo "Fetched via Bright Data: " . strlen($html) . " bytes." . PHP_EOL;
}
?>
Output:
Fetched via Bright Data: 48291 bytes.
Using the Scraping Browser
The Scraping Browser is Bright Data’s most powerful product – a real Chromium instance hosted on their infrastructure that handles JavaScript, CAPTCHAs, and browser fingerprinting automatically. Connect to it via WebSocket from PHP:
<?php
// Bright Data Scraping Browser uses WebSocket + CDP (Chrome DevTools Protocol)
// The simplest integration from PHP is via their REST endpoint
function scrape_via_bright_data_browser($url, $credentials) {
$apiEndpoint = "https://api.brightdata.com/request";
$payload = json_encode([
'zone' => $credentials['zone'],
'url' => $url,
'format' => 'raw',
'render' => true, // execute JavaScript
]);
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $apiEndpoint,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $payload,
CURLOPT_TIMEOUT => 90,
CURLOPT_HTTPHEADER => [
'Content-Type: application/json',
'Authorization: Bearer ' . $credentials['api_token'],
],
]);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode !== 200) {
echo "Bright Data Browser error: HTTP $httpCode" . PHP_EOL;
return false;
}
return $response;
}
$credentials = [
'zone' => 'your_zone_name',
'api_token' => 'your_api_token',
];
$html = scrape_via_bright_data_browser(
"https://example-js-site.com/products",
$credentials
);
if ($html) {
echo "JS-rendered page fetched: " . strlen($html) . " bytes." . PHP_EOL;
// Parse with DOMDocument as normal
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$products = $xpath->query('//div[contains(@class,"product-card")]');
echo "Products found: " . $products->length . PHP_EOL;
}
?>
Output:
JS-rendered page fetched: 187432 bytes.
Products found: 24
Targeting Specific Countries
Bright Data lets you route requests through specific countries – useful when scraping sites that show different prices or content based on location:
<?php
// Route through specific countries by modifying the username
$proxyConfigs = [
'us' => [
'host' => 'brd.superproxy.io',
'port' => '22225',
'username' => 'your-username-country-us',
'password' => 'your-password',
],
'uk' => [
'host' => 'brd.superproxy.io',
'port' => '22225',
'username' => 'your-username-country-gb',
'password' => 'your-password',
],
'de' => [
'host' => 'brd.superproxy.io',
'port' => '22225',
'username' => 'your-username-country-de',
'password' => 'your-password',
],
];
// Compare prices across regions
foreach ($proxyConfigs as $country => $config) {
$html = scrape_via_bright_data(
"https://example.com/product/12345",
$config
);
if ($html) {
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$price = $xpath->query('//*[@class="price"]')->item(0);
$priceText = $price ? trim($price->textContent) : 'N/A';
echo strtoupper($country) . ": $priceText" . PHP_EOL;
}
sleep(2);
}
?>
Output:
US: $49.99
UK: £42.99
DE: €46.99
Pricing
Bright Data pricing as of 2026 – billed by data transferred, not by request count:
- Residential Proxies – from $8.40/GB. Minimum commitment varies by plan.
- Datacenter Proxies – from $0.60/GB. Significantly cheaper than residential.
- ISP Proxies – from $7.14/GB.
- Scraping Browser – from $0.001 per page. Custom pricing for high volume.
- Free trial – available on request with account verification.
Costs add up quickly on large scraping jobs. A project fetching 100,000 pages averaging 50KB each transfers 5GB of data – $42 at residential proxy rates. Factor this into your budget before choosing Bright Data over cheaper alternatives.
Pros
- Largest residential proxy network available – 72 million IPs across 195 countries
- Highest success rates on sites that block everything else
- Country-level targeting for geo-specific scraping
- Scraping Browser handles JavaScript and CAPTCHAs automatically
- Enterprise-grade reliability and SLA
- Dedicated account managers on higher plans
Cons
- Expensive – pricing by GB means costs scale directly with data volume
- Complex platform with multiple products – takes time to understand which one fits your use case
- Overkill for small to medium scraping projects
- Minimum commitments on some plans
- Account verification required before starting
When to Use It
Use Bright Data when you’re scraping at enterprise scale – millions of pages per month – or when you need to collect geo-specific data from multiple countries simultaneously. For most PHP scraping projects, free tools with proper configuration or ScraperAPI at $49/month are more appropriate. Bright Data makes sense when ScraperAPI’s limits or success rates aren’t enough for your volume.
Cost
Residential proxies from $8.40/GB. See brightdata.com/pricing for current rates across all products.
How to Choose the Right PHP Web Scraping Tool
The right tool depends on four factors – target site type, scraping volume, your hosting environment, and whether you’re using a framework. Work through these questions in order:
Step 1: Does the Target Site Use JavaScript?
Site is static HTML (content in cURL response)
→ Use cURL + DOMDocument, Guzzle, or Goutte
Site uses JavaScript rendering (content missing from cURL response)
→ Free option: Symfony Panther (needs Chrome on server)
→ Paid option: ScraperAPI with render=true or Bright Data Scraping Browser
→ DIY option: Puppeteer via Node.js called from PHP
Step 2: What Volume Are You Scraping?
Under 1,000 pages per day
→ Any free tool works. No need for paid services.
1,000 - 50,000 pages per day
→ Free tools with proxy rotation.
→ ScraperAPI if getting blocked consistently.
Over 50,000 pages per day
→ Guzzle with concurrent requests + proxy rotation.
→ ScraperAPI Business plan or Bright Data.
Step 3: Are You Using a Framework?
Laravel project
→ Guzzle (already included), DomCrawler for parsing
Symfony project
→ DomCrawler or Panther (same ecosystem)
Standalone PHP script
→ cURL + DOMDocument (no Composer needed)
Any Composer project
→ Goutte (simplest combined fetch and parse)
Step 4: What Does Your Hosting Support?
Shared hosting
→ cURL + DOMDocument or Goutte only
→ Panther won't work (no Chrome available)
→ Paid API services work (just HTTP requests)
VPS or dedicated server
→ All free tools available
→ Panther and Puppeteer both work
Serverless / cloud functions
→ ScraperAPI or Bright Data
→ No persistent processes for browser tools
Decision Summary
| Situation | Best Tool |
|---|---|
| Learning PHP scraping for the first time | cURL + DOMDocument |
| Laravel or Symfony project | Guzzle + DomCrawler |
| Want CSS selectors without framework setup | Goutte |
| JavaScript sites, free option, VPS available | Symfony Panther |
| Getting blocked at medium volume | ScraperAPI |
| Enterprise scale or geo-specific data | Bright Data |
Frequently Asked Questions
What is the best PHP web scraping tool for beginners?
PHP cURL with DOMDocument. It’s built into PHP, requires no installation, works on every hosting environment, and teaches you exactly what’s happening at the HTTP level. Once you understand how requests and HTML parsing work at this level, moving to Goutte or Guzzle is straightforward. The PHP web scraper beginner guide walks through the complete setup with working code.
Can PHP scrape JavaScript websites without paid tools?
Yes – with Symfony Panther on a VPS or dedicated server where Chrome can run. It’s free, PHP-native, and handles full JavaScript execution including infinite scroll and dynamic content. The limitation is hosting – shared hosting environments don’t support running Chrome processes. If you’re on shared hosting, ScraperAPI with render=true is the practical alternative.
Is Goutte still actively maintained?
Goutte hasn’t had major updates recently. The underlying components – Guzzle and Symfony DomCrawler – are actively maintained and that’s what matters for functionality. For new projects the more future-proof approach is using Guzzle and DomCrawler directly rather than through the Goutte wrapper. The API is nearly identical and you avoid the extra dependency.
Do I need a paid scraping service for most projects?
No. The majority of scraping projects – price monitoring, content aggregation, data collection from public sites – work fine with free tools and proper configuration. Add complete browser headers, random delays, and cookie handling and most blocking issues disappear. The avoiding blocks guide covers all seven techniques with working code. Paid services make sense when you’re hitting volume limits that free tools can’t overcome.
What is the difference between ScraperAPI and Bright Data?
ScraperAPI is simpler and cheaper – a single API endpoint you point at any URL, starting at $49/month for 100,000 requests. Good for medium-scale scraping where you’re getting blocked and want a quick fix. Bright Data is a full infrastructure platform – multiple proxy types, country targeting, enterprise SLAs, and a dedicated scraping browser. It’s significantly more expensive and complex but handles scale and success rates that ScraperAPI can’t match on the toughest targets.
Can I use multiple tools in the same project?
Yes – and it’s often the right approach. Use cURL with DOMDocument for static pages, ScraperAPI for pages that block you, and Panther for JavaScript-rendered content. Wrap each in a function with a consistent interface so your scraping logic doesn’t need to change based on which tool is fetching. The decision of which tool to use for a given URL can be handled by checking the response – if cURL returns a blocked page or empty content, fall back to a more capable tool automatically.
Summary: Best Web Scraping Tools PHP in 2026
The best PHP web scraping tools in 2026 cover every scraping scenario from simple static sites to enterprise scale JavaScript rendering:
- cURL + DOMDocument – start here. Free, built-in, full control. Handles most static scraping projects without any additional tools.
- Guzzle – cleaner HTTP requests with concurrent fetching. The standard choice in Laravel and Symfony projects.
- Goutte – combined fetch and parse with CSS selectors. Less setup than cURL + DOMDocument for simple scraping jobs.
- Symfony DomCrawler – the best free HTML parser for Composer projects. Use alongside Guzzle for clean separation of HTTP and parsing logic.
- Symfony Panther – full JavaScript execution in PHP. The only free option for dynamic content on a server where Chrome is available.
- ScraperAPI – drop-in proxy and unblocking service. Worth the $49/month when free tools consistently fail at scale.
- Bright Data – enterprise scraping infrastructure. The right choice for millions of pages per month or geo-specific data collection.
Start with cURL and DOMDocument. Add tools only when you hit a problem they can’t solve. Most scraping projects never need to go beyond the free options.
For implementing your first scraper with cURL and DOMDocument, the PHP cURL web scraping complete guide covers every detail with working code. For handling the errors that come up in real projects, the web scraping errors guide covers the seven most common failures and their fixes.
Learn more about cURL from the official PHP documentation.
