01
DecPHPScraper is a universal web util for PHP. The main goal is to get stuff done instead of getting distracted with selectors, preparing & converting data structures, etc. Instead, you can just go to a website and get the relevant information for your project.
All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways:
// Prep
$web = new \Spekulatius\PHPScraper\PHPScraper;
$web->go('https://google.com');
// Returns "Google"
echo $web->title;
// Also returns "Google"
echo $web->title();
Many common use cases are covered already. You can find prepared extractors for various HTML tags, including interesting attributes. You can filter and combine these to your needs. In some cases there is an option to get a simple or detailed version, here in the case of linksWithDetails
:
$web = new \Spekulatius\PHPScraper\PHPScraper;
// Contains:
// <a href="https://placekitten.com/456/500" rel="ugc">
// <img src="https://placekitten.com/456/400">
// <img src="https://placekitten.com/456/300">
// </a>
$web->go('https://test-pages.phpscraper.de/links/image-urls.html');
// Get the first link on the page and print the result
print_r($web->linksWithDetails[0]);
// [
// 'url' => 'https://placekitten.com/456/500',
// 'protocol' => 'https',
// 'text' => '',
// 'title' => null,
// 'target' => null,
// 'rel' => 'ugc',
// 'image' => [
// 'https://placekitten.com/456/400',
// 'https://placekitten.com/456/300'
// ],
// 'isNofollow' => false,
// 'isUGC' => true,
// 'isSponsored' => false,
// 'isMe' => false,
// 'isNoopener' => false,
// 'isNoreferrer' => false,
// ]
If there aren’t any matching elements (here links) on the page, an empty array will be returned. If a method normally returns a string it might return null
. Details such as follow_redirects
, etc. are optional configuration parameters (see below).
Most of the DOM should be covered using these methods:
A full list of methods with example code can be found on phpscraper.de.
Besides processing the content on the page itself, you can download files using fetchAsset
:
// Absolute URL
$csvString = $web->fetchAsset('https://test-pages.phpscraper.de/test.csv');
// Relative URL after navigation
$csvString = $web
->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html')
->fetchAsset('/test.csv');
You will only need to write the content into a file or cloud storage.
There is more!
There are plenty of examples on the PHPScraper website and in the tests.
For more details, please visit Github.
Published at : 01-12-2022
I am a highly results-driven professional with 12+ years of collective experience in the grounds of web application development especially in laravel, native android application development in java, and desktop application development in the dot net framework. Now managing a team of expert developers at Codebrisk.
Launch project