Dec

PHPScraper - A Universal PHP Web Tool for Scrapping

laravel laravel-packages

PHPScraper is a universal web util for PHP. The main goal is to get stuff done instead of getting distracted with selectors, preparing & converting data structures, etc. Instead, you can just go to a website and get the relevant information for your project.

Basics: Flexible Calling as an Attribute or Method

All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways:

// Prep
$web = new \Spekulatius\PHPScraper\PHPScraper;
$web->go('https://google.com');

// Returns "Google"
echo $web->title;

// Also returns "Google"
echo $web->title();

Batteries included: Meta data, Links, Images, Headings, Content, Keywords, …

Many common use cases are covered already. You can find prepared extractors for various HTML tags, including interesting attributes. You can filter and combine these to your needs. In some cases there is an option to get a simple or detailed version, here in the case of linksWithDetails:

$web = new \Spekulatius\PHPScraper\PHPScraper;

// Contains:
// <a href="https://placekitten.com/456/500" rel="ugc">
//   <img src="https://placekitten.com/456/400">
//   <img src="https://placekitten.com/456/300">
// </a>
$web->go('https://test-pages.phpscraper.de/links/image-urls.html');

// Get the first link on the page and print the result
print_r($web->linksWithDetails[0]);
// [
//     'url' => 'https://placekitten.com/456/500',
//     'protocol' => 'https',
//     'text' => '',
//     'title' => null,
//     'target' => null,
//     'rel' => 'ugc',
//     'image' => [
//         'https://placekitten.com/456/400',
//         'https://placekitten.com/456/300'
//     ],
//     'isNofollow' => false,
//     'isUGC' => true,
//     'isSponsored' => false,
//     'isMe' => false,
//     'isNoopener' => false,
//     'isNoreferrer' => false,
// ]

If there aren’t any matching elements (here links) on the page, an empty array will be returned. If a method normally returns a string it might return null. Details such as follow_redirects, etc. are optional configuration parameters (see below).

Most of the DOM should be covered using these methods:

Several meta-tags and other -information
Social-Media information like Twitter Card and Facebook Open Graph
Content: Headings, Outline, Texts and Lists
Images
Links
Keywords

A full list of methods with example code can be found on phpscraper.de.

Download Files

Besides processing the content on the page itself, you can download files using fetchAsset:

// Absolute URL
$csvString = $web->fetchAsset('https://test-pages.phpscraper.de/test.csv');

// Relative URL after navigation
$csvString = $web
  ->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html')
  ->fetchAsset('/test.csv');

You will only need to write the content into a file or cloud storage.

There is more!

There are plenty of examples on the PHPScraper website and in the tests.

For more details, please visit Github.

Published at : 01-12-2022

AUTHOR

Rizwan Aslam

I am a highly results-driven professional with 12+ years of collective experience in the grounds of web application development especially in laravel, native android application development in java, and desktop application development in the dot net framework. Now managing a team of expert developers at Codebrisk.

Blog Detail

arrow_back PHPScraper - A Universal PHP Web Tool for Scrapping

Basics: Flexible Calling as an Attribute or Method

Batteries included: Meta data, Links, Images, Headings, Content, Keywords, …

Download Files

Rizwan Aslam

PHPScraper - A Universal PHP Web Tool for Scrapping