25
JulRoach is a complete web scraping toolkit for PHP. It is heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP.
The Laravel adapter mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. To learn about how to actually start using Roach itself, check out the rest of the documentation
Note: The Laravel integration for Roach requires Laravel 9.x.
Instead of installing the core Roach package, we are going to install Roach’s Laravel adapter.
composer require roach-php/laravel
We can also publish the configuration file that comes with the package.
php artisan vendor:publish --provider='RoachPHP\Laravel\RoachServiceProvider'
This will publish a roach.php configuration file to our app’s config folder.
The Laravel adapter of Roach registers a few Artisan commands to make out development experience as pleasant as possible.
Generating new Spiders
To quickly stub out a new spider, we can use the
roach:spider Artisan command.
php artisan roach:spider LaravelDocsSpider
This command will create a new spider with the provided name inside our app’s Spider directory.
Check out the section about getting started with spiders to learn about how to proceed from this point.
Starting the REPL
Roach ships with an interactive shell (often called Read-Evaluate-Print-Loop, or Repl for short) which makes prototyping our spiders a breeze. We can use the provided roach:shell
command to launch a new Repl session.
php artisan roach:shell https://roach-php.dev/docs/introduction
Running a Spider
To start a run for a given spider directly from the CLI, we can use the roach:run
command and pass it the name of our spider.
php artisan roach:run MySpider
It’s also possible to pass in a relative namespace.
php artisan roach:run Secret\\MySpider
Spiders
Define how websites get crawled and how data is scraped from its pages.
Scraping Responses
Learn how to extract data from web documents and send them through the processing pipeline.
Items
An abstraction about data extracted from web documents.
Item Pipeline
Process extracted data by sending it through a series of sequential steps.
Interactive Shell
Quickly prototype spiders with Roach’s interactive shell.
Above is the basic introduction of this package, If you want to learn more then you can visit its complete documentation here
Note
The team of Codebrisk Laravel developers is always ready to execute even your boldest ideas. Our expert team can design and develop any type of custom CRM solution, SAAS app, or e-commerce app to meet our customer’s needs and transform our customer’s experiences. Get in touch with our team to discuss your bespoke ideas and learn more about the next steps to launching cooperation.
Published at : 25-07-2022
I am a highly results-driven professional with 12+ years of collective experience in the grounds of web application development especially in laravel, native android application development in java, and desktop application development in the dot net framework. Now managing a team of expert developers at Codebrisk.
Launch project