Blog Detail

25

Jul
Laravel Adapter for Roach - A Web Scraping Toolkit for PHP cover image

arrow_back Laravel Adapter for Roach - A Web Scraping Toolkit for PHP

Roach is a complete web scraping toolkit for PHP. It is heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP.

The Laravel adapter for Roach

The Laravel adapter mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. To learn about how to actually start using Roach itself, check out the rest of the documentation

Installing the Laravel Adapter

Note: The Laravel integration for Roach requires Laravel 9.x.

Instead of installing the core Roach package, we are going to install Roach’s Laravel adapter.

composer require roach-php/laravel

We can also publish the configuration file that comes with the package.

php artisan vendor:publish --provider='RoachPHP\Laravel\RoachServiceProvider'

This will publish a roach.php configuration file to our app’s config folder.

Available Commands

The Laravel adapter of Roach registers a few Artisan commands to make out development experience as pleasant as possible.

Generating new Spiders

To quickly stub out a new spider, we can use the
roach:spider Artisan command.

php artisan roach:spider LaravelDocsSpider

This command will create a new spider with the provided name inside our app’s Spider directory.

Check out the section about getting started with spiders to learn about how to proceed from this point.

Starting the REPL

Roach ships with an interactive shell (often called Read-Evaluate-Print-Loop, or Repl for short) which makes prototyping our spiders a breeze. We can use the provided roach:shell command to launch a new Repl session.

php artisan roach:shell https://roach-php.dev/docs/introduction

Running a Spider

To start a run for a given spider directly from the CLI, we can use the roach:run command and pass it the name of our spider.

php artisan roach:run MySpider

It’s also possible to pass in a relative namespace.

php artisan roach:run Secret\\MySpider

Spiders

Define how websites get crawled and how data is scraped from its pages.

Scraping Responses

Learn how to extract data from web documents and send them through the processing pipeline.

Items

An abstraction about data extracted from web documents.

Item Pipeline

Process extracted data by sending it through a series of sequential steps.

Interactive Shell

Quickly prototype spiders with Roach’s interactive shell.

Above is the basic introduction of this package, If you want to learn more then you can visit its complete documentation here

Note

The team of Codebrisk Laravel developers is always ready to execute even your boldest ideas. Our expert team can design and develop any type of custom CRM solution, SAAS app, or e-commerce app to meet our customer’s needs and transform our customer’s experiences. Get in touch with our team to discuss your bespoke ideas and learn more about the next steps to launching cooperation.

Published at : 25-07-2022

Author : Rizwan Aslam
AUTHOR
Rizwan Aslam

I am a highly results-driven professional with 12+ years of collective experience in the grounds of web application development especially in laravel, native android application development in java, and desktop application development in the dot net framework. Now managing a team of expert developers at Codebrisk.

Launch your project

Launch project