Skip to content

cirpo/sainsburys-scraper

Repository files navigation

Sainsbury’s Software Engineering Test

Build Status

This task is intended to test your ability to consume a webpage, process some data and present it.

Using best practice coding methods, build a console application that scrapes the Sainsbury’s grocery site - Ripe Fruits page and returns a JSON array of all the products on the page.

You need to follow each link and get the size (in kb) of the linked HTML (no assets) and the description to display in the JSON.

Each element in the JSON results array should contain title, unit_price, size and description keys corresponding to items in the table. Additionally, there should be a total field which is a sum of all unit prices on the page.

The link to use is: http://hiring-tests.s3-website-eu-west-1.amazonaws.com/2015_Developer_Scrape/5_products.html

Example JSON:

{
   "results":[
      {
         "title":"Sainsbury's Avocado, Ripe & Ready x2",
         "size":"90.6kb",
         "unit_price":1.80,
         "description":"Great to eat now - refrigerate at home 1 of 5 a day 1 avocado counts as 1 of your 5..."
      },
      {
         "title":"Sainsbury's Avocado, Ripe & Ready x4",
         "size":"87kb",
         "unit_price":2.00,
         "description":"Great to eat now - refrigerate at home 1 of 5 a day 1 "
      }
   ],
   "total":3.80
}

Installation

Clone the repository and install dependencies:

    ./composer.phar install

Run

Just launch the default command below in your shell:

 
 bin/scraper

Opionally you can pass a specific grocery list url

 
 bin/scraper products-scraper http://another-sainsburys-grocery-list-url

You can get a formatted json output using the --pretty option

 
 bin/scraper products-scraper --pretty

Tests

PhpUnit

 bin/phpunit

Behat

 bin/behat

Design

I designed the app as per requirement: the code should be as concise as possibile and get straight to the point without forgetting decoupled code. I could have used a dependency injection container, even a small one like Pimple, but I preferred to keep it simple and initialize the objects with their relations directly in the app

Structure

  • Command ProductsScraperCommand is responsible to start the app and to handle the possible command options
  • Service ProductsInfoScraper is the service responsible to call the scraper and collect the product info
  • Scraper ProductDetailScraper and ProductListScraper
  • Model Product, Products and Url are the main object in the domain

About

Sainsbury's product page scraper - technical interview

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published