I worked for Evo Pricing exclusively and remotely from July 2015 to present.

This work involved building a smart Scrapy spider that was trainable via a Django web GUI. It allows a semi-technical user (XPATH/JSONPATH knowledge) to configure the spider for different retail target sites, including the dynamic creation of database schema to store data on a per-target basis, custom loading functions and error emailing triggers. It handles targets that use AJAX, pagination and even deals with JSON responses.

Through a restful JSON API the user can schedule periodic runs of the spider for their target sites of choice, distribute scrapes over multiple nodes of a cluster and get statistics about how the jobs went (this was acheived via a custom celery monitor and django rest framework). Images are stored centrally on Azure via a custom images pipeline that builds upon Scrapy's s3 support.

I also built hotel spiders that recursively parsed large regions of a map again and other simple spiders such as one to gather historic weather data.

I was involved with DevOps too, helping to build an Ansible playbook for automatic provisioning of the linux servers. Fabric was used for deployment and other tasks that needed automating across multiple nodes.

The work mostly involved Django, Scrapy, MSSQL and Celery, but also used jQuery, bootstrap, rundeck, mongo, azure storages.

This work involved building a smart Scrapy spider that was trainable via a Django web GUI. It allows a semi-technical user (XPATH/JSONPATH knowledge) to configure the spider for different retail target sites, including the dynamic creation of database schema to store data on a per-target basis, custom loading functions and error emailing triggers. It handles targets that use AJAX, pagination and even deals with JSON responses.


Launch project Back to Portfolio