I need a web scraper written for the following url:
[login to view URL]
All pages will need to be retrieved not just page one.
The data on this site changes and the number of pages will vary, however, we need to scrape data from all available pages.
The output should be a pipe (|) delimited file with the following column mappings:
origin_city --> data is located in the "Origin City" column
origin_state --> data is located in the "Origin State" column
ship_date --> data is located in the "Ship Date" column, changed to the YYYY-MM-DD format
destination_city --> data is located in the "Destination City" column
destination_state --> data is located in the "Destination State' column
receive_date --> data is located in the "Delivery Date" column, changed to the YYYY-MM-DD format
trailer_type --> data is located in the "Eq Type" column, if blank add the text "VR"
load_size --> add text "Full" to the column
weight --> leave blank
length --> leave blank
width --> leave blank
height --> leave blank
trip_miles --> leave blank
pay_rate --> leave blank
contact_phone --> leave blank
contact_name --> leave blank
tarp_required --> leave blank
comment --> leave blank
load_number --> data is located in the "Order#" column
commodity --> leave blank
The first line of the output should contain all of the column headers.
Any field that contain no data should be left blank.
Please do not use words like "null" or "blank" in blank columns.
Below is a sample output of the first 5 columns using sample data:
The deliverable will be a Perl .pl file that must run on
Ubuntu Linux and must use Modern::Perl. The Perl .pl file
should be called '[login to view URL]' and the output file should be
called '[login to view URL]'
It will be scheduled in cron to run unattended every 15 minutes.
We suggest WWW::Mechanize but you are free to use other Perl libraries.
Please specific what language/OS/tools you will be using in your bid.
Also, please include the word "raccoon" in your bid so I know that
you read this description.