I need a web scraper written for the URL:
[login to view URL]
The blue box that says "ZACH'S LOAD BOARD" will need to be clicked.
All information needed is available on the main page. The number of rows will vary. If there is a row without an origin city, skip that row.
Data will be listed in blocks with different contact information for each block, contact information will be located above the block of data.
The output should be a pipe (|) delimited file with the following column mappings:
origin_city --> data located in the "Pickup Location" column, before the space or comma and the two letter state abbreviation
origin_state --> data is the two letter state abbreviation located after the origin_city and space or comma in the "Pickup Location" column
ship_date --> data located in the "Pick Up Date" column, change to the YYYY-MM-DD format, if the text "Daily" is in the column, use the current days date with the YYYY-MM-DD format
destination_city --> data located in the "Delivery Location" column, before the space or comma and the two letter state abbreviation
destination_state --> data is the two letter state abbreviation located after the destination_city and space or comma in the "Delivery Location" column
receive_date --> data located in the "Delivery Date" column, change to the YYYY-MM-DD format, use current year if one is not listed; if column contains "ASAP", leave blank
trailer_type --> data located in the "Equipment" column
load_size --> add the text "Full"
weight --> data located in the "Per" column, leave off commas and/or #; if column contains "cwt" or "flat", leave blank
length --> leave blank
width --> leave blank
height --> leave blank
trip_miles --> leave blank
pay_rate --> leave blank
contact_phone --> data located in the contact cell above each block of loads, after the email address (ie: (719-628-2929)
contact_name --> data located in the contact cell above each block of loads, the contact name is the first text in the contact cell (ie: Zach)
tarp_required --> leave blank
comment --> data located in the "Type" and "Product" column; data located in the "Rate" column add text "Rate=" before data
load_number --> leave blank
commodity --> leave blank
The first line of the output should contain all of the column headers.
Any field that contains no data should be left blank.
Please do not use words like "null" or "blank" in blank columns.
Below is a sample output of the first 5 columns using sample data:
The deliverable will be a Perl .pl file that must run on
Ubuntu Linux and must use Modern::Perl. The Perl .pl file
should be called '[login to view URL]' and the output file should be
called '[login to view URL]'
It will be scheduled in cron to run unattended every 15 minutes.
Please specific what language/OS/modules you plan to use.
Also, please include the word "raccoon" in your bid so I know that
you read this description.