I am looking for someone to write a program for me, I need the programme to scrape a horseracing webpage for a race which includes all the runners in the race, an example horse is Kauto Star, if you click this link [url removed, login to view] and then click Kauto Star, the popup shows the previous race details.
I need the data scraped/ extracted for all the horses entered into this race and outputted and sorted, the information I am looking to be outputted is as below and I will use the last race he ran on 19 Nov 11 as my example, the full result for each race is being the date of the race:
Information under race details:
* min - this is the shortest race the horse has run and won at, this is derived from the number 24; this number is in furlongs (see below r, w and p for result details)
* max - is the longest race the horse has run and won at (see min above and r, w and p below for result details)
* note 1 - if the horse has won a race at course che then input cheltenham, otherwise leave it blank. This race was Hay, so blank in this case
* note 2 - if the horse has won a race on heavy ground(Hy) input heavy, otherwise leave it blank. This race was Gs, so blank in this case
* Top W - this is the heaviest weight the horse has carried when winning a race, the weight for this race is 11-7 (this is stone and llbs)
* Mark - this is the highest number under the OR when the horse has won a race
* R - this is the number of runs a horse has had on this type of ground/ this course. In this case the ground is Gd and the course is Hay, so this horse has run 12 times on good/soft (total of gs) and run 5 times at Hay(total of Hay)
* W - this is the number of wins the horse has out of his runs on this ground/ this course. In this example the result is 1/6 so the horse won. The win will be indicated by a 1 before a / under the heading, race outcome.
* P - this is the number of places the horse has out of his runs on this gound/ this course. In this example the horse won (1/6) so did not place. The place should be indicated if the horse is finishes second in a race with 5 or more horses eg 2/5, 2/6, if the horse finishes second or third in a race with 8 or more horses, eg 3/8, 3/12, if the horse finishes second, third or fourth in a race with 20 or more horses, eg 4/20 or 4/28 or if the horse finished second, third, fourth, fifth in a race with 30 or more horses.
* Succ - is the total of w and p divided by r and expressed as a percentage.
I would also like if you could include columns labelled PU(Horse pulled up), U(Horse unseated rider) and F(horse fell) between columns r and w. These should contain the total for each type of ground and each racecourse and the values will be found under result and will always be to the left of the number of runners, see the race on 22 Nov 06 where the result is expressed as UR/6.
The project should also include the other information in the attached excel file, this is all garnered from the top line of the horses race history page.
Finally, I would like this programme to update itself and to give me the ability to imput a url for horses i wish to add to this database.
Please do not quote if you don't understand the project or you will not supply a sample of a similar scraping project you have done.
Thanks for taking the time to read this and I look forward to hearing from you. :)