Create a php script to be called from the command line that will parse a small piece of HTML
This needs explanation about which HTML/XML parsing routines you would use - we want fast code.
- HTML must be passed to the script for processing when the script is called from command line (example html pasted below)
- The script must return the required, parsed values to the STDOUT/screen in CSV format (header not needed, just comma seperation)
- Command line examples will be needed to show how the final script is used
Required variables (all extracted from the source to be passed to your script) are:
1 - a raw url (specifically the one where the link target is datamercsexternalframe)
2 - 55 chars of plain text, which in the example below would start as "My current Shiny project"... etc"
3 - the first raw img src url link in the source (allowing for cases where more than one)
4 - the file extension of the file specified in item 3 above, eg png, jpg, etc.
Example source HTML:
<img src="images/feedsin/2016/04/23/[url removed, login to view]"/>My current Shiny project contains at least five tables and I
constantly forget how they are called. So I whipped up a little
bookmarklet that uses jQuery to show the id of each div and input.
Some of those can be ignored as they are internal names set<p class="trackback"><a class="shortlink " rel="" title="Display element ids for debugging Shiny apps" href="[url removed, login to view]" target="datamercsexternalframe">Read more </a></p>
Native PHP HTML/XML routines will keep this efficient as this is not a big job. Markup processing has already been done in core PHP.
Clarifying point 2 re plain text. This means that all tags would need to be removed so it is entirely human readable i.e. tagless / markupless
there will be no need to make http calls - the html will be provided at the command line
Hi, I would use PHP's preg_match function to parse the HTML. I'm available to complete this today. Thanks for considering my services. Stan. Jobs feedback: [login to view URL]