html - Unable to extract all spans with matching class or id -

January 15, 2013

this stupid. trying write simple scraper grab listing website: https://online.ncat.nsw.gov.au/hearing/hearinglist.aspx?locationcode=2000

well, run each locationcode example page.

i want extract both <span> headings , table data each date.

the general form of data is:

<span id="lblsubheader1242017" class="clsgriditem">1:15 pm wednesday, 12 apr 2017 @ room 15.6 level 15, 66 goulburn st </span> <hr /> <table id="dg1242017">     <tr class="clsgriditem">         <td width="15%">rt 17/11111</td>         <td width="30%">name of party</td>         <td width="55%">name of party</td>     </tr>     ...  </table>

it's rough can grab table data pretty code of form:

page = requests.get('https://online.ncat.nsw.gov.au/hearing/hearinglist.aspx?locationcode=2000') tree = html.fromstring(page.content) events = tree.xpath('//table//td/text()')

but when try grab the spans outside table can have location , date information like:

days = tree.xpath('//span[starts-with(@id,"lbl")]/text()')

days = tree.xpath('//span[@class,"clsgriditem"]/text()')

i following 2 results:

days:  ['there no matters listed in sydney today', 'there no matters listed in sydney today']

these refer 2 spans 2/3 of way down page:

<span id="lbl1442017" style="font-weight:bold;">sydney: friday, 14 apr 2017</span><br /><br /><span id="lblerror1442017" class="clsgriditem">there no matters listed in sydney today</span><br /><br /><br /><span id="lbl1742017" style="font-weight:bold;">sydney: monday, 17 apr 2017</span><br /><br /><span id="lblerror1742017" class="clsgriditem">there no matters listed in sydney today</span>

could explain me doing wrong?

why other spans being skipped?

you can use below code every text content of <span class="clsgriditem">:

days = tree.xpath('//span[@class="clsgriditem"]//text()')

but have no idea why //span[@class="clsgriditem"]/text() not working should applicable well...

Search This Blog

MOno

html - Unable to extract all spans with matching class or id -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to provide dependency injections in Eclipse RCP 3.x? -