html - Unable to extract all spans with matching class or id -
this stupid. trying write simple scraper grab listing website: https://online.ncat.nsw.gov.au/hearing/hearinglist.aspx?locationcode=2000
well, run each locationcode example page.
i want extract both <span>
headings , table
data each date.
the general form of data is:
<span id="lblsubheader1242017" class="clsgriditem">1:15 pm wednesday, 12 apr 2017 @ room 15.6 level 15, 66 goulburn st </span> <hr /> <table id="dg1242017"> <tr class="clsgriditem"> <td width="15%">rt 17/11111</td> <td width="30%">name of party</td> <td width="55%">name of party</td> </tr> ... </table>
it's rough can grab table data pretty code of form:
page = requests.get('https://online.ncat.nsw.gov.au/hearing/hearinglist.aspx?locationcode=2000') tree = html.fromstring(page.content) events = tree.xpath('//table//td/text()')
but when try grab the spans outside table can have location , date information like:
days = tree.xpath('//span[starts-with(@id,"lbl")]/text()')
or
days = tree.xpath('//span[@class,"clsgriditem"]/text()')
i following 2 results:
days: ['there no matters listed in sydney today', 'there no matters listed in sydney today']
these refer 2 spans 2/3 of way down page:
<span id="lbl1442017" style="font-weight:bold;">sydney: friday, 14 apr 2017</span><br /><br /><span id="lblerror1442017" class="clsgriditem">there no matters listed in sydney today</span><br /><br /><br /><span id="lbl1742017" style="font-weight:bold;">sydney: monday, 17 apr 2017</span><br /><br /><span id="lblerror1742017" class="clsgriditem">there no matters listed in sydney today</span>
could explain me doing wrong?
why other spans being skipped?
you can use below code every text content of <span class="clsgriditem">
:
days = tree.xpath('//span[@class="clsgriditem"]//text()')
but have no idea why //span[@class="clsgriditem"]/text()
not working should applicable well...
Comments
Post a Comment