python - Beautiful Soup: Data Values Not Matching Headings -

March 15, 2014

i'm new python , i'm working on learning project i'm attempting scrape data on college football players. source code website looks :

</thead>    <tbody>   >    <tr ><th scope="row" class="right " data-stat="year_id" ><a > href="/cfb/years/1957.html">1957</a></th><td class="left " > data-stat="school_name" csk="san jose state.1957" ><a > href="/cfb/schools/san-jose-state/1957.html">san jose > state</a></td><td class="left " data-stat="conf_abbr" ><a > href="/cfb/conferences/independent/1957.html">ind</a></td><td > class="center " data-stat="class" ></td><td class="center " > data-stat="pos" >rb</td><td class="right " data-stat="g" >10</td><td > class="right " data-stat="rec" >1</td><td class="right " > data-stat="rec_yds" >6</td><td class="right " > data-stat="rec_yds_per_rec" >6.0</td><td class="right " > data-stat="rec_td" >0</td><td class="right " data-stat="rush_att" > >1</td><td class="right " data-stat="rush_yds" >3</td><td class="right " data-stat="rush_yds_per_att" >3.0</td><td class="right " > data-stat="rush_td" >0</td><td class="right " data-stat="scrim_att" > >2</td><td class="right " data-stat="scrim_yds" >9</td><td class="right " data-stat="scrim_yds_per_att" >4.5</td><td class="right > " data-stat="scrim_td" >0</td></tr>

here how far i've gotten code :

headers = [item["data-stat"] item in soup.find_all(attrs={"data-stat" : true})] cellstrings = [cell.find(text = true) cell in soup.findall('td')] print headers, cellstrings

this prints out following:

[u'', u'header_receiving', u'header_rushing', u'header_scrimmage', u'year_id', u'school_name', u'conf_abbr', u'class', u'pos', u'g', u'rec', u'rec_yds', u'rec_yds_per_rec', u'rec_td', u'rush_att', u'rush_yds', u'rush_yds_per_att', u'rush_td', u'scrim_att', u'scrim_yds', u'scrim_yds_per_att', u'scrim_td', u'year_id', u'school_name', u'conf_abbr', u'class', u'pos', u'g', u'rec', u'rec_yds', u'rec_yds_per_rec', u'rec_td', u'rush_att', u'rush_yds', u'rush_yds_per_att', u'rush_td', u'scrim_att', u'scrim_yds', u'scrim_yds_per_att', u'scrim_td', u'year_id', u'school_name', u'conf_abbr', u'class', u'pos', u'g', u'rec', u'rec_yds', u'rec_yds_per_rec', u'rec_td', u'rush_att', u'rush_yds', u'rush_yds_per_att', u'rush_td', u'scrim_att', u'scrim_yds', u'scrim_yds_per_att', u'scrim_td'] [u'san jose state', u'ind', none, u'rb', u'10', u'1', u'6', u'6.0', u'0', u'1', u'3', u'3.0', u'0', u'2', u'9', u'4.5', u'0', u'san jose state', none, none, none, none, u'1', u'6', u'6.0', u'0', u'1', u'3', u'3.0', u'0', u'2', u'9', u'4.5', u'0']

the problem of headings appear earlier in source code, 2 lists, data , headings, not match.

my question how can pull 'data-stat' along it's associated value instead of pulling them separately? ideally, pull dictionary.

if i'm getting correctly, want dictionary consisting of {'data-stat-value': 'value of td'}; can this:

data_stats = {e['data-stat']: e.get_text().strip()               e in html.find_all(attrs={'data-stat': true})}

this way surely pull text associated data-stat tag.

Search This Blog

MOno

python - Beautiful Soup: Data Values Not Matching Headings -

Comments

Post a Comment

Popular posts from this blog

android - ConstraintLayout: Realign baseline constraint in case if dependent view visibility was set to GONE -

Retrieving ETA (estimated time of arrival) with Google Distance Matrix API and public transit as transport mode -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -