python - Scraping using Beautiful Soup leads to error only in a particular section (NullType object encountered) -
i'm trying list of injuries of particular team (liverpool in case) following website
http://www.physioroom.com/news/english_premier_league/epl_injury_table.php
it works fine teams(swansea), exits following errors (liverpool, everyon)
typeerror: can't convert 'nonetype' object str implicitly here code using.
from bs4 import beautifulsoup import urllib.request   url = "http://www.physioroom.com/news/english_premier_league/epl_injury_table.php" html = urllib.request.urlopen(url).read() soup = beautifulsoup(html, "html.parser") #lp = soup.find(alt="liverpool away shirt").parent.parent.parent.next_sibling.next_sibling lp = soup.find(alt="swansea city away shirt").parent.parent.parent.next_sibling.next_sibling player_info = "" player_list = []  while true:     if(lp.has_attr('id')):             break     else:             tdlist = lp.find_all('td')#     player_info = tdlist[0].string+"\t"+tdlist[1].string+"\t"+tdlist[3].string             #print(tdlist[0].find('a').string.strip() + "\t" + tdlist[1].string.strip() + "\t" + tdlist[3].string.strip())             print(tdlist[0].string + "\t" + tdlist[1].string + "\t" + tdlist[3].string)             lp=lp.findnext('tr') please let me know how can fix this.
from bs4 import beautifulsoup import requests   url = "http://www.physioroom.com/news/english_premier_league/epl_injury_table.php" r = requests.get(url) soup = beautifulsoup(r.text, "lxml") table = soup.find('table', id='epl-table') tr in table('tr', id=none):     print(tr.get_text('\t', strip=true)) out:
player  condition   latest news expected return available? d meyler    knock   no return date  slight doubt s maloney   ear infection   no return date  slight doubt m henriksen shoulder separation april 1, 2017   major doubt mcgregor  fitness no return date  major doubt w keane acl knee injury no return date m odubajo   patella fracture    may 1, 2017 g luer  knee injury february 1, 2017 if want text part of document or tag, can use get_text() method. returns text in document or beneath tag, single unicode string:
you can specify string used join bits of text together
you can tell beautiful soup strip whitespace beginning , end of each bit of text
Comments
Post a Comment