Python download table and save to Excel -
i trying download table html not in usual td/ tr format , includes images , save result excel.
the html code looks this:
<div class="dynamicbottom"> <div class="dynamicleft"> <div class="content_block details_block scroll_tabs" data-tab="tabs_details"> <div class="header_with_improve wrap"> <a href="/updatelisting.html" onclick="ta.setevtcookie('updatelisting', 'entry-detail-moreinfo', null, 0, '/updatelistingredesign')"><div class="improve_listing_btn ui_button primary small">improve entry</div></a> <h3 class="tabs_header">details</h3> </div> <div class="details_tab"> <div class="table_section"> <div class="row"> <div class="ratingsummary wrap"> <div class="histogramcommon bubblehistogram wrap"> <div class="coltitle"> rating </div> <ul class="barchart"> <li> <div class="ratingrow wrap"> <div class="label part "> <span class="text">location</span> </div> <div class="wrap row part "> <span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points"> </span> </div> </div> <div class="ratingrow wrap"> <div class="label part "> <span class="text">service</span> </div> <div class="wrap row part "> <span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points"> </span> </div> </div> </li>
i table: [location 45 out of fifty points, service 45 out of fifty points]. , save result excel file. column-header in excel file should state "location" , cell below "45" or "45 out of fifty points". next column-header should state "service" , cell in row below should state "45" or "45 out of fifty points". manage save name , rating location cell service rating remains empty.
my python code looks this:
workbook = xlsxwriter.workbook('file.xlsx') worksheet = workbook.add_worksheet() row=1 col=0 url in urls: r=requests.get(url) soup=beautifulsoup(r.content, "lxml") worksheet.write('a1', 'name') worksheet.write('b1', 'location') worksheet.write('c1', 'service') row += 1 name= soup.find_all("div", {"class": "locationname"}) item in name: worksheet.write_string(row, col, item.text) div in soup.find_all('div', class_="ratingrow wrap"): text = div.text.strip() alt = div.find('img').get('alt') print(text, alt) worksheet.write_string(row, col+1, alt)
the print function gives out
location 45 out of fifty points service 45 out of fifty points
the console prints results location , service in excel sheet rating location appears while cell service rating remains empty. tried enumerate function, there 1 character of rating location in each cell in 1 row in excel, no results service rating appear, either.
0 4 1 5 2 3 o 4 u 5 t 6 7 o 8 f 9 10 f 11 12 f 13 t 14 y 15 16 p 17 o 18 19 n 19 t 20 s
is there way can tell python save second line in print text "45 out of fifty points" cell below "service" in excel? searched thoroughly not find solution yet. thank help!
i can't understand why have 2 separate loops, , can't find class locationname
appears anywhere within html. because expect no results that, expect nothing written in first loop - consistent report. seems should write text
(row, col) in 2nd loop.
following discussion, first loop employs name other data in html occurs once per page.
my suggestion avoid overwrite of (row, col+1) cell:
workbook = xlsxwriter.workbook('file.xlsx') worksheet = workbook.add_worksheet() row=1 url in urls: col=0 r=requests.get(url) soup=beautifulsoup(r.content, "lxml") worksheet.write('a1', 'name') worksheet.write('b1', 'location') worksheet.write('c1', 'service') row += 1 name= soup.find_all("div", {"class": "locationname"}) item in name: worksheet.write_string(row, col, item.text) div in soup.find_all('div', class_="ratingrow wrap"): col+=1 text = div.text.strip() alt = div.find('img').get('alt') print(text, alt) worksheet.write_string(row, col, alt)
Comments
Post a Comment