Subtracting Columns in DataFrames based on results of another column - python

Say I have a Pandas DataFrame like the following table:
CarFuel Volume Mazda 311.3 Mazda 310.4 F-15014.3 F-1509.7
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg" style="undefined;table-layout: fixed; width: 122px">
<colgroup>
<col style="width: 56px">
<col style="width: 66px">
</colgroup>
<tr>
<th class="tg-yw4l">Car</th>
<th class="tg-yw4l">Fuel Volume</th>
</tr>
<tr>
<td class="tg-yw4l">F-150</td>
<td class="tg-yw4l">25.01</td>
</tr>
<tr>
<td class="tg-yw4l">F-150</td>
<td class="tg-yw4l">22.47</td>
</tr>
<tr>
<td class="tg-yw4l">F-150</td>
<td class="tg-yw4l">19.56</td>
</tr>
<tr>
<td class="tg-yw4l">F-250</td>
<td class="tg-yw4l">9.87</td>
</tr>
<tr>
<td class="tg-yw4l">F-250</td>
<td class="tg-yw4l">6.32</td>
</tr>
<tr>
<td class="tg-yw4l">F-250</td>
<td class="tg-yw4l">1.32</td>
</tr>
</table>
I want to create another column based on the difference of fuel volume, but only if the fuel volume being subtracted is from the same model of car. So the resulting DataFrame would look like the following:
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 0px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 0px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg" style="undefined;table-layout: fixed; width: 144px">
<colgroup>
<col style="width: 52px">
<col style="width: 62px">
<col style="width: 30px">
</colgroup>
<tr>
<th class="tg-yw4l">Car</th>
<th class="tg-yw4l">Fuel Volume</th>
<th class="tg-yw4l">Difference in Fuel</th>
</tr>
<tr>
<td class="tg-yw4l">F-150</td>
<td class="tg-yw4l">25.01</td>
<td class="tg-yw4l">NaN</td>
</tr>
<tr>
<td class="tg-yw4l">F-150</td>
<td class="tg-yw4l">22.47</td>
<td class="tg-yw4l">2.54</td>
</tr>
<tr>
<td class="tg-yw4l">F-150</td>
<td class="tg-yw4l">19.56</td>
<td class="tg-yw4l">2.91</td>
</tr>
<tr>
<td class="tg-yw4l">F-250</td>
<td class="tg-yw4l">9.87</td>
<td class="tg-yw4l">NaN</td>
</tr>
<tr>
<td class="tg-yw4l">F-250</td>
<td class="tg-yw4l">6.32</td>
<td class="tg-yw4l">3.55</td>
</tr>
<tr>
<td class="tg-yw4l">F-250</td>
<td class="tg-yw4l">1.32</td>
<td class="tg-yw4l">5</td>
</tr>
</table>

I think you need groupby with diff, last add abs:
df['Difference in Fuel'] = df.groupby('Car')['Fuel Volume'].diff().abs()
print (df)
Car Fuel Volume Difference in Fuel
0 F-150 25.01 NaN
1 F-150 22.47 2.54
2 F-150 19.56 2.91
3 F-250 9.87 NaN
4 F-250 6.32 3.55
5 F-250 1.32 5.00

Related

Page Number and Total Pages in Header When Printing HTML to PDF

Background: I have a large HTML file that has 8 different pages. Some of the pages in the HTML can be larger than the 11in container size and the 11in stipulated in the #page CSS due to a lot of data in some of the tables.
What am I trying to do?: I am trying to send in context data (written in Django / Python) to each table of unknown length. Once the data has been entered then I will use weasyprint to create the pdf. At the top of every page the page number and total number of pages should be added dynamically.
The issue: When I print to PDF the header on the pages with a lot of rows (ones that are >11in) add the header but the header shows the same page number for of the split pages. In the example below it is on page 2 that is split into two pages and when you print to pdf the header on page 3 and 4 are incorrect.
What have I tried?:
Basically everything I could think of. At first I thought about just using paged media but I couldn't figure out how to put this complex of a header on the pages using that method.
Is it possible with just HTML and CSS to do what I want? If it isn't then I may have to figure out a way to add the headers using JS and then saving the HTML before sending it into weasyprint (since weasyprint doesn't support JS). Any suggestions would be appreciated.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<style>
*{
margin: 0;
padding: 0;
}
#page{
size: 8.5in 11in;
}
table.container{
page-break-after: always;
}
td{
padding: 0;
margin: 0;
}
table tbody tr{
vertical-align: top;
page-break-after: always;
}
.container{
height: 11in;
width: 8.5in;
border: 1px solid black;
padding: 10px;
margin: 10px auto ;
}
.container thead{
height: 225px;
vertical-align:top;
}
.top{
display: flex;
align-items:center;
}
.header{
page-break-before: always;
}
.bottom{
font-family: sans-serif;
font-size: 10px;
}
.main{
margin: 0 30px;
}
.thead{
display: flex;
background-color: rgb(123, 199, 157);
color: white;
font-size: 14px;
border-bottom: 1px solid black;
border: 1px solid black;
}
.thead .col-one{
flex:3;
}
.thead .col-two{
flex: 1;
}
.thead .col-three{
flex: 2;
text-align: center;
}
.thead p{
padding: 10px;
}
.tbody{
display: flex;
color: rgb(23, 184, 109);
font-size: 12px;
font-family: sans-serif;
}
.main-data{
padding: 10px 15px ;
}
.sub-data{
padding: 0 30px 0 ;
}
.test i{
padding-top: 13.3px ;
}
.result i{
padding-top: 10px;
}
.result p{
padding: 1px;
}
.tbody tr td:nth-child(2), tr td:nth-child(3){
padding-left: 230px;
}
.tbody td{
padding: 2px 0;
}
.tbody{
border: 1px solid black;
}
.entry p{
color: rgb(78, 208, 143);
}
.entry{
font-family: sans-serif;
font-size: 13px;
padding: 50px 0 0 ;
}
.sub{
padding: 5px;
}
body{
counter-reset: page pages my-counter 0;
}
.header{
display: table-header-group;
}
.footer{
display: table-footer-group;
}
#media print{
.tbody tr td:nth-child(2), tr td:nth-child(3){
padding-left: 220px;
}
.thead{
display: table-header-group
color: black;
background-color: rgb(69, 213, 129);
}
.thead tr { page-break-inside: avoid; }
}
#page {
#bottom-right {
content: counter(page) " of " counter(pages);
}
}
.dot::after {
content: " : "counter(page) " of " counter(pages);
counter-increment: page 1;
}
</style>
</head>
<body>
<table class="container">
<thead>
<tr>
<td>
<div class="haeder">
<div class="top">
<div class="address">
<h2>Company title</h2>
<p>contact info</p>
</div>
<div class="customer">
<p>Report ID: </p>
</div>
</div>
<div class="head-line">
<h1>Document Title</h1>
<p>Page No<span class="dot"></span> </p>
</div>
<div class="details">
<div class="lab-info">
<p>Info: <span>Lab Number</span></p>
<p>Info: <span>Sample ID from Sample Received</span></p>
</div>
<div class="date">
<p>Date Received: </p>
<p>Date Reported: </p>
</div>
</div>
</td>
</tr>
</thead>
<tfoot>
<tr>
<td>
<div class="footer" >
<div class="bottom">
<div class="first-col">
<p>Some Info</p>
<p>Some more info</p>
</div>
</div>
</div>
</td>
</tr>
</tfoot>
<tbody>
<tr>
<td>
<div class="main" >
<div class="thead">
<p class="col-one" >Table Name</p>
<p class="col-two">Result</p>
<p class="col-three">Result 2</p>
</div>
<div class="tbody">
<div class="main">
<table>
<tr>
<td>Test</td>
<td></td>
<td></td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td></td>
<td></td>
</tr>
</table>
</div>
<div class="result">
</div>
<div class="test">
</div>
</div>
</div>
</td>
</tr>
</tbody>
</table>
<table class="container">
<thead>
<tr>
<td>
<div class="haeder">
<div class="top">
<div class="address">
<h2>Company title</h2>
<p>contact info</p>
</div>
<div class="customer">
<p>Report ID: </p>
</div>
</div>
<div class="head-line">
<h1>Document Title</h1>
<p>Page No<span class="dot"</p>
</div>
<div class="details">
<div class="lab-info">
<p>Info: <span>Lab Number</span></p>
<p>Info: <span>Sample ID from Sample Received</span></p>
</div>
<div class="date">
<p>Date Received: </p>
<p>Date Reported: </p>
</div>
</div>
</td>
</tr>
</thead>
<tfoot>
<tr>
<td>
<div class="footer" >
<div class="bottom">
<div class="first-col">
<p>Some Info</p>
<p>Some more info</p>
</div>
</div>
</div>
</td>
</tr>
</tfoot>
<tbody>
<tr>
<td>
<div class="main" >
<div class="thead">
<p class="col-one" >Table Name</p>
<p class="col-two">Result</p>
<p class="col-three">Result 2</p>
</div>
<div class="tbody">
<div class="main">
<table>
<tr>
<td>Test</td>
<td></td>
<td></td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td> %</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
</tr>
<tr>
<td class="sup">Test</td>
<td>%</td>
<td></td>
</tr>
<tr>
<td class="sup">Test</td>
<td></td>
<td></td>
</tr>
</table>
</div>
<div class="result">
</div>
<div class="test">
</div>
</div>
</div>
</td>
</tr>
</tbody>
</table>
</body>
</html>

Python web scraping problems - result different from soruce code

The code below is not able to scrape the class_=datarow. i have tried to use read_html as well, but there are tables inside this table class="table_equities", which makes the read_html not working for me. I have no idea how to get the table.
from bs4 import BeautifulSoup
import pandas as pd
import requests
import openpyxl
import scrapy
path = 'C:/Users/pacc_/OneDrive/Desktop/Eric/Investment/Trading record python.xlsx'
page_link = 'https://www.hkex.com.hk/Market-Data/Securities-Prices/Equities?sc_lang=en'
page_response = requests.get(page_link, timeout=2)
page_content = BeautifulSoup(page_response.content, "html.parser")
data = page_content.find(class_='table_equities')
print(data)
My result:
<table class="table_equities">
<tr>
<th class="th code">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th name">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th price">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th turnover selected uppercase">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th mktcap">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th pe">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th div_yield">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th intraday">
<table>
<thead>
<tr>
<th class="text"></th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
</tr>
</table>
Target url structure:
<table class="table_equities">
<tbody>
<tr>
<th class="th code">
<table>
<thead>
<tr>
<th class="text">Stock Code</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th name">
<table>
<thead>
<tr>
<th class="text">Name</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th price">
<table>
<thead>
<tr>
<th class="text">Nominal Price</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th turnover selected uppercase">
<table>
<thead>
<tr>
<th class="text">Turnover (HK$)</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th mktcap">
<table>
<thead>
<tr>
<th class="text">Market Cap (HK$)</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th pe">
<table>
<thead>
<tr>
<th class="text">P/E</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th div_yield">
<table>
<thead>
<tr>
<th class="text">Dividend Yield (%)</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
<th class="th intraday">
<table>
<thead>
<tr>
<th class="text">Intraday Movement</th>
<th class="ico"><i></i></th>
</tr>
</thead>
</table>
</th>
</tr>
<tr class="datarow">
<td class="code"><a>909</a></td>
<td class="name"><a>MING YUAN CLOUD</a></td>
<td class="price"><bdo>HK$30.700</bdo>
<br>
<div><span>0.000</span> (<span>0.00%</span>)</div>
</td>
<td class="turnover">8.35B</td>
<td class="market">57.44B</td>
<td class="pe">-</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=0909.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (909)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>700</a></td>
<td class="name"><a>TENCENT</a></td>
<td class="price"><bdo>HK$503.500</bdo>
<br>
<div class="downval"><span>-1.500</span> (<span>-0.30%</span>)</div>
</td>
<td class="turnover">6.69B</td>
<td class="market">4,824.90B</td>
<td class="pe">46.13x</td>
<td class="dividend">0.24%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=0700.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (700)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>9988</a></td>
<td class="name"><a>BABA-SW</a></td>
<td class="price"><bdo>HK$258.000</bdo>
<br>
<div class="downval"><span>-3.000</span> (<span>-1.15%</span>)</div>
</td>
<td class="turnover">3.97B</td>
<td class="market">5,584.43B</td>
<td class="pe">-</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=9988.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (9988)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>3690</a></td>
<td class="name"><a>MEITUAN-W</a></td>
<td class="price"><bdo>HK$232.000</bdo>
<br>
<div class="downval"><span>-6.600</span> (<span>-2.77%</span>)</div>
</td>
<td class="turnover">3.75B</td>
<td class="market">1,364.39B</td>
<td class="pe">547.17x</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=3690.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (3690)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>3333</a></td>
<td class="name"><a>EVERGRANDE</a></td>
<td class="price"><bdo>HK$13.780</bdo>
<br>
<div class="downval"><span>-1.440</span> (<span>-9.46%</span>)</div>
</td>
<td class="turnover">2.80B</td>
<td class="market">180.00B</td>
<td class="pe">9.59x</td>
<td class="dividend">5.15%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=3333.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (3333)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>1810</a></td>
<td class="name"><a>XIAOMI-W</a></td>
<td class="price"><bdo>HK$19.720</bdo>
<br>
<div class="downval"><span>-0.120</span> (<span>-0.60%</span>)</div>
</td>
<td class="turnover">2.69B</td>
<td class="market">475.78B</td>
<td class="pe">42.69x</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=1810.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (1810)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>2318</a></td>
<td class="name"><a>PING AN</a></td>
<td class="price"><bdo>HK$80.350</bdo>
<br>
<div class="downval"><span>-0.100</span> (<span>-0.12%</span>)</div>
</td>
<td class="turnover">1.49B</td>
<td class="market">598.41B</td>
<td class="pe">8.61x</td>
<td class="dividend">2.89%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=2318.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (2318)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>388</a></td>
<td class="name"><a>HKEX</a></td>
<td class="price"><bdo>HK$355.800</bdo>
<br>
<div class="downval"><span>-1.800</span> (<span>-0.50%</span>)</div>
</td>
<td class="turnover">1.49B</td>
<td class="market">451.09B</td>
<td class="pe">47.50x</td>
<td class="dividend">1.88%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=0388.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (388)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>1918</a></td>
<td class="name"><a>SUNAC</a></td>
<td class="price"><bdo>HK$28.950</bdo>
<br>
<div class="downval"><span>-1.600</span> (<span>-5.24%</span>)</div>
</td>
<td class="turnover">1.36B</td>
<td class="market">134.94B</td>
<td class="pe">4.44x</td>
<td class="dividend">4.64%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=1918.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (1918)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>981</a></td>
<td class="name"><a>SMIC</a></td>
<td class="price"><bdo>HK$18.580</bdo>
<br>
<div class="downval"><span>-0.760</span> (<span>-3.93%</span>)</div>
</td>
<td class="turnover">1.35B</td>
<td class="market">143.03B</td>
<td class="pe">59.55x</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=0981.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (981)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>1299</a></td>
<td class="name"><a>AIA</a></td>
<td class="price"><bdo>HK$77.650</bdo>
<br>
<div class="upval"><span>+1.000</span> (<span>+1.30%</span>)</div>
</td>
<td class="turnover">1.34B</td>
<td class="market">939.05B</td>
<td class="pe">18.03x</td>
<td class="dividend">1.65%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=1299.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (1299)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>2300</a></td>
<td class="name"><a>AMVIG HOLDINGS</a></td>
<td class="price"><bdo>HK$2.140</bdo>
<br>
<div class="upval"><span>+0.700</span> (<span>+48.61%</span>)</div>
</td>
<td class="turnover">1.31B</td>
<td class="market">1.98B</td>
<td class="pe">6.35x</td>
<td class="dividend">5.33%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=2300.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (2300)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>1398</a></td>
<td class="name"><a>ICBC</a></td>
<td class="price"><bdo>HK$3.990</bdo>
<br>
<div class="downval"><span>-0.030</span> (<span>-0.75%</span>)</div>
</td>
<td class="turnover">1.28B</td>
<td class="market">346.30B</td>
<td class="pe">4.32x</td>
<td class="dividend">7.20%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=1398.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (1398)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>5</a></td>
<td class="name"><a>HSBC HOLDINGS</a></td>
<td class="price"><bdo>HK$28.200</bdo>
<br>
<div class="downval"><span>-0.400</span> (<span>-1.40%</span>)</div>
</td>
<td class="turnover">1.20B</td>
<td class="market">583.52B</td>
<td class="pe">12.21x</td>
<td class="dividend">2.78%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=0005.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (5)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>939</a></td>
<td class="name"><a>CCB</a></td>
<td class="price"><bdo>HK$5.020</bdo>
<br>
<div class="downval"><span>-0.040</span> (<span>-0.79%</span>)</div>
</td>
<td class="turnover">1.10B</td>
<td class="market">1,206.89B</td>
<td class="pe">4.36x</td>
<td class="dividend">6.97%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=0939.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (939)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>9618</a></td>
<td class="name"><a>JD-SW</a></td>
<td class="price"><bdo>HK$282.200</bdo>
<br>
<div class="downval"><span>-6.200</span> (<span>-2.15%</span>)</div>
</td>
<td class="turnover">1.08B</td>
<td class="market">883.22B</td>
<td class="pe">-</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=9618.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (9618)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>1658</a></td>
<td class="name"><a>PSBC</a></td>
<td class="price"><bdo>HK$3.160</bdo>
<br>
<div class="upval"><span>+0.060</span> (<span>+1.94%</span>)</div>
</td>
<td class="turnover">1.05B</td>
<td class="market">62.74B</td>
<td class="pe">4.01x</td>
<td class="dividend">7.23%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=1658.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (1658)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>9633</a></td>
<td class="name"><a>NONGFU SPRING</a></td>
<td class="price"><bdo>HK$35.150</bdo>
<br>
<div class="downval"><span>-2.850</span> (<span>-7.50%</span>)</div>
</td>
<td class="turnover">1.01B</td>
<td class="market">174.92B</td>
<td class="pe">-</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=9633.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (9633)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>1211</a></td>
<td class="name"><a>BYD COMPANY</a></td>
<td class="price"><bdo>HK$103.100</bdo>
<br>
<div class="downval"><span>-0.400</span> (<span>-0.39%</span>)</div>
</td>
<td class="turnover">844.11M</td>
<td class="market">94.33B</td>
<td class="pe">188.21x</td>
<td class="dividend">0.06%</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=1211.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (1211)"></td>
</tr>
<tr class="datarow">
<td class="code"><a>708</a></td>
<td class="name"><a>EVERG VEHICLE</a></td>
<td class="price"><bdo>HK$16.820</bdo>
<br>
<div class="downval"><span>-2.460</span> (<span>-12.76%</span>)</div>
</td>
<td class="turnover">789.41M</td>
<td class="market">148.29B</td>
<td class="pe">-</td>
<td class="dividend">-</td>
<td class="intraday"><img width="82" height="33" src="https://www1.hkex.com.hk/hkexwidget/chart/genchart?&sym=0708.HK&int=2&per=1&w=82&h=33&bottom=2&mode=8&tm=1601021280000" alt="Stock Chart (708)"></td>
</tr>
</tbody>
</table>
The table is being loaded dynamically using javascript. If you inspect element and use the networking tab you can see the XHR calls being made to get the data. I would try looking at these and then scrape from the api directly.

How to extract the following lines after pattern match

the web source is like this:
<div class="MT12">
<table class="tblchart" border="0" cellspacing="0" cellpadding="0">
<tr>
<th rowspan="2" width="100" align="left" valign="top">Date</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">Open</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">High</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">Low</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">Close</th>
<th colspan="2" style="text-align:center;" valign="top">- SPREAD -</th>
</tr>
<tr>
<th width="100" style="text-align:right;" valign="top">(High-Low)</th>
<th width="100" style="text-align:right;" valign="top" class="last">(Open-Close)</th>
</tr>
<tr>
<td align="left" valign="top">2019-12-24</td>
<td valign="top" style="text-align:right;">12269.25</td>
<td valign="top" class="b_12vv" style="text-align:right">12283.70</td>
<td valign="top" style="text-align:right;">12202.10</td>
<td valign="top" style="text-align:right;">12214.55</td>
<td valign="top" style="text-align:right;">81.60</td>
<td align="right" valign="top" class="last" style="text-align:right;">54.70</td>
</tr>
<tr>
<td align="left" valign="top">2019-12-23</td>
<td valign="top" style="text-align:right;">12235.45</td>
<td valign="top" class="b_12vv" style="text-align:right">12287.15</td>
<td valign="top" style="text-align:right;">12213.25</td>
<td valign="top" style="text-align:right;">12262.75</td>
<td valign="top" style="text-align:right;">73.90</td>
<td align="right" valign="top" class="last" style="text-align:right;">-27.30</td>
</tr>
<tr>
<td align="left" valign="top">2019-12-20</td>
<td valign="top" style="text-align:right;">12266.45</td>
<td valign="top" class="b_12vv" style="text-align:right">12293.90</td>
<td valign="top" style="text-align:right;">12252.75</td>
<td valign="top" style="text-align:right;">12271.80</td>
<td valign="top" style="text-align:right;">41.15</td>
<td align="right" valign="top" class="last" style="text-align:right;">-5.35</td>
</tr>
</table>
</div>
I want to get the following numbers for every date:
say for example I have to get the numbers 12269.25, 12283.70, 12202.10 and 12214.55 for a particular date (2019-12-24). Then proceed for the next date given.
I am facing difficulty because I need to select next 4 lines(whose xpath is not exatly related much as shown above) following each date in the page. The dates can range from single date to 100-200 dates.
Can anybody please help with webdriver code snippet for the same.
Thanks a lot
Can this meet your needs
from simplified_scrapy.simplified_doc import SimplifiedDoc
html = '''<div class="MT12">
<table class="tblchart" border="0" cellspacing="0" cellpadding="0">
<tr>
<th rowspan="2" width="100" align="left" valign="top">Date</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">Open</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">High</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">Low</th>
<th rowspan="2" width="100" style="text-align:right;" valign="top">Close</th>
<th colspan="2" style="text-align:center;" valign="top">- SPREAD -</th>
</tr>
<tr>
<th width="100" style="text-align:right;" valign="top">(High-Low)</th>
<th width="100" style="text-align:right;" valign="top" class="last">(Open-Close)</th>
</tr>
<tr>
<td align="left" valign="top">2019-12-24</td>
<td valign="top" style="text-align:right;">12269.25</td>
<td valign="top" class="b_12vv" style="text-align:right">12283.70</td>
<td valign="top" style="text-align:right;">12202.10</td>
<td valign="top" style="text-align:right;">12214.55</td>
<td valign="top" style="text-align:right;">81.60</td>
<td align="right" valign="top" class="last" style="text-align:right;">54.70</td>
</tr>
<tr>
<td align="left" valign="top">2019-12-23</td>
<td valign="top" style="text-align:right;">12235.45</td>
<td valign="top" class="b_12vv" style="text-align:right">12287.15</td>
<td valign="top" style="text-align:right;">12213.25</td>
<td valign="top" style="text-align:right;">12262.75</td>
<td valign="top" style="text-align:right;">73.90</td>
<td align="right" valign="top" class="last" style="text-align:right;">-27.30</td>
</tr>
<tr>
<td align="left" valign="top">2019-12-20</td>
<td valign="top" style="text-align:right;">12266.45</td>
<td valign="top" class="b_12vv" style="text-align:right">12293.90</td>
<td valign="top" style="text-align:right;">12252.75</td>
<td valign="top" style="text-align:right;">12271.80</td>
<td valign="top" style="text-align:right;">41.15</td>
<td align="right" valign="top" class="last" style="text-align:right;">-5.35</td>
</tr>
</table>
</div>'''
doc = SimplifiedDoc(html)
table = doc.getElement(tag='table',value='tblchart')
trs = table.trs.notContains('<th') # get tr
for tr in trs:
tds = tr.tds # get all td
data = [td.text for td in tds]
print (data[0],data[1],data[2],data[3],data[4])

Get <td> text using python selenium

<html>
<body>
<table style="border:0">
<tbody>
<tr class="">
<td class="pr10">Mon</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Tue</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="bold">
<td class="pr10">Wed</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Thu</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Fri</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Sat</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Sun</td>
<td class="pl10">11am – 11pm</td>
</tr>
</tbody>
</table>
</html>
</body>
Try 1:
driver.find_elements_by_xpath("//*[#class='pr10']")
Try 2:
driver.find_element_by_xpath("//tr[td='Mon']/td").text
But it not fetching the text "Mon" "11am - 11pm"
text_area = driver.find_elements_by_xpath("//*[#class='pr10']")
for items2 in text_area:
print(items2.text)
try this instead:
text_area = driver.find_elements_by_xpath("""//*[#id="body"]/table/tbody/tr[1]/td[1]""")
print([elm.get_attribute('innerHTML') for elm in text_area])

parsing part of the table for data using BeautifulSoup

I have a self-project to scrape data online using BeautifulSoup and Python, and I think historical stocks data would be a good one for me to practice. I looked at the source code here to analyze how I can use BeautifulSoup's select() or findall() to parse part of the data from the table. Here is the code I use, but it parsed things other than the table.
soup = bs4.BeautifulSoup(res.text, 'lxml')
table = soup.findAll( 'td', {'class':'yfnc_tabledata1'} )
print table
My Question: How to I parse only the 2 rows showing the 2 days of data from the table?
Here is the table that has 2 days of the historical data:
<table class="yfnc_datamodoutline1" width="100%" cellpadding="0" cellspacing="0" border="0">
<tr valign="top">
<td>
<table border="0" cellpadding="2" cellspacing="1" width="100%">
<tr>
<th scope="col" class="yfnc_tablehead1" align="right" width="16%">Date</th>
<th scope="col" class="yfnc_tablehead1" align="right" width="12%">Open</th>
<th scope="col" class="yfnc_tablehead1" align="right" width="12%">High</th>
<th scope="col" class="yfnc_tablehead1" align="right" width="12%">Low</th>
<th scope="col" class="yfnc_tablehead1" align="right" width="12%">close</th>
<th scope="col" class="yfnc_tablehead1" align="right" width="16%">Volume</th>
<th scope="col" class="yfnc_tablehead1" align="right" width="15%">Adj Close*</th>
</tr>
<tr>
<td class="yfnc_tabledata1" nowrap align="right">12 Aug 2016</td>
<td class="yfnc_tabledata1" align="right">107.78</td>
<td class="yfnc_tabledata1" align="right">108.44</td>
<td class="yfnc_tabledata1" align="right">107.78</td>
<td class="yfnc_tabledata1" align="right">108.18</td>
<td class="yfnc_tabledata1" align="right">18,612,300</td>
<td class="yfnc_tabledata1" align="right">108.18</td>
</tr>
<tr>
<td class="yfnc_tabledata1" nowrap align="right">11 Aug 2016</td>
<td class="yfnc_tabledata1" align="right">108.52</td>
<td class="yfnc_tabledata1" align="right">108.93</td>
<td class="yfnc_tabledata1" align="right">107.85</td>
<td class="yfnc_tabledata1" align="right">107.93</td>
<td class="yfnc_tabledata1" align="right">27,484,500</td>
<td class="yfnc_tabledata1" align="right">107.93</td>
</tr>
<tr>
<td class="yfnc_tabledata1" colspan="7" align="center">
* <small>Close price adjusted for dividends and splits.</small>
</td>
</tr>
</table>
</td>
</tr>
</table>
I only need the specific 2 rows of data from above:
<tr>
<td class="yfnc_tabledata1" nowrap align="right">12 Aug 2016</td>
<td class="yfnc_tabledata1" align="right">107.78</td>
<td class="yfnc_tabledata1" align="right">108.44</td>
<td class="yfnc_tabledata1" align="right">107.78</td>
<td class="yfnc_tabledata1" align="right">108.18</td>
<td class="yfnc_tabledata1" align="right">18,612,300</td>
<td class="yfnc_tabledata1" align="right">108.18</td>
</tr>
<tr>
<td class="yfnc_tabledata1" nowrap align="right">11 Aug 2016</td>
<td class="yfnc_tabledata1" align="right">108.52</td>
<td class="yfnc_tabledata1" align="right">108.93</td>
<td class="yfnc_tabledata1" align="right">107.85</td>
<td class="yfnc_tabledata1" align="right">107.93</td>
<td class="yfnc_tabledata1" align="right">27,484,500</td>
<td class="yfnc_tabledata1" align="right">107.93</td>
</tr>
You can select the all the rows from the nested table inside the yfnc_datamodoutline1 table and index the first two:
soup = BeautifulSoup(html)
table_rows = soup.select("table.yfnc_datamodoutline1 table tr + tr")
row1, row2 = table_rows[0:2]
print(row1)
print(row2)
Which would give you:
<tr>
<td align="right" class="yfnc_tabledata1" nowrap="">12 Aug 2016</td>
<td align="right" class="yfnc_tabledata1">107.78</td>
<td align="right" class="yfnc_tabledata1">108.44</td>
<td align="right" class="yfnc_tabledata1">107.78</td>
<td align="right" class="yfnc_tabledata1">108.18</td>
<td align="right" class="yfnc_tabledata1">18,612,300</td>
<td align="right" class="yfnc_tabledata1">108.18</td>
</tr>
<tr>
<td align="right" class="yfnc_tabledata1" nowrap="">11 Aug 2016</td>
<td align="right" class="yfnc_tabledata1">108.52</td>
<td align="right" class="yfnc_tabledata1">108.93</td>
<td align="right" class="yfnc_tabledata1">107.85</td>
<td align="right" class="yfnc_tabledata1">107.93</td>
<td align="right" class="yfnc_tabledata1">27,484,500</td>
<td align="right" class="yfnc_tabledata1">107.93</td>
</tr>
To get the td data just extract the text from each td:
print([td.text for td in row1.find_all("td")])
print([td.text for td in row2.find_all("td")])
Which would give you:
[u'12 Aug 2016', u'107.78', u'108.44', u'107.78', u'108.18', u'18,612,300', u'108.18']
[u'11 Aug 2016', u'108.52', u'108.93', u'107.85', u'107.93', u'27,484,500', u'107.93']
table.yfnc_datamodoutline1 table tr + tr selects all the rows inside the inner table skipping the first which is the header row.

Categories