Related
I'm trying to scrape the data from the table in the specifications section of this webpage:
Lochinvar Water Heaters
I'm using beautiful soup 4. I've tried searching for it by class - for example - (class="Table__Cell-sc-1e0v68l-0 kdksLO") but bs4 can't find the class on the webpage. I listed all the available classes that it could find and it doesn't find anything useful. Any help is appreciated.
Here's the code I tried to get the classes
import requests
from bs4 import BeautifulSoup
URL = "https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", class_='Table__Wrapper-sc-1e0v68l-3 iFOFNW')
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
classes = sorted(classes)
for cass in classes:
print(cass)
The page is populated with javascript, but fortunately in this case, much of the data [including the specs table you want] seems to be inside a script tag within the fetched html. The script just has one statement, so it's fairly easy to extract it as json
import json
### copied from your q ####
import requests
from bs4 import BeautifulSoup
URL = "https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
###########################
wrInf = soup.find(lambda l: l.name == 'script' and '__routeInfo' in l.text)
wrInf = wrInf.text.replace('window.__routeInfo = ', '', 1) # remove variable name
wrInf = wrInf.strip()[:-1] # get rid of ; at end
wrInf = json.loads(wrInf) # convert to python dictionary
specsTables = wrInf['data']['product']['specifications'][0]['table'] # get table (tsv string)
specsTables = [tuple(row.split('\t')) for row in specsTables.split('\n')] # convert rows to tuples
To view it, you could use pandas,
import pandas
headers = specsTables[0]
st_df = pandas.DataFrame([dict(zip(headers, r)) for r in specsTables[1:]])
# or just
# st_df = pandas.DataFrame(specsTables[1:], columns=headers)
print(st_df.head())
or you could simply print it
for i, r in enumerate(specsTables):
print(" | ".join([f'{c:^18}' for c in r]))
if i == 0: print()
output:
Model Number | Btu/Hr Input | Thermal Efficiency | GPH # 100ºF Rise | A | B | C | D | E | F | G | H | I | J | K | L | M | Gas Conn. | Water Conn. | Air Inlet | Vent Size | Ship. Wt.
AWH0400NPM | 399,000 | 99% | 479 | 45" | 24" | 30-1/2" | 42-1/2" | 29-3/4" | 20-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1" | 2" | 4" | 4" | 326
AWH0500NPM | 500,000 | 99% | 600 | 45" | 24" | 30-1/2" | 42-1/2" | 29-3/4" | 20-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1" | 2" | 4" | 4" | 333
AWH0650NPM | 650,000 | 98% | 772 | 45" | 24" | 41" | 53" | 30-1/2" | 15-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2" | 4" | 6" | 424
AWH0800NPM | 800,000 | 98% | 950 | 45" | 24" | 41" | 53" | 30-1/2" | 15-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2" | 4" | 6" | 434
AWH1000NPM | 999,000 | 98% | 1,187 | 45" | 24" | 48" | 62" | 30-1/2" | 15-3/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2-1/2" | 6" | 6" | 494
AWH1250NPM | 1,250,000 | 98% | 1,485 | 51-1/2" | 34" | 49" | 59" | 5-1/2" | 5-1/2" | 13-1/2" | 6-3/4" | 46-3/4" | 5-3/4" | 19-3/4" | 23" | 22-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,568
AWH1500NPM | 1,500,000 | 98% | 1,782 | 51-1/2" | 34" | 52-3/4" | 62-3/4" | 4-1/2" | 4-1/2" | 13-1/2" | 6-3/4" | 46-3/4" | 5-3/4" | 19-3/4" | 23" | 22-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,649
AWH2000NPM | 1,999,000 | 98% | 2,375 | 51-1/2" | 34" | 65-1/2" | 75-1/2" | 7" | 5-3/4" | 14-3/4" | 7-1/4" | 46-3/4" | 6-3/4" | 18-3/4" | 23" | 23-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,911
AWH3000NPM | 3,000,000 | 98% | 3,564 | 67-1/4" | 48-1/4" | 79-3/4" | 93-3/4" | 4-3/4" | 6-3/4" | 17-3/4" | 8-3/4" | 60-1/4" | 8-1/2" | 25-1/2" | 29-1/2" | 40" | 2" | 4" | 10" | 10" | 3,147
AWH4000NPM | 4,000,000 | 98% | 4,752 | 67-1/4" | 48-1/4" | 96" | 110" | 5" | 7-1/2" | 17-3/4" | 8-3/4" | 60-1/4" | 8-1/2" | 25-1/2" | 29-1/2" | 40" | 2-1/2" | 4" | 12" | 12" | 3,694
If you wanted a specific models specs:
modelNo = 'AWH1000NPM'
mSpecs = [r for r in specsTables if r[0] == modelNo]
mSpecs = [[]] if mSpecs == [] else mSpecs # in case there is no match
mSpecs = dict(zip(specsTables[0], mSpecs[0])) # convert to dictionary
print(mSpecs)
output:
{'Model Number': 'AWH1000NPM', 'Btu/Hr Input': '999,000', 'Thermal Efficiency': '98%', 'GPH # 100ºF Rise': '1,187', 'A': '45"', 'B': '24"', 'C': '48"', 'D': '62"', 'E': '30-1/2"', 'F': '15-3/4"', 'G': '12"', 'H': '20"', 'I': '38"', 'J': '3-1/2"', 'K': '10-1/2"', 'L': '19-1/4"', 'M': '20"', 'Gas Conn.': '1-1/4"', 'Water Conn.': '2-1/2"', 'Air Inlet': '6"', 'Vent Size': '6"', 'Ship. Wt.': '494'}
The contents for constructing the table are within a script tag. You can extract the relevant string and re-create the table through string manipulation.
import requests, re
import pandas as pd
r = requests.get('https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater/').text
s = re.sub(r'\\"', '"', re.search(r'table":"([\s\S]+?)(?:","tableFootNote)', r).groups(1)[0])
lines = [i.split('\\t') for i in s.split('\\n')]
df = pd.DataFrame(lines[1:], columns = lines[:1])
df.head(5)
I have been using this code below to pull MLB lineups from BaseballPress.com. However this pulls the official MLB lineups which dont normally get posted until about an hour before the game.
import requests
import pandas as pd
import openpyxl
from bs4 import BeautifulSoup
url = "https://www.baseballpress.com/lineups/2022-08-09"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
def get_name(tag):
if tag.select_one(".desktop-name"):
return tag.select_one(".desktop-name").get_text()
elif tag.select_one(".mobile-name"):
return tag.select_one(".mobile-name").get_text()
else:
return tag.get_text()
data = []
for card in soup.select(".lineup-card"):
header = [
c.get_text(strip=True, separator=" ")
for c in card.select(".lineup-card-header .c")
]
h_p1, h_p2 = [
get_name(p) for p in card.select(".lineup-card-header .player")
]
data.append([*header, h_p1, h_p2])
for p1, p2 in zip(
card.select(".col--min:nth-of-type(1) .player"),
card.select(".col--min:nth-of-type(2) .player"),
):
p1 = get_name(p1).split(maxsplit=1)[-1]
p2 = get_name(p2).split(maxsplit=1)[-1]
data.append([*header, p1, p2])
df = pd.DataFrame(
data, columns=["Team1", "Date", "Team2", "Player1", "Player2"]
)
df.to_excel("MLB Games.xlsx", sheet_name='sheet1', index=False)
print(df.head(10).to_markdown(index=False))
In order to get around this, I found out that Rotowire releases the projected lineups about 24 hours in advance which is what I need for this analysis. I have changed the python script to match the website, except I am not sure how to alter the get_name() tag. Does anyone know how I would address this portion of the code? See the new code below:
import requests
import pandas as pd
import openpyxl
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/baseball/daily-lineups.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
def get_name(tag):
if tag.select_one(".desktop-name"):
return tag.select_one(".desktop-name").get_text()
elif tag.select_one(".mobile-name"):
return tag.select_one(".mobile-name").get_text()
else:
return tag.get_text()
data = []
for card in soup.select(".lineup__main"):
header = [
c.get_text(strip=True, separator=" ")
for c in card.select(".lineup__teams .c")
]
h_p1, h_p2 = [
get_name(p) for p in card.select(".lineup__teams .lineup__player")
]
data.append([*header, h_p1, h_p2])
for p1, p2 in zip(
card.select(".lineup__list is-visit:nth-of-type(1) .lineup__player"),
card.select(".lineup__list is-home:nth-of-type(2) .lineup__player"),
):
p1 = get_name(p1).split(maxsplit=1)[-1]
p2 = get_name(p2).split(maxsplit=1)[-1]
data.append([*header, p1, p2])
df = pd.DataFrame(
data, columns=["Team1", "Date", "Team2", "Player1", "Player2"]
)
df.to_excel("MLB Predicted Lineups.xlsx", sheet_name='sheet1', index=False)
print(df.head(10).to_markdown(index=False))
You need to look at the actual html to see what tags and attributes the html source is using, in order to correctly identify the content you want. I had made a script to do this, what you are asking here, a while back, so I'm just using/posting that.
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
def get_players(home_away_dict):
rows = []
for home_away, v in home_away_dict.items():
players = v['players']
print("\n{} - {}".format(v['team'],v['lineupStatus']))
for idx, player in enumerate(players):
if home_away == 'Home':
team = home_away_dict['Home']['team']
opp = home_away_dict['Away']['team']
else:
team = home_away_dict['Away']['team']
opp = home_away_dict['Home']['team']
if player.find('span', {'class':'lineup__throws'}):
playerPosition = 'P'
handedness = player.find('span', {'class':'lineup__throws'}).text
else:
playerPosition = player.find('div', {'class':'lineup__pos'}).text
handedness = player.find('span', {'class':'lineup__bats'}).text
if 'title' in list(player.find('a').attrs.keys()):
playerName = player.find('a')['title'].strip()
else:
playerName = player.find('a').text.strip()
playerRow = {
'Bat Order':idx,
'Name':playerName,
'Position':playerPosition,
'Team':team,
'Opponent':opp,
'Home/Away':home_away,
'Handedness':handedness,
'Lineup Status':home_away_dict[home_away]['lineupStatus']}
rows.append(playerRow)
print('{} {}'.format(playerRow['Position'], playerRow['Name']))
return rows
rows = []
url = 'https://www.rotowire.com/baseball/daily-lineups.php'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
lineupBoxes = soup.find_all('div', {'class':'lineup__box'})
for lineupBox in lineupBoxes:
try:
awayTeam = lineupBox.find('div', {'class':'lineup__team is-visit'}).text.strip()
homeTeam = lineupBox.find('div', {'class':'lineup__team is-home'}).text.strip()
print(f'\n\n############\n {awayTeam} # {homeTeam}\n############')
awayLineup = lineupBox.find('ul', {'lineup__list is-visit'})
homeLineup = lineupBox.find('ul', {'lineup__list is-home'})
awayLineupStatus = awayLineup.find('li', {'class':re.compile('lineup__status.*')}).text.strip()
homeLineupStatus = homeLineup.find('li', {'class':re.compile('lineup__status.*')}).text.strip()
awayPlayers = awayLineup.find_all('li', {'class':re.compile('lineup__player.*')})
homePlayers = homeLineup.find_all('li', {'class':re.compile('lineup__player.*')})
home_away_dict = {
'Home':{
'team':homeTeam, 'players':homePlayers, 'lineupStatus':homeLineupStatus},
'Away':{
'team':awayTeam, 'players':awayPlayers,'lineupStatus':awayLineupStatus}}
playerRows = get_players(home_away_dict)
rows += playerRows
except:
continue
df = pd.DataFrame(rows)
Output: First 20 of 300 rows
print(df.head(20).to_markdown(index=False))
| Bat Order | Name | Position | Team | Opponent | Home/Away | Handedness | Lineup Status |
|------------:|:-----------------|:-----------|:-------|:-----------|:------------|:-------------|:----------------|
| 0 | Nick Lodolo | P | CIN | PHI | Home | L | Expected Lineup |
| 1 | Jonathan India | 2B | CIN | PHI | Home | R | Expected Lineup |
| 2 | Nick Senzel | CF | CIN | PHI | Home | R | Expected Lineup |
| 3 | Kyle Farmer | 3B | CIN | PHI | Home | R | Expected Lineup |
| 4 | Joey Votto | 1B | CIN | PHI | Home | L | Expected Lineup |
| 5 | Aristides Aquino | DH | CIN | PHI | Home | R | Expected Lineup |
| 6 | Albert Almora | LF | CIN | PHI | Home | R | Expected Lineup |
| 7 | Matt Reynolds | RF | CIN | PHI | Home | R | Expected Lineup |
| 8 | Jose Barrero | SS | CIN | PHI | Home | R | Expected Lineup |
| 9 | Austin Romine | C | CIN | PHI | Home | R | Expected Lineup |
| 0 | Ranger Suarez | P | PHI | CIN | Away | L | Expected Lineup |
| 1 | Jean Segura | 2B | PHI | CIN | Away | R | Expected Lineup |
| 2 | Kyle Schwarber | LF | PHI | CIN | Away | L | Expected Lineup |
| 3 | Rhys Hoskins | 1B | PHI | CIN | Away | R | Expected Lineup |
| 4 | J.T. Realmuto | C | PHI | CIN | Away | R | Expected Lineup |
| 5 | Nick Castellanos | RF | PHI | CIN | Away | R | Expected Lineup |
| 6 | Alec Bohm | 3B | PHI | CIN | Away | R | Expected Lineup |
| 7 | Darick Hall | DH | PHI | CIN | Away | L | Expected Lineup |
| 8 | Bryson Stott | SS | PHI | CIN | Away | L | Expected Lineup |
| 9 | Matt Vierling | CF | PHI | CIN | Away | R | Expected Lineup |
I am trying to scrape a table, which in some cells has a "graphical" element (arrow up/down) using R. Unfortunately, the library rvest function html_table seems to skip these elements. This is how such a cell with arrow looks like in HTML:
<td>
<span style="font-weight: bold; color: darkgreen">Ba2</span>
<i class="glyphicon glyphicon-arrow-down" title="negative outlook"></i>
</td>
The code I am using is:
require(rvest)
require(tidyverse)
url = "https://tradingeconomics.com/country-list/rating"
#bypass company firewall
download.file(url, destfile = "scrapedpage.html", quiet=TRUE)
content <- read_html("scrapedpage.html")
tables <- content %>% html_table(fill = TRUE, trim=TRUE)
But for example for the cell above, it gives me only Ba2 string. Is there a way to include also the arrows somehow (as text, e.g. Ba2 neg)? Solution in Python would be also useful, if R does not have such functionality.
Thank you!
I don't know if this is possible in R but in Python this will give you the required results.
I have tried to print the first few rows to give you an idea of how the data looks.
pos - Denotes Arrow-up and neg - Denotes Arrow-down
from bs4 import BeautifulSoup
import requests
url = 'https://tradingeconomics.com/country-list/rating'
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')
t = soup.find('table', attrs= {'id': 'ctl00_ContentPlaceHolder1_ctl01_GridView1'})
tr = t.findAll('tr')
for i in range(1,10):
tds = tr[i].findAll('td')
temp = []
for j in tds:
fa_down = j.find('i', class_='glyphicon-arrow-down')
fa_up = j.find('i', class_='glyphicon-arrow-up')
if fa_up:
print(f'{j.text.strip()} (pos)')
elif fa_down:
print(f'{j.text.strip()} (neg)')
else:
print(f'{j.text.strip()}')
Output:
+------------+---------+-----------+-----------+---------+---------+
| Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 |
+------------+---------+-----------+-----------+---------+---------+
| Albania | B+ | B1 | | | 35 |
| Andorra | BBB | | BBB+ | | 62 |
| Angola | CCC+ | Caa1 | CCC | | 21 |
| Argentina | CCC+ | Ca | CCC | CCC | 15 |
| Armenia | | Ba3 | B+ | | 16 |
| Aruba | BBB | | BB | | 52 |
| Australia | AAA | Aaa | AAA (neg) | AAA | 100 |
| Austria | AA+ | Aa1 | AA+ | AAA | 96 |
| Azerbaijan | BB+ | Ba2 (pos) | BB+ | | 48 |
+------------+---------+-----------+-----------+---------+---------+
from bs4 import BeautifulSoup
import numpy as np
import requests
from selenium import webdriver
from nltk.tokenize import sent_tokenize,word_tokenize
html = webdriver.Firefox(executable_path=r'D:\geckodriver.exe')
html.get("https://www.tsa.gov/coronavirus/passenger-throughput")
def TSA_travel_numbers(html):
soup = BeautifulSoup(html,'lxml')
for i,rows in enumerate(soup.find('div',class_='view-content'),1):
# print(rows.content)
for header in rows.find('tr'):
number = rows.find_all('td',class_='views-field views-field field-2021-throughput views-align-center')
print(number.text)
TSA_travel_numbers(html.page_source)
My error as follows :
Traceback (most recent call last):
File "TSA_travel.py", line 23, in <module>
TSA_travel_numbers(html.page_source)
File "TSA_travel.py", line 15, in TSA_travel_numbers
for header in rows.find('tr'):
TypeError: 'int' object is not iterable
What is happening here?
I can't iter thru 'tr' tags, please help me to solve this problem.
Sorry for your time and advance thanks!
As the error says, you can't iterate over an int, which is your rows.
Also, there's no need for a webdriver as data on the page is static.
Here's my take on it:
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
def get_page(url):
return requests.get(url).text
def get_data(page):
soup = BeautifulSoup(page, 'lxml')
return [
item.getText(strip=True) for item in soup.select(".views-align-center")
]
def build_table(table_rows):
t = [table_rows[i:i + 4] for i in range(0, len(table_rows[1:]), 4)]
h = t[0]
return t[1:], h
if __name__ == '__main__':
source = "https://www.tsa.gov/coronavirus/passenger-throughput"
table, header = build_table(get_data(get_page(source)))
print(tabulate(table, headers=header, tablefmt="pretty"))
Output:
+------------+--------------------------+--------------------------+--------------------------+
| Date | 2021 Traveler Throughput | 2020 Traveler Throughput | 2019 Traveler Throughput |
+------------+--------------------------+--------------------------+--------------------------+
| 5/9/2021 | 1,707,805 | 200,815 | 2,419,114 |
| 5/8/2021 | 1,429,657 | 169,580 | 1,985,942 |
| 5/7/2021 | 1,703,267 | 215,444 | 2,602,631 |
| 5/6/2021 | 1,644,050 | 190,863 | 2,555,342 |
| 5/5/2021 | 1,268,938 | 140,409 | 2,270,662 |
| 5/4/2021 | 1,134,103 | 130,601 | 2,106,597 |
| 5/3/2021 | 1,463,672 | 163,692 | 2,470,969 |
| 5/2/2021 | 1,626,962 | 170,254 | 2,512,598 |
| 5/1/2021 | 1,335,535 | 134,261 | 1,968,278 |
| 4/30/2021 | 1,558,553 | 171,563 | 2,546,029 |
| 4/29/2021 | 1,526,681 | 154,695 | 2,499,461 |
| 4/28/2021 | 1,184,326 | 119,629 | 2,256,442 |
| 4/27/2021 | 1,077,199 | 110,913 | 2,102,068 |
| 4/26/2021 | 1,369,410 | 119,854 | 2,412,770 |
| 4/25/2021 | 1,571,220 | 128,875 | 2,506,809 |
| 4/24/2021 | 1,259,724 | 114,459 | 1,990,464 |
| 4/23/2021 | 1,521,393 | 123,464 | 2,521,897 |
| 4/22/2021 | 1,509,649 | 111,627 | 2,526,961 |
| 4/21/2021 | 1,164,099 | 98,968 | 2,254,209 |
| 4/20/2021 | 1,082,443 | 92,859 | 2,227,475 |
| 4/19/2021 | 1,412,500 | 99,344 | 2,594,171 |
| 4/18/2021 | 1,572,383 | 105,382 | 2,356,802 |
| 4/17/2021 | 1,277,815 | 97,236 | 1,988,205 |
| 4/16/2021 | 1,468,218 | 106,385 | 2,457,133 |
| 4/15/2021 | 1,491,435 | 95,085 | 2,616,158 |
| 4/14/2021 | 1,152,703 | 90,784 | 2,317,381 |
| 4/13/2021 | 1,085,034 | 87,534 | 2,208,688 |
| 4/12/2021 | 1,468,972 | 102,184 | 2,484,580 |
| 4/11/2021 | 1,561,495 | 90,510 | 2,446,801 |
and so on ...
Or, an ever shorter approach, just use pandas:
import pandas as pd
import requests
from tabulate import tabulate
if __name__ == '__main__':
source = "https://www.tsa.gov/coronavirus/passenger-throughput"
df = pd.read_html(requests.get(source).text, flavor="bs4")[0]
print(tabulate(df.head(10), tablefmt="pretty", showindex=False))
Output:
+-----------+-----------+--------+---------+
| 5/9/2021 | 1707805.0 | 200815 | 2419114 |
| 5/8/2021 | 1429657.0 | 169580 | 1985942 |
| 5/7/2021 | 1703267.0 | 215444 | 2602631 |
| 5/6/2021 | 1644050.0 | 190863 | 2555342 |
| 5/5/2021 | 1268938.0 | 140409 | 2270662 |
| 5/4/2021 | 1134103.0 | 130601 | 2106597 |
| 5/3/2021 | 1463672.0 | 163692 | 2470969 |
| 5/2/2021 | 1626962.0 | 170254 | 2512598 |
| 5/1/2021 | 1335535.0 | 134261 | 1968278 |
| 4/30/2021 | 1558553.0 | 171563 | 2546029 |
+-----------+-----------+--------+---------+
I am trying to scrape the th element, but the result keeps returning None. What am I doing wrong?
This is the code I have tried:
import requests
import bs4
import urllib3
dateList = []
openList = []
closeList = []
highList = []
lowList = []
r = requests.get(
'https://coinmarketcap.com/currencies/bitcoin/historical-data/')
soup = bs4.BeautifulSoup(r.text, 'lxml')
td = soup.find('th')
print(td)
There's an API endpoint so you can fetch the data from there.
Here's how:
import pandas as pd
import requests
from tabulate import tabulate
api_endpoint = "https://web-api.coinmarketcap.com/v1/cryptocurrency/ohlcv/historical?id=1&convert=USD&time_start=1609804800&time_end=1614902400"
bitcoin = requests.get(api_endpoint).json()
df = pd.DataFrame([q["quote"]["USD"] for q in bitcoin["data"]["quotes"]])
print(tabulate(df, headers="keys", showindex=False, disable_numparse=True, tablefmt="pretty"))
Output:
+----------------+----------------+----------------+----------------+--------------------+-------------------+--------------------------+
| open | high | low | close | volume | market_cap | timestamp |
+----------------+----------------+----------------+----------------+--------------------+-------------------+--------------------------+
| 34013.614533 | 36879.69856854 | 33514.03374162 | 36824.36441009 | 75289433810.59091 | 684671246323.6501 | 2021-01-06T23:59:59.999Z |
| 36833.87435728 | 40180.3679073 | 36491.18981083 | 39371.04235311 | 84762141031.49448 | 732062681138.1346 | 2021-01-07T23:59:59.999Z |
| 39381.76584266 | 41946.73935079 | 36838.63599637 | 40797.61071993 | 88107519479.50471 | 758625941266.7522 | 2021-01-08T23:59:59.999Z |
| 40788.64052286 | 41436.35000639 | 38980.87690625 | 40254.54649816 | 61984162837.0747 | 748563483043.1383 | 2021-01-09T23:59:59.999Z |
| 40254.21779758 | 41420.19103255 | 35984.62712175 | 38356.43950662 | 79980747690.35463 | 713304617760.9486 | 2021-01-10T23:59:59.999Z |
| 38346.52950301 | 38346.52950301 | 30549.59876946 | 35566.65594049 | 123320567398.62296 | 661457321418.0524 | 2021-01-11T23:59:59.999Z |
| 35516.36114084 | 36568.52697414 | 32697.97662163 | 33922.9605815 | 74773277909.4566 | 630920422745.0479 | 2021-01-12T23:59:59.999Z |
| 33915.11958124 | 37599.96059774 | 32584.66767186 | 37316.35939997 | 69364315979.27992 | 694069582193.7559 | 2021-01-13T23:59:59.999Z |
| 37325.10763475 | 39966.40524241 | 36868.5632453 | 39187.32812109 | 63615990033.01017 | 728904366964.3611 | 2021-01-14T23:59:59.999Z |
| 39156.7080858 | 39577.71118833 | 34659.58974449 | 36825.36585131 | 67760757880.723885 | 685005864471.3622 | 2021-01-15T23:59:59.999Z |
| 36821.64873201 | 37864.36887891 | 35633.55401669 | 36178.13890106 | 57706187875.104546 | 673000645230.8221 | 2021-01-16T23:59:59.999Z |
| 36163.64923243 | 36722.34987621 | 34069.32218533 | 35791.27792129 | 52359854336.21185 | 665831621390.9865 | 2021-01-17T23:59:59.999Z |
| 35792.23666766 | 37299.28580604 | 34883.84404829 | 36630.07568284 | 49511702429.3542 | 681470030572.0747 | 2021-01-18T23:59:59.999Z |
| 36642.23272357 | 37755.89185872 | 36069.80639361 | 36069.80639361 | 57244195485.50075 | 671081200699.8711 | 2021-01-19T23:59:59.999Z |
and so on ...
I think the request object doesn't have a text attribute.
Try soup = bs4.BeautifulSoup(r.content, 'lxml')