import requests
import pandas as pd
shot_chart_url = 'http://stats.nba.com/stats/shotchartdetail?CFID=33&CFPAR'\
'AMS=2014-15&ContextFilter=&ContextMeasure=FGA&DateFrom=&D'\
'ateTo=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Loca'\
'tion=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&'\
'PaceAdjust=N&PerMode=PerGame&Period=0&PlayerID=201935&Plu'\
'sMinus=N&Position=&Rank=N&RookieYear=&Season=2014-15&Seas'\
'onSegment=&SeasonType=Regular+Season&TeamID=0&VsConferenc'\
'e=&VsDivision=&mode=Advanced&showDetails=0&showShots=1&sh'\
'owZones=0'
# Get the webpage containing the data
response = requests.get(shot_chart_url)
# Grab the headers to be used as column headers for our DataFrame
headers = response.json()['resultSets'][0]['headers']
# Grab the shot chart data
shots = response.json()['resultSets'][0]['rowSet']
shot_df = pd.DataFrame(shots, columns=headers)
# View the head of the DataFrame and all its columns
from IPython.display import display
with pd.option_context('display.max_columns', None):
display(shot_df.head())
I want to dump the data from the pandas table into a CSV but I'm unsure of how to implement pandas.DataFrame.to_csv
Call the DataFrame instance's to_csv method with at least the path argument
shot_df.to_csv('/path/to/file.csv')
Related
I am experiencing this error with the code below:
File "\<stdin\>", line 1, in \<module\>
AttributeError: 'list' object has no attribute 'to_excel'
I want to save the table I am scraping from wikipedia to an Excel file - but I can't work out how to adjust the code to get the data list from the terminal to the Excel file using to_excel.
I can see it works for a similar problem when a dataset has data set out as a 'DataFrame'
(i.e. df = pd.DataFrame(data, columns = \['Product', 'Price'\]).
But can't work out how to adjust my code for the df = pd.read*html(str(congresstable)) line - which I think is the issue. (i.e. using read*_html and sourcing the data from a table id)
How can I adjust the code to make it save an excel file to the path specified?
from bs4 import BeautifulSoup
import requests
import pandas as pd
wiki_url = 'https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives'
table_id = 'votingmembers'
response = requests.get(wiki_url)
soup = BeautifulSoup(response.text, 'html.parser')
congress_table = soup.find('table', attrs={'id': table_id})
df = pd.read_html(str(congress_table))
df.to_excel (r'C:\Users\name\OneDrive\Code\.vscode\Test.xlsx', index = False, header=True)
print(df)
I was expecting the data list to be saved to Excel at the folder path specified.
I tried following multiple guides, but they don't show the read_html item, only DataFrame solutions.
pandas.read_html() creates a list of tables respectiv dataframe objects, so you have to pick one by index in your case [0] - You also do not need requests and BeautifulSoup, separatly, just go with pandas.read_html()
pd.read_html(wiki_url,attrs={'id': table_id})[0]
Example
import pandas as pd
wiki_url = 'https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives'
table_id = 'votingmembers'
congress_table = soup.find('table', )
df = pd.read_html(wiki_url,attrs={'id': table_id})[0]
df.to_excel (r'C:\Users\name\OneDrive\Code\.vscode\Test.xlsx', index = False, header=True)
I have an issue with trying to convert my data table to a background gradient style. Every time I run the script, I'm not able to convert it somehow. I think it has to do that some data values in python won't convert right since they are in the wrong data form. Does anyone know how to help me with this issue?
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
print("Wrong version")
import json
def get_jsonparsed_data(url):
"""
Receive the content of ``url``, parse it as JSON and return the object.
Parameters
----------
url : str
Returns
-------
dict
"""
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url = ("https://financialmodelingprep.com/api/v3/income-statement/AAPL?apikey=*******************")
print(get_jsonparsed_data(url))
data = get_jsonparsed_data(url)
import pandas as pd
import numpy as np
# Sets the pandas dataframe wide for vizualization
desired_width=1000
pd.set_option('display.width', desired_width)
np.set_printoptions(linewidth=desired_width)
pd.set_option('display.max_columns',100)
# Gradient color
df = pd.DataFrame(data)
df.info()
df.style.background_gradient(cmap='Blues',
low=0,
high=0,
axis=0,
subset=None,
text_color_threshold=0.408,
vmin=None,
vmax=None)
print(df)
Screenshots:
Calling .style.* doesn't convert anything.
So the print(df) in the end makes your call useless, it gets evaluated, and then nothing.
If you want to "convert your DataFrame" (you can't actually), create a new variable :
df_styled = df.style.background_gradient(...)
But note that
df is a DataFrame,
df_styled is an html representation of a DataFrame...
It's really different
I am new to scraping and python. I am trying to scrape multiple tables from this URL: https://en.wikipedia.org/wiki/List_of_Game_of_Thrones_episodes. I did the scraping and now I am trying to save the dataframe to a csv file. I tried but it just stores the the first table from the page.
code:
from pandas.io.html import read_html
page = 'https://en.wikipedia.org/wiki/List_of_Game_of_Thrones_episodes'
wikitables = read_html(page, index_col=0, attrs={"class":"wikitable plainrowheaders wikiepisodetable"})
print ("Extracted {num} wikitables".format(num=len(wikitables)))
for line in range(7):
df= pd.DataFrame(wikitables[line].head())
df.to_csv('file1.csv')
You need to reshape the list of dataframes into a single dataframe and then you need to export it to csv file.
wikitable = wikitables[0]
for i in range(1,len(wikitables)):
wikitable = wikitable.append(wikitables[i],sort=True)
wikitable.to_csv('wikitable.csv')
You forgot
import pandas as pd
but you don't need it because read_html gives list of dataframes and you don't have to convert it o dataframes. You can write it directly.
from pandas.io.html import read_html
url = 'https://en.wikipedia.org/wiki/List_of_Game_of_Thrones_episodes'
wikitables = read_html(url, index_col=0, attrs={"class":"wikitable plainrowheaders wikiepisodetable"})
print("Extracted {num} wikitables".format(num=len(wikitables)))
for i, dataframe in enumerate(wikitables):
dataframe.to_csv('file{}.csv'.format(i))
Trying to extract the table from this page "https://www.hkex.com.hk/Market-Data/Statistics/Consolidated-Reports/Monthly-Bulletin?sc_lang=en#select1=0&select2=28". By inspect/network function of chorme, the data request link is "https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485". This links looks like json format when access directly. However, the codes using this link does not work.
My codes:
import pandas as pd
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485"
df = pd.read_json(url)
print(df.info(verbose=True))
print(df)
also tried:
url="https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?"
You can try downloading the json first and then convert it back to DataFrame
import pandas as pd
url='https://www.hkex.com.hk/eng/stat/smstat/mthbull/rpt_turnover_short_selling_current_month_1910.json?_=1574650413485'
import urllib.request, json
with urllib.request.urlopen(url) as r:
data = json.loads(r.read().decode())
df = pd.DataFrame(data['tables'][0]['body'])
columns = [item['text'] for item in data['tables'][0]['header']]
row_count = max(df['row'])
new_df = pd.DataFrame(df.text.values.reshape((row_count,-1)),columns = columns)
I am new to python and am stuck. I cant figure out how to only output one of the tables given. In the output, it gives the desired table, but three versions of them. The first two are awfully formatted, and the last table is the table desired.
I have tried running a for loop and counting to only print the third table.
import pandas as pd
from bs4 import BeautifulSoup
import requests
url = 'https://www.espn.com/golf/leaderboard'
dfs = pd.read_html(url, header = 0)
for df in dfs:
print(df[0:])
Just use index to print the table.
import pandas as pd
url = 'https://www.espn.com/golf/leaderboard'
dfs = pd.read_html(url, header = 0)
print(dfs[2])
OR
print(dfs[-1])
OR If you want to use loop then try that.
import pandas as pd
url = 'https://www.espn.com/golf/leaderboard'
dfs = pd.read_html(url, header = 0)
for df in range(len(dfs)):
if df==2:
print(dfs[df])