AttributeError: 'DataFrame' object has no attribute 'to_flat_index' - python

I am importing an HTML file. It has the data in a weird format and with multi index.
I am particularly interested in importing the table 'Photovoltaic' and it starts at line 10 in the big table. The table seems to be of multiindex.
code:
net_met_cus = 'https://www.eia.gov/electricity/annual/html/epa_04_10.html'
net_met = pd.read_html(net_met_cus)
print(len(net_met))
net_met_pv = net_met[1]
# Photovoltaic table starts at 12 row
print(net_met_pv.loc[12])
Unnamed: 0_level_0 Year Photovoltaic
Capacity (MW) Residential Photovoltaic
Commercial Photovoltaic
Industrial Photovoltaic
Transportation Photovoltaic
Total Photovoltaic
Customers Residential Photovoltaic
Commercial Photovoltaic
Industrial Photovoltaic
Transportation Photovoltaic
Total Photovoltaic
Name: 12, dtype: object
# Is it multiindex
print(net_met_pv.loc[12].index)
MultiIndex([('Unnamed: 0_level_0', 'Year'),
( 'Capacity (MW)', 'Residential'),
( 'Capacity (MW)', 'Commercial'),
( 'Capacity (MW)', 'Industrial'),
( 'Capacity (MW)', 'Transportation'),
( 'Capacity (MW)', 'Total'),
( 'Customers', 'Residential'),
( 'Customers', 'Commercial'),
( 'Customers', 'Industrial'),
( 'Customers', 'Transportation'),
( 'Customers', 'Total')],
)
# Okay, let's flaten it
net_met_pv.to_flat_index()
Present output:
AttributeError: 'DataFrame' object has no attribute 'to_flat_index'

.to_flat_index() is a method of Index or Multindex, so you should call using net_met_pv.loc[12].index.to_flat_index() or similar calls.
Ref: https://pandas.pydata.org/docs/reference/api/pandas.Index.to_flat_index.html?highlight=to_flat_index#pandas.Index.to_flat_index
https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.to_flat_index.html?highlight=to_flat_index#pandas.MultiIndex.to_flat_index

Related

how to retrieve sector and industry for a list of tickers with python?

I have a list of tickers (below: tick1) that comes from the Earnings Report.
I would like to add the "shortname", "sector" and the "industry" next to the ticker while creating a dataframe.
Unfortunately, the columns are always shuffeling up a bit and they are not matched properly. for instance: VFC --> sector: technology; industry: Semiconductors, which is wrong. It should be sector: Consumer Cyclical; industry: Apparel Manufacturing
Here is my code below: can you please help to adjust it?
---tickers to be read---
import yfinance as yf
with open("/Users/Doc/AB/Earnings/tickers.txt") as fh:
tick1 = fh.read().split()
tickers in txt file
ABOS
ACRX
ADI
ADMP
ADOCY
AER
AGYS
AINV
ALBO
ALLT
AMAT
AMPS
AOZOY
ARCO
AREC
ARZGY
ATAI
AUTO
AVAL
AXDX
BAH
BBAR
BBWI
BHIL
BJ
BKYI
BLBX
BPCGY
BPTH
BRDS
BZFD
CAAP
CAE
CALT
CCHWF
CCSI
CELC
CFRHF
CGEN
CINT
CLSN
CMRX
CRLBF
CRXT
CSCO
CSWI
CVSI
CWBHF
CWBR
DAC
DADA
DE
DECK
DESP
DLO
DOYU
DTST
DUOT
EAST
EBR
EBR.B
EDAP
ENJY
EVTV
EXP
FATH
FL
FLO
FSI
FTK
FUV
FXLV
GAN
GBOX
GDS
GLBE
GLOB
GNLN
GOED
GOGL
GRAB
GRAMF
GRCL
HD
HOOK
HPK
HUYA
HWKN
HYRE
IBEX
IGIC
IKT
IMPL
INLB
INLX
INVO
IONM
IONQ
IPW
IPWR
ISUN
ITCTY
JBI
JD
JHX
JMIA
KALA
KBNT
KEYS
KMDA
KORE
KSLLF
KSS
KULR
LOW
LTRY
LUNA
LVLU
MARK
MBT
MCG
MCLD
MDWD
MDWT
MIGI
MIRO
MNDY
MNMD
MNRO
MSADY
MSGM
MUFG
MVST
NEXCF
NGS
NNOX
NOVN
NRDY
NRGV
NU
NXGN
OBSV
OEG
OMQS
ONON
PANW
PASG
PCYG
PEAR
PLNHF
PLX
PTE
PTN
PXS
QIPT
QRHC
QTEK
QUIK
RCRT
RDY
REE
REED
REKR
RKLB
RMED
RMTI
ROST
RSKD
RYAAY
SANW
SCVL
SDIG
SE
SHLS
SHPW
SHWZ
SLGG
SNPS
SPRO
SQM
SRAD
SSYS
SUNL
SUNW
SUPV
SYN
SYRS
TCEHY
TCRT
TCS
TGI
TGT
THBRF
TJX
TKOMY
TLLTF
TME
TRMR
TSEM
TSHA
TTWO
TXMD
USWS
VBLT
VERB
VEV
VFC
VIPS
VJET
VOXX
VTRU
VVOS
VWE
VYGVF
VYNT
WEBR
WEDXF
WEJO
WIX
WMS
WMT
WRBY
WYY
YALA
YOU
ZIM
---adding the shortname, sector, industry ---
from yahooquery import Ticker
import pandas as pd
symbols = tick1
tickers = Ticker(symbols, asynchronous=True)
datasi = tickers.get_modules("summaryProfile quoteType")
dfsi = pd.DataFrame.from_dict(datasi).T
dataframes = [pd.json_normalize([x for x in dfsi[module] if isinstance(x, dict)]) for
module in ['summaryProfile', 'quoteType']]
dfsi = pd.concat(dataframes, axis=1)
dfsi
import pandas as pd
from yahooquery import Ticker
symbols = ['TSHA', 'GRAMF', 'VFC', 'ABOS', 'INLX', 'INVO', 'IONM', 'IONQ']
tickers = Ticker(symbols, asynchronous=True)
datasi = tickers.get_modules("summaryProfile quoteType")
dfsi = pd.DataFrame.from_dict(datasi).T
dataframes = [pd.json_normalize([x for x in dfsi[module] if isinstance(x, dict)]) for
module in ['summaryProfile', 'quoteType']]
dfsi = pd.concat(dataframes, axis=1)
dfsi = dfsi.set_index('symbol')
dfsi = dfsi.loc[symbols]
print(dfsi[['industry', 'sector']])
Output
industry sector
symbol
TSHA Biotechnology Healthcare
GRAMF Drug Manufacturers—Specialty & Generic Healthcare
VFC Apparel Manufacturing Consumer Cyclical
ABOS Biotechnology Healthcare
INLX Software—Application Technology
INVO Medical Devices Healthcare
IONM Medical Care Facilities Healthcare
IONQ Computer Hardware Technology
Try the following. Set the column'symbol' as indexes.
And send it to the ticker list. Again, you need to check.
I have run the ticker 'VFC' several times: VFC industry---Apparel Manufacturing, sector---Consumer Cyclical.

ValueError: Unknown format code 'f' for object of type 'str' create table

I keep get the same error message for my code:
Associate Business Budget ($) Setup Fee ($) Mgmt Fee ($) Annual Revenue ($) Commission ($) .
Traceback (most recent call last):
File "/Users/Group10Asg03.py", line 50, in <module>
printResults(salesassociateList)
File "/Users/Group10Asg03.py", line 30, in <module>
print("{0:16}{1:20}{2:<20.2f}{3:<20.2f}{4:<20.2f}{5:<20.2f}{6:<20.2f}.".format(row[0], row[1], float(row[2]), float(row[3]), float(row[4]), float(row[5]), float(row[6])))
builtins.ValueError: Unknown format code 'f' for object of type 'str'
I have tried multiple ways to fix, but it won't work. I need the code to show a table.
def printResults(salesassociateList):
""" This function displays the results for each data in the salesassociateList, it displays all seven variables"""
total_f = total_tar = total_c = 0
print("*********\n\nThe Results of Total Revenue and Commission Calculations\n\n*********")
print("{0:16}{1:20}{2:20}{3:20}{4:20}{5:20}{6:20}.".format("Associate", "Business", "Budget ($)", "Setup Fee ($)", "Mgmt Fee ($)", "Annual Revenue ($)", "Commission ($)"))
for row in salesassociateList:
print("{0:16}{1:20}{2:<20.2f}{3:<20.2f}{4:<20.2f}{5:<20.2f}{6:<20.2f}.".format(row[0], row[1], float(row[2]), float(row[3]), float(row[4]), float(row[5]), float(row[6])))
total_f += row[4]
total_tar += row[5]
total_c += row[6]
print("\nThe total monthly management fee is ${4:.2f}. The total annual revenue from all projects is ${5:.2f} and the total commission to all sales associates is ${6:.2f}.".format(total_f, total_tar, total_c))

Plotly Choropleth Map Not Showing Up

I'm trying to display a Plotly Choropleth Map in Jupyter Notebooks (I'm a beginner with this type of stuff) and for some reason it won't display correctly.
The csv file I am using for it can be found here:
https://www.kaggle.com/ajaypalsinghlo/world-happiness-report-2021
Here is the code leading up to the choropleth:
# here we're assigning the hover data columns to use for our choropleth map below
hover_data_cols_df = ['Country', 'Life Ladder', 'Log GDP per capita', 'Social support', 'Healthy life expectancy at birth', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
df.groupby('Year').Country.count()
and here is the code for the actual choropleth:
choropleth_map = px.choropleth(df,
locations="Country",
color='Life Ladder',
hover_name = 'Life Ladder',
hover_data = hover_data_cols_df,
color_continuous_scale = px.colors.sequential.Oranges,
animation_frame="Year"
).update_layout (title_text = 'World Happiness Index - year wise data', title_x = 0.5,);
iplot(choropleth_map)
I'm not getting any error messages attached to it currently, however when I check my console log on my browser, I do find this error:
Wolrd-Happiness-Report.ipynb:1 Uncaught ReferenceError: require is not defined
at <anonymous>:1:17
at t.attachWidget (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at t.insertWidget (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at x._insertOutput (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at x.onModelChanged (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at m (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at Object.l [as emit] (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at e.emit (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at c._onListChanged (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at m (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
I'm not too sure if this is related or not!
Thanks all!
Your task requires a setting that associates a country name with a country on the map. It requires that the location mode be the country name.
import pandas as pd
df = pd.read_csv('./data/world-happiness-report.csv', sep=',')
df.sort_values('year', ascending=True, inplace=True)
hover_data_cols_df = ['Country name', 'year', 'Life Ladder', 'Log GDP per capita', 'Social support', 'Healthy life expectancy at birth', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
import plotly.express as px
fig = px.choropleth(df,
locations="Country name",
locationmode='country names',
color='Life Ladder',
hover_name = 'Life Ladder',
hover_data = hover_data_cols_df,
color_continuous_scale = px.colors.sequential.Oranges,
animation_frame="year"
)
fig.update_layout (title_text = 'World Happiness Index - year wise data', title_x = 0.5,);
fig.show()

A list of ticker to get setor and name

import pandas as pd
import datetime as dt
from pandas_datareader import data as web
import yfinance as yf
yf.pdr_override()
filename=r'C:\Users\User\Desktop\from_python\data_from_python.xlsx'
yeah = pd.read_excel(filename, sheet_name='entry')
stock = []
stock = list(yeah['name'])
stock = [ s.replace('\xa0', '') for s in stock if not pd.isna(s) ]
adj_close=pd.DataFrame([])
high_price=pd.DataFrame([])
low_price=pd.DataFrame([])
volume=pd.DataFrame([])
print(stock)
['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS', 'AQB', 'ASPN', 'ATHM', 'AZRE', 'BCYC', 'BGNE', 'CAT', 'CC', 'CLAR', 'CLCT', 'CMBM', 'CMT', 'CRDF', 'CYD', 'DE', 'DKNG', 'EARN', 'EMN', 'FBIO', 'FBRX', 'FCX', 'FLXS', 'FMC', 'FMCI', 'GME', 'GRVY', 'HAIN', 'HBM', 'HIBB', 'IEX', 'IOR', 'KFS', 'MAXR', 'MPX', 'MRTX', 'NSTG', 'NVCR', 'NVO', 'OESX', 'PENN', 'PLL', 'PRTK', 'RDY', 'REGI', 'REKR', 'SBE', 'SQM', 'TCON', 'TCS', 'TGB', 'TPTX', 'TRIL', 'UEC', 'VCEL', 'VOXX', 'WIT', 'WKHS', 'XNCR']
for symbol in stock:
adj_close[symbol] = web.get_data_yahoo([symbol],start,end)['Adj Close']
I have a list of tickers, I have got the adj close price, how can get these tickers NAME and SECTORS?
for single ticker I found in web, it can be done like as below
sbux = yf.Ticker("SBUX")
tlry = yf.Ticker("TLRY")
print(sbux.info['sector'])
print(tlry.info['sector'])
How can I make it as a dataframe that I can put the data into excel as I am doing for adj price.
Thanks a lot!
You can try this answer using a package called yahooquery. Disclaimer: I am the author of the package.
from yahooquery import Ticker
import pandas as pd
symbols = ['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS', 'AQB', 'ASPN', 'ATHM', 'AZRE', 'BCYC', 'BGNE', 'CAT', 'CC', 'CLAR', 'CLCT', 'CMBM', 'CMT', 'CRDF', 'CYD', 'DE', 'DKNG', 'EARN', 'EMN', 'FBIO', 'FBRX', 'FCX', 'FLXS', 'FMC', 'FMCI', 'GME', 'GRVY', 'HAIN', 'HBM', 'HIBB', 'IEX', 'IOR', 'KFS', 'MAXR', 'MPX', 'MRTX', 'NSTG', 'NVCR', 'NVO', 'OESX', 'PENN', 'PLL', 'PRTK', 'RDY', 'REGI', 'REKR', 'SBE', 'SQM', 'TCON', 'TCS', 'TGB', 'TPTX', 'TRIL', 'UEC', 'VCEL', 'VOXX', 'WIT', 'WKHS', 'XNCR']
# Create Ticker instance, passing symbols as first argument
# Optional asynchronous argument allows for asynchronous requests
tickers = Ticker(symbols, asynchronous=True)
data = tickers.get_modules("summaryProfile quoteType")
df = pd.DataFrame.from_dict(data).T
# flatten dicts within each column, creating new dataframes
dataframes = [pd.json_normalize([x for x in df[module] if isinstance(x, dict)]) for module in ['summaryProfile', 'quoteType']]
# concat dataframes from previous step
df = pd.concat(dataframes, axis=1)
# View columns
df.columns
Index(['address1', 'address2', 'city', 'state', 'zip', 'country', 'phone',
'fax', 'website', 'industry', 'sector', 'longBusinessSummary',
'fullTimeEmployees', 'companyOfficers', 'maxAge', 'exchange',
'quoteType', 'symbol', 'underlyingSymbol', 'shortName', 'longName',
'firstTradeDateEpochUtc', 'timeZoneFullName', 'timeZoneShortName',
'uuid', 'messageBoardId', 'gmtOffSetMilliseconds', 'maxAge'],
dtype='object')
# Data you're looking for
df[['symbol', 'shortName', 'sector']].head(10)
symbol shortName sector
0 NQZ20.CME Nasdaq 100 Dec 20 NaN
1 ALB Albemarle Corporation Basic Materials
2 AOS A.O. Smith Corporation Industrials
3 ASPN Aspen Aerogels, Inc. Industrials
4 AAU Almaden Minerals, Ltd. Basic Materials
5 ^GSPC S&P 500 NaN
6 ATHM Autohome Inc. Communication Services
7 AQB AquaBounty Technologies, Inc. Consumer Defensive
8 APPS Digital Turbine, Inc. Technology
9 BCYC Bicycle Therapeutics plc Healthcare
It processes stocks and sectors at the same time. However, some stocks do not have a sector, so an error countermeasure is added.
Since the issue column name consists of sector and issue name, we change it to a hierarchical column and update the retrieved data frame. Finally, I save it in CSV format to import it into Excel. I've only tried some of the stocks due to the large number of stocks, so there may be some issues.
import datetime
import pandas as pd
import yfinance as yf
import pandas_datareader.data as web
yf.pdr_override()
start = "2018-01-01"
end = "2019-01-01"
# symbol = ['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS', 'AQB', 'ASPN', 'ATHM', 'AZRE', 'BCYC', 'BGNE', 'CAT',
#'CC', 'CLAR', 'CLCT', 'CMBM', 'CMT', 'CRDF', 'CYD', 'DE', 'DKNG', 'EARN', 'EMN', 'FBIO', 'FBRX', 'FCX', 'FLXS',
#'FMC', 'FMCI', 'GME', 'GRVY', 'HAIN', 'HBM', 'HIBB', 'IEX', 'IOR', 'KFS', 'MAXR', 'MPX', 'MRTX', 'NSTG', 'NVCR',
#'NVO', 'OESX', 'PENN', 'PLL', 'PRTK', 'RDY', 'REGI', 'REKR', 'SBE', 'SQM', 'TCON', 'TCS', 'TGB', 'TPTX', 'TRIL',
#'UEC', 'VCEL', 'VOXX', 'WIT', 'WKHS', 'XNCR']
stock = ['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS']
adj_close = pd.DataFrame([])
for symbol in stock:
try:
sector = yf.Ticker(symbol).info['sector']
name = yf.Ticker(symbol).info['shortName']
except:
sector = 'None'
name = 'None'
adj_close[sector, symbol] = web.get_data_yahoo(symbol, start=start, end=end)['Adj Close']
idx = pd.MultiIndex.from_tuples(adj_close.columns)
adj_close.columns = idx
adj_close.head()
None Basic Materials Industrials Technology
^GSPC_None NQ=F_None AAU_None ALB_Albemarle Corporation AOS_A.O. Smith Corporation APPS_Digital Turbine, Inc.
2018-01-02 2695.810059 6514.75 1.03 125.321663 58.657742 1.79
2018-01-03 2713.060059 6584.50 1.00 125.569397 59.010468 1.87
2018-01-04 2723.989990 6603.50 0.98 124.073502 59.286930 1.86
2018-01-05 2743.149902 6667.75 1.00 125.502716 60.049587 1.96
2018-01-08 2747.709961 6688.00 0.95 130.962250 60.335583 1.96
# for excel
adj_close.to_csv('stock.csv', sep=',')

how to merge two or more list in custom order in python

I have the following code:
import pandas as pd
y = pd.ExcelFile('C:\\Users\\vibhu\\Desktop\\Training docs\\excel training\\super store data transformation\\Sample - Superstore data transformation by Vaibhav.xlsx')
superstore_orders = y.parse(sheet_name='Orders Input data')
superstore_orders.dtypes
factual_table= superstore_orders[['Order ID','Customer ID','Postal Code','Product ID','Product Name','Sales','Quantity','Discount','Profit' ]]
Order_table= superstore_orders[['Order ID','Order Date','Ship Date','Ship Mode']]
Order_table1= Order_table.drop_duplicates(subset='Order ID', keep='first', inplace=False)
Customer_table= superstore_orders[['Customer ID','Customer Name','Segment']]
Customer_table1= Customer_table.drop_duplicates(subset='Customer ID', keep='first', inplace=False)
Geographical_table= superstore_orders[['Postal Code','Country','City','State','Region']]
Geographical_table1= Geographical_table.drop_duplicates(subset='Postal Code', keep='first', inplace=False)
Product_table= superstore_orders[['Product ID','Category','Sub-Category','Product Name']]
Product_table1= Product_table.drop_duplicates(subset=['Product ID','Product Name'], keep='first', inplace=False)
Final_factual_data = pd.merge(Order_table1, factual_table, how='left', on='Order ID')
Final_factual_data = pd.merge(Customer_table1, Final_factual_data, how='left', on='Customer ID')
Final_factual_data = pd.merge(Geographical_table1,Final_factual_data,how='left', on='Postal Code')
Final_factual_data = pd.merge(Product_table1,Final_factual_data,how='left', on=['Product ID','Product Name'] )
Output is this format:- Product ID Category Sub-Category Product Name Postal Code Country City State Region Customer ID Customer Name Segment Order ID Order Date Ship Date Ship Mode Sales Quantity Discount Profit
I require reformatting in this order :
Order ID order date ship date ship mode Customer ID cutomer name segment Postal Code country city state reion Product ID Product Name product key cateory subcategory Sales Quantity Discount Profit
Final_factual_data1 = Final_factual_data [['Order ID','Order Date','Ship Date','Ship Mode','Customer ID','Customer Name','Segment','Country','City','State','Postal Code','Region','Product ID','Category','Sub-Category','Product Name','Sales','Quantity','Discount','Profit']]
this code help me to get the desired answer
Just assign the intended ordered sequence to columns attribute:
Final_factual_data.columns = ['Order ID', 'order date', 'ship date', 'ship mode', 'Customer ID', 'cutomer name', 'segment', 'Postal Code', 'country', 'city', 'state reion', 'Product ID', 'Product Name', 'product key', 'cateory', 'subcategory', 'Sales', 'Quantity', 'Discount', 'Profit']

Categories