How to add a DataFrame to MultiIndex DataFrame columns? - python

import pandas as pd
import yfinance as yf
import numpy as np
tickers = ['BRIGADE.NS', 'DLF.NS', 'GODREJPROP.NS', 'OBEROIRLTY.NS', 'PRESTIGE.NS']
Tickers_Data = yf.download(tickers, period='5y')
display(Tickers_Data)
Ret_Log = np.log(Tickers_Data['Adj Close'] / Tickers_Data['Adj Close'].shift(1))
Ret_Cumulative = Ret_Log.cumsum().apply(np.exp)
Ret_Absolute = Tickers_Data['Adj Close'].pct_change()
MA50 = Tickers_Data['Open'].rolling(50).mean()
MA200 = Tickers_Data['Open'].rolling(200).mean()
display(Tickers_Data.columns)
I want to add Ret_Log, Ret_Cumulative, Ret_Absolute, MA50, MA200 at the end of my MultiIndex DataFrame Tickers_Data. If I run for loop (added at the end of code) I get the expected output but I need to achieve this without a for loop.
Any help is highly appreciated.
Thanks in advance.
If I use the code below, I get the expected output but I need to achieve this without a for loop
for i in range(len(tickers)):
tickers_data['MA50',tickers[i]] = tickers_data['Open'][tickers.index[i]].rolling(50).mean()
for i in range(len(tickers)):
tickers_data['MA200',tickers[i]] = tickers_data['Open'][tickers.index[i]].rolling(200).mean()

Related

Python Panda drop() function not working after dictionary conversion; using Yfinance API

Using the yfinance API I pulled data from there option chain object and converted it to a dictionary. I tried to delete all rows that contained "True" in the column labeled "inTheMoney" however when I run the program it does not do so.
import yfinance as yf
import pandas as pd
price = 100
ticker = yf.Ticker("SPY")
opt = ticker.option_chain('2022-11-18')
df = pd.DataFrame(opt.puts)
#df = df.drop(df[(df['inTheMoney'] != 'True')].index)
df = df.drop(['contractSymbol', 'lastTradeDate', 'change', 'percentChange', 'volume', 'openInterest', 'impliedVolatility', 'contractSize', 'currency'], axis = 1)
print(df)
I also tried to use a for loop and loc but that did not work either.
for index in range(len(df)):
#print(df.loc[index, 'strike'])
if df.loc[index, 'strike'] < 100:
print(df.loc[index])
Any help is greatly appreciated
just:
df = df.drop(df[(df['inTheMoney'] != True)].index) #do not use quotes

Delete rows < x to create plot in Pandas

I have a dataframe like this in a .csv:
Consequence,N_samples
A,227
B,413
C,194
D,1
E,1610
F,10
G,7
H,1
I,1
J,5
K,1
L,5
M,5
N,30
O,7
P,3
And I want to make a plot pie out of it, but grouping all values lower than 150 into "Other" category. I've tried running this code but it's not working.
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plotother = {'Consequence' : 'Other', 'N_samples':0}
df=pd.read_csv('df.csv', sep=',')
df = df.append(other,ignore_index=True)
for i in df:
if (x in df['N_samples']) < 150:
df['N_samples'].iloc[-1]=df['N_samples'].iloc[-1] + (x in df['N_samples'])
df.drop([x])
df.plot.pie(label="", title="Consequence", startangle=90);
plot.savefig('Consequence.svg')
Once I run it I get the following error:
KeyError: "['Consequence'] not found in axis"
I would really appreciate any help.
You are making it more difficult than it is.
First get all the rows, where sample size is below 150:
small_sizes = df[df['N_Samples']<150]
The sum up their values:
other_samples = small_sizes['N_Samples'].sum()
Finally drop the rows and add the other row:
df = df[~df['N_Samples']<150]
df.loc['other','N_samples'] = other_samples
That should do the trick.
you can do this as follows:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('df.csv')
collect the rows <150 into a new df:
df_other=pd.DataFrame([{'Consequence':'Other','N_samples':df[df.N_samples<150].N_samples.sum()}])
add that to the rows >= 150 and plot
df2=df[df.N_samples>=150]
df3=pd.concat([df2,df_other],axis=0)
df3.plot.pie(y='N_samples',labels=df3['Consequence'])
plt.show()
if you find yourself iterating thru a dataframe, be aware there's often a built-in way to do whatever you're trying to do.
Define your filtering condition:
cond = df.N_samples < 150
Sum values from filtering condition:
other_sum = df.N_samples[cond].sum()
Filter by opposite to condition and add 'other' row at the bottom in the same line:
df = df.loc[~cond].append({'Consequence': 'other', 'N_samples': other_sum}, ignore_index=True)

Count occurrences of number from specific column in python

I am trying to do the equivalent of a COUNTIF() function in excel. I am stuck at how to tell the .count() function to read from a specific column in excel.
I have
df = pd.read_csv('testdata.csv')
df.count('1')
but this does not work, and even if it did it is not specific enough.
I am thinking I may have to use read_csv to read specific columns individually.
Example:
Column name
4
4
3
2
4
1
the function would output that there is one '1' and I could run it again and find out that there are three '4' answers. etc.
I got it to work! Thank you
I used:
print (df.col.value_counts().loc['x']
Here is an example of a simple 'countif' recipe you could try:
import pandas as pd
def countif(rng, criteria):
return rng.eq(criteria).sum()
Example use
df = pd.DataFrame({'column1': [4,4,3,2,4,1],
'column2': [1,2,3,4,5,6]})
countif(df['column1'], 1)
If all else fails, why not try something like this?
import numpy as np
import pandas
import matplotlib.pyplot as plt
df = pandas.DataFrame(data=np.random.randint(0, 100, size=100), columns=["col1"])
counters = {}
for i in range(len(df)):
if df.iloc[i]["col1"] in counters:
counters[df.iloc[i]["col1"]] += 1
else:
counters[df.iloc[i]["col1"]] = 1
print(counters)
plt.bar(counters.keys(), counters.values())
plt.show()

CSV Storing data incorrectly or is it just me?

So pretty much I have a script that takes one set of a time series, drops the time and some other information from the second time series, and then adds it to the outer end of a csv file. The problem I am having is it is constantly storing 3 blank ,,,,,,, lines at the end of my file, but updates the lines as the script goes on. The code is this:
import pandas as pd
import time
def compiler():
for i in range(1000):
# Read File
df = pd.read_csv(r'C:/Users/J/Desktop/dropmarketdata/xz.csv')
# Remove useless info
df.pop('cached')
df.pop('id')
df.pop('name')
df.pop('last_updated')
df.pop('max_supply')
# Read 2nd file
ohlc = pd.read_csv(r'C:/Users/J/Desktop/dropmarketdata/ohlc/ohlc.csv')
main_df = pd.DataFrame()
# Drop datetime because im already indexing by it on the other file
del ohlc['datetime']
# join to outside or at the end of each lines where both files have
# the same number of lines
main_df = df.join(ohlc, how='outer')
main_df.set_index('datetime', inplace=True)
main_df.to_csv(r'C:/Users/J/Desktop/dropmarketdata/
ohlcomp.csv', float_format='%.8f')
print('saving....')
time.sleep(900)
print('15m has surpassed....')
compiler()
The problem is my file always looks like this:
2018-04-16 01:57:09.021924,85409.30000000,18473609990.00000000,77146350.00000000,-0.11000000,-1.92000000,-7.11000000,0.00000052,0.00417603,147,DROP,30000000000.00000000,,,,,
2018-04-16 02:12:10.098678,85061.30000000,18473609990.00000000,74266498.00000000,-4.09000000,-5.59000000,-10.38000000,0.00000050,0.00402014,148,DROP,30000000000.00000000,,,,,
2018-04-16 02:27:10.916329,87757.50000000,18473609990.00000000,76921156.00000000,1.22000000,-2.24000000,-6.99000000,0.00000052,0.00416384,147,DROP,30000000000.00000000,,,,,
Each row indexed by date, Where all ,,,,, are at the end of the rows are actually supposed to have H,L,O,C data. I'm quite new to python so sorry if this sounds like a dumb question. Thanks for the help.
EDIT:
For anyone who needs to stream the data on their own this code should work
import pandas as pd
import time
from datetime import datetime
import coinmarketcap
from coinmarketcap import Market
import ccxt
def compiler():
# Read Filed
df = pd.read_csv('other.csv')
ohlc = pd.read_csv('ohlc.csv')
# Remove useless info
df.pop('cached')
df.pop('id')
df.pop('name')
df.pop('last_updated')
df.pop('max_supply')
main_df = pd.DataFrame()
# Drop datetime because im already indexing by it on the other file
del ohlc['datetime']
# join to outside or at the end of each lines where both files have
# the same number of lines
main_df = df.join(ohlc, how='outer')
main_df.set_index('datetime', inplace=True)
main_df.to_csv('file.csv', float_format='%.8f')
print('saving compiled list....')
def collect1():
#pulling from tidex
tidex = ccxt.tidex()
tidex.load_markets(True)
ticker = tidex.fetch_ticker('DROP/BTC')
ticker_df = pd.DataFrame(ticker, index=['f'], columns=['ask', 'bid', 'close', 'high', 'low', 'datetime'])
ticker_df['ask'] = '%.8f' % ticker_df['ask']
ticker_df['bid'] = '%.8f' % ticker_df['bid']
ticker_df['close'] = '%.8f' % ticker_df['close']
ticker_df['high'] = '%.8f' % ticker_df['high']
ticker_df['low'] = '%.8f' % ticker_df['low']
ticker_df.loc[:, 'datetime'] = pd.Series("{:}".format(datetime.now()), index=ticker_df.index)
ticker_df.set_index(pd.DatetimeIndex(ticker_df.loc[:, 'datetime']), inplace=True)
ticker_df.pop('datetime')
ticker_df.to_csv('ohlc.csv', float_format='%.8f')
def collect2():
#pulling information from coinmarketcap
market = Market()
ticker2 = market.ticker("dropil")
dropArray = pd.DataFrame(ticker2)
dropArray.loc[:, 'datetime'] = pd.Series("{:}".format(datetime.now()), index=dropArray.index)
dropArray.reset_index()
dropArray.set_index(pd.DatetimeIndex(dropArray.loc[:, 'datetime']), inplace=True)
dropArray.pop('datetime')
dropArray.to_csv('other.csv', float_format='%.8f')
for i in range(1000):
collect1()
collect2()
compiler()
time.sleep(900)

Grouping levels - multiindex in python pandas pivot_table

I've a multiindex dataframe in pandas that looks this (created using pivot_table):
I need help to add a level above (or below) the Date level showing the day of the date like this:
I know I can get the day of a date like this:
lt.DATE.dt.strftime('%a')
#lt is a dataframe and DATE is a column it.
Here is the code reporduce a similar pivot_table:
import pandas as pd
import numpy as np
dlist = pd.date_range('2015-01-01',periods=5)
df = pd.DataFrame(dlist, columns=['DATE'])
df['EC'] = range(7033,7033+len(df))
df['HS'] = np.random.randint(0,9,5)
df['AH'] = np.random.randint(0,9,5)
pv = pd.pivot_table(df, columns=[df.DATE, 'EC'], values=['HS','AH'])
pv = pv.unstack(level=1).unstack(level=0)
I got the solution! Here it goes:
import pandas as pd
import numpy as np
dlist = pd.date_range('2015-01-01',periods=5)
df = pd.DataFrame(dlist, columns=['DATE'])
df['EC'] = range(7033,7033+len(df))
df['HS'] = np.random.randint(0,9,5)
df['AH'] = np.random.randint(0,9,5)
df['DAY'] = df.DATE.dt.strftime('%a')
pv = pd.pivot_table(df, columns=[df.DATE.dt.date, df.DAY, 'EC'], values=['HS','AH'])
pv = pv.unstack(level=[1,2]).unstack(level=0)
pv.to_excel('solution.xlsx')
And it produces an output like this:
Pay attention to the function unstack and set the list of levels that are required to be unstacked at a time.

Categories