How to only extract Close prices from Dataframe Pandas

How to only extract Close prices from Dataframe Pandas - python

I can't find a method to loop over my data frame (df_yf) and extract all the "Close" prices and create a new df_adj. The df is group_by coin price.
Initially, I tried something like but throwing me error.
for i in range(len(df_yf.columns):
df_adj.append(df_yf[i]["Close"])
Also tried using .get and .filter but throws me errors
"list indices must be integers or slices, not str; perhaps you missed
a comma?"
EDIT!!
Thank you for the answers. It made me realize my mistake :D. I shouldn't group_by tickers so I changed it to group_by prices (Low, Close etc.) and then was able to simply extract the right columns by doing df_adj = df_yf["Close"] as was mentioned

df_adj = np.array(df_yf["Close"])

dataframe from tables will use dict to extract columns, and then use values to get ndarray form.
df_adj = df_yf["Close"].values

If you group by Tickers, you could use:
df_adj = pd.DataFrame()
for i in [ticker[0] for ticker in df_yf]:
df_adj[i] = df_yf[i]['Close']
Result:
Ticker1 Ticker2 Ticker3
0 1 1 1
1 3 3 3

Related

sort values using calculation in pandas python

i wanted to sort the length of the message of the column without adding new column in dataframes.tried the below method and didnt work ..is there any way to sort values based on the any custom function.
df.sort_values(df['message'].apply(len),ascending=False)
Regards,
Michael

You can use the len() of the string (message) as the key parameter in sort_values().
Consider a random df:
df = pd.DataFrame({'messages':['come here please as I need you','why would i come there','fine i will be there soon']})
df
messages
0 come here please as I need you
1 why would i come there
2 fine i will be there soon
Use:
df.sort_values(by='messages', key=lambda x: x.str.len(),ascending=False,inplace=True)
df
messages
0 come here please as I need you
2 file i will be there soon
1 why would i come there
So you were almost there.
For more information on the parameters on sort_values check this link.

l1=['ab','effg','hjj','klllh','m','n','abbc']
df=pd.DataFrame(data={'message':l1})
df['message']=list(df['message'][df['messsage'].apply(lambda x:len(x)).sort_values(ascending=False).index])
print(df)
get the sorted index and apply this index on messages
https://i.stack.imgur.com/iyxzL.png

Python loop through two dataframes and find similar column

I am currently working on a project where my goal is to get the game scores for each NCAA mens basketball game. In order to do this, I need to use the python package sportsreference. I need to use two dataframes, one called df which has the game date and one called box_index (shown below) which has the unique link of each game. I need to get the date column replaced by the unique link of each game. These unique links start with the date (formatted exactly as in the date column of df), which makes it easier to do this with regex or the .contains(). I keep getting a Keyerror: 0 error. Can someone help me figure out what is wrong with my logic below?
from sportsreference.ncaab.schedule import Schedule
def get_team_schedule(name):
combined =Schedule(name).dataframe
box_index = combined["boxscore_index"]
box = box_index.to_frame()
#print(box)
for i in range(len(df)):
for j in range(len(box)):
if box.loc[i,"boxscore_index"].contains(df.loc[i, "date"]):
df.loc[i,"date"] = box.loc[i,"boxscore_index"]
get_team_schedule("Virginia")

It seems like "box" and "df" are pandas data frame, and since you are iterating through all the rows, it may be more efficient to use iterrows (instead of searching by index with ".loc")
for i, row_df in df.iterrows():
for j, row_box in box.iterrows():
if row_box["boxscore_index"].contains(row_df["date"]):
df.at[i, 'date'] = row_box["boxscore_index"]
the ".at" function will overwrite the value at a given cell

Just fyi, iterrows is more efficient than .loc., however itertuples is about 10x faster, and zip about 100xs.
The Keyerror: 0 error is saying you can't get that row at index 0, because there is no index value of 0 using box.loc[i,"boxscore_index"] (the index values are the dates, for example '2020-12-22-14-virginia'). You could use .iloc. though, like box.iloc[i]["boxscore_index"]. You'd have to convert all the .loc to that.
Like the other post said though, I wouldn't go that path. I actually wouldn't even use iterrows here. I would put the box_index into a list, then iterarte through that. Then use pandas to filter your df dataframe. I'm sort of making some assumptions of what df looks like, so if this doesn't work, or not what you looking to do, please share some sample rows of df:
from sportsreference.ncaab.schedule import Schedule
def get_team_schedule(name):
combined = Schedule(name).dataframe
box_index_list = list(combined["boxscore_index"])
for box_index in box_index_list:
temp_game_data = df[df["date"] == boxscore_index]
print(box_index)
print(temp_game_data,'\n')
get_team_schedule("Virginia")

how to divide pandas dataframe into different dataframes based on unique values from one column and itterate over that?

I have a dataframe with three columns
The first column has 3 unique values I used the below code to create unique dataframes, However I am unable to iterate over that dataframe and not sure how to use that to iterate.
df = pd.read_excel("input.xlsx")
unique_groups = list(df.iloc[:,0].unique()) ### lets assume Unique values are 0,1,2
mtlist = []
for index, value in enumerate(unique_groups):
globals()['df%s' % index] = df[df.iloc[:,0] == value]
mtlist.append('df%s' % index)
print(mtlist)
O/P
['df0', 'df1', 'df2']
for example lets say I want to find out the length of the first unique dataframe
if I manually type the name of the DF I get the correct output
len(df0)
O/P
35
But I am trying to automate the code so technically I want to find the length and itterate over that dataframe normally as i would by typing the name.
What I'm looking for is
if I try the below code
len('df%s' % 0)
I want to get the actual length of the dataframe instead of the length of the string.
Could someone please guide me how to do this?
I have also tried to create a Dictionary using the below code but I cant figure out how to iterate over the dictionary when the DF columns are more than two, where key would be the unique group and the value containes the two columns in same line.
df = pd.read_excel("input.xlsx")
unique_groups = list(df["Assignment Group"].unique())
length_of_unique_groups = len(unique_groups)
mtlist = []
df_dict = {name: df.loc[df['Assignment Group'] == name] for name in unique_groups}
Can someone please provide a better solution?
UPDATE
SAMPLE DATA
Assignment_group Description Document
Group A Text to be updated on the ticket 1 doc1.pdf
Group B Text to be updated on the ticket 2 doc2.pdf
Group A Text to be updated on the ticket 3 doc3.pdf
Group B Text to be updated on the ticket 4 doc4.pdf
Group A Text to be updated on the ticket 5 doc5.pdf
Group B Text to be updated on the ticket 6 doc6.pdf
Group C Text to be updated on the ticket 7 doc7.pdf
Group C Text to be updated on the ticket 8 doc8.pdf
Lets assume there are 100 rows of data
I'm trying to automate ServiceNow ticket creation with the above data.
So my end goal is GROUP A tickets should go to one group, however for each description an unique task has to be created, but we can club 10 task once and submit as one request so if I divide the df's into different df based on the Assignment_group it would be easier to iterate over(thats the only idea which i could think of)
For example lets say we have REQUEST001
within that request it will have multiple sub tasks such as STASK001,STASK002 ... STASK010.
hope this helps

Your problem is easily solved by groupby: one of the most useful tools in pandas. :
length_of_unique_groups = df.groupby('Assignment Group').size()
You can do all kind of operations (sum, count, std, etc) on your remaining columns, like getting the mean value of price for each group if that was a column.

I think you want to try something like len(eval('df%s' % 0))

Remove excess info from dataframe

I have a big data frame that contains 6 columns. When I want to print the info out of one cell, I use the following code:
df = pd.read_excel(Path_files_data)
info_rol = df.loc[df.Rank == Ranknumber]
print(info_rol['Art_Nr'])
Here Rank is the column that gives the rank of every item and Ranknumber is the Rank of the item i try to look up. How what i get back looks like this:
0 10399
Name: Art_Nr, dtype: object
Here 0 is the rank and 10399 is Art_Nr. How do I get it to work that it only printsout the Art_Nr. and leaves al the crap like dtype: object.
PS. I tried strip but that didnt work for me.

I think you need select first value of Series by iat or iloc for scalar:
print(info_rol['Art_Nr'].iat[0])
print(info_rol['Art_Nr'].iloc[0])
If string or numeric output:
print(info_rol['Art_Nr'].values[0])
But after filtering is possible you get multiple values, then second, third.. values are lost.
So converting to list is more general solution:
print(info_rol['Art_Nr'].tolist())

Iterating through dataframe to produce chart title (Python)

Good morning,
I am trying to iterate through a CSV to produce a title for each stock chart that I am making.
The CSV is formatted as: Ticker, Description spanning about 200 rows.
The code is shown below:
df_symbol_description = pd.read_csv('C:/TS/Combined/Tickers Desc.csv')
print(df_symbol_description['Description'])
for r in df_symbol_description['Description']:
plt.suptitle(df_symbol_description['Description'][r],size = '20')
It is erroneous as it comes back with this error: "KeyError: 'iShrs MSCI ACWI ETF'"
This error is just showing me the first ticker description in the CSV. If anyone knows how to fix this is is much appreciated!
Thank you

I don't know how to fix the error, since it's unclear what you are trying to achieve, but we can have a look at the problem itself.
Consider this example, which is essentially your code in small.
import pandas as pd
df=pd.DataFrame({"x" : ["A","B","C"]})
for r in df['x']:
print(r, df['x'][r])
The dataframe consists of one column, called x which contains the values "A","B","C". In the for loop you select those values, such that for the first iteration r is "A". You are then using "A" as an index to the column, which is not possible, since the column would need to be indexed by 0,1 or 2, but not the string that it contains.
So in order to print the column values, you can simply use
for r in df['x']:
print(r)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to only extract Close prices from Dataframe Pandas - python

df_adj = np.array(df_yf["Close"])

dataframe from tables will use dict to extract columns, and then use values to get ndarray form. df_adj = df_yf["Close"].values

If you group by Tickers, you could use: df_adj = pd.DataFrame() for i in [ticker[0] for ticker in df_yf]: df_adj[i] = df_yf[i]['Close'] Result: Ticker1 Ticker2 Ticker3 0 1 1 1 1 3 3 3

Related

sort values using calculation in pandas python

Python loop through two dataframes and find similar column

how to divide pandas dataframe into different dataframes based on unique values from one column and itterate over that?

Remove excess info from dataframe

Iterating through dataframe to produce chart title (Python)

Categories

Resources