this is my first post here, I hope you will understand what troubles me.
So, I have a DataFrame that contains prices for some 1200 companies for each day, beginning in 2010. Now I want to calculate the total return for each one. My DataFrame is indexed by date. I could use the
df.iloc[-1]/df.iloc[0] method, but some companies started trading publicly at a later date, so I can't get the results for those companies, as they are divided by a NaN value. I've tried by creating a list which contains the first valid indexes for every stock(column), then when I try to calculate the total returns, I get - the wrong result!
I've tried a classic for loop:
for l in list:
returns = df.iloc[-1]/df.iloc[l]
For instance, last price of one stock was around $16, and first data I have is $1.5, which would be over 10 times return, yet my result is only about 1.1! I would also like to add that the aforementioned list includes first valid indexes for Date aswell, and it is in the first position.
Can somebody please help me? Thank you very much
Many ways you can go about this actually. But I do recommend you brush up on your python skills with basic examples before you get into more complicated examples.
If you want to do it your way, you can do it like this:
returns = {}
for stock_name in df.columns:
returns[stock_name] = df[stock_name].dropna().iloc[-1] / df[stock_name].dropna().iloc[0]
A more pythonic way would be to do it in a vectorized form, like this:
returns = ((1 + data.ffill().pct_change())
.cumprod()
.iloc[-1])
Related
first of all, I'm quite new to programming overall (< 2 Months), so I'm sorry if that's an 'simple, no need to ask for help, try it yourself until you get it done' problem.
I have two data-frames with partially the same content (general overview of mobile-numbers including their cost centers in the company and monthly invoices with the affected mobile-numbers and their invoice amount).
I'd like to compare the content of the 'mobile-numbers' column of the monthly invoices DF to the content of the 'mobile-numbers' column of the general overview DF and if matching, assign the respective cost center to the mobile-number in the monthly invoices DF.
I'd love to share my code with you, but unfortunately I have absolutely zero clue how to solve that problem in any way.
Thanks
Edit: I'm from germany, I tried my best to explain the problem in english. If there is anything I messed up (so u dont get it) just tell me :)
Example of desired result
program meets your needs, in the second dataframe I put the value '40' to demonstrate that the dataframes already filled will not be zeroed, the replacement will only occur if there is a similar value between the dataframes, if you want a better explanation about the program , comment below, and don't forget to vote and mark as solved, I also put some 'prints' for a better view, but in general they are not necessary
import pandas as pd
general_df = pd.DataFrame({"mobile_number": [1234,3456,6545,4534,9874],
"cost_center": ['23F','67F','32W','42W','98W']})
invoice_df = pd.DataFrame({"mobile_number": [4534,5567,1234,4871,1298],
"invoice_amount": ['19,99E','19,99E','19,99E','19,99E','19,99E'],
"cost_center": ['','','','','40']})
print(f"""GENERAL OVERVIEW DF
{general_df}
________________________________________
INVOICE DF
{invoice_df}
_________________________________________
INVOICE RESULT
""")
def func(line):
t = 0
for x in range(0, len(general_df['mobile_number'])):
t = general_df.loc[general_df['mobile_number'] == line[0]]
if t.empty:
return line[2]
else:
return t.values.tolist()[0][1]
invoice_df['cost_center'] = invoice_df.apply(func, axis = 1)
print(invoice_df)
I'm fairly new to Python and still learning the ropes, so I need help with a step by step program without using any functions. I understand how to count through an unknown column range and output the quantity. However, for this program, I'm trying to loop through a column, picking out unique numbers and counting its frequency.
So I have an excel file with random numbers down column A. I only put in 20 numbers but let's pretend the range is unknown. How would I go about extracting the unique numbers and inputting them into a separate column along with how many times they appeared in the list?
I'm not really sure how to go about this. :/
unique = 1
while xw.Range((unique,1)).value != None:
frequency = 0
if unique != unique: break
quantity += 1
"end"
I presume as you can't use functions this may be homework...so, high level:
You could first go through the column and then put all the values in a list?
Secondly take the first value from the list and go through the rest of the list - is it in there? If so then it is not unique. Now remove the value where you have found the duplicate from the list. Keep going if you find another remove that too.
Take the second value and so on?
You would just need list comprehension, some loops and perhaps .pop()
Using pandas library would be the easiest way to do. I created a sample excel sheet having only one column called "Random_num"
import pandas
data = pandas.read_excel("sample.xlsx", sheet_name = "Sheet1")
print(data.head()) # This would give you a sneak peek of your data
print(data['Random_num'].value_counts()) # This would solve the problem you asked for
# Make sure to pass your column name within the quotation marks
#eg: data['your_column'].value_counts()
Thanks
I am trying to find the values inside dataframe that has been grouped.
I grouped payment data with time the person borrowed the money and months it took for the person to pay and summed the amount they paid. My goal is to find the list of months it took for people to pay back.
For example, how can I know the list of 'month_taken' when start_yyyymm is 201807?
payment_sum_monthly =
payment_data.groupby(['start_yyyymm','month_taken'])
[['amount']].sum()
If I use R and put the payment data in data.table form, I can find out the list of month_taken by
payment_sum_monthly[start_yyyymm == '201807',month_taken]
How can I do this in Python? Thanks.
is_date = payment_data['start_yyyymm'] == "201807"
It should give you all the entities that has 'start_yyyymm' is 201807. Then to call those entities, you can code following:
date_set = payment_data[is_date].copy()
payment_sum_monthly = date_set .groupby('month_taken').aggregate(sum)
payment_sum_monthly
And if you need one more condition you can do following:
condition2 = payment_data['column name'] == condition
payment_data[is_date & condition2]
I hope I got your question right and it helps
I'd like to find a simple way to increment values in in one column that correspond to a particular date in Pandas. This is what I have so far:
old_casual_value = tstDF['casual'].loc[tstDF['ds'] == '2012-10-08'].values[0]
old_registered_value = tstDF['registered'].loc[tstDF['ds'] == '2012-10-08'].values[0]
# Adjusting the numbers of customers for that day.
tstDF.set_value(406, 'casual', old_casual_value*1.05)
tstDF.set_value(406, 'registered', old_registered_value*1.05)
If I could find a better and simpler way to do this (a one-liner), I'd greatly appreciate it.
Thanks for your help.
The following one liner should work based on your limited description of your problem. If not, please provide more information.
#The code below first filters out the records based on specified date and then increase casual and regisitered column values by 1.05 times.
tstDF.loc[tstDF['ds'] == '2012-10-08',['casual','registered']]*=1.05
I have two dataframes with different lengths(df,df1). They share one similar label "collo_number". I want to search the second dataframe for every collo_number in the first data frame. Problem is that the second date frame contains multiple rows for different dates for every collo_nummer. So i want to sum these dates and add this in a new column in the first database.
I now use a loop but it is rather slow and has to perform this operation for al 7 days in a week. Is there a way to get a better performance? I tried multiple solutions but keep getting the error that i cannot use the equal sign for two databases with different lenghts. Help would really be appreciated! Here is an example of what is working but with a rather bad performance.
df5=[df1.loc[(df1.index == nasa) & (df1.afleverdag == x1) & (df1.ind_init_actie=="N"), "aantal_colli"].sum() for nasa in df.collonr]
Your description is a bit vague (hence my comment). First what you good do is to select the rows of the dataframe that you want to search:
dftmp = df1[(df1.afleverdag==x1) & (df1.ind_init_actie=='N')]
so that you don't do this for every item in the loop.
Second, use .groupby.
newseries = dftmp['aantal_colli'].groupby(dftmp.index).sum()
newseries = newseries.ix[df.collonr.unique()]