Summing rows based on keyword within index - python

I am trying to sum multiple rows together based on a keyword that is part of the index - but it is not the entire index. For example, the index could look like
Count
1234_Banana_Green 43
4321_Banana_Yellow 34
2244_Banana_Brown 23
12345_Apple_Red 45
I would like to sum all of the rows that have the same "keyword" within them and create a total "banana" row. Is there a way to do this without searching for the keyword "banana"? For my purposes, this keyword changes every time and I would like to be able to automate this summing process. Any help is very much appreciated.

May be this:
df.groupby(df.index.to_series()
.str.split('_', expand=True)[1]
)['Count'].sum()
Output:
1
Apple 45
Banana 100
Name: Count, dtype: int64

Given the following dataframe:
raw_data = {'id': ['1234_Banana_Green', '4321_Banana_Yellow',
'2244_Banana_Brown', '12345_Apple_Red',
'1267_Apple_Blue']}
df = pd.DataFrame(raw_data).set_index(['id'])
Try this code:
df = df.reset_index()
df['extracted_keyword'] = df['id'].apply(lambda x: x.split('_')[1])
df.groupby(["extracted_keyword"]).count()
And gives:
id
extracted_keyword
Apple 2
Banana 3
if you want restore the index, add in the end:
df = df.set_index(['id'])

Related

Create new dataframes in python pandas based on the value of a column

I have a dataset that looks like that:
There are 15 unique values in the column 'query id', so I am trying to create new dataframes for each unique value. I thought of having a loop for every unique value in column 'query id' with a code like this:
df_list = []
i = 0
for x in df['query id'].unique():
df{i} = pd.DataFrame(columns=df.columns)
df_list.append()
i+=1
But I am definitely doing something wrong there and got stuck. Do you have any ideas of how to do that?
Sample dataset:
relevance query id 1 2 3
1 WT04-170 10 40 80
1 WT04-170 20 60 70
1 WT04-176 30 70 50
1 WT04-176 40 90 20
1 WT04-173 50 100 10
Pandas has a built-in function for iterating unique values in a column and selecting the matching rows. The function is groupby
In your case, you can create the dictionary as a one-liner using:
dfs = {query_id: grp.copy() for query_id, grp in df.groupby("query id")}
Once you have your dictionary of dataframes, you can access each one using the query id as your key:
my_df = dfs["WT04-170"] # Access each dataframe using the appropriate key
my_df.describe() # Do your work with the dataframe here
Does something like this help?
df_list = []
for x in set(df['query id'].to_list()):
df = df[df['query id'] == x].copy()
df_list.append(df)
It sounds like what you want is a filtered dataframe for each unique query id. So you would end up with 15 dataframes, each containing only the rows for that specific query id from the combined df. Is that right?
In that case, your approach is close, but you could just filter the df in your loop. I also used a dict to store the resulting dataframes too, but you could do it with the list as well.
If my understanding of what you're looking for is correct, I think this should work for you:
df_dict = {}
for (i,x) in enumerate(df['query id'].unique()):
df_dict[i] = df[df['query id']==x].copy()
You could also just use the query_ids as the dict keys too, like this:
df_dict = {}
for x in df['query id'].unique():
df_dict[x] = df[df['query id']==x].copy()

Pandas - partition a dataframe into two groups with an approximate mean value

I want to split all rows into two groups that have similar means.
I have a dataframe of about 50 rows but this could go into several thousands with a column of interest called 'value'.
value total bucket
300048137 3.0741 3.0741 0
352969997 2.1024 5.1765 0
abc13.com 4.5237 9.7002 0
abc7.com 5.8202 15.5204 0
abcnews.go.com 6.7270 22.2474 0
........
www.legacy.com 12.6609 263.0797 1
www.math-aids.com 10.9832 274.0629 1
So far I tried using cumulative sum for which total column was created then I essentially made the split based on where the mid-point of the total column is. Based on this solution.
test['total'] = test['value'].cumsum()
df_sum = test['value'].sum()//2
test['bucket'] = np.where(test['total'] <= df_sum, 0,1)
If I try to group them and take the average for each group then the difference is quite significant
display(test.groupby('bucket')['value'].mean())
bucket
0 7.456262
1 10.773905
Is there a way I could achieve this partition based on means instead of sums? I was thinking about using expanding means from pandas but couldn't find a proper way to do it.
I am not sure I understand what you are trying to do, but possibly you want to groupy by quantiles of a column. If so:
test['bucket'] = pd.qcut(test['value'], q=2, labels=False)
which will have bucket=0 for the half of rows with the lesser value values. And 1 for the rest. By tweakign the q parameter you can have as many groups as you want (as long as <= number of rows).
Edit:
New attemp, now that I think I understand better your aim:
df = pd.DataFrame( {'value':pd.np.arange(100)})
df['group'] = df['value'].argsort().mod(2)
df.groupby('group')['value'].mean()
# group
# 0 49
# 1 50
# Name: value, dtype: int64
​
df['group'] = df['value'].argsort().mod(3)
df.groupby('group')['value'].mean()
#group
# 0 49.5
# 1 49.0
# 2 50.0
# Name: value, dtype: float64

Python, lambda function as argument for groupby

I'm trying to figure out what a piece of code is doing, but I'm getting kinda lost on it.
I have a pandas dataframe, which has been loaded by the following .csv file:
origin_census_block_group,date_range_start,date_range_end,device_count,distance_traveled_from_home,bucketed_distance_traveled,median_dwell_at_bucketed_distance_traveled,completely_home_device_count,median_home_dwell_time,bucketed_home_dwell_time,at_home_by_each_hour,part_time_work_behavior_devices,full_time_work_behavior_devices,destination_cbgs,delivery_behavior_devices,median_non_home_dwell_time,candidate_device_count,bucketed_away_from_home_time,median_percentage_time_home,bucketed_percentage_time_home,mean_home_dwell_time,mean_non_home_dwell_time,mean_distance_traveled_from_home
010539707003,2020-06-25T00:00:00-05:00,2020-06-26T00:00:00-05:00,49,626,"{""16001-50000"":5,""0"":11,"">50000"":4,""2001-8000"":3,""1-1000"":9,""1001-2000"":7,""8001-16000"":1}","{""16001-50000"":110,"">50000"":155,""<1000"":40,""2001-8000"":237,""1001-2000"":27,""8001-16000"":180}",12,627,"{""721-1080"":11,""361-720"":9,""61-360"":1,""<60"":11,"">1080"":12}","[32,32,28,30,30,31,27,23,20,20,20,17,19,19,15,14,17,20,20,21,25,22,24,23]",7,3,"{""120330012011"":1,""010030107031"":1,""010030114052"":2,""120330038001"":1,""010539701003"":1,""010030108001"":1,""010539707002"":14,""010539705003"":2,""120330015001"":1,""121130102003"":1,""010539701002"":1,""120330040001"":1,""370350101014"":2,""120330033081"":2,""010030106003"":1,""010539706001"":2,""010539707004"":3,""120330039001"":1,""010539699003"":1,""120330030003"":1,""010539707003"":41,""010970029003"":1,""010539705004"":1,""120330009002"":1,""010539705001"":3,""010539704003"":1,""120330028012"":1,""120330035081"":1,""120330036102"":1,""120330036142"":1,""010030114062"":1,""010539706004"":7,""010539706002"":1,""120330036082"":1,""010539707001"":7,""010030102001"":1,""120330028011"":1}",2,241,71,"{""21-45"":4,""481-540"":2,""541-600"":1,""721-840"":1,""1201-1320"":1,""301-360"":3,""<20"":13,""61-120"":3,""241-300"":3,""121-180"":1,""421-480"":3,""1321-1440"":4,""1081-1200"":1,""961-1080"":2,""601-660"":1,""181-240"":1,""661-720"":2,""361-420"":3}",72,"{""0-25"":13,""76-100"":21,""51-75"":6,""26-50"":3}",657,413,1936
010730144081,2020-06-25T00:00:00-05:00,2020-06-26T00:00:00-05:00,139,2211,"{""16001-50000"":17,""0"":41,"">50000"":15,""2001-8000"":22,""1-1000"":8,""1001-2000"":12,""8001-16000"":24}","{""16001-50000"":143,"">50000"":104,""<1000"":132,""2001-8000"":39,""1001-2000"":15,""8001-16000"":102}",41,806,"{""721-1080"":32,""361-720"":16,""61-360"":12,""<60"":30,"">1080"":46}","[91,92,93,91,91,90,86,83,78,64,64,61,64,62,65,62,60,74,61,64,75,78,81,84]",8,6,"{""131350501064"":1,""131350502151"":1,""010730102002"":1,""011170302131"":2,""010730038024"":1,""010730108041"":1,""010730144133"":1,""010730132003"":1,""011210118002"":1,""011170303053"":1,""010730111084"":2,""011170302142"":1,""010730119011"":1,""010730129063"":2,""010730107063"":1,""010730059083"":1,""010730058003"":1,""011270204003"":1,""010730049012"":2,""130879701001"":1,""010730120021"":1,""130890219133"":1,""010730144082"":4,""170310301031"":1,""010730129112"":1,""010730024002"":1,""011170303034"":2,""481390616004"":1,""121270826052"":1,""010730128021"":2,""121270825073"":1,""010730004004"":1,""211959313002"":1,""010730100012"":1,""011170302151"":1,""010730142041"":1,""010730129123"":1,""010730129084"":1,""010730042002"":1,""010730059033"":2,""170318306001"":1,""130519800001"":1,""010730027003"":1,""121270826042"":1,""481610001002"":1,""010730100011"":1,""010730023032"":1,""350250004002"":1,""010730056003"":1,""010730132001"":1,""011170302171"":2,""120910227003"":1,""011239620001"":1,""130351503002"":1,""010730129155"":1,""010730001001"":2,""010730110021"":1,""170310104003"":1,""010730059082"":2,""010730120022"":1,""011170303151"":1,""010730139022"":1,""011170303441"":4,""010730144092"":3,""010730129151"":1,""011210119001"":2,""010730144081"":117,""010730108052"":1,""010730129122"":9,""370710321003"":1,""010730142034"":2,""010730042001"":2,""010570201003"":1,""010730144132"":6,""010730059032"":1,""010730012001"":2,""010730102003"":1,""011170303332"":1,""010730128032"":2,""010730129081"":1,""010730103011"":1,""010730058001"":3,""011150401041"":1,""010730045001"":3,""010730110013"":1,""010730119041"":1,""010730042003"":1,""010730141041"":1,""010730144091"":1,""010730129154"":1,""484759501002"":1,""010730144063"":1,""010730144102"":12,""011170303141"":1,""011250106011"":1,""011170303152"":1,""010730059104"":1,""010730107021"":1,""010730100014"":1,""010730008004"":1,""011170303451"":1,""010730127041"":2,""370559704003"":1,""010730047011"":2,""010730129132"":2,""011010014002"":1,""010730144131"":1,""011170302133"":1,""010730030011"":1,""131350506063"":1,""010730118023"":1,""010890110141"":1,""010730128023"":1,""010730106022"":2,""130879703004"":1,""010730108015"":1,""131390010041"":1,""011170305013"":1,""010730134002"":1,""010730031004"":1,""010730138012"":1,""010730011004"":1,""011250102041"":1,""010730129131"":4,""010730144101"":4,""011170303331"":2,""010730003001"":1,""011010033012"":1,""483539504004"":1,""010550104021"":1,""011170303411"":1,""010730106031"":1,""011170303153"":5,""010730128034"":1,""010730129061"":1,""131390010023"":1,""010730051042"":1,""130510107002"":1,""010730027001"":2,""120090686011"":1,""010730107042"":1,""010730123052"":1,""010730129102"":1,""011210115003"":1,""010730129083"":4,""011170303142"":1,""011010014001"":1,""010730107064"":2}",7,176,205,"{""21-45"":7,""481-540"":10,""541-600"":4,""46-60"":2,""721-840"":3,""1201-1320"":3,""301-360"":7,""<20"":46,""61-120"":6,""241-300"":4,""121-180"":9,""421-480"":2,""1321-1440"":3,""1081-1200"":5,""961-1080"":1,""601-660"":1,""181-240"":5,""661-720"":1,""361-420"":7}",78,"{""0-25"":29,""76-100"":71,""51-75"":27,""26-50"":8}",751,338,38937
010890017002,2020-06-25T00:00:00-05:00,2020-06-26T00:00:00-05:00,78,1934,"{""16001-50000"":2,""0"":12,"">50000"":9,""2001-8000"":27,""1-1000"":12,""1001-2000"":8,""8001-16000"":8}","{""16001-50000"":49,"">50000"":99,""<1000"":111,""2001-8000"":37,""1001-2000"":24,""8001-16000"":28}",11,787,"{""721-1080"":17,""361-720"":11,""61-360"":11,""<60"":15,"">1080"":23}","[49,42,48,48,47,48,44,44,39,32,34,32,36,31,32,36,40,37,36,38,49,45,46,46]",5,1,"{""010890101002"":1,""010730108041"":1,""010890020003"":2,""010890010001"":2,""010890025011"":3,""010890026001"":4,""280819505003"":1,""281059504004"":1,""010890103022"":1,""120990056011"":1,""010890109012"":2,""010890019021"":6,""010890013021"":4,""010890015004"":3,""010890108003"":1,""010890014022"":6,""281059501003"":1,""281059503001"":1,""010890007022"":3,""010890017001"":3,""010890107023"":1,""010890021002"":1,""010890009011"":1,""010890109013"":1,""010730120022"":1,""010890031003"":15,""011170303151"":1,""010890019011"":9,""010890030002"":2,""010890110221"":1,""011170305021"":1,""010890026003"":2,""010890025012"":3,""010730117034"":1,""010830208022"":1,""010890031002"":2,""010890112002"":1,""010210602001"":1,""010890002022"":1,""010890017002"":65,""281059506021"":1,""010890010003"":2,""010890106222"":1,""120990059182"":1,""010890110222"":1,""010890020001"":1,""010890101003"":1,""010890018013"":1,""010890021001"":1,""010890109021"":1,""010890108001"":1,""010770106005"":1,""281059506011"":1,""010030114032"":2,""010830209001"":1,""010890027222"":1,""010730128023"":1,""010890009021"":1,""010030114051"":1,""010030109031"":1,""010030103003"":1,""010890031001"":1,""010890021003"":1,""010030114062"":4,""010890106241"":1,""281059504003"":1,""010890018011"":10,""010890019031"":5,""010890027012"":1,""010730108054"":1,""010890106223"":2,""010890111001"":1,""010210603002"":1,""010890109011"":1,""010890019012"":2,""010890113001"":1,""010890028013"":3}",1,229,99,"{""481-540"":3,""541-600"":2,""46-60"":1,""721-840"":1,""1201-1320"":7,""301-360"":6,""<20"":18,""61-120"":10,""241-300"":5,""121-180"":2,""1321-1440"":2,""841-960"":1,""1081-1200"":1,""961-1080"":3,""601-660"":3,""181-240"":2,""661-720"":3}",78,"{""0-25"":16,""76-100"":44,""51-75"":11,""26-50"":7}",708,353,14328
010950308022,2020-06-25T00:00:00-05:00,2020-06-26T00:00:00-05:00,100,2481,"{""16001-50000"":11,""0"":19,"">50000"":11,""2001-8000"":40,""1-1000"":6,""1001-2000"":3,""8001-16000"":4}","{""16001-50000"":150,"">50000"":23,""<1000"":739,""2001-8000"":23,""1001-2000"":12,""8001-16000"":208}",17,703,"{""721-1080"":21,""361-720"":19,""61-360"":10,""<60"":24,"">1080"":26}","[62,64,64,63,65,67,54,48,37,37,34,33,30,34,32,33,35,43,50,56,58,56,56,57]",8,6,"{""010950306004"":1,""010950302023"":1,""011030054051"":1,""010950311002"":1,""010950309023"":1,""010499606003"":1,""121319506023"":2,""010950308022"":86,""121319506016"":2,""010950304013"":1,""010950307024"":1,""010950309041"":1,""010890019021"":2,""010950312001"":5,""010499607002"":1,""011150402013"":1,""010550102003"":1,""120050027043"":3,""010719509003"":1,""010950302022"":1,""010950308023"":2,""120050027051"":2,""471079701022"":1,""010890106221"":1,""010950306001"":1,""010950302011"":2,""011150405013"":1,""011150402041"":2,""010950312002"":16,""011030054042"":1,""010950301002"":2,""130459105011"":1,""010730001001"":1,""130459102001"":1,""010890109013"":2,""010950308013"":14,""010719508004"":1,""120050027041"":3,""010550110021"":3,""010730049022"":1,""010950308024"":1,""010950312004"":6,""010950312003"":1,""010550104012"":2,""010550110013"":1,""120860004111"":1,""010890027222"":1,""010950306002"":2,""010950304015"":1,""011030054041"":1,""010950309031"":8,""010950308021"":1,""010950302024"":1,""010950307011"":5,""010550110012"":2,""011150404013"":1,""130459103003"":1,""120050027032"":3,""010950307012"":5,""010950309022"":2,""010950307023"":1,""010719508003"":1,""010499608001"":2,""010950310003"":1,""011150402043"":1,""120860099063"":1,""010950309021"":4,""010950309043"":2,""010950308011"":1,""010950306003"":3,""120050027042"":1,""010950308025"":5,""010950309032"":6,""010499607001"":1}",1,199,132,"{""21-45"":8,""481-540"":6,""541-600"":4,""46-60"":3,""721-840"":3,""1201-1320"":4,""301-360"":3,""<20"":20,""61-120"":10,""241-300"":2,""121-180"":4,""421-480"":3,""1321-1440"":1,""841-960"":3,""961-1080"":2,""601-660"":1,""181-240"":3,""661-720"":1,""361-420"":2}",74,"{""0-25"":20,""76-100"":48,""51-75"":23,""26-50"":4}",661,350,5044
df = pd.read_csv(csv_file,
usecols=[
'origin_census_block_group',
'date_range_start',
'date_range_end',
'device_count',
'distance_traveled_from_home',
'completely_home_device_count',
'median_home_dwell_time',
'part_time_work_behavior_devices',
'full_time_work_behavior_devices'
],
dtype={'origin_census_block_group': str},
).set_index('origin_census_block_group')
and, later in the code, the dataframe is modified by:
df = df.groupby(lambda cbg: cbg[:5]).sum()
I don't quite understand what this line is doing precisely.
Groupby generally groups a dataframe by column, so...is it grouping the dataframe using multiple columns (0 to 5)? What is the effect of .sum() at the end?
If you run your code exactly as you wrote it (both the creation of df and the groupby) you can see the result. I print first couple of columns of the output of groupby
device_count distance_traveled_from_home
----- -------------- -----------------------------
01053 49 626
01073 139 2211
01089 78 1934
01095 100 2481
What happens here is the function lambda cbg: cbg[:5] is applied to each of the index values (strings that look like numbers in column origin_census_block_group). As a side, note the statement
...
dtype={'origin_census_block_group': str},
when creating the df, so somebody went into trouble to make sure they are actually str
So the function is applied to string like '010539707003' and returns a substring which is the first 5 characters of that string:
'010539707003'[:5]
produces
'01053'
so I assume there are multiple keys that share the first 5 characters (in the actual file -- the snippet has them all unique so not very interesting) and all these rows are grouped together
Then .sum() is applied to each numerical column of each group and returns, well, the column sum per each groupby key. This is what you see in my output in column 'device_count' and so on.
Hope this is clear now
Pandas' read_csv() will render a csv-formatted file a Pandas Dataframe.
I recommend having a ready at the Pandas' documentation, as it's very exhaustive -> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
usecols=[
'origin_census_block_group',
'date_range_start',
'date_range_end',
'device_count',
'distance_traveled_from_home',
'completely_home_device_count',
'median_home_dwell_time',
'part_time_work_behavior_devices',
'full_time_work_behavior_devices'
],
The usecols parameter will take as input an array of desired columns and will only load the specified columns into the dataframe.
dtype={'origin_census_block_group': str}
The dtype parameter will take a dict as input and is to specify the data type of the values, like {'column' : datatype}
.set_index('origin_census_block_group')
.set_index() will set the specificed column as the index column (ie: the first column). The usual index of Pandas' Dataframe is the row's index number, which appears as the first column of the dataframe. By setting the index, the first column now becomes the specified column. See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html
Panda's .groupby() function will take a dataframe a regroup it basing on the occurrences of he values from the specified column.
That is to say, if we a dataframe such as df =
Fruit Name Quality Count
Apple Marco High 4
Pear Lucia Medium 10
Apple Francesco Low 3
Banana Carlo Medium 6
Pear Timmy Low 7
Apple Roberto High 8
Banana Joe High 21
Banana Jack Low 3
Pear Rob Medium 5
Apple Louis Medium 6
Pear Jennifer Low 7
Pear Laura High 8
Performing a groupby operations, such as:
df = df.groupby(lambda x: x[:2]).sum()
Will take all the elements in the index, slice them from index 0 through index 2 and return the sum of all the corresponding values, ie:
Ap 21
Ba 30
Pe 37
Now, you might be wondering about that final .sum() method. If you try to print the dataframe without applying it, you'll likely get something like this:
<bound method GroupBy.sum of <pandas.core.groupby.generic.DataFrameGroupBy object at 0x109d260a0>>
This is because Pandas has created a groubpy object and does not yet now how to display it to you. Do you want to have it displayed by the number of the occurrences in the index? You'd do this:
df = df.groupby(lambda x: x[:2]).size()
And that would output:
Ap 4
Ba 3
Pe 5
Or maybe the sum of their respective summable values? (Which is what is done in the example)
df = df.groupby(lambda x: x[:2]).sum()
Which again, will output:
Ap 21
Ba 30
Pe 37
Notice it has taken the first two letters of the string in the index. Had it been x[:3], it would have taken the first three letters, of course.
Summing it up:
-> .groupby() takes the elements in the index, i.e. the first column of the dataframe and organises the dataframe in groups relating to the index
-> The input you have given to groubpy is an anonymous function, i.e. lambda function, slicing from index 0 through 5 of its mapped input
-> You may choose how to have the results of groubpy by appending the methos .sum() or .size() to a groubpy object
I also recommend reading about Python's lambda functions:
https://docs.python.org/3/reference/expressions.html

Can I update the value of a column based on the same column value in a python dataframe?

I have a dataframe to capture characteristics of people accessing a webpage. The list of time spent by each user in the page is one of the characteristic feature that I get as an input. I want to update this column with maximum value of the list. Is there a way in which I can do this?
Assume that my data is:
df = pd.DataFrame({Page_id:{1,2,3,4}, User_count:{5,3,3,6}, Max_time:{[45,56,78,90,120],[87,109,23],[78,45,89],[103,178,398,121,431,98]})
What I want to do is convert the column Max_time in df to Max_time:{120,109,89,431}
I am not supposed to add another column for computing the max separately as this table structure cannot be altered.
I tried the following:
for i in range(len(df)):
df.loc[i]["Max_time"] = max(df.loc[i]["Max_time"])
But this is not changing the column as I intended it to. Is there something that I missed?
df = pd.DataFrame({'Page_id':[1,2,3,4],'User_count':[5,3,3,6],'Max_time':[[45,56,78,90,120],[87,109,23],[78,45,89],[103,178,398,121,431,98]]})
df.Max_time = df.Max_time.apply(max)
Result:
Page_id User_count Max_time
0 1 5 120
1 2 3 109
2 3 3 89
3 4 6 431
You can use this:
df['Max_time'] = df['Max_time'].map(lambda x: np.max(x))

Loop zip, append in Python

I am strugling with something in python...
I have a function that when I input a date, it gives me back a column with 30 prices (one in each line) and in index 30 names.
[in] getPrice('14/07/2015')
[out]
apple 10
pear 20
orange 12
banana 23
etc...
The number of fruit are the same.
What I am trying to do is, to loop this function to obtain a big file with the price of these fruits for all the days I have in a list.
with the ZIP function I dont really understand how it could work and with the append function it is not 'zipping' the price, meaning it recreate everytime the index, etc..
Any idea?
I have a list with all my dates already and the index is the same all along, no other fruit apprear or disapear.
It could look like something
Def Alltogether():
Alltogether = []
x = HistoricalDates
For _ in Historicaldates :
k = getPrice....
and then I block...
I ultimately want something like
print Alltogether
[out]
apple 10 40 60 20 ...
pear 20 20 20 20 ...
orange 12 13 14 29 ...
banana 23 14 41 54 ...
etc...
I am using panda dataframe.
Thank you very much!
This could probably be streamlined by changing the format you get your data in, but here's an example that should get you on your way. I'm totally skipping the zip approach and doing this in more of a Pandas way. For testing I created a dummy function to return random prices for a fixed set of produce:
import pandas as pd
def getPrice(date):
return pd.Series(np.random.randn(5), index=['apple', 'pear', 'orange', 'nanner', 'etc...'])
And then we can create a pandas dataframe with some dates in it:
df = pd.DataFrame( pd.date_range('1/1/2017', periods=3, freq='D') )
df.columns=['MyDate'] # name the date column
That gives us a simple df with 3 dates:
MyDate
0 2017-01-01
1 2017-01-02
2 2017-01-03
While iterating over rows is a little less idiomatic than some sort of apply function, I think it's very readable and easy to comprehend to simply iterate over the dates, get the prices, then shove them into a new dataframe and name the column the same as the date. Given your question, I suspect this is your desired output.
outputDF = pd.DataFrame() ## dump results into this df
for index, row in df.iterrows(): #iterate through every row of the date df
outputDF[row.MyDate] = getPrice(row.MyDate) #shove values into output
which gives us a pretty df like this:
2017-01-01 2017-01-02 2017-01-03
apple 0.150646 0.209668 0.398204
pear 0.131142 0.046473 -0.261545
orange 0.822508 0.456384 -0.774957
nanner -0.996102 -0.260049 -0.558503
etc... 0.622459 -0.173556 -0.681957
Per your comment about handling situations where the date is invalid, there are a few ways to handle this. If the getPrice() function throws an error when you pass it a bad date, you might use try/except:
try:
getPrice(date)
except:
# do something else... return nulls maybe?
if bad dates don't throw an error but instead they return nulls or an empty list, then just check for that condition after calling getPrice() but before putting it in the dataframe.
Is THIS what are you have problem with? Just use '*' to pass a list of lists to zip(). Here an example:
L1=['L1-i1','L1-i2', 'L1-i3', 'L1-i4']
L2=['L2-i1','L2-i2', 'L2-i3', 'L2-i4']
L3=['L3-i1','L3-i2', 'L3-i3', 'L3-i4']
Lalltogether = [L1, L2, L3]
# in Python 3 you should use list() for zip():
print( list(zip( *Lalltogether )) )
print( list(zip( L1, L2, L3 )) )
gives:
[('L1-i1', 'L2-i1', 'L3-i1'), ('L1-i2', 'L2-i2', 'L3-i2'), ('L1-i3', 'L2-i3', 'L3-i3'), ('L1-i4', 'L2-i4', 'L3-i4')]
For more about '*' see
Python - use list as function parameters

Categories