How to select multiple columns after applying group by a column - python
I'm trying to group by "sender" column and extract some related columns.Here is part of my dataset:
row number,type,rcvTime,sender,pos_x,pos_y,pos_z,spd_x,spd_y,spd_z,acl_x,acl_y,acl_z,hed_x,hed_y,hed_z
0,2,25207.0,15,136.07,1118.46,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,-1.0,0.0
1,2,25208.0,15,136.19,1117.14,0.0,0.22,-2.31,0.0,0.14,-1.48,0.0,0.09,-1.0,0.0
2,3,25208.81,21,152.66,904.56,0.0,0.06,-0.75,0.0,0.18,-2.43,0.0,0.07,-1.0,0.0
3,2,25209.0,15,136.69,1113.79,0.0,0.39,-4.18,0.0,0.15,-1.64,0.0,0.09,-1.0,0.0
4,3,25209.81,21,152.98,902.59,0.0,0.22,-2.91,0.0,0.12,-1.68,0.0,0.07,-1.0,0.0
5,2,25210.0,15,133.77,1108.01,0.0,0.58,-6.17,0.0,0.16,-1.76,0.0,0.09,-1.0,0.0
6,3,25210.81,21,153.25,898.68,0.0,0.37,-4.65,0.0,0.11,-1.35,0.0,0.08,-1.0,0.0
7,2,25211.0,15,134.37,1100.75,0.0,0.76,-8.14,0.0,0.18,-1.93,0.0,0.09,-1.0,0.0
8,3,25211.81,21,153.82,893.0,0.0,0.65,-6.67,0.0,0.25,-2.54,0.0,0.1,-1.0,0.0
9,3,25211.93,27,122.87,892.12,0.0,5.63,0.32,0.0,-1.57,-0.09,0.0,1.0,0.04,0.0
Here is what I have tried and the result is just all the 'rcvTime' data for that sender However I need all other columns like pos_x,spd_x as well:
import numpy as np
import pandas as pd
df = pd.read_csv(r"/Users/h/trace.csv")
df.head()
df1 = df.groupby('sender')['rcvTime'].apply(list).reset_index(name='new')
print(df1)
What I need is the following data, I just wrote for sender=15:
rowNumber,sender,rcvTime,pos_x,spd_x,rcvTime,pos_x,spd_x,rcvTime,pos_x,spd_x,...
0,15,25207.0,136.07,0.0,25208.0,136.19,0.22, 25209.0,... 25210.0,..., 25211.0, ...
1,21,25208.81,152.66,0.06, 25209.81,..., 25210.81,..., 25211.81,..., 25212...
2,27,25211.93..., 25212.93..., 25213.93..., 25214.93..., 25215...
IIUC you search for something like this:
df1 = df.groupby('sender',as_index=False).agg(lambda x: list(x))
EDIT
I'm sure there is a better way, but here how I managed to achieve your desired output:
cols = ['rcvTime', 'pos_x', 'spd_x']
grouped = df.groupby('sender')[cols]
list_of_lists = [tup[1].values.flatten().tolist() for tup in grouped.pipe(list)]
res = pd.DataFrame({'sender': grouped.groups.keys(), f'{cols*len(grouped.groups.keys())}' : list_of_lists})
print(res)
sender ['rcvTime', 'pos_x', 'spd_x', 'rcvTime', 'pos_x', 'spd_x', 'rcvTime', 'pos_x', 'spd_x']
0 15 [25207.0, 136.07, 0.0, 25208.0, 136.19, 0.22, ...
1 21 [25208.81, 152.66, 0.06, 25209.81, 152.98, 0.2...
2 27 [25211.93, 122.87, 5.63]
Still think, you don't benefit of the possibilities with pandas when formatting your data like this.
Related
How do I capture the properties I want from a string?
I hope you are well I have the following string: "{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"},....\"childProducts\":[]}}"... To which I'm trying to capture the attributes: id, idType and subscriptionId and map them as a dataframe, but the entire body of the .cvs puts it in a single row so it is almost impossible for me to work without index desired output: id, idType, suscriptionID 0. '7-84-1811', 'CIP', 21312421412 1. '1-232-42', 'IO' , 21421e324 My code: import pandas as pd import json path = '/example.csv' df = pd.read_csv(path) normalize_df = json.load(df) print(df)
Considering your string is in JSON format, you can do this. drop columns, transpose, and get headers right. toEscape = "{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"}}" json_string = toEscape.encode('utf-8').decode('unicode_escape') df = pd.read_json(json_string) df = df.drop(["code","description"], axis=1) df = df.transpose().reset_index().drop("index", axis=1) df.to_csv("user_details.csv") the output looks like this: id idType suscriptionId 0 8-717-2346 CIP 92118213 Thank you for the question.
How to aggregate a dataframe then transpose it with Pandas
I'm trying to achieve this kind of transformation with Pandas. I made this code but unfortunately it doesn't give the result I'm searching for. CODE : import pandas as pd df = pd.read_csv('file.csv', delimiter=';') df = df.count().reset_index().T.reset_index() df.columns = df.iloc[0] df = df[1:] df RESULT : Do you have any proposition ? Any help will be appreciated.
First create columns for test nonOK and then use named aggregatoin for count, sum column Values and for count Trues values use sum again, last sum both columns: df = (df.assign(NumberOfTest1 = df['Test one'].eq('nonOK'), NumberOfTest2 = df['Test two'].eq('nonOK')) .groupby('Category', as_index=False) .agg(NumberOfID = ('ID','size'), Values = ('Values','sum'), NumberOfTest1 = ('NumberOfTest1','sum'), NumberOfTest2 = ('NumberOfTest2','sum')) .assign(TotalTest = lambda x: x['NumberOfTest1'] + x['NumberOfTest2']))
How to merge multiple columns with same names in a dataframe
I have the following dataframe as below: df = pd.DataFrame({'Field':'FAPERF', 'Form':'LIVERID', 'Folder':'ALL', 'Logline':'9', 'Data':'Yes', 'Data':'Blank', 'Data':'No', 'Logline':'10'}) ''' I need dataframe: df = pd.DataFrame({'Field':['FAPERF','FAPERF'], 'Form':['LIVERID','LIVERID'], 'Folder':['ALL','ALL'], 'Logline':['9','10'], 'Data':['Yes','Blank','No']}) ''' I had tried using the below code but not able to achieve desired output. res3.set_index(res3.groupby(level=0).cumcount(), append=True['Data'].unstack(0) Can anyone please help me.
I believe your best option is to create multiple data frames with the same column name ( example 3 df with column name : "Data" ) then simply perform a concat function over Data frames : df1 = pd.DataFrame({'Field':'FAPERF', 'Form':'LIVERID', 'Folder':'ALL', 'Logline':'9', 'Data':'Yes'} df2 = pd.DataFrame({ 'Data':'No', 'Logline':'10'}) df3 = pd.DataFrame({'Data':'Blank'}) frames = [df1, df2, df3] result = pd.concat(frames)
You just need to add to list in which you specify the logline and data_type for each row. import pandas as pd import numpy as np list_df = [] data_type_list = ["yes","no","Blank"] logline_type = ["9","10",'10'] for x in range (len(data_type_list)): new_dict = { 'Field':['FAPERF'], 'Form':['LIVERID'],'Folder':['ALL'],"Data" : [data_type_list[x]], "Logline" : [logline_type[x]]} df = pd.DataFrame(new_dict) list_df.append(df) new_df = pd.concat(list_df) print(new_df)
Count occurrences of number from specific column in python
I am trying to do the equivalent of a COUNTIF() function in excel. I am stuck at how to tell the .count() function to read from a specific column in excel. I have df = pd.read_csv('testdata.csv') df.count('1') but this does not work, and even if it did it is not specific enough. I am thinking I may have to use read_csv to read specific columns individually. Example: Column name 4 4 3 2 4 1 the function would output that there is one '1' and I could run it again and find out that there are three '4' answers. etc. I got it to work! Thank you I used: print (df.col.value_counts().loc['x']
Here is an example of a simple 'countif' recipe you could try: import pandas as pd def countif(rng, criteria): return rng.eq(criteria).sum() Example use df = pd.DataFrame({'column1': [4,4,3,2,4,1], 'column2': [1,2,3,4,5,6]}) countif(df['column1'], 1)
If all else fails, why not try something like this? import numpy as np import pandas import matplotlib.pyplot as plt df = pandas.DataFrame(data=np.random.randint(0, 100, size=100), columns=["col1"]) counters = {} for i in range(len(df)): if df.iloc[i]["col1"] in counters: counters[df.iloc[i]["col1"]] += 1 else: counters[df.iloc[i]["col1"]] = 1 print(counters) plt.bar(counters.keys(), counters.values()) plt.show()
Dictionary in Pandas DataFrame, how to split the columns
I have a DataFrame that consists of one column ('Vals') which is a dictionary. The DataFrame looks more or less like this: In[215]: fff Out[213]: Vals 0 {u'TradeId': u'JP32767', u'TradeSourceNam... 1 {u'TradeId': u'UUJ2X16', u'TradeSourceNam... 2 {u'TradeId': u'JJ35A12', u'TradeSourceNam... When looking at an individual row the dictionary looks like this: In[220]: fff['Vals'][100] Out[218]: {u'BrdsTraderBookCode': u'dffH', u'Measures': [{u'AssetName': u'Ie0', u'DefinitionId': u'6dbb', u'MeasureValues': [{u'Amount': -18.64}], u'ReportingCurrency': u'USD', u'ValuationId': u'669bb'}], u'SnapshotId': 12739, u'TradeId': u'17304M', u'TradeLegId': u'31827', u'TradeSourceName': u'xxxeee', u'TradeVersion': 1} How can I split the the columns and create a new DataFrame, so that I get one column with TradeId and another one with MeasureValues?
try this: l=[] for idx, row in df['Vals'].iteritems(): temp_df = pd.DataFrame(row['Measures'][0]['MeasureValues']) temp_df['TradeId'] = row['TradeId'] l.append(temp_df) pd.concat(l,axis=0)
Here's a way to get TradeId and MeasureValues (using twice your sample row above to illustrate the iteration): new_df = pd.DataFrame() for id, data in fff.iterrows(): d = {'TradeId': data.ix[0]['TradeId']} d.update(data.ix[0]['Measures'][0]['MeasureValues'][0]) new_df = pd.concat([new_df, pd.DataFrame.from_dict(d, orient='index').T]) Amount TradeId 0 -18.64 17304M 0 -18.64 17304M