How to generate a result list from pandas DataFrame - python

I need to generate a list from pandas DataFrame . I am able to print the result,but i don't know how to convert it to the list format.The code i used to print the result(without converting to list) is
df=pandas.DataFrame(processed_data_format, columns=["file_name", "innings", "over","ball", "individual ball", "runs","batsman", "wicket_status","bowler_name","fielder_name"])**
#processed_data_format is the list passing to the DataFrame
t = df.groupby(['batsman','over'])['runs','ball'].sum()
print t
i am getting the result like
Sangakara 1 10 5
2 0 2
3 3 1
sewag 1 2 1
2 1 1
I would like to convert this data into list format like
[ [sangakara,1,10,5],[sangakara,2,0,2],[sangakara,3,3,1],[sewag,1,2,1][sewag,2,1,1] ]

You can use to_records:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [2, 3, 4, 5]})
>>> list(df.to_records())
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5)]
This will make a list of tuples. Converting this to a list of lists (if you really need this at all), is easy.

Related

Extract data from table column and make variables in Python

I have a dataset where I want to make a new variable everytime 'Recording' number changes. I want the new variable to include the 'Duration' data for the specific 'Recording' and the previous data. So for the below table it would be:
Var1 = (3, 3, 3)
Var2 = (3, 3, 3, 4, 6)
Var2 = (3, 3, 3, 4, 6, 4, 3, 1, 4)
And so on. I have several dataset that can have different number of recordings (but always starting from 1) and different number of durations for each recording. Any help is greatly appreciated.
Recording
Duration
1
3
1
3
1
3
2
4
2
6
3
4
3
3
3
1
3
4
You can aggregate list with cumualative sum for lists, then convert to tuples and dictionary:
d = df.groupby('Recording')['Duration'].agg(list).cumsum().apply(tuple).to_dict()
print (d)
{1: (3, 3, 3), 2: (3, 3, 3, 4, 6), 3: (3, 3, 3, 4, 6, 4, 3, 1, 4)}
print (d[1])
print (d[2])
print (d[3])
Your ouput is possible, but not recommended:
s = df.groupby('Recording')['Duration'].agg(list).cumsum().apply(tuple)
for k, v in s.items():
globals()[f'Var{k}'] = v
#jezrael's answer is beautiful and definately better :). But if you really wanted to do this as a loop, (perhaps in future you might want to modify the logic further), then you might:
import pandas as pd
df = pd.DataFrame({
"Recording": [1,1,1,2,2,3,3,3,3],
"Duration": [3,3,3,4,6,4,3,1,4]
}) # your example data
records = {}
record = []
last_recording = None # flag to track change in recording
for r, d in zip(df.Recording, df.Duration):
if record and not r == last_recording:
records[last_recording] = (tuple(record))
record.append(d)
last_recording = r
records[last_recording] = (tuple(record)) # capture final group
print(records)
modified to provide a dict (which seems sensible). This will be slow for large datasets.

Is there any method or function in python to name the sides of a 3-Dimensional Matrix, like in 2-D there is a Panda method Dataframedata

I want to name/index the sides of the 3D matrix (Plane,row,column) in pythong code, like in 2D we can do it with the help od Panda methond Dataframe
matrix = np.reshape((1, 2, 3, 4, 5, 6, 7, 8, 9), (3, 3))
df = pd.DataFrame(matrix, columns=column_names, index=row_names)
print(df)
I want a result like, when we write A[0][0][0] then we can also track the what first index is representing and second index and 3rd.
Like in 2D we get something like
a b c
1 1 2 3
2 4 5 6
3 7 8 9
In pandas we can using multiple index
matrix = np.reshape((1, 2, 3, 4, 5, 6, 7, 8), (2,2,2))
df=pd.concat([pd.DataFrame(x)for x in matrix],keys=np.arange(matrix.shape[2]))
df[0][0][0]
1
df[0][0][1]
3

Pandas - Filter dataframe using list of tuples

I need to create a function that filters a dataframe using a list of tuples - taking as arguments a dataframe and a tuple list, as follows:
tuplelist=[('A', 5, 10), ('B', 0, 4),('C', 10, 11)]
What is the proper way to do this?
I have tried the following:
def multcolfilter(data_frame, tuplelist):
def apply_single_cond(df_0,cond):
df_1=df_0[(df_0[cond[0]]>cond[1]) & (df_0[cond[0]]<cond[2])]
return df_1
for x in range(len(tuplelist)-1):
df=apply_single_cond(apply_single_cond(data_frame,tuplelist[x-1]),tuplelist[x])
return df
Example dataframe and tuplelist:
df = pd.DataFrame({'A':range(1,10), 'B':range(1,10), 'C':range(1,10)})
tuplelist=[('A', 2, 10), ('B', 0, 4),('C', 3, 5)]
Instead of working with tuples, create a dictionary form them:
filters = {x[0]:x[1:] for x in tuplelist}
print(filters)
{'A': (5, 10), 'B': (0, 4), 'C': (10, 11)}
You can use pd.cut to bin the values of the dataframe's columns:
rows = np.asarray([~pd.cut(df[i], filters[i], retbins=False, include_lowest=True).isnull()
for i in filters.keys()]).all(axis = 0)
Use rows as a boolean indexer of df:
df[rows]
A B C
2 3 3 3
3 4 4 4

Convert Pandas Dataframe to Dictionary with Tuple Keys for Ternary plot

I am plotting ternary diagrams with python-ternary
My data is in a pandas dataframe. I need to convert it to a dictionary mapping (i, j) to a float as input for the heatmap function in ternary.
My dataframe (df) looks like this:
i j value
0 1 2 7
1 3 4 8
2 5 6 9
I need to make a dictionary like this:
{(1, 2): 7, (5, 6): 9, (3, 4): 8}
My current workaround is a brute force loop that is very slow:
import pandas as pd
df = pd.DataFrame({'i': [1, 3, 5], 'j': [2, 4, 6], 'value': [7, 8, 9]})
data = dict()
for k in range(0, len(df)):
data[(df.iloc[k]['i'],df.iloc[k]['j'])] = \
df.iloc[k]['value']
Please, could someone help me with a faster or more pythonic way of doing this?
Use set_index with to_dict:
d = df.set_index(['i','j'])['value'].to_dict()
Alternative with zip and dict comprehension:
d = {(a,b):c for a,b,c in zip(df['i'], df['j'], df['value'])}
print (d)
{(1, 2): 7, (3, 4): 8, (5, 6): 9}

Combine multiple columns into 1 column [python,pandas]

I have a pandas data frame with 2 columns:
{'A':[1, 2, 3],'B':[4, 5, 6]}
I want to create a new column where:
{'C':[1 4,2 5,3 6]}
Setup
df = pd.DataFrame({'A':[1, 2, 3],'B':[4, 5, 6]})
Solution
Keep in mind, per your expected output, [1 4,2 5,3 6] isn't a thing. I'm interpreting you to mean either [(1, 4), (2, 5), (3, 6)] or ["1 4", "2 5", "3 6"]
First assumption
df.apply(lambda x: tuple(x.values), axis=1)
0 (1, 4)
1 (2, 5)
2 (3, 6)
dtype: object
Second assumption
df.apply(lambda x: ' '.join(x.astype(str)), axis=1)
0 1 4
1 2 5
2 3 6
dtype: object
If you don't mind zip object, then you can usedf['C'] = zip(df.A,df.B).
If you like tuple then you can cast zip object with list(). Please refer to this post. It's pretty handy to use zip in this kind of scenarios.

Categories