How to generate a result list from pandas DataFrame

How to generate a result list from pandas DataFrame - python

I need to generate a list from pandas DataFrame . I am able to print the result,but i don't know how to convert it to the list format.The code i used to print the result(without converting to list) is
df=pandas.DataFrame(processed_data_format, columns=["file_name", "innings", "over","ball", "individual ball", "runs","batsman", "wicket_status","bowler_name","fielder_name"])**
#processed_data_format is the list passing to the DataFrame
t = df.groupby(['batsman','over'])['runs','ball'].sum()
print t
i am getting the result like
Sangakara 1 10 5
2 0 2
3 3 1
sewag 1 2 1
2 1 1
I would like to convert this data into list format like
[ [sangakara,1,10,5],[sangakara,2,0,2],[sangakara,3,3,1],[sewag,1,2,1][sewag,2,1,1] ]

You can use to_records:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [2, 3, 4, 5]})
>>> list(df.to_records())
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5)]
This will make a list of tuples. Converting this to a list of lists (if you really need this at all), is easy.

Related

Extract data from table column and make variables in Python

I have a dataset where I want to make a new variable everytime 'Recording' number changes. I want the new variable to include the 'Duration' data for the specific 'Recording' and the previous data. So for the below table it would be:
Var1 = (3, 3, 3)
Var2 = (3, 3, 3, 4, 6)
Var2 = (3, 3, 3, 4, 6, 4, 3, 1, 4)
And so on. I have several dataset that can have different number of recordings (but always starting from 1) and different number of durations for each recording. Any help is greatly appreciated.
Recording
Duration
1
3
1
3
1
3
2
4
2
6
3
4
3
3
3
1
3
4

You can aggregate list with cumualative sum for lists, then convert to tuples and dictionary:
d = df.groupby('Recording')['Duration'].agg(list).cumsum().apply(tuple).to_dict()
print (d)
{1: (3, 3, 3), 2: (3, 3, 3, 4, 6), 3: (3, 3, 3, 4, 6, 4, 3, 1, 4)}
print (d[1])
print (d[2])
print (d[3])
Your ouput is possible, but not recommended:
s = df.groupby('Recording')['Duration'].agg(list).cumsum().apply(tuple)
for k, v in s.items():
globals()[f'Var{k}'] = v

#jezrael's answer is beautiful and definately better :). But if you really wanted to do this as a loop, (perhaps in future you might want to modify the logic further), then you might:
import pandas as pd
df = pd.DataFrame({
"Recording": [1,1,1,2,2,3,3,3,3],
"Duration": [3,3,3,4,6,4,3,1,4]
}) # your example data
records = {}
record = []
last_recording = None # flag to track change in recording
for r, d in zip(df.Recording, df.Duration):
if record and not r == last_recording:
records[last_recording] = (tuple(record))
record.append(d)
last_recording = r
records[last_recording] = (tuple(record)) # capture final group
print(records)
modified to provide a dict (which seems sensible). This will be slow for large datasets.

Is there any method or function in python to name the sides of a 3-Dimensional Matrix, like in 2-D there is a Panda method Dataframedata

I want to name/index the sides of the 3D matrix (Plane,row,column) in pythong code, like in 2D we can do it with the help od Panda methond Dataframe
matrix = np.reshape((1, 2, 3, 4, 5, 6, 7, 8, 9), (3, 3))
df = pd.DataFrame(matrix, columns=column_names, index=row_names)
print(df)
I want a result like, when we write A[0][0][0] then we can also track the what first index is representing and second index and 3rd.
Like in 2D we get something like
a b c
1 1 2 3
2 4 5 6
3 7 8 9

In pandas we can using multiple index
matrix = np.reshape((1, 2, 3, 4, 5, 6, 7, 8), (2,2,2))
df=pd.concat([pd.DataFrame(x)for x in matrix],keys=np.arange(matrix.shape[2]))
df[0][0][0]
1
df[0][0][1]
3

Pandas - Filter dataframe using list of tuples

I need to create a function that filters a dataframe using a list of tuples - taking as arguments a dataframe and a tuple list, as follows:
tuplelist=[('A', 5, 10), ('B', 0, 4),('C', 10, 11)]
What is the proper way to do this?
I have tried the following:
def multcolfilter(data_frame, tuplelist):
def apply_single_cond(df_0,cond):
df_1=df_0[(df_0[cond[0]]>cond[1]) & (df_0[cond[0]]<cond[2])]
return df_1
for x in range(len(tuplelist)-1):
df=apply_single_cond(apply_single_cond(data_frame,tuplelist[x-1]),tuplelist[x])
return df

Example dataframe and tuplelist:
df = pd.DataFrame({'A':range(1,10), 'B':range(1,10), 'C':range(1,10)})
tuplelist=[('A', 2, 10), ('B', 0, 4),('C', 3, 5)]
Instead of working with tuples, create a dictionary form them:
filters = {x[0]:x[1:] for x in tuplelist}
print(filters)
{'A': (5, 10), 'B': (0, 4), 'C': (10, 11)}
You can use pd.cut to bin the values of the dataframe's columns:
rows = np.asarray([~pd.cut(df[i], filters[i], retbins=False, include_lowest=True).isnull()
for i in filters.keys()]).all(axis = 0)
Use rows as a boolean indexer of df:
df[rows]
A B C
2 3 3 3
3 4 4 4

Convert Pandas Dataframe to Dictionary with Tuple Keys for Ternary plot

I am plotting ternary diagrams with python-ternary
My data is in a pandas dataframe. I need to convert it to a dictionary mapping (i, j) to a float as input for the heatmap function in ternary.
My dataframe (df) looks like this:
i j value
0 1 2 7
1 3 4 8
2 5 6 9
I need to make a dictionary like this:
{(1, 2): 7, (5, 6): 9, (3, 4): 8}
My current workaround is a brute force loop that is very slow:
import pandas as pd
df = pd.DataFrame({'i': [1, 3, 5], 'j': [2, 4, 6], 'value': [7, 8, 9]})
data = dict()
for k in range(0, len(df)):
data[(df.iloc[k]['i'],df.iloc[k]['j'])] = \
df.iloc[k]['value']
Please, could someone help me with a faster or more pythonic way of doing this?

Use set_index with to_dict:
d = df.set_index(['i','j'])['value'].to_dict()
Alternative with zip and dict comprehension:
d = {(a,b):c for a,b,c in zip(df['i'], df['j'], df['value'])}
print (d)
{(1, 2): 7, (3, 4): 8, (5, 6): 9}

Combine multiple columns into 1 column [python,pandas]

I have a pandas data frame with 2 columns:
{'A':[1, 2, 3],'B':[4, 5, 6]}
I want to create a new column where:
{'C':[1 4,2 5,3 6]}

Setup
df = pd.DataFrame({'A':[1, 2, 3],'B':[4, 5, 6]})
Solution
Keep in mind, per your expected output, [1 4,2 5,3 6] isn't a thing. I'm interpreting you to mean either [(1, 4), (2, 5), (3, 6)] or ["1 4", "2 5", "3 6"]
First assumption
df.apply(lambda x: tuple(x.values), axis=1)
0 (1, 4)
1 (2, 5)
2 (3, 6)
dtype: object
Second assumption
df.apply(lambda x: ' '.join(x.astype(str)), axis=1)
0 1 4
1 2 5
2 3 6
dtype: object

If you don't mind zip object, then you can usedf['C'] = zip(df.A,df.B).
If you like tuple then you can cast zip object with list(). Please refer to this post. It's pretty handy to use zip in this kind of scenarios.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to generate a result list from pandas DataFrame - python

You can use to_records: import pandas as pd df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [2, 3, 4, 5]}) >>> list(df.to_records()) [(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5)] This will make a list of tuples. Converting this to a list of lists (if you really need this at all), is easy.

Related

Extract data from table column and make variables in Python

Is there any method or function in python to name the sides of a 3-Dimensional Matrix, like in 2-D there is a Panda method Dataframedata

Pandas - Filter dataframe using list of tuples

Convert Pandas Dataframe to Dictionary with Tuple Keys for Ternary plot

Combine multiple columns into 1 column [python,pandas]

Categories

Resources