What I have, and what I need
I have a pandas DataFrame p with cols 'a', 'b', 'c' (col names stored in pc).
From that I would like to create a DataFrame pn of the same shape, but each cell as a list of values from selected rows.
The DataFrame n tells me which rows to select from p for each row in pn.
import pandas as pd
pc = ['a', 'b', 'c']
p = pd.DataFrame([[11, 12, 13],
[21, 22, 23]],
columns=pc,
index=[1001,
1002])
n = pd.DataFrame([[[1001] ],
[[1001, 1002]]],
columns=['sel_row'],
index=[1001,
1002])
What I could (and want to) achieve
The farthest I could get... gives me a list of cols, rather than rows.
So, am I confusing the nested for loops ?
pn = pd.DataFrame([ [p.loc[ix, pc].values for ix in n.loc[indx].values[0]]
for indx in n.index ])
print (pn)
# The actual output:
# 0 1
# 0 [11, 12, 13] None
# 1 [11, 12, 13] [21, 22, 23]
# The required output:
# 0 1 2
# 0 [11] [12] [13]
# 1 [11, 21] [12, 22] [13, 23]
Stray thoughts
Maybe I should also iterate something like p.loc[ix, c] ... for c in pc... but how can there be 3 loops ??
A further (optional) wish
Is this possible with lambda too ? My intuition is: that would be faster-- but not sure !
Thanks for going through the question or any help offered.
You can explode the n, use that to slice p and groupby:
s = n['sel_row'].explode()
p.loc[s].groupby(s.index).agg(list)
Output:
a b c
1001 [11] [12] [13]
1002 [11, 21] [12, 22] [13, 23]
You can write a custom function here.
pc = ['a', 'b', 'c']
p = pd.DataFrame([[11, 12, 13],
[21, 22, 23]],
columns=pc,
index=[1001,
1002])
n = pd.DataFrame([[[1001] ],
[[1001, 1002]]],
columns=['sel_row'],
index=[1001,
1002])
def f(idx):
return pd.Series(p.loc[idx, :].values.T.tolist())
n.sel_row.apply(f)
0 1 2
1001 [11] [12] [13]
1002 [11, 21] [12, 22] [13, 23]
With lambda could rewrite above as:
n.sel_row.apply(lambda idx: pd.Series(p.loc[idx, :].values.T.tolist()))
IIUC, you could do:
data = [[[*x] for x in zip(*p.loc[idxs].values)] for idxs in n['sel_row']]
result = pd.DataFrame(data=data, columns=p.columns, index=p.index)
print(result)
Output
a b c
1001 [11] [12] [13]
1002 [11, 21] [12, 22] [13, 23]
Related
I'm stuck on trying to modify 2d array... Nothing I try seem to work... I'm trying to write a function that will add a value to its specific location in the numbers column...
import pandas as pd
def twod_array(num):
data = {"group": [-1, 0, 1, 2],
'numbers': [[2], [14, 15], [16, 17], [19, 20, 21]],
}
df = pd.DataFrame(data=data)
print(df)
return 0
Currently it prints this:
group numbers
0 -1 [2]
1 0 [14, 15]
2 1 [16, 17]
3 2 [19, 20, 21]
What I'd like to do is to add a value based on the passed input, so for example if I pass 14.5 as a num, this is the output I'd like to see:
group numbers
0 -1 [2]
1 0 [14,14.5 15]
2 1 [16, 17]
3 2 [19, 20, 21]
Another example:
If I pass 18 as a num:
group numbers
0 -1 [2]
1 0 [14, 15]
2 1 [16, 17, 18]
3 2 [19, 20, 21]
I'm hoping someone can help with this.
df = pd.DataFrame({"group": [-1, 0, 1, 2],
'numbers': [[2], [14, 15], [16, 17], [19, 20, 21]],
})
arr = df['numbers'].to_list()
in_num = 18
for i, sub_arr in enumerate(arr):
for j, num in enumerate(sub_arr):
if arr[i][j]>in_num:
if j!=0: arr[i].insert(j,in_num)
else: arr[i-1].insert(-1 ,in_num)
df['numbers'] = arr
I'm hoping I can explain this well. I have this df with 2 clumns: group and numbers. I'm trying to get that np.nan and pop it into it's new group.
def check_for_nan():
# for example let's say my new value is 14.5
new_nan_value=14.5
data = {"group:" : [-1,0,1,2,3],
'numbers': [[np.nan], [11, 12], [14, 15], [16, 17], [18, 19]],
}
df = pd.DataFrame(data=data)
# *** add some code ***
# I created a new dataframe to visually show how it should look like but we would want to operate only on the same df from above
data_2 = {"group" : [0,1,2,3],
'numbers': [[11, 12], [14,np.nan, 15], [16, 17], [18, 19]],
}
df_2 = pd.DataFrame(data=data_2)
# should return the new group number where the nan would live
return data_2["group"][1]
Output:
current:
group: numbers
0 -1 [nan]
1 0 [11, 12]
2 1 [14, 15]
3 2 [16, 17]
4 3 [18, 19]
Desired output when new_nan_value =14.5
group numbers
0 0 [11, 12]
1 1 [14, nan, 15]
2 2 [16, 17]
3 3 [18, 19]
return 1
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"group": [-1, 0, 1, 2, 3],
"numbers": [[pd.NA], [11, 12], [14, 15], [16, 17], [18, 19]],
}
)
new_nan_value = 14.5
Here is one way to do it:
def move_nan(df, new_nan_value):
"""Helper function.
Args:
df: input dataframe.
new_nan_value: insertion value.
Returns:
Dataframe with nan value at insertion point, new group.
"""
# Reshape dataframe along row axis
df = df.explode("numbers").dropna().reset_index(drop=True)
# Insert new row
insert_pos = df.loc[df["numbers"] < new_nan_value, "numbers"].index[-1] + 1
df = pd.concat(
[
df.loc[: insert_pos - 1, :],
pd.DataFrame({"group": [pd.NA], "numbers": pd.NA}, index=[insert_pos]),
df.loc[insert_pos:, :],
]
)
df["group"] = df["group"].fillna(method="bfill")
# Find new group value
new_group = df.loc[df["numbers"].isna(), "group"].values[0]
# Groupby and reshape dataframe along column axis
df = df.groupby("group").agg(list).reset_index(drop=False)
return df, new_group
So that:
df, new_group = move_nan(df, 14.5)
print(df)
# Output
group numbers
0 0 [11, 12]
1 1 [14, nan, 15]
2 2 [16, 17]
3 3 [18, 19]
print(new_group) # 1
I have a dataset that has a particular column having values similar to the dummy data frame below with column col2. The column entries are either a list or a list of lists and I want to flatten only the list of lists to a single list.
col1 col2
0 tom [10]
1 nick [15, 24]
2 juli [[16, 14], [19, 17]]
3 harry [23, 15]
4 frank [[15, 16], [50, 30]]
I want my expected dataframe to resemble something like this -
col1 col2
0 tom [10]
1 nick [15, 24]
2 juli [16, 14, 19, 17]
3 harry [23, 15]
4 frank [15, 16, 50, 3]
I tried using DF['col2'] = DF.col2.apply(lambda x: sum(x, [])) but it didn't work returning the error - TypeError: can only concatenate list (not "str") to list
How can I solve this elegantly?
You can use np.ravel, as follows:
df['col2'] = df['col2'].map(np.ravel)
Note that this assumed your list are real list instead of string looking like list. If not the case, you can convert the string to real list first, as follows:
import ast
df['col2'] = df['col2'].apply(ast.literal_eval)
# Then, run the code:
df['col2'] = df['col2'].map(np.ravel)
Result:
print(df)
col1 col2
0 tom [10]
1 nick [15, 24]
2 juli [16, 14, 19, 17]
3 harry [23, 15]
4 frank [15, 16, 50, 30]
I have a OP : {'2017-05-06': [3, 7, 8],'2017-05-07': [3, 9, 10],'2017-05-08': [4]}
from the OP I just want another OP :
{'2017-05-06': [15, 11, 10],'2017-05-07': [19, 13, 12],'2017-05-08': [4]}
which means:
Ncleand is 2017-05-06
element total is 18 so '2017-05-06': [3 -18, 7-18, 8-18] = '2017-05-06': [15, 11, 10]
likewise all elements data.
So final output is {'2017-05-06': [15, 11, 10],'2017-05-07': [19, 13, 12],'2017-05-08': [4]}
How to do this?
Note : I am using python 3.6.2 and pandas 0.22.0
code so far :
import pandas as pd
dfs = pd.read_excel('ff2.xlsx', sheet_name=None)
dfs1 = {i:x.groupby(pd.to_datetime(x['date']).dt.strftime('%Y-%m-%d'))['duration'].sum() for i, x in dfs.items()}
d = pd.concat(dfs1).groupby(level=1).apply(list).to_dict()
actuald = pd.concat(dfs1).div(80).astype(int)
sum1 = actuald.groupby(level=1).transform('sum')
m = actuald.groupby(level=1).transform('size') > 1
cleand = sum1.sub(actuald).where(m, actuald).groupby(level=1).apply(list).to_dict()
print (cleand)
From the cleand I want to do this?
In a compact (but somehow inefficient) way:
>>> op = {'2017-05-06': [3, 7, 8],'2017-05-07': [3, 9, 10],'2017-05-08': [4]}
>>> { x:[sum(y)-i for i in y] if len(y)>1 else y for x,y in op.items() }
#output:
{'2017-05-06': [15, 11, 10], '2017-05-07': [19, 13, 12], '2017-05-08': [4]}
def get_list_manipulation(list_):
subtraction = list_
if len(list_) != 1:
total = sum(list_)
subtraction = [total-val for val in list_]
return subtraction
for key, values in data.items():
data[key] = get_list_manipulation(values)
>>>{'2017-05-06': [15, 11, 10], '2017-05-07': [19, 13, 12], '2017-05-08': [4]}
I have an occurrences DataFrame :
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,3,size=(4,3)))
Out[0] :
0 1 2
0 2 2 1
1 2 2 2
2 1 1 1
3 2 1 2
and a list of values :
L = np.random.random_integers(10,15,size=df.values.sum())
Out[1] :
array([13, 11, 15, 11, 15, 13, 12, 11, 12, 15, 11, 11, 10, 11, 13, 11, 14,
10, 12])
I need your assistance for creating a new DataFrame of the same size than df which has the values of the list L given the occurrences matrix df :
0 1 2
0 [13, 11] [15, 11] [15]
1 [13, 12] [11, 12] [15, 11]
2 [11] [10] [11]
3 [13, 11] [14] [10, 12]
Simple nested loop variant:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,3,size=(4,3)))
L = np.random.random_integers(10,15,size=df.values.sum())
new_df = df.astype(object).copy()
L_ind = 0
for i in range(df.shape[0]):
for j in range(df.shape[1]):
new_df.loc[i, j] = list(L[L_ind: L_ind + df.iloc[i, j]])
L_ind += df.iloc[i, j]
df:
0 1 2
0 2 2 1
1 1 1 2
2 1 2 2
3 2 2 2
L:
array([15, 12, 10, 12, 13, 15, 13, 13, 15, 13, 15, 15, 12, 11, 14, 11, 10,
15, 15, 13])
new_df:
0 1 2
0 [15, 12] [10, 12] [13]
1 [15] [13] [13, 15]
2 [13] [15, 15] [12, 11]
3 [14, 11] [10, 15] [15, 13]
this code might help
import numpy as np
import pandas as pd
np.random.seed(7)
df = pd.DataFrame(np.random.randint(1,3,size=(4,3)))
# print df
L = np.random.random_integers(10,15,size=df.values.sum())
currentIndex=0
new_df = pd.DataFrame()
for c in df.columns.tolist():
new_list = []
for val in df[c]:
small_list = []
for i in range(val):
small_list.append(L[currentIndex])
currentIndex+=1
new_list.append(small_list)
new_df.insert(c,c,new_list)
print new_df
new_df
0 1 2
0 [10, 11] [14] [14, 15]
1 [12] [10, 13] [10, 10]
2 [12, 10] [12, 13] [15]
3 [14, 10] [14] [10, 13]