I'm stuck on trying to modify 2d array... Nothing I try seem to work... I'm trying to write a function that will add a value to its specific location in the numbers column...
import pandas as pd
def twod_array(num):
data = {"group": [-1, 0, 1, 2],
'numbers': [[2], [14, 15], [16, 17], [19, 20, 21]],
}
df = pd.DataFrame(data=data)
print(df)
return 0
Currently it prints this:
group numbers
0 -1 [2]
1 0 [14, 15]
2 1 [16, 17]
3 2 [19, 20, 21]
What I'd like to do is to add a value based on the passed input, so for example if I pass 14.5 as a num, this is the output I'd like to see:
group numbers
0 -1 [2]
1 0 [14,14.5 15]
2 1 [16, 17]
3 2 [19, 20, 21]
Another example:
If I pass 18 as a num:
group numbers
0 -1 [2]
1 0 [14, 15]
2 1 [16, 17, 18]
3 2 [19, 20, 21]
I'm hoping someone can help with this.
df = pd.DataFrame({"group": [-1, 0, 1, 2],
'numbers': [[2], [14, 15], [16, 17], [19, 20, 21]],
})
arr = df['numbers'].to_list()
in_num = 18
for i, sub_arr in enumerate(arr):
for j, num in enumerate(sub_arr):
if arr[i][j]>in_num:
if j!=0: arr[i].insert(j,in_num)
else: arr[i-1].insert(-1 ,in_num)
df['numbers'] = arr
Related
I'm hoping I can explain this well. I have this df with 2 clumns: group and numbers. I'm trying to get that np.nan and pop it into it's new group.
def check_for_nan():
# for example let's say my new value is 14.5
new_nan_value=14.5
data = {"group:" : [-1,0,1,2,3],
'numbers': [[np.nan], [11, 12], [14, 15], [16, 17], [18, 19]],
}
df = pd.DataFrame(data=data)
# *** add some code ***
# I created a new dataframe to visually show how it should look like but we would want to operate only on the same df from above
data_2 = {"group" : [0,1,2,3],
'numbers': [[11, 12], [14,np.nan, 15], [16, 17], [18, 19]],
}
df_2 = pd.DataFrame(data=data_2)
# should return the new group number where the nan would live
return data_2["group"][1]
Output:
current:
group: numbers
0 -1 [nan]
1 0 [11, 12]
2 1 [14, 15]
3 2 [16, 17]
4 3 [18, 19]
Desired output when new_nan_value =14.5
group numbers
0 0 [11, 12]
1 1 [14, nan, 15]
2 2 [16, 17]
3 3 [18, 19]
return 1
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"group": [-1, 0, 1, 2, 3],
"numbers": [[pd.NA], [11, 12], [14, 15], [16, 17], [18, 19]],
}
)
new_nan_value = 14.5
Here is one way to do it:
def move_nan(df, new_nan_value):
"""Helper function.
Args:
df: input dataframe.
new_nan_value: insertion value.
Returns:
Dataframe with nan value at insertion point, new group.
"""
# Reshape dataframe along row axis
df = df.explode("numbers").dropna().reset_index(drop=True)
# Insert new row
insert_pos = df.loc[df["numbers"] < new_nan_value, "numbers"].index[-1] + 1
df = pd.concat(
[
df.loc[: insert_pos - 1, :],
pd.DataFrame({"group": [pd.NA], "numbers": pd.NA}, index=[insert_pos]),
df.loc[insert_pos:, :],
]
)
df["group"] = df["group"].fillna(method="bfill")
# Find new group value
new_group = df.loc[df["numbers"].isna(), "group"].values[0]
# Groupby and reshape dataframe along column axis
df = df.groupby("group").agg(list).reset_index(drop=False)
return df, new_group
So that:
df, new_group = move_nan(df, 14.5)
print(df)
# Output
group numbers
0 0 [11, 12]
1 1 [14, nan, 15]
2 2 [16, 17]
3 3 [18, 19]
print(new_group) # 1
What I have, and what I need
I have a pandas DataFrame p with cols 'a', 'b', 'c' (col names stored in pc).
From that I would like to create a DataFrame pn of the same shape, but each cell as a list of values from selected rows.
The DataFrame n tells me which rows to select from p for each row in pn.
import pandas as pd
pc = ['a', 'b', 'c']
p = pd.DataFrame([[11, 12, 13],
[21, 22, 23]],
columns=pc,
index=[1001,
1002])
n = pd.DataFrame([[[1001] ],
[[1001, 1002]]],
columns=['sel_row'],
index=[1001,
1002])
What I could (and want to) achieve
The farthest I could get... gives me a list of cols, rather than rows.
So, am I confusing the nested for loops ?
pn = pd.DataFrame([ [p.loc[ix, pc].values for ix in n.loc[indx].values[0]]
for indx in n.index ])
print (pn)
# The actual output:
# 0 1
# 0 [11, 12, 13] None
# 1 [11, 12, 13] [21, 22, 23]
# The required output:
# 0 1 2
# 0 [11] [12] [13]
# 1 [11, 21] [12, 22] [13, 23]
Stray thoughts
Maybe I should also iterate something like p.loc[ix, c] ... for c in pc... but how can there be 3 loops ??
A further (optional) wish
Is this possible with lambda too ? My intuition is: that would be faster-- but not sure !
Thanks for going through the question or any help offered.
You can explode the n, use that to slice p and groupby:
s = n['sel_row'].explode()
p.loc[s].groupby(s.index).agg(list)
Output:
a b c
1001 [11] [12] [13]
1002 [11, 21] [12, 22] [13, 23]
You can write a custom function here.
pc = ['a', 'b', 'c']
p = pd.DataFrame([[11, 12, 13],
[21, 22, 23]],
columns=pc,
index=[1001,
1002])
n = pd.DataFrame([[[1001] ],
[[1001, 1002]]],
columns=['sel_row'],
index=[1001,
1002])
def f(idx):
return pd.Series(p.loc[idx, :].values.T.tolist())
n.sel_row.apply(f)
0 1 2
1001 [11] [12] [13]
1002 [11, 21] [12, 22] [13, 23]
With lambda could rewrite above as:
n.sel_row.apply(lambda idx: pd.Series(p.loc[idx, :].values.T.tolist()))
IIUC, you could do:
data = [[[*x] for x in zip(*p.loc[idxs].values)] for idxs in n['sel_row']]
result = pd.DataFrame(data=data, columns=p.columns, index=p.index)
print(result)
Output
a b c
1001 [11] [12] [13]
1002 [11, 21] [12, 22] [13, 23]
I have a OP : {'2017-05-06': [3, 7, 8],'2017-05-07': [3, 9, 10],'2017-05-08': [4]}
from the OP I just want another OP :
{'2017-05-06': [15, 11, 10],'2017-05-07': [19, 13, 12],'2017-05-08': [4]}
which means:
Ncleand is 2017-05-06
element total is 18 so '2017-05-06': [3 -18, 7-18, 8-18] = '2017-05-06': [15, 11, 10]
likewise all elements data.
So final output is {'2017-05-06': [15, 11, 10],'2017-05-07': [19, 13, 12],'2017-05-08': [4]}
How to do this?
Note : I am using python 3.6.2 and pandas 0.22.0
code so far :
import pandas as pd
dfs = pd.read_excel('ff2.xlsx', sheet_name=None)
dfs1 = {i:x.groupby(pd.to_datetime(x['date']).dt.strftime('%Y-%m-%d'))['duration'].sum() for i, x in dfs.items()}
d = pd.concat(dfs1).groupby(level=1).apply(list).to_dict()
actuald = pd.concat(dfs1).div(80).astype(int)
sum1 = actuald.groupby(level=1).transform('sum')
m = actuald.groupby(level=1).transform('size') > 1
cleand = sum1.sub(actuald).where(m, actuald).groupby(level=1).apply(list).to_dict()
print (cleand)
From the cleand I want to do this?
In a compact (but somehow inefficient) way:
>>> op = {'2017-05-06': [3, 7, 8],'2017-05-07': [3, 9, 10],'2017-05-08': [4]}
>>> { x:[sum(y)-i for i in y] if len(y)>1 else y for x,y in op.items() }
#output:
{'2017-05-06': [15, 11, 10], '2017-05-07': [19, 13, 12], '2017-05-08': [4]}
def get_list_manipulation(list_):
subtraction = list_
if len(list_) != 1:
total = sum(list_)
subtraction = [total-val for val in list_]
return subtraction
for key, values in data.items():
data[key] = get_list_manipulation(values)
>>>{'2017-05-06': [15, 11, 10], '2017-05-07': [19, 13, 12], '2017-05-08': [4]}
I have an occurrences DataFrame :
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,3,size=(4,3)))
Out[0] :
0 1 2
0 2 2 1
1 2 2 2
2 1 1 1
3 2 1 2
and a list of values :
L = np.random.random_integers(10,15,size=df.values.sum())
Out[1] :
array([13, 11, 15, 11, 15, 13, 12, 11, 12, 15, 11, 11, 10, 11, 13, 11, 14,
10, 12])
I need your assistance for creating a new DataFrame of the same size than df which has the values of the list L given the occurrences matrix df :
0 1 2
0 [13, 11] [15, 11] [15]
1 [13, 12] [11, 12] [15, 11]
2 [11] [10] [11]
3 [13, 11] [14] [10, 12]
Simple nested loop variant:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,3,size=(4,3)))
L = np.random.random_integers(10,15,size=df.values.sum())
new_df = df.astype(object).copy()
L_ind = 0
for i in range(df.shape[0]):
for j in range(df.shape[1]):
new_df.loc[i, j] = list(L[L_ind: L_ind + df.iloc[i, j]])
L_ind += df.iloc[i, j]
df:
0 1 2
0 2 2 1
1 1 1 2
2 1 2 2
3 2 2 2
L:
array([15, 12, 10, 12, 13, 15, 13, 13, 15, 13, 15, 15, 12, 11, 14, 11, 10,
15, 15, 13])
new_df:
0 1 2
0 [15, 12] [10, 12] [13]
1 [15] [13] [13, 15]
2 [13] [15, 15] [12, 11]
3 [14, 11] [10, 15] [15, 13]
this code might help
import numpy as np
import pandas as pd
np.random.seed(7)
df = pd.DataFrame(np.random.randint(1,3,size=(4,3)))
# print df
L = np.random.random_integers(10,15,size=df.values.sum())
currentIndex=0
new_df = pd.DataFrame()
for c in df.columns.tolist():
new_list = []
for val in df[c]:
small_list = []
for i in range(val):
small_list.append(L[currentIndex])
currentIndex+=1
new_list.append(small_list)
new_df.insert(c,c,new_list)
print new_df
new_df
0 1 2
0 [10, 11] [14] [14, 15]
1 [12] [10, 13] [10, 10]
2 [12, 10] [12, 13] [15]
3 [14, 10] [14] [10, 13]
What is the Pythonic way to get a list of diagonal elements in a matrix passing through entry (i,j)?
For e.g., given a matrix like:
[1 2 3 4 5]
[6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[21 22 23 24 25]
and an entry, say, (1,3) (representing element 9) how can I get the elements in the diagonals passing through 9 in a Pythonic way? Basically, [3,9,15] and [5,9,13,17,21] both.
Using np.diagonal with a little offset logic.
import numpy as np
lst = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
i, j = 1, 3
major = np.diagonal(lst, offset=(j - i))
print(major)
array([ 3, 9, 15])
minor = np.diagonal(np.rot90(lst), offset=-lst.shape[1] + (j + i) + 1)
print(minor)
array([ 5, 9, 13, 17, 21])
The indices i and j are the row and column. By specifying the offset, numpy knows from where to begin selecting elements for the diagonal.
For the major diagonal, You want to start collecting from 3 in the first row. So you need to take the current column index and subtract it by the current row index, to figure out the correct column index at the 0th row. Similarly for the minor diagonal, where the array is flipped (rotated by 90˚) and the process repeats.
As another alternative method, with raveling the array and for matrix with shape (n*n):
array = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
x, y = 1, 3
a_mod = array.ravel()
size = array.shape[0]
if y >= x:
diag = a_mod[y-x:(x+size-y)*size:size+1]
else:
diag = a_mod[(x-y)*size::size+1]
if x-(size-1-y) >= 0:
reverse_diag = array[:, ::-1].ravel()[(x-(size-1-y))*size::size+1]
else:
reverse_diag = a_mod[x:x*size+1:size-1]
# diag --> [ 3 9 15]
# reverse_diag --> [ 5 9 13 17 21]
The correctness of the resulted arrays must be checked further. This can be developed to handle matrices with other shapes e.g. (n*m).