I have matrix of 4x4 like this
ds1=
4 13 6 9
7 12 5 7
7 0 4 22
9 8 12 0
and other file with two columns:
ds2 =
4 1
5 3
6 1
7 2
8 2
9 3
12 1
13 2
22 3
ds1 = ds1.apply(lambda x: ds2_mean[1] if [condition])
What condition to be added to compare and check that elements from ds1 and ds2 are equal?
I want col1 value from 2nd matrix to be replaced by col2 value in matrix 1, so resultant matrix should look like
1 2 1 3
2 1 3 2
2 0 1 3
3 2 1 0
please see Replacing mean value from one dataset to another this does not answer my question
If you are working with numpy arrays, you could do this -
# Make a copy of ds1 to initialize output array
out = ds1.copy()
# Find out the row indices in ds2 that have intersecting elements between
# its first column and ds1
_,C = np.where(ds1.ravel()[:,None] == ds2[:,0])
# New values taken from the second column of ds2 to be put in output
newvals = ds2[C,1]
# Valid positions in output array to be changed
valid = np.in1d(ds1.ravel(),ds2[:,0])
# Finally make the changes to get desired output
out.ravel()[valid] = newvals
Sample input, output -
In [79]: ds1
Out[79]:
array([[ 4, 13, 6, 9],
[ 7, 12, 5, 7],
[ 7, 0, 4, 22],
[ 9, 8, 12, 0]])
In [80]: ds2
Out[80]:
array([[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]])
In [81]: out
Out[81]:
array([[1, 2, 1, 3],
[2, 1, 3, 2],
[2, 0, 1, 3],
[3, 2, 1, 0]])
Here is another solution. Using DataFrame.replace() function.
df1.replace(to_replace= df2[0].tolist(), value= df2[1].tolist, inplace=True)
Related
Given an id in a pandas dataframe, how can I create a new column that has an additional id that maxes out at a count of 5 for each ID. almost like "batches" of rows
df = pd.DataFrame([[1, 1],
[2, 1],
[3, 1],
[4, 1],
[5, 1],
[6, 1],
[7, 1],
[8, 2],
[9, 2],
[10, 3],
[11, 3],
[12, 3],
[13, 4],
[14, 5],
[15, 5],
[16, 5],
[17, 5],
[18, 5],
[19, 5],
[20, 5]])
df.columns = ['ln_num', 'id']
print(df)
#expected output
expected = pd.DataFrame([[1, 1, 1],
[2, 1, 1],
[3, 1, 1],
[4, 1, 1],
[5, 1, 1],
[6, 1, 2],
[7, 1, 2],
[8, 2, 3],
[9, 2, 3],
[10, 3, 4],
[11, 3, 4],
[12, 1, 2],
[13, 1, 2],
[14, 1, 2],
[15, 1, 5],
[16, 4, 6],
[17, 4, 6],
[18, 4, 6],
[19, 3, 4],
[20, 3, 4]])
expected.columns = ['ln_num', 'id', 'grp_id']
print(expected)
so for example if I have 11 rows with ID=1 I need 3 different unique Id's for these subset of alerts. 1. lines 1-5, 2. lines 6-10 3. line 11
The closest I've gotten so far is using a groupby with +1 offset that gives me a new grp_id for each id, but doesn't limit this to 5.
df = df.groupby('id').ngroup() + 1
I've also tried by head() and nlargest() but these don't sort ALL lines into batches, only the first or top 5
I would start by getting all the points where you know the transition will happen:
df[1].diff() \ # Show where column 1 differs from the previous row
.astype(bool) # Make it a boolean (true/false)
We can use this selection on the index of the dataframe to get the indices of rows that change:
df.index[df[1].diff().astype(bool)]
This gives output: Int64Index([0, 7, 9, 12, 13], dtype='int64') and we can check that rows 0, 7, 9, 12, and 13 are where column 1 changes.
Next, we need to break down any segments that are longer than 5 rows into smaller batches. We'll iterate though each pair of steps and use the range function to batch them:
all_steps = [] # Start with an empty list of steps
for i, step in enumerate(steps[:-1]):
all_steps += list(range(step, steps[i+1], 5)) # Add each step, but also any needed 5-steps
Last, we can use all_steps to assign values to the dataframe by index:
df['group'] = 0
for i, step in enumerate(all_steps[:-1]):
df.loc[step:all_steps[i+1], 'group'] = i
Putting it all together, we also need to use len(df) a few times, so that the range function knows how long the interval is on the last group.
steps = df.index[df[1].diff().astype(bool)].tolist() + [len(df)] # range needs to know how long the last interval is
all_steps = []
for i, step in enumerate(steps[:-1]):
all_steps += list(range(step, steps[i+1], 5))
all_steps += [len(df)] # needed for indexing
df['group'] = 0
for i, step in enumerate(all_steps[:-1]):
df.loc[step:all_steps[i+1], 'group'] = i
Our final output:
0 1 group
0 1 1 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 1
6 7 1 1
7 8 2 2
8 9 2 2
9 10 3 3
10 11 3 3
11 12 3 3
12 13 4 4
13 14 5 5
14 15 5 5
15 16 5 5
16 17 5 5
17 18 5 5
18 19 5 6
19 20 5 6
If you want the groups to start at 1, use the start=1 keyword in the enumerate function.
I have a 2D array, and I would like to find the min. value in every column and minus this min value in every column.
For example,
array = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]]
The smallest values in columns are 1, 2, 4.
I would like the result to be
array = [
[0, 0, 0],
[1, 2, 2],
[4, 5, 5]]
How can I achieve this?
If you use real numpy.array or pandas.DataFrame then you have arr.min(axis=0) and arr - arr.min(axis=0)
For numpy.array
import numpy as np
data = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]
]
arr = np.array(data)
print( arr.min(axis=0) )
print( arr - arr.min(axis=0) )
Result
[1 2 4]
[[0 0 0]
[1 2 2]
[4 5 5]]
Similar for pandas.DataFrame
import pandas as pd
data = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]
]
df = pd.DataFrame(data)
print( df.min(axis=0) )
print( df - df.min(axis=0) )
Result
0 1
1 2
2 4
dtype: int64
0 1 2
0 0 0 0
1 1 2 2
2 4 5 5
This question already has answers here:
Flatten or group array in blocks of columns - NumPy / Python
(6 answers)
Closed 3 years ago.
I've got problem with reshaping simple 2-d array into another.
Let`s assume matrix :
[[4 1 2 1 2 4 1 2 4]
[2 3 0 3 0 2 3 0 2]
[5 5 1 5 1 5 5 1 5]
[6 6 6 6 6 6 6 6 6]]
What I want to do is to reshape it to (12, 3) matrix, but using (4, 3) block. What I meant to do is to get matrix like:
[[4 1 2
2 3 0
5 5 1
6 6 6
1 2 4
3 0 2
5 1 5
6 6 6
1 2 4
3 0 2
5 1 5
6 6 6]]
I have highlighted the "egde" of cutting this matrix by additional newline.
I`ve tried numpy reshape (with all available order parameter value), but still I get array with "mixed" values.
You can always do this manually for custom reshapes:
import numpy as np
data = [[4, 1, 2, 1, 2, 4, 1, 2, 4],
[2, 3, 0, 3, 0, 2, 3, 0, 2],
[5, 5, 1, 5, 1, 5, 5, 1, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6]]
X = np.array(data)
Z = np.r_[X[:, 0:3], X[:, 3:6], X[:, 6:9]]
print(Z)
yields
array([[4, 1, 2],
[2, 3, 0],
[5, 5, 1],
[6, 6, 6],
[1, 2, 4],
[3, 0, 2],
[5, 1, 5],
[6, 6, 6],
[1, 2, 4],
[3, 0, 2],
[5, 1, 5],
[6, 6, 6]])
note the special np.r_ operator that concatenates arrays on rows (first axis). It is just a handy alias for np.concatenate.
If you have an x*n matrix how do you check for a row that contains a certain number and if so, how do you delete that row?
If you are using pandas, you can create a mask that you can use to index the dataframe, negating the mask with ~:
df = pd.DataFrame(np.arange(12).reshape(3, 4))
# 0 1 2 3
# 0 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
value = 2
If you want to check if the value is contained in a specific column:
df[~(df[2] == value)]
# 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
Or if it can be contained in any column:
df[~(df == value).any(axis=1)]
# 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
Just reassign it to df afterwards.
This also works if you are using just numpy:
x = np.arange(12).reshape(3, 4)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
x[~(x == value).any(axis=1)]
# array([[ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
And finally, if you are using plain Python and have a list of lists, use the built-in any in a list comprehension:
y = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
[row for row in y if not any(x == value for x in row)]
# [[4, 5, 6, 7], [8, 9, 10, 11]]
I have the following dataframe:
index = range(14)
data = [1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2, 1]
df = pd.DataFrame(data=data, index=index, columns = ['A'])
How can I fill the zeros with the previous non-zero value using pandas? Is there a fillna that is not just for "NaN"?.
The output should look like:
[1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2, 1]
(This question was asked before here Fill zero values of 1d numpy array with last non-zero values but he was asking exclusively for a numpy solution)
You can use replace with method='ffill'
In [87]: df['A'].replace(to_replace=0, method='ffill')
Out[87]:
0 1
1 1
2 1
3 2
4 2
5 4
6 6
7 8
8 8
9 8
10 8
11 8
12 2
13 1
Name: A, dtype: int64
To get numpy array, work on values
In [88]: df['A'].replace(to_replace=0, method='ffill').values
Out[88]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2, 1], dtype=int64)
This is a better answer to the previous one, since the previous answer returns a dataframe which hides all zero values.
Instead, if you use the following line of code -
df['A'].mask(df['A'] == 0).ffill(downcast='infer')
Then this resolves the problem. It replaces all 0 values with previous values.