Let's assume I have the following data frame:
x y
1 -1.808909 0.093380
2 1.733595 -0.380938
3 -1.385898 0.714071
And I want to insert a value in the column after "y".
However, it's possible that I might insert more than one value.
So, I need to check if the cell after "y" is empty or not to avoid overwriting the cell.
so, the expected output might be like
x y
1 -1.808909 0.093380 5
2 1.733595 -0.380938 6 7
3 -1.385898 0.714071 8
Compared to the input above, I need to check the cell first if it's empty or not.
I thought I might use: x = df.iloc[1,:].last_valid_index()
but that method returns "y" not the index of "y" which is 1.
later I'll use that index to inset "5":
x +=1
df.iloc[1,x] = 5
I want to use that approach of finding the last non-empty cell because of the 2nd row in the output.
You see that I need to insert "6" then "7"
If I ended up using always the same method like this one:
df.iloc[1,2] = 6
df.iloc[1,2] = 7
It'll overwrite the "6" when inserting "7"
One more thing, I can't look for the value using something like: (df['y'].iloc[2]).index because later I'll have two "y" columns so, that might leads to returns index number less than the required.
It is easy to identify the position of the first zero in each row in a numpy array or a dataframe. Let's create a dataframe with zeros after a certain position:
df = pd.DataFrame(np.random.normal(size=(5, 10)))
df
0 1 2 3 4 5 6 7 8 9
0 4 1 4 2 6 0 0 0 0 0
1 5 4 9 5 5 4 0 0 0 0
2 6 6 6 5 4 8 6 0 0 0
3 5 3 9 5 3 9 6 3 0 0
4 3 2 7 9 7 6 6 7 5 0
For instance, the code below will give you all positions in the dataframe where the value is 0
np.argwhere(df.values == 0)
array([[0, 5],
[0, 6],
[0, 7],
[0, 8],
[0, 9],
[1, 6],
[1, 7],
[1, 8],
[1, 9],
[2, 7],
[2, 8],
[2, 9],
[3, 8],
[3, 9],
[4, 9]], dtype=int64)
Or you can get the positions where the values are not zero:
np.argwhere(df.values != 0)
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[1, 4],
[1, 5],
[2, 0],
[2, 1],
[2, 2],
[2, 3],
[2, 4],
[2, 5],
[2, 6],
[3, 0],
[3, 1],
[3, 2],
[3, 3],
[3, 4],
[3, 5],
[3, 6],
[3, 7],
[4, 0],
[4, 1],
[4, 2],
[4, 3],
[4, 4],
[4, 5],
[4, 6],
[4, 7],
[4, 8]], dtype=int64)
I hope it helps.
I suggest this, a less complicated solution
import random
nums = [0, 7, 78, 843, 34893, 0 , 2, 23, 4, 0]
random.shuffle(nums)
thg = [x for x in nums if x != 0]
print(thg[0])
what this does is shuffle the 'nums' list and filters out all the zeros. Then it prints the first non-zero value
Related
Given an id in a pandas dataframe, how can I create a new column that has an additional id that maxes out at a count of 5 for each ID. almost like "batches" of rows
df = pd.DataFrame([[1, 1],
[2, 1],
[3, 1],
[4, 1],
[5, 1],
[6, 1],
[7, 1],
[8, 2],
[9, 2],
[10, 3],
[11, 3],
[12, 3],
[13, 4],
[14, 5],
[15, 5],
[16, 5],
[17, 5],
[18, 5],
[19, 5],
[20, 5]])
df.columns = ['ln_num', 'id']
print(df)
#expected output
expected = pd.DataFrame([[1, 1, 1],
[2, 1, 1],
[3, 1, 1],
[4, 1, 1],
[5, 1, 1],
[6, 1, 2],
[7, 1, 2],
[8, 2, 3],
[9, 2, 3],
[10, 3, 4],
[11, 3, 4],
[12, 1, 2],
[13, 1, 2],
[14, 1, 2],
[15, 1, 5],
[16, 4, 6],
[17, 4, 6],
[18, 4, 6],
[19, 3, 4],
[20, 3, 4]])
expected.columns = ['ln_num', 'id', 'grp_id']
print(expected)
so for example if I have 11 rows with ID=1 I need 3 different unique Id's for these subset of alerts. 1. lines 1-5, 2. lines 6-10 3. line 11
The closest I've gotten so far is using a groupby with +1 offset that gives me a new grp_id for each id, but doesn't limit this to 5.
df = df.groupby('id').ngroup() + 1
I've also tried by head() and nlargest() but these don't sort ALL lines into batches, only the first or top 5
I would start by getting all the points where you know the transition will happen:
df[1].diff() \ # Show where column 1 differs from the previous row
.astype(bool) # Make it a boolean (true/false)
We can use this selection on the index of the dataframe to get the indices of rows that change:
df.index[df[1].diff().astype(bool)]
This gives output: Int64Index([0, 7, 9, 12, 13], dtype='int64') and we can check that rows 0, 7, 9, 12, and 13 are where column 1 changes.
Next, we need to break down any segments that are longer than 5 rows into smaller batches. We'll iterate though each pair of steps and use the range function to batch them:
all_steps = [] # Start with an empty list of steps
for i, step in enumerate(steps[:-1]):
all_steps += list(range(step, steps[i+1], 5)) # Add each step, but also any needed 5-steps
Last, we can use all_steps to assign values to the dataframe by index:
df['group'] = 0
for i, step in enumerate(all_steps[:-1]):
df.loc[step:all_steps[i+1], 'group'] = i
Putting it all together, we also need to use len(df) a few times, so that the range function knows how long the interval is on the last group.
steps = df.index[df[1].diff().astype(bool)].tolist() + [len(df)] # range needs to know how long the last interval is
all_steps = []
for i, step in enumerate(steps[:-1]):
all_steps += list(range(step, steps[i+1], 5))
all_steps += [len(df)] # needed for indexing
df['group'] = 0
for i, step in enumerate(all_steps[:-1]):
df.loc[step:all_steps[i+1], 'group'] = i
Our final output:
0 1 group
0 1 1 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 1
6 7 1 1
7 8 2 2
8 9 2 2
9 10 3 3
10 11 3 3
11 12 3 3
12 13 4 4
13 14 5 5
14 15 5 5
15 16 5 5
16 17 5 5
17 18 5 5
18 19 5 6
19 20 5 6
If you want the groups to start at 1, use the start=1 keyword in the enumerate function.
I have this task where I need to create a dataset based on two other connected datasets.
df = pd.DataFrame(columns=['ID','P1','P2'],
data=[[1, 2, 0], [2,1,0], [3, 1, 2], [4, 2, 1],
[5, 1, 2], [6, 0, 1], [7, 1, 0]])
fp = pd.DataFrame(columns=['ID','FP'],
data=[[1, 'fp'], [2,'i'], [3, 'i'], [4, 'fp'],
[5, 'fp'], [6, 'fp'], [7, 'i']])
My task is to create a third dataset which only contains the id, p1, and p2 from the df dataset if the fp data set 'FP' column shows 'fp'.
I tried this
df2 = np.where((df['ID']==fp['ID'])&fp['FP']=='fp)
But it didn't work
I am sure there is a better way than mine, but this is what I would do
import pandas as pd
df = pd.DataFrame(columns=['ID','P1','P2'],
data=[[1, 2, 0], [2,1,0], [3, 1, 2], [4, 2, 1],
[5, 1, 2], [6, 0, 1], [7, 1, 0]])
fp = pd.DataFrame(columns=['ID','FP'],
data=[[1, 'fp'], [2,'i'], [3, 'i'], [4, 'fp'],
[5, 'fp'], [6, 'fp'], [7, 'i']])
# Merging dataframes
res = df.merge(fp)
# Filtering
res = res[res['FP'] == 'fp'].drop(columns=['FP'])
res
Result
ID P1 P2
0 1 2 0
3 4 2 1
4 5 1 2
5 6 0 1
You can use Series.isin with boolean indexing.
idx = fp['ID'][fp['FP'].eq('fp')]
df.loc[df['ID'].isin(idx)]
ID P1 P2
0 1 2 0
3 4 2 1
4 5 1 2
5 6 0 1
This question already has answers here:
Flatten or group array in blocks of columns - NumPy / Python
(6 answers)
Closed 3 years ago.
I've got problem with reshaping simple 2-d array into another.
Let`s assume matrix :
[[4 1 2 1 2 4 1 2 4]
[2 3 0 3 0 2 3 0 2]
[5 5 1 5 1 5 5 1 5]
[6 6 6 6 6 6 6 6 6]]
What I want to do is to reshape it to (12, 3) matrix, but using (4, 3) block. What I meant to do is to get matrix like:
[[4 1 2
2 3 0
5 5 1
6 6 6
1 2 4
3 0 2
5 1 5
6 6 6
1 2 4
3 0 2
5 1 5
6 6 6]]
I have highlighted the "egde" of cutting this matrix by additional newline.
I`ve tried numpy reshape (with all available order parameter value), but still I get array with "mixed" values.
You can always do this manually for custom reshapes:
import numpy as np
data = [[4, 1, 2, 1, 2, 4, 1, 2, 4],
[2, 3, 0, 3, 0, 2, 3, 0, 2],
[5, 5, 1, 5, 1, 5, 5, 1, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6]]
X = np.array(data)
Z = np.r_[X[:, 0:3], X[:, 3:6], X[:, 6:9]]
print(Z)
yields
array([[4, 1, 2],
[2, 3, 0],
[5, 5, 1],
[6, 6, 6],
[1, 2, 4],
[3, 0, 2],
[5, 1, 5],
[6, 6, 6],
[1, 2, 4],
[3, 0, 2],
[5, 1, 5],
[6, 6, 6]])
note the special np.r_ operator that concatenates arrays on rows (first axis). It is just a handy alias for np.concatenate.
I am working with python 3.7 and I would like to get all the odd columns of a matrix.
To give a example, I have a 4x4 matrix of this style right now.
[[0, 9, 1, 6], [0, 3, 1, 5], [0, 2, 1, 7], [0, 6, 1, 2]]
That is...
0 9 1 6
0 3 1 5
0 2 1 7
0 6 1 2
And I would like to get:
9 6
3 5
2 7
6 2
The numbers and the size of the matrix will change but the structure will always be
[[0, (int), 1, (int), 2...], [0, (int), 1, (int), 2 ...], [0, (int), 1, (int), 2...], [0, (int), 1, (int), 2...], ...]
To get the rows I can do [:: 2], but that wonderful solution does not work for me right now. I try to access the matrix with:
for i in matrix:
for j in matrix:
But none of this doesn't work either.
How can I solve it?
Thank you.
Without using numpy, you can use something similar to your indexing scheme ([1::2]) in a list comprehension:
>>> [i[1::2] for i in mat]
[[9, 6], [3, 5], [2, 7], [6, 2]]
Using numpy, you can do something similar:
>>> import numpy as np
>>> np.array(mat)[:,1::2]
array([[9, 6],
[3, 5],
[2, 7],
[6, 2]])
If you can't use NumPy for whatever reason, write a custom implementation:
def getColumns(matrix, columns):
return {c: [matrix[r][c] for r in range(len(matrix))] for c in columns}
It takes a 2D array and a list of columns, and it returns a dictionary where the column indexes are keys and the actual columns are values. Note that if you passed all indices you would get a transposed matrix.
In your case,
M = [[0, 9, 1, 6],
[0, 3, 1, 5],
[0, 2, 1, 7],
[0, 6, 1, 2]]
All odd columns are even indices (because the index of the first one is 0), Therefore:
L = list(range(0, len(M[0]), 2))
And then you would do:
myColumns = getColumns(M, L)
print(list(myColumns.values()))
#result: [[0, 0, 0, 0], [1, 1, 1, 1]]
But since you showed the values as if they were in rows:
def f(matrix, columns):
return [[matrix[row][i] for i in columns] for row in range(len(matrix))]
print(f(M, L))
#result: [[0, 1], [0, 1], [0, 1], [0, 1]]
And I believe that the latter is what you wanted.
I'm finding difficulting what exactly the doctest in the following code is looking for in the function.
Can someone make it more clear for me, please?
Question 14:
Write functions row_times_column and matrix_mult:
def row_times_column(m1, row, m2, column):
"""
>>> row_times_column([[1, 2], [3, 4]], 0, [[5, 6], [7, 8]], 0)
19
>>> row_times_column([[1, 2], [3, 4]], 0, [[5, 6], [7, 8]], 1)
22
>>> row_times_column([[1, 2], [3, 4]], 1, [[5, 6], [7, 8]], 0)
43
>>> row_times_column([[1, 2], [3, 4]], 1, [[5, 6], [7, 8]], 1)
50
"""
def matrix_mult(m1, m2):
"""
>>> matrix_mult([[1, 2], [3, 4]], [[5, 6], [7, 8]])
[[19, 22], [43, 50]]
>>> matrix_mult([[1, 2, 3], [4, 5, 6]], [[7, 8], [9, 1], [2, 3]])
[[31, 19], [85, 55]]
>>> matrix_mult([[7, 8], [9, 1], [2, 3]], [[1, 2, 3], [4, 5, 6]])
[[39, 54, 69], [13, 23, 33], [14, 19, 24]]
"""
Add your new functions to matrices.py and be sure it passes the
doctests above.
Its asking you to implement some matrix multiplication methods.
In the first one, given a matrix, multiply the row of m1 x the column of m2. The first one given is:
1 2 times 5 6
3 4 7 8
but we only want row 0 times col 0, so it would be
[1 2] x 5
7
= 5+14 = 19. And so on for the others...
The second function wants a full matrix multiplication. See http://en.wikipedia.org/wiki/Matrix_multiplication
The first function, first example in docstring: take the zeroth row from the first matrix ([1, 2]) and multiply it by the zeroth column of the second matrix ([5, 7])
1 x 5 + 2 x 7 = 19
The second function is just standard matrix multiplication. You can use the first function to answer this problem.
The first function, row_times_column multiplies the nth row of the first matrix by the mth column of the second matrix. In the first doctest, for example, n = 0 and m = 0, so we multiply the row matrix [1, 2] by the column matrix [5, 7] to get 1 * 5 + 2 * 7 which is equal to 19 as specified.
The second function is the generalization, and you have to multiply the first matrix by the second. You are probably supposed to use the first function. The linked article shows how to get matrix multiplication from a combination of row by column multiplications.