pandas replace zeros with previous non zero value - python

I have the following dataframe:
index = range(14)
data = [1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2, 1]
df = pd.DataFrame(data=data, index=index, columns = ['A'])
How can I fill the zeros with the previous non-zero value using pandas? Is there a fillna that is not just for "NaN"?.
The output should look like:
[1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2, 1]
(This question was asked before here Fill zero values of 1d numpy array with last non-zero values but he was asking exclusively for a numpy solution)

You can use replace with method='ffill'
In [87]: df['A'].replace(to_replace=0, method='ffill')
Out[87]:
0 1
1 1
2 1
3 2
4 2
5 4
6 6
7 8
8 8
9 8
10 8
11 8
12 2
13 1
Name: A, dtype: int64
To get numpy array, work on values
In [88]: df['A'].replace(to_replace=0, method='ffill').values
Out[88]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2, 1], dtype=int64)

This is a better answer to the previous one, since the previous answer returns a dataframe which hides all zero values.
Instead, if you use the following line of code -
df['A'].mask(df['A'] == 0).ffill(downcast='infer')
Then this resolves the problem. It replaces all 0 values with previous values.

Related

Using numpy to select rows based on a condition of one column

I have a file with various columns. Say
1 2 3 4 5 6
2 4 5 6 7 4
3 4 5 6 7 6
2 0 1 5 6 0
2 4 6 8 9 9
I would like to select and save out rows (in each column) in a new file which have the values in column two in the range [0 - 2].
The answer in the new file should be
1 2 3 4 5 6
2 0 1 5 6 0
Kindly assist me. I prefer doing this with numpy in python.
For array a, you can use:
a[(a[:,1] <= 2) & (a[:,1] >= 0)]
Here, the condition filters the values in your second column.
For your example:
>>> a
array([[1, 2, 3, 4, 5, 6],
[2, 4, 5, 6, 7, 4],
[3, 4, 5, 6, 7, 6],
[2, 0, 1, 5, 6, 0],
[2, 4, 6, 8, 9, 9]])
>>> a[(a[:,1] <= 2) & (a[:,1] >= 0)]
array([[1, 2, 3, 4, 5, 6],
[2, 0, 1, 5, 6, 0]])

Pandas: back unique values to column in order

I'm not sure how I should proceed in this case.
Consider a df like bellow and when I do df.A.unique() -> give me an array like this [1, 2, 3, 4]
But also I want the index of this values, like numpy.unique()
df = pd.DataFrame({'A': [1,1,1,2,2,2,3,3,4], 'B':[9,8,7,6,5,4,3,2,1]})
df.A.unique()
>>> array([1, 2, 3, 4])
And
np.unique([1,1,1,2,2,2,3,3,4], return_inverse=True)
>>> (array([1, 2, 3, 4]), array([0, 0, 0, 1, 1, 1, 2, 2, 3]))
How can I do it in Pandas? Unique values with index.
In pandas we have drop_duplicates
df.A.drop_duplicates()
Out[22]:
0 1
3 2
6 3
8 4
Name: A, dtype: int64
To match the np.unique output factorize
pd.factorize(df.A)
Out[21]: (array([0, 0, 0, 1, 1, 1, 2, 2, 3]), Int64Index([1, 2, 3, 4], dtype='int64'))
You can also use a dict to .map() with index of .unique():
df.A.map({i:e for e,i in enumerate(df.A.unique())})
0 0
1 0
2 0
3 1
4 1
5 1
6 2
7 2
8 3

Remove all columns that are of a certain value in a specific row

I'm looking for a way to remove all columns from my pandas df based on the value of a single row, e.g., return a new df with all rows but only those columns that are zero in row X.
You can do this with loc and iloc
df = pd.DataFrame({'a':[1, 20, 30, 4, 0],
'b':[1, 0, 3, 4, 0],
'c':[1, 3, 7, 7, 5],
'd':[1, 8, 3, 8, 5],
'e':[1, 11, 3, 4, 0]})
df.loc[:, df.iloc[4,:] == 0]
a b e
0 1 1 1
1 2 0 2
2 3 3 3
3 4 4 4
4 0 0 0
Ok, I found a way. Any better/quicker/more pythonian/pandas solution anyone?
zero_cols = df1['X' == 0
df2 = df1.loc[:,zero_cols == True]

Filter a 2D numpy array from an array of values

Let's say I have a numpy array with the following shape :
nonSortedNonFiltered=np.array([[9,8,5,4,6,7,1,2,3],[1,3,2,6,4,5,7,9,8]])
I want to :
- Sort the array according to nonSortedNonFiltered[1]
- Filter the array according to nonSortedNonFiltered[0] and an array of values
I currently do the sorting with :
sortedNonFiltered=nonSortedNonFiltered[:,nonSortedNonFiltered[1].argsort()]
Which gives : np.array([[9 5 8 6 7 4 1 3 2],[1 2 3 4 5 6 7 8 9]])
Now I want to filter sortedNonFiltered from an array of values, for example :
sortedNonFiltered=np.array([[9 5 8 6 7 4 1 3 2],[1 2 3 4 5 6 7 8 9]])
listOfValues=np.array([8 6 5 2 1])
...Something here...
> np.array([5 8 6 1 2],[2 3 4 7 9]) #What I want to get in the end
Note : Each value in a column of my 2D array is exclusive.
You can use np.in1d to get a boolean mask and use it to filter columns in the sorted array, something like this -
output = sortedNonFiltered[:,np.in1d(sortedNonFiltered[0],listOfValues)]
Sample run -
In [76]: nonSortedNonFiltered
Out[76]:
array([[9, 8, 5, 4, 6, 7, 1, 2, 3],
[1, 3, 2, 6, 4, 5, 7, 9, 8]])
In [77]: sortedNonFiltered
Out[77]:
array([[9, 5, 8, 6, 7, 4, 1, 3, 2],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [78]: listOfValues
Out[78]: array([8, 6, 5, 2, 1])
In [79]: sortedNonFiltered[:,np.in1d(sortedNonFiltered[0],listOfValues)]
Out[79]:
array([[5, 8, 6, 1, 2],
[2, 3, 4, 7, 9]])

Any function in numpy/pandas/python to search and replace

I have matrix of 4x4 like this
ds1=
4 13 6 9
7 12 5 7
7 0 4 22
9 8 12 0
and other file with two columns:
ds2 =
4 1
5 3
6 1
7 2
8 2
9 3
12 1
13 2
22 3
ds1 = ds1.apply(lambda x: ds2_mean[1] if [condition])
What condition to be added to compare and check that elements from ds1 and ds2 are equal?
I want col1 value from 2nd matrix to be replaced by col2 value in matrix 1, so resultant matrix should look like
1 2 1 3
2 1 3 2
2 0 1 3
3 2 1 0
please see Replacing mean value from one dataset to another this does not answer my question
If you are working with numpy arrays, you could do this -
# Make a copy of ds1 to initialize output array
out = ds1.copy()
# Find out the row indices in ds2 that have intersecting elements between
# its first column and ds1
_,C = np.where(ds1.ravel()[:,None] == ds2[:,0])
# New values taken from the second column of ds2 to be put in output
newvals = ds2[C,1]
# Valid positions in output array to be changed
valid = np.in1d(ds1.ravel(),ds2[:,0])
# Finally make the changes to get desired output
out.ravel()[valid] = newvals
Sample input, output -
In [79]: ds1
Out[79]:
array([[ 4, 13, 6, 9],
[ 7, 12, 5, 7],
[ 7, 0, 4, 22],
[ 9, 8, 12, 0]])
In [80]: ds2
Out[80]:
array([[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]])
In [81]: out
Out[81]:
array([[1, 2, 1, 3],
[2, 1, 3, 2],
[2, 0, 1, 3],
[3, 2, 1, 0]])
Here is another solution. Using DataFrame.replace() function.
df1.replace(to_replace= df2[0].tolist(), value= df2[1].tolist, inplace=True)

Categories