Select n-th lowest value from DataFrame (Every Row!) - python

I am looking for a solution to pick values (row wise) from a Dataframe.
Here is what I already have:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(1,10, (10, 10)))
df.columns = list('ABCDEFGHIJ')
N = 2
idx = np.argsort(df.values, 1)[:, 0:N]
df= pd.concat([pd.DataFrame(df.values.take(idx), index=df.index), pd.DataFrame(df.columns[idx], index=df.index)],keys=['Value', 'Columns']).sort_index(level=1)
Now I have the index/position for every value but if I try to get the values from the Dataframe it only takes the values from the first row.
What do I have to change in the code?
df looks like:
A B C D E F G H I J
0 6 9 6 1 1 2 8 7 3 5
1 6 3 5 3 5 8 8 2 8 1
2 7 8 7 2 1 2 9 9 4 9
....
My output should look like:
0 D E
0 1 1
1 J H
1 1 2

You can use np.take_along_axis to take values from dataframe. Use np.insert to sieve both values taken and corresponding column names.
# idx is the same as the one used in the question.
vals = np.take_along_axis(df.values, idx, axis=1)
cols = df.columns.values[idx]
indices = np.r_[: len(vals)] # same as np.arange(len(vals))
out = np.insert(vals.astype(str), indices , cols, axis=0)
index = np.repeat(indices, 2)
df = pd.DataFrame(out, index=index)
0 1
0 D E
0 1 1
1 J H
1 1 2
2 E D
2 1 2
3 E I
3 2 2
4 A D
4 1 1
5 I J
5 1 3
6 E I
6 1 2
7 B H
7 1 3
8 G I
8 1 1
9 E A
9 1 2

Related

How to count cumulatively with conditions on a groupby?

Say I have a data-frame, filled as below, with the column 'Key' having one of five possible values A, B, C, D, X. I would like to add a new column 'Res' that counts the number of these letters cumulatively and resets each time it hits and X.
For example:
Key Res
0 D 1
1 X 0
2 B 1
3 C 2
4 D 3
5 X 0
6 A 1
7 C 2
8 X 0
9 X 0
May anyone assist in how I can achieve this?
A possible solution:
a = df.Key.ne('X')
df['new'] = ((a.cumsum()-a.cumsum().where(~a).ffill().fillna(0)).astype(int))
Another possible solution, which is more basic than the previous one, but much faster (several orders of magnitude):
s = np.zeros(len(df), dtype=int)
for i in range(len(df)):
if df.Key[i] != 'X':
s[i] = s[i-1] + 1
df['new'] = s
Output:
Key Res new
0 D 1 1
1 X 0 0
2 B 1 1
3 C 2 2
4 D 3 3
5 X 0 0
6 A 1 1
7 C 2 2
8 X 0 0
9 X 0 0
Example
df = pd.DataFrame(list('DXBCDXACXX'), columns=['Key'])
df
Key
0 D
1 X
2 B
3 C
4 D
5 X
6 A
7 C
8 X
9 X
Code
df1 = pd.concat([df.iloc[[0]], df])
grouper = df1['Key'].eq('X').cumsum()
df1.assign(Res=df1.groupby(grouper).cumcount()).iloc[1:]
result:
Key Res
0 D 1
1 X 0
2 B 1
3 C 2
4 D 3
5 X 0
6 A 1
7 C 2
8 X 0
9 X 0

pandas-how can I replace rows in a dataframe

I am new in Python and try to replace rows.
I have a dataframe such as:
X
Y
1
a
2
d
3
c
4
a
5
b
6
e
7
a
8
b
I have two question:
1- How can I replace 2nd row with 5th, such as:
X
Y
1
a
5
b
3
c
4
a
2
d
6
e
7
a
8
b
2- How can I put 6th row above 3rd row, such as:
X
Y
1
a
2
d
6
e
3
c
4
a
5
b
7
a
8
b
First use DataFrame.iloc, python counts from 0, so for select second row use 1 and for fifth use 4:
df.iloc[[1, 4]] = df.iloc[[4, 1]]
print (df)
X Y
0 1 a
1 5 b
2 3 c
3 4 a
4 2 d
5 6 e
6 7 a
7 8 b
And then rename indices for above value, here 1 and sorting with only stable sorting mergesort:
df = df.rename({5:1}).sort_index(kind='mergesort', ignore_index=True)
print (df)
X Y
0 1 a
1 2 d
2 6 e
3 3 c
4 4 a
5 5 b
6 7 a
7 8 b

Calculating reciprocal of last row in dataframe and including it as a 'new' last row in the dataframe

I have a pandas dataframe:
DF: A B C D E F G H
0 J S T 1 2 3 4 5
1 R A M 2 3 4 5 6
sum 0 0 0 3 5 7 9 11
and I would like to add a new row to DF, that the dataframe takes the last tiw of the dataframe (in this case 'sum') and gets the reciprocal
So it should read as
DF:
A B C D E F G H
0 J S T 1 2 3 4 5
1 R A M 2 3 4 5 6
sum 0 0 0 3 5 7 9 11
rec 0 0 0 0.3 .25 etc etc
Try the following code:
df = df.append(df.loc['sum', 'D':'H'].apply(lambda x: 1 / x)\
.rename('rec')).fillna(0)

Repeating rows of a dataframe based on a column value

I have a data frame like this:
df1 = pd.DataFrame({'a': [1,2],
'b': [3,4],
'c': [6,5]})
df1
Out[150]:
a b c
0 1 3 6
1 2 4 5
Now I want to create a df that repeats each row based on difference between col b and c plus 1. So diff between b and c for first row is 6-3 = 3. I want to repeat that row 3+1=4 times. Similarly for second row the difference is 5-4 = 1, so I want to repeat it 1+1=2 times. The column d is added to have value from min(b) to diff between b and c (i.e.6-3 = 3. So it goes from 3->6). So I want to get this df:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5
Do it with reindex + repeat, then using groupby cumcount assign the new value d
df1.reindex(df1.index.repeat(df1.eval('c-b').add(1))).\
assign(d=lambda x : x.c-x.groupby('a').cumcount(ascending=False))
Out[572]:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5

Creating a list of sliced dataframes

I am trying to create a list of dataframes where each dataframe is 3 rows of a larger dataframe.
dframes = [df[0:3], df[3:6],...,df[2000:2003]]
I am still fairly new to programming, why does:
x = 3
dframes = []
for i in range(0, len(df)):
dframes = dframes.append(df[i:x])
i = x
x = x + 3
dframes = dframes.append(df[i:x])
AttributeError: 'NoneType' object has no attribute 'append'
Use np.split
Setup
Consider the dataframe df
df = pd.DataFrame(dict(A=range(15), B=list('abcdefghijklmno')))
Solution
dframes = np.split(df, range(3, len(df), 3))
Output
for d in dframes:
print(d, '\n')
A B
0 0 a
1 1 b
2 2 c
A B
3 3 d
4 4 e
5 5 f
A B
6 6 g
7 7 h
8 8 i
A B
9 9 j
10 10 k
11 11 l
A B
12 12 m
13 13 n
14 14 o
Python raise this error because function append return None and next time in your loot variable dframes will be None
You can use this:
[list(dframes[i:i+3]) for i in range(0, len(dframes), 3)]
You can use list comprehension with groupby by numpy array created by length of index floor divided by 3:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(10,5)), columns=list('ABCDE'))
print (df)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
5 4 3 7 1 1
6 7 7 0 2 9
7 9 3 2 5 8
8 1 0 7 6 2
9 0 8 2 5 1
dfs = [x for i, x in df.groupby(np.arange(len(df.index)) // 3)]
print (dfs)
[ A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8, A B C D E
3 4 0 9 6 2
4 4 1 5 3 4
5 4 3 7 1 1, A B C D E
6 7 7 0 2 9
7 9 3 2 5 8
8 1 0 7 6 2, A B C D E
9 0 8 2 5 1]
If default monotonic index (0,1,2...) solution can be simplify:
dfs = [x for i, x in df.groupby(df.index // 3)]

Categories