I have a sorted Series, is there a simple way to change it from
A 0.064467
B 0.042283
C 0.037581
D 0.017410
dtype: float64
to
A 1
B 2
C 3
D 4
You can just do rank
df.rank(ascending=False)
Related
I have the following dataframe
df
A B C D
1 2 NA 3
2 3 NA 1
3 NA 1 2
A, B, C, and D are answers to a question. Basically, respondents ranked answers from 1 to 3 which means that one line cannot have 2 values the same. I am trying to make a new column which is a summary of the top 3 something such as.
1st 2nd 3rd
A B D
D A B
C D A
This format will make it easier for me to come up with conclusions such as, here are the 3rd top answers.
I didn't find any way to do this. Could you help me, please?
Thank you very much!
One way is using argsort and indexing the columns:
pd.DataFrame(df.columns[df.values.argsort()[:,:-1]],
columns=['1st', '2nd', '2rd'])
1st 2nd 2rd
0 A B D
1 D A B
2 C D A
Another way is to use stack()/pivot():
(df.stack().astype(int)
.reset_index(name='val')
.pivot('level_0', 'val', 'level_1')
)
Output:
val 1 2 3
level_0
0 A B D
1 D A B
2 C D A
I have the following Pandas dataframe:
name1 name2
A B
A A
A C
A A
B B
B A
I want to add a column named new which counts name1 OR name2 keeping the merged columns (distinct values in both name1 and name2). Hence, the expected output is the following dataframe:
name new
A 7
B 4
C 1
I've tried
df.groupby(["name1"]).count().groupby(["name2"]).count(), among many other things... but although that last one seems to give me the correct results, I cant get the joined datasets.
You can use value_counts with df.stack():
df[['name1','name2']].stack().value_counts()
#df.stack().value_counts() for all cols
A 7
B 4
C 1
Specifically:
(df[['name1','name2']].stack().value_counts().
to_frame('new').rename_axis('name').reset_index())
name new
0 A 7
1 B 4
2 C 1
Let us try melt
df.melt().value.value_counts()
Out[17]:
A 7
B 4
C 1
Name: value, dtype: int64
Alternatively,
df.name1.value_counts().add(df.name2.value_counts(), fill_value=0).astype(int)
gives you
A 7
B 4
C 1
dtype: int64
Using Series.append with Series.value_counts:
df['name1'].append(df['name2']).value_counts()
A 7
B 4
C 1
dtype: int64
value_counts converts the aggregated column to index. To get your desired output, use rename_axis with reset_index:
df['name1'].append(df['name2']).value_counts().rename_axis('name').reset_index(name='new')
name new
0 A 7
1 B 4
2 C 1
python Counter is another solution
from collections import Counter
s = pd.Series(Counter(df.to_numpy().flatten()))
In [1325]: s
Out[1325]:
A 7
B 4
C 1
dtype: int64
I have a data set of 60 plus computers with each column being the computer and the rows being the collection of all the software installed from each PC. I want to be able to count each unique value(software), so I can see how many of each software is currently installed.
data = [['a','a','c'],['a','b','d'],['a','c','c']]
df = pd.DataFrame(data,columns=['col1','col2','col3'])
df
col1 col2 col3
a a c
a b d
a c c
I expect the following output
a 4
b 1
c 3
value_counts after melt
df.melt().value.value_counts()
Out[648]:
a 4
c 3
b 1
d 1
Name: value, dtype: int64
numpy.unique for speed up
pd.Series(*np.unique(df.values.ravel(),return_counts=True)[::-1])
Out[653]:
a 4
b 1
c 3
d 1
dtype: int64
This is my table:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 2
Now, I want to group all rows by Column A and B. Column C should be summed and for column E, I want to use the value where value C is max.
I did the first part of grouping A and B and summing C. I did this with:
df = df.groupby(['A', 'B'])['C'].sum()
But at this point, I am not sure how to tell that column E should take the value where C is max.
The end result should look like this:
A B C E
0 1 1 6 4
1 3 3 8 2
Can somebody help me with this past piece?
Thanks!
Using groupby with agg after sorting by C.
In general, if you are applying different functions to different columns, DataFrameGroupBy.agg allows you to pass a dictionary specifying which operation is applied to each column:
df.sort_values('C').groupby(['A', 'B'], sort=False).agg({'C': 'sum', 'E': 'last'})
C E
A B
1 1 6 4
3 3 8 2
By sorting by column C first, and not sorting as part of groupby, we can select the last value of E per group, which will align with the maximum value of C for each group.
Given the following data frame:
import pandas as pd
d = pd.DataFrame({'a':[1,2,3],'b':[np.nan,5,6]})
d
a b
0 1 NaN
1 2 5.0
2 3 6.0
I would like to replace all non-null values with the column name.
Desired result:
a b
0 a NaN
1 a b
2 a b
In reality, I have many columns.
Thanks in advance!
Update to answer from root:
To perform this on a subset of columns:
d.loc[:,d.columns[3:]] = np.where(d.loc[:,d.columns[3:]].notnull(), d.loc[:,d.columns[3:]].columns, d.loc[:,d.columns[3:]])
Using numpy.where and notnull:
d[:] = np.where(d.notnull(), d.columns, d)
The resulting output:
a b
0 a NaN
1 a b
2 a b
Edit
To select specific columns:
cols = d.columns[3:] # or whatever Index/list-like of column names
d[cols] = np.where(d[cols].notnull(), cols, d[cols])
I can think of one possibility using apply/transform:
In [1610]: d.transform(lambda x: np.where(x.isnull(), x, x.name))
Out[1610]:
a b
0 a nan
1 a b
2 a b
You could also use df.where:
In [1627]: d.where(d.isnull(), d.columns.values.repeat(len(d)).reshape(d.shape))
Out[1627]:
a b
0 a NaN
1 a b
2 b b