Pandas reverse order of some columns not all

Pandas reverse order of some columns not all - python

There are a few solutions to reverse the order of all columns here on SO like for example so:
df = df.iloc[:,::-1]
What I want to achieve is only reverse the order of columns starting from column 2 on, so I want to keep A and B in the same place and only reverse the order of the columns after that. As I have a couple of hundred columns and don't want to write the order manually.
My initial columns are:
| A | B | C | D | E | F |
|---|---|---|---|---|---|
And The Result I am looking for is:
| A | B | F | E | D | C |
|---|---|---|---|---|---|
Thanks!

df[list(df.columns[:2]) + list(df.columns[:1:-1])]

Use numpy.r_ for concatenate of indices and tehn select by DataFrame.iloc:
df = pd.DataFrame(columns=list('ABCDEF'))
print (df.iloc[:, np.r_[0:2, len(df.columns)-1:1:-1]])
Empty DataFrame
Columns: [A, B, F, E, D, C]
Index: []
Or seelct both DataFrames with join:
print (df.iloc[:, :2].join(df.iloc[:, :1:-1]))
Empty DataFrame
Columns: [A, B, F, E, D, C]
Index: []

Related

convert dataframe column into int so can sort and do calulations on the column values

I have a dataframe row as such like below
a | b | c | d
1 |-700.5;-1000.0;200.0| yes | blue
I want to change column b to be numeric so I can do data work like sorting on it but when I try the below code
df= pd.to_numeric(df["b"])
print(df)
Get error ValueError: Unable to parse string or issue with "-".

Maybe you are looking for something like this:
df.assign(b = df.b.str.split(';')).explode('b')
Output:
a b c d
0 1 -700.5 yes blue
0 1 -1000.0 yes blue
0 1 200.0 yes blue

if you need that in the same column?
here is one way to do it
# split, and sort then join back
df['b'].str.split(';').apply(lambda x: sorted([(float(i)) for i in x], reverse=True))
df['b']=df['b'].str.split(';').apply(lambda x: sorted([(float(i)) for i in x], reverse=True))
df
a b c d
0 1 [200.0, -700.5, -1000.0] yes blue

How to get count of items in rows as well as check one item is present or not, and finally keep the first row in python?

Suppose I have a data frame as follows:
df = pd.DataFrame({
'Column A': [12,12,12, 13, 15, 16, 141, 141, 141, 141],
'Column B':['Apple' ,'Apple' ,'Orange' ,'Apple' , np.nan, 'Orange', 'Apple', np.nan, 'Apple', 'Apple']})
Based on these conditions:
If values in column A are repeated then count the word 'Orange' in Column B and paste it in new Column C(For example, there are 3 rows for 12, the count of 'Orange' is 1, and this 1 should be in new column C). For the non-repeat rows, just paste the corresponding values.
If values in column A are repeated then count the word 'Apple' in Column B and paste it in new Column D(For example, there are 3 rows for 12, the count of 'Apple' is 2, and this 2 should be in new column D). For the non-repeat rows, just paste the corresponding values.
For repeated and non-repeated rows due to Column A, If the word 'Orange' is present in Column B, write 'yes' else 'No' in Column E.
I would like to have an output following. I was trying in python jupyter notebook, can anyone please help me to get an output like this:
| Column A | Column B |Column C |Column D |Column E
----- | -------- | ---------|---------|---------|---------
0 | 12 | Apple |1 |2 |Yes
1 | 13 | Apple |0 |1 |No
2 | 15 | NaN |NaN |NaN |NaN
3 | 16 | Orange |1 |0 |Yes
4 | 141 | Apple |0 |3 |No
Thanks in advance:)

I think there is no powerful and simple solution for your question, but use the following code.
First, define a function count(x, a) which returns nan if x includes nan, the number of occurence of a in x, otherwise.
The function will be used for the apply function.
Then, use groupby and apply list function.
temp = df.copy().groupby('Column A')['Column B'].apply(list)
After that, temp becomes
Column A
12 [Apple, Apple, Orange]
13 [Apple]
15 [nan]
16 [Orange]
141 [Apple, nan, Apple, Apple]
Name: Column B, dtype: object
So, based on the temp, we can count the number of apples and oranges.
Since df has duplicates, I removed them and add new columns (Column C, D, and E).
df.drop_duplicates(subset = ['Column A'], keep = "first", inplace = True)
df['Column C'] = temp.apply(count, a = "Orange").values
df['Column D'] = temp.apply(count, a = "Apple").values
df['Column E'] = df['Column D'].apply(lambda x:1 if x>=1 else 0)
Edit
I am sorry. I missed the function count..
def count(x, a):
if type(x[0]) == float:
return np.nan
else:
return x.count(a)

Select either a column or another based on condition

I need to select rows from a column (A) if it's not nan else from a another column (B), how to do this in Python?
The use case is that I'm inserting a new column C that will contain the result of an operation (func) on values from column A. Sometimes the values in A are nan, and in these cases I want to calculate the value in C from B, an example of the result would be this:
| A | B | C |
| bla | bla2 | func(bla) | #read from A
| nan | bla3 | func(bla3) | #read from B

What you want is combine_first:
df['C'] = df['A'].combine_first(df['B']).apply(func)
No explicit iteration required here...

This code should solve your problem.
df['C'] = [x['B'] if x['A'] == 'nan' else x['A'] for x in df.iterrows()]
basically you create a new list by iterating all rows and selecting the correct one, then you add it to de df.

By applying a function to the dataframe that checks for nan before it returns, you have the ability to switch to another value in the same row. It's important to give axis=1 if you're using apply on a dataframe (all columns) rather than a series (one column)
import pandas as pd
df = pd.DataFrame({'A':[1,2,-2],'B':[1,2,2]})
def fn(x):
c = np.sqrt(x['A'])
if np.isnan(c):
c = np.sqrt(x['B'])
return c
df['C'] = df.apply(fn,axis=1)

Find maximum in Dataframe based on variable values

I have a dataframe of the form:
A| B| C | D
a| x| r | 1
a| x| s | 2
a| y| r | 1
b| w| t | 4
b| z| v | 2
I'd like to be able to return something like (showing unique values and frequency)
A| freq of most common value in Column B |maximum of column D based on the most common value in Column B | most common value in Column B
a 2 2 x
b 1 4 w
at the moment i can calculate the everything but the 3 column of the result dataframe quiet fast via
df = (df.groupby('A', sort=False)['B']
.apply(lambda x: x.value_counts().head(1))
.reset_index()
but to calculate the 2 Column ("maximum of column D based on the most common value in Column B") i have writen a for-loop witch is slow for a lot of data.
Is there a fast way?
The question is linked to: Count values in dataframe based on entry

Use merge with get rows by maximum D per groups by DataFrameGroupBy.idxmax:
df1 = (df.groupby('A', sort=False)['B']
.apply(lambda x: x.value_counts().head(1))
.reset_index()
.rename(columns={'level_1':'E'}))
#print (df1)
df = df1.merge(df, left_on=['A','E'], right_on=['A','B'], suffixes=('','_'))
df = df.loc[df.groupby('A')['D'].idxmax(), ['A','B','D','E']]
print (df)
A B D E
1 a 2 2 x
2 b 1 4 w

Consider doing this in 3 steps:
find most common B (as in your code):
df2 = (df.groupby('A', sort=False)['B']).apply(lambda x: x.value_counts().head(1)).reset_index()
build DataFrame with max D for each combination of A and B
df3 = df.groupby(['A','B']).agg({'D':max}).reset_index()
merge 2 DataFrames to find max Ds matching the A-B pairs selected earlier
df2.merge(df3, left_on=['A','level_1'], right_on=['A','B'])
The column D in the resulting DataFrame will be what you need
A level_1 B_x B_y D
0 a x 2 x 2
1 b w 1 w 4

How to add values to a row if a column exist in a df

I have a list of items that I loop to find values example: dfexample1
#List of documents:
list1 = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']
#each document contains a DF with the same Test columns with the same Tets letters:
doc1.pdf: df1
Test | Value
a | 5.5
b | 6.5
c | 8.5
doc2.pdf: df2
Test | Value
a | 6.5
b | 11.5
c | 13.5
doc3.pdf: df3
Test | Value
a | 12.5
b | 3.5
c | 9.5
I have a df with the Test values being the columns
a | b | c
in each loop I am trying to extract the value of each test and ad them to the df.
in the first example the same analysis will be repeated over a list of pdf documents so I need to extract each of those values and add them into de df.
I tried this:
for items in list:
for index, row in dfexample1.iterrows():
if item in row[0]:
value1=row[1]
Verifying if value exists as column in the df:
if items in df.columns:
#How can I insert this value in the next row?
My expected result would be:
a | b | c
5.5|6.5 |8.5
6.5|11.5|13.5
12.5| 3.5|9.5
.T would not work I guess because the dfexample1 will be repeated with different values while looping across documents.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas reverse order of some columns not all - python

df[list(df.columns[:2]) + list(df.columns[:1:-1])]

Related

convert dataframe column into int so can sort and do calulations on the column values

How to get count of items in rows as well as check one item is present or not, and finally keep the first row in python?

Select either a column or another based on condition

Find maximum in Dataframe based on variable values

How to add values to a row if a column exist in a df

Categories

Resources