There are a few solutions to reverse the order of all columns here on SO like for example so:
df = df.iloc[:,::-1]
What I want to achieve is only reverse the order of columns starting from column 2 on, so I want to keep A and B in the same place and only reverse the order of the columns after that. As I have a couple of hundred columns and don't want to write the order manually.
My initial columns are:
| A | B | C | D | E | F |
|---|---|---|---|---|---|
And The Result I am looking for is:
| A | B | F | E | D | C |
|---|---|---|---|---|---|
Thanks!
df[list(df.columns[:2]) + list(df.columns[:1:-1])]
Use numpy.r_ for concatenate of indices and tehn select by DataFrame.iloc:
df = pd.DataFrame(columns=list('ABCDEF'))
print (df.iloc[:, np.r_[0:2, len(df.columns)-1:1:-1]])
Empty DataFrame
Columns: [A, B, F, E, D, C]
Index: []
Or seelct both DataFrames with join:
print (df.iloc[:, :2].join(df.iloc[:, :1:-1]))
Empty DataFrame
Columns: [A, B, F, E, D, C]
Index: []
Related
I have a dataframe row as such like below
a | b | c | d
1 |-700.5;-1000.0;200.0| yes | blue
I want to change column b to be numeric so I can do data work like sorting on it but when I try the below code
df= pd.to_numeric(df["b"])
print(df)
Get error ValueError: Unable to parse string or issue with "-".
Maybe you are looking for something like this:
df.assign(b = df.b.str.split(';')).explode('b')
Output:
a b c d
0 1 -700.5 yes blue
0 1 -1000.0 yes blue
0 1 200.0 yes blue
if you need that in the same column?
here is one way to do it
# split, and sort then join back
df['b'].str.split(';').apply(lambda x: sorted([(float(i)) for i in x], reverse=True))
df['b']=df['b'].str.split(';').apply(lambda x: sorted([(float(i)) for i in x], reverse=True))
df
a b c d
0 1 [200.0, -700.5, -1000.0] yes blue
Suppose I have a data frame as follows:
df = pd.DataFrame({
'Column A': [12,12,12, 13, 15, 16, 141, 141, 141, 141],
'Column B':['Apple' ,'Apple' ,'Orange' ,'Apple' , np.nan, 'Orange', 'Apple', np.nan, 'Apple', 'Apple']})
Based on these conditions:
If values in column A are repeated then count the word 'Orange' in Column B and paste it in new Column C(For example, there are 3 rows for 12, the count of 'Orange' is 1, and this 1 should be in new column C). For the non-repeat rows, just paste the corresponding values.
If values in column A are repeated then count the word 'Apple' in Column B and paste it in new Column D(For example, there are 3 rows for 12, the count of 'Apple' is 2, and this 2 should be in new column D). For the non-repeat rows, just paste the corresponding values.
For repeated and non-repeated rows due to Column A, If the word 'Orange' is present in Column B, write 'yes' else 'No' in Column E.
I would like to have an output following. I was trying in python jupyter notebook, can anyone please help me to get an output like this:
| Column A | Column B |Column C |Column D |Column E
----- | -------- | ---------|---------|---------|---------
0 | 12 | Apple |1 |2 |Yes
1 | 13 | Apple |0 |1 |No
2 | 15 | NaN |NaN |NaN |NaN
3 | 16 | Orange |1 |0 |Yes
4 | 141 | Apple |0 |3 |No
Thanks in advance:)
I think there is no powerful and simple solution for your question, but use the following code.
First, define a function count(x, a) which returns nan if x includes nan, the number of occurence of a in x, otherwise.
The function will be used for the apply function.
Then, use groupby and apply list function.
temp = df.copy().groupby('Column A')['Column B'].apply(list)
After that, temp becomes
Column A
12 [Apple, Apple, Orange]
13 [Apple]
15 [nan]
16 [Orange]
141 [Apple, nan, Apple, Apple]
Name: Column B, dtype: object
So, based on the temp, we can count the number of apples and oranges.
Since df has duplicates, I removed them and add new columns (Column C, D, and E).
df.drop_duplicates(subset = ['Column A'], keep = "first", inplace = True)
df['Column C'] = temp.apply(count, a = "Orange").values
df['Column D'] = temp.apply(count, a = "Apple").values
df['Column E'] = df['Column D'].apply(lambda x:1 if x>=1 else 0)
Edit
I am sorry. I missed the function count..
def count(x, a):
if type(x[0]) == float:
return np.nan
else:
return x.count(a)
I need to select rows from a column (A) if it's not nan else from a another column (B), how to do this in Python?
The use case is that I'm inserting a new column C that will contain the result of an operation (func) on values from column A. Sometimes the values in A are nan, and in these cases I want to calculate the value in C from B, an example of the result would be this:
| A | B | C |
| bla | bla2 | func(bla) | #read from A
| nan | bla3 | func(bla3) | #read from B
What you want is combine_first:
df['C'] = df['A'].combine_first(df['B']).apply(func)
No explicit iteration required here...
This code should solve your problem.
df['C'] = [x['B'] if x['A'] == 'nan' else x['A'] for x in df.iterrows()]
basically you create a new list by iterating all rows and selecting the correct one, then you add it to de df.
By applying a function to the dataframe that checks for nan before it returns, you have the ability to switch to another value in the same row. It's important to give axis=1 if you're using apply on a dataframe (all columns) rather than a series (one column)
import pandas as pd
df = pd.DataFrame({'A':[1,2,-2],'B':[1,2,2]})
def fn(x):
c = np.sqrt(x['A'])
if np.isnan(c):
c = np.sqrt(x['B'])
return c
df['C'] = df.apply(fn,axis=1)
I have a dataframe of the form:
A| B| C | D
a| x| r | 1
a| x| s | 2
a| y| r | 1
b| w| t | 4
b| z| v | 2
I'd like to be able to return something like (showing unique values and frequency)
A| freq of most common value in Column B |maximum of column D based on the most common value in Column B | most common value in Column B
a 2 2 x
b 1 4 w
at the moment i can calculate the everything but the 3 column of the result dataframe quiet fast via
df = (df.groupby('A', sort=False)['B']
.apply(lambda x: x.value_counts().head(1))
.reset_index()
but to calculate the 2 Column ("maximum of column D based on the most common value in Column B") i have writen a for-loop witch is slow for a lot of data.
Is there a fast way?
The question is linked to: Count values in dataframe based on entry
Use merge with get rows by maximum D per groups by DataFrameGroupBy.idxmax:
df1 = (df.groupby('A', sort=False)['B']
.apply(lambda x: x.value_counts().head(1))
.reset_index()
.rename(columns={'level_1':'E'}))
#print (df1)
df = df1.merge(df, left_on=['A','E'], right_on=['A','B'], suffixes=('','_'))
df = df.loc[df.groupby('A')['D'].idxmax(), ['A','B','D','E']]
print (df)
A B D E
1 a 2 2 x
2 b 1 4 w
Consider doing this in 3 steps:
find most common B (as in your code):
df2 = (df.groupby('A', sort=False)['B']).apply(lambda x: x.value_counts().head(1)).reset_index()
build DataFrame with max D for each combination of A and B
df3 = df.groupby(['A','B']).agg({'D':max}).reset_index()
merge 2 DataFrames to find max Ds matching the A-B pairs selected earlier
df2.merge(df3, left_on=['A','level_1'], right_on=['A','B'])
The column D in the resulting DataFrame will be what you need
A level_1 B_x B_y D
0 a x 2 x 2
1 b w 1 w 4
I have a list of items that I loop to find values example: dfexample1
#List of documents:
list1 = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']
#each document contains a DF with the same Test columns with the same Tets letters:
doc1.pdf: df1
Test | Value
a | 5.5
b | 6.5
c | 8.5
doc2.pdf: df2
Test | Value
a | 6.5
b | 11.5
c | 13.5
doc3.pdf: df3
Test | Value
a | 12.5
b | 3.5
c | 9.5
I have a df with the Test values being the columns
a | b | c
in each loop I am trying to extract the value of each test and ad them to the df.
in the first example the same analysis will be repeated over a list of pdf documents so I need to extract each of those values and add them into de df.
I tried this:
for items in list:
for index, row in dfexample1.iterrows():
if item in row[0]:
value1=row[1]
Verifying if value exists as column in the df:
if items in df.columns:
#How can I insert this value in the next row?
My expected result would be:
a | b | c
5.5|6.5 |8.5
6.5|11.5|13.5
12.5| 3.5|9.5
.T would not work I guess because the dfexample1 will be repeated with different values while looping across documents.