I am using pandas with python.
I have a column in which the first value is zero.
There are other zeros as well in the column but i don't want to delete them as well.
I want to delete this cell and move the column up by 1 position.
If it is easy i can make the first Zero as an empty cell and then delete but i cant find anything just to delete a specific cell and move the rest of the column up.
SO far i have tried help from existing stack overflow and quora plus github etc but i cant see anything i am looking for.
I believe you need shift first and then replace last NaN value:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
If no NaNs only use fillna for replace NaN:
df['A'] = df['A'].shift(-1).fillna('AAA')
print (df)
A B C D E F
0 b 4 7 1 5 a
1 c 5 8 3 3 a
2 d 4 9 5 6 a
3 e 5 4 7 9 b
4 f 5 2 1 2 b
5 AAA 4 3 0 4 b
If possible some NaNs in column then set last value by iloc, get_loc function return position of column A:
df['A'] = df['A'].shift(-1)
df.iloc[-1, df.columns.get_loc('A')] = 'AAA'
print (df)
A B C D E F
0 b 4 7 1 5 a
1 c 5 8 3 3 a
2 d 4 9 5 6 a
3 e 5 4 7 9 b
4 f 5 2 1 2 b
5 AAA 4 3 0 4 b
Related
I am new in Python and try to replace rows.
I have a dataframe such as:
X
Y
1
a
2
d
3
c
4
a
5
b
6
e
7
a
8
b
I have two question:
1- How can I replace 2nd row with 5th, such as:
X
Y
1
a
5
b
3
c
4
a
2
d
6
e
7
a
8
b
2- How can I put 6th row above 3rd row, such as:
X
Y
1
a
2
d
6
e
3
c
4
a
5
b
7
a
8
b
First use DataFrame.iloc, python counts from 0, so for select second row use 1 and for fifth use 4:
df.iloc[[1, 4]] = df.iloc[[4, 1]]
print (df)
X Y
0 1 a
1 5 b
2 3 c
3 4 a
4 2 d
5 6 e
6 7 a
7 8 b
And then rename indices for above value, here 1 and sorting with only stable sorting mergesort:
df = df.rename({5:1}).sort_index(kind='mergesort', ignore_index=True)
print (df)
X Y
0 1 a
1 2 d
2 6 e
3 3 c
4 4 a
5 5 b
6 7 a
7 8 b
I would like to obtain the 'Value' column below, from the original df:
A B C Column_To_Use
0 2 3 4 A
1 5 6 7 C
2 8 0 9 B
A B C Column_To_Use Value
0 2 3 4 A 2
1 5 6 7 C 7
2 8 0 9 B 0
Use DataFrame.lookup:
df['Value'] = df.lookup(df.index, df['Column_To_Use'])
print (df)
A B C Column_To_Use Value
0 2 3 4 A 2
1 5 6 7 C 7
2 8 0 9 B 0
Given the following DataFrame:
>>> pd.DataFrame(data=[['a',1],['a',2],['b',3],['b',4],['c',5],['c',6],['d',7],['d',8],['d',9],['e',10]],columns=['key','value'])
key value
0 a 1
1 a 2
2 b 3
3 b 4
4 c 5
5 c 6
6 d 7
7 d 8
8 d 9
9 e 10
I'm looking for a method that will change the structure based on the key value, like so:
a b c d e
0 1 3 5 7 10
1 2 4 6 8 10 <- 10 is duplicated
2 2 4 6 9 10 <- 10 is duplicated
The result row number is as the longest group count (d in the above example) and the missing values are duplicates of the last available value.
Create MultiIndex by set_index with counter column by cumcount, reshape by unstack, repalce missing values by last non missing ones with ffill and last converting all data to integers if necessary:
df = df.set_index([df.groupby('key').cumcount(),'key'])['value'].unstack().ffill().astype(int)
Another solution with custom lambda function:
df = (df.groupby('key')['value']
.apply(lambda x: pd.Series(x.values))
.unstack(0)
.ffill()
.astype(int))
print (df)
key a b c d e
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10
Using pivot , with groupby + cumcount
df.assign(key2=df.groupby('key').cumcount()).pivot('key2','key','value').ffill().astype(int)
Out[214]:
key a b c d e
key2
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10
I am trying to update last column value for all the rows in the csv file using Pandas. but while updating the value, other column value are dropping.
file = r'Test.csv'
# Read the file
df = pd.read_csv(file, error_bad_lines=False)
# df.at[3, "ingestion"] = '20'
df.set_value(1, "ingestion", '30')
df.to_csv("Test.csv", index=False, sep='|')
Use DataFrame.iloc with -1 for select last column and : for select all rows:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df.iloc[:, -1] = '20'
print (df)
A B C D E F
0 a 4 7 1 5 20
1 b 5 8 3 3 20
2 c 4 9 5 6 20
3 d 5 4 7 9 20
4 e 5 2 1 2 20
5 f 4 3 0 4 20
EDIT:
For update all rows by last edit value swap -1 with : and get last column value by DataFrame.iat:
df.iloc[-1, :] = df.iat[-1, -1]
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 b b b b b b
pd.DataFrame.set_value is not appropriate for setting all the values in a column. As per the docs, it is used to setting a scalar at a specific row and column label combination.
Moreover, since v0.21, it has been deprecated in favour of .at / .iat accessors.
Instead, you can set the value directly by extracting the final column label, assuming you have no duplicate column names:
df[df.columns[-1]] = '20'
Or, more directly, you can use the iloc accessor:
df.iloc[:, -1] = '20'
In sql, select a.*,count(a.id) as N from table a group by a.name would give me a new column 'N'containing the count as per my group by specification.
However in pandas, if I try df['name'].value_counts(), I get the count but not as a column in the original dataframe.
Is there a way to get the count as a column in the original dataframe in a single step/statement?
It seems you need groupby + transform function size:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'name':list('aaabcc')})
print (df)
A B C D E name
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 c
5 f 4 3 0 4 c
df['new'] = df.groupby('name')['name'].transform('size')
print (df)
A B C D E name new
0 a 4 7 1 5 a 3
1 b 5 8 3 3 a 3
2 c 4 9 5 6 a 3
3 d 5 4 7 9 b 1
4 e 5 2 1 2 c 2
5 f 4 3 0 4 c 2
What is the difference between size and count in pandas?