pandas: Unable to write values to single row dataframe - python

I have a single row dataframe(df) on which I want to insert value for every column using only the index numbers.
The dataframe df is in following form.
a b c
1 0 0 0
2 0 0 0
3 0 0 0
df.iloc[[0],[1]] = predictions[:1]
This gives me the following warning and does not write anything to the row:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
However when I try using
pred_row.iloc[0,1] = predictions[:1]
It gives me error
ValueError: Incompatible indexer with Series
Is there a way to write value to single row dataframe.
Predictions is any random value that I am trying to set in a particular cell of df

For set one element of Series to DataFrame change selecting to predictions[0]:
print (df)
a b c
1 0 0 0
2 0 0 0
3 0 0 0
predictions = pd.Series([1,2,3])
print (predictions)
0 1
1 2
2 3
dtype: int64
df.iloc[0, 1] = predictions[0]
#more general for set one element of Series by position
#df.iloc[0, 1] = predictions.iat[0]
print (df)
a b c
1 0 1 0
2 0 0 0
3 0 0 0
Details:
#scalar
print (predictions[0])
1
#one element Series
print (predictions[:1])
0 1
dtype: int64
Also working convert one element Series to one element array, but set by scalar is simplier:
df.iloc[0, 1] = predictions[:1].values
print (df)
a b c
1 0 1 0
2 0 0 0
3 0 0 0
print (predictions[:1].values)
[1]

Related

pandas: replace values in column with the last character in the column name

I have a dataframe as follows:
import pandas as pd
df = pd.DataFrame({'sent.1':[0,1,0,1],
'sent.2':[0,1,1,0],
'sent.3':[0,0,0,1],
'sent.4':[1,1,0,1]
})
I am trying to replace the non-zero values with the 5th character in the column names (which is the numeric part of the column names), so the output should be,
sent.1 sent.2 sent.3 sent.4
0 0 0 0 4
1 1 2 0 4
2 0 2 0 0
3 1 0 3 4
I have tried the following but it does not work,
print(df.replace(1, pd.Series([i[5] for i in df.columns], [i[5] for i in df.columns])))
However when I replace it with column name, the above code works, so I am not sure which part is wrong.
print(df.replace(1, pd.Series(df.columns, df.columns)))
Since you're dealing with 1's and 0's, you can actually just use multiply the dataframe by a range:
df = df * range(1, df.shape[1] + 1)
Output:
sent.1 sent.2 sent.3 sent.4
0 0 0 0 4
1 1 2 0 4
2 0 2 0 0
3 1 0 3 4
Or, if you want to take the numbers from the column names:
df = df * df.columns.str.split('.').str[-1].astype(int)
you could use string multiplication on a boolean array to place the strings based on the condition, and where to restore the zeros:
mask = df.ne(0)
(mask*df.columns.str[5]).where(mask, 0)
To have integers:
mask = df.ne(0)
(mask*df.columns.str[5].astype(int))
output:
sent.1 sent.2 sent.3 sent.4
0 0 0 0 4
1 1 2 0 4
2 0 2 0 0
3 1 0 3 4
And another one, working with an arbitrary condition (here s.ne(0)):
df.apply(lambda s: s.mask(s.ne(0), s.name.rpartition('.')[-1]))

Python Selecting and Adding row values of columns in dataframe to create an aggregated dataframe

I need to process my dataframe in Python such that I add the numeric values of numeric columns that lie between 2 rows of the dataframe.
The dataframe can be created using
df = pd.DataFrame(np.array([['a',0,1,0,0,0,0,'i'],
['b',1,0,0,0,0,0,'j'],
['c',0,0,1,0,0,0,'k'],
['None',0,0,0,1,0,0,'l'],
['e',0,0,0,0,1,0,'m'],
['f',0,1,0,0,0,0,'n'],
['None',0,0,0,1,0,0,'o'],
['h',0,0,0,0,1,0,'p']]),
columns=[0,1,2,3,4,5,6,7],
index=[0,1,2,3,4,5,6,7])
I need to add all rows that occur before the 'None' entries and move the aggregated row to a new dataframe that should look like:
Your data frame dtype is mess up , cause you are using the array to assign the value , since one array only accpet one type , so it push all int to become string , we need convert it firstly
df=df.apply(pd.to_numeric,errors ='ignore')# convert
df['newkey']=df[0].eq('None').cumsum()# using cumsum create the key
df.loc[df[0].ne('None'),:].groupby('newkey').agg(lambda x : x.sum() if x.dtype=='int64' else x.head(1))# then we agg
Out[742]:
0 1 2 3 4 5 6 7
newkey
0 a 1 1 1 0 0 0 i
1 e 0 1 0 0 1 0 m
2 h 0 0 0 0 1 0 p
You can also specify the agg funcs
s = lambda s: sum(int(k) for k in s)
d = {i: s for i in range(8)}
d.update({0: 'first', 7: 'first'})
df.groupby((df[0] == 'None').cumsum().shift().fillna(0)).agg(d)
0 1 2 3 4 5 6 7
0
0.0 a 1 1 1 1 0 0 i
1.0 e 0 1 0 1 1 0 m
2.0 h 0 0 0 0 1 0 p

Create new column with max value with groupby

From the following dataframe, I am trying to add a new column, with the condition that for every id check the maximium value. Then place the maximium value for each row of every id in the new column.
df
id value
1 0
1 0
1 0
2 0
2 1
3 1
3 1
Expected result:
id value new_column
1 0 0
1 0 0
1 0 0
2 0 1
2 1 1
3 1 1
3 1 1
I have tried:
df['new_column'] = df.groupby(['id'])['value'].idxmax()
Or:
df['new_column'] = df.groupby(['id'])['value'].max()
But neither of these give the desired result.
You need to use transform for this:
df['new_column'] = df.groupby(['id'])['value'].transform('max')
This more succinctly replicates the following:
df['new_column'] = df['id'].map(df.groupby(['id'])['value'].max())
Remember that the result of a groupby operation is a series with index set to grouper column(s).
Since indices are not aligned between your original dataframe and the groupby object, assignment will not happen automatically.

Mode of each digit in binary strings of a pandas column

I would like to find the mode value of each digit in binary strings of a pandas column. Suppose I have the following data
df = pd.DataFrame({'categories':['A','B','C'],'values':['001','110','111']})
so my data look like this
categories values
A 001
B 110
C 111
If we consider the column "values" at the first digit (0, 1, 1) of A, B, and C respectively, the mode value is 1. If we do the same for other digits, my expected output should be 111.
I can find a mode value of a particular column. If I split each bit into a new column and find the mode value. I could get the expected output by concatenation later. However, when the data has much more columns of binary strings, I'm not sure whether this method still be a good way to do. I'm looking for the more elegant method do this. May I have your suggestion?
I think you can use apply with Series and list for convert digits to columns and then mode:
print (df['values'].apply(lambda x: pd.Series(list(x))))
0 1 2
0 0 0 1
1 1 1 0
2 1 1 1
df1 = df['values'].apply(lambda x: pd.Series(list(x))).mode()
print (df1)
0 1 2
0 1 1 1
Last select row, create list and join:
print (''.join(df1.iloc[0].tolist()))
111
Another possible solution with list comprehension:
df = pd.DataFrame([list(x) for x in df['values']])
print (df)
0 1 2
0 0 0 1
1 1 1 0
2 1 1 1
If output is DataFrame is possible use apply join:
df = pd.DataFrame({'categories':['A','B','C', 'D'],'values':['001','110','111', '000']})
print (df)
categories values
0 A 001
1 B 110
2 C 111
3 D 000
print (pd.DataFrame([list(x) for x in df['values']]).mode())
0 1 2
0 0 0 0
1 1 1 1
df1 = pd.DataFrame([list(x) for x in df['values']]).mode().apply(''.join, axis=1)
print (df1)
0 000
1 111
dtype: object

Using two different data frames to compute new variable

I have two dataframes of the same dimensions that look like:
df1
ID flag
0 1
1 0
2 1
df2
ID flag
0 0
1 1
2 0
In both dataframes I want to create a new variable that denotes an additive flag. So the new variable will look like this:
df1
ID flag new_flag
0 1 1
1 0 1
2 1 1
df2
ID flag new_flag
0 0 1
1 1 1
2 0 1
So if either flag columns is a 1 the new flag will be a 1.
I tried this code:
df1['new_flag']= 1
df2['new_flag']= 1
df1['new_flag'][(df1['flag']==0)&(df1['flag']==0)]=0
df2['new_flag'][(df2['flag']==0)&(df2['flag']==0)]=0
I would expect the same number of 1 in both new_flag but they differ. Is this because I'm not going row by row? Like this question?
pandas create new column based on values from other columns
If so how do I include criteria from both datafrmes?
You can use np.logical_or to achieve this, if we set df1 to be all 0's except for the last row so we don't just get a column of 1's, we can cast the result of np.logical_or using astype(int) to convert the boolean array to 1 and 0:
In [108]:
df1['new_flag'] = np.logical_or(df1['flag'], df2['flag']).astype(int)
df2['new_flag'] = np.logical_or(df1['flag'], df2['flag']).astype(int)
df1
Out[108]:
ID flag new_flag
0 0 0 0
1 1 0 1
2 2 1 1
In [109]:
df2
Out[109]:
ID flag new_flag
0 0 0 0
1 1 1 1
2 2 0 1

Categories