I have a dataset in pandas with column pid (patient id), and code (drug code), sorted in rows as the example shows. I need to convert them to 1 patient/row, and list all the drugs as attributes for each patient.
What I have now:
pid code
1 Az
1 Bn
2 Az
2 Bn
2 C4
3 Bn
3 C4
3 Dx
4 Az
4 Bn
4 Dx
4 E
5 C4
5 Dx
5 E
I need to convert it to:
pid Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
IIUC crosstab
pd.crosstab(df.pid,df.code).replace({1:'y',0:'n'})
Out[231]:
code Az Bn C4 Dx E
pid
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
One way is to pivot your dataframe
new_df = df.assign(values='y').pivot(index='pid', columns='code', values='values').replace({None:'n'})
>>> new_df
code Az Bn C4 Dx E
pid
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
Having fun!
Fun 1
Create a Series with a MultiIndex and unstack
pd.Series('y', df.values.T.tolist()).unstack(fill_value='n')
Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
Fun 2
Use defaultdict
d = defaultdict(dict)
for i, p, c in df.itertuples():
d[c][p] = 'y'
pd.DataFrame(d).fillna('n')
Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
Fun 3
i, r = pd.factorize(df.pid)
j, c = pd.factorize(df.code)
e = np.empty((len(r), len(c)), str)
e.fill('n')
e[i, j] = 'y'
pd.DataFrame(e, r, c)
Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
Related
Hey guys I hope you're doing well.
The problem I have is that my loop is not well defined, therefore the condition that I give it is not met. The print of the DataFrame that I implemented inside the while loop is not performed when the condition is not met.
This is the code I have so far. By implementing the while loop it stopped returning me the modified dataframe. As I said before, the loop is poorly constructed.
Dataframe content:
1 2 3 4 5 6 7 8 9
A 5 3 X X 7 X X X X
B 6 X X 1 9 5 X X X
C X 9 8 X X X X 6 X
D 8 X X X 6 X X X 3
E 4 X X 8 X 3 X X 1
F 7 X X X 2 X X X 6
G X 6 X X X X 2 8 X
H X X X 4 1 9 X X 5
I X X X X 8 X X 7 9
Code:
import pandas as pd
def modifyDF():
T = pd.read_fwf('file', header= None, names=['1','2','3','4','5','6','7','8','9'])
T = T.rename(index={0:'A',1:'B',2:'C',3:'D',4:'E',5:'F',6:'G',7:'H',8:'I'})
df = pd.DataFrame(T)
print(T,'\n')
x= input('row: ')
y= input('column: ')
v= input('value: ')
while 'X' in df:
f = df.loc[x,y]= v
print(f)
while 'X' not in df:
break
modifyDF()
Expected OUTPUT:
1 2 3 4 5 6 7 8 9
A 5 3 X X 7 X X X X
B 6 X X 1 9 5 X X X
C X 9 8 X X X X 6 X
D 8 X X X 6 X X X 3
E 4 X X 8 X 3 X X 1
F 7 X X X 2 X X X 6
G X 6 X X X X 2 8 X
H X X X 4 1 9 X X 5
I X X X X 8 X X 7 9
row: D #For example
column: 2 #For example
value: 1 #For example
#The modified dataframe:
1 2 3 4 5 6 7 8 9
A 5 3 X X 7 X X X X
B 6 X X 1 9 5 X X X
C X 9 8 X X X X 6 X
D 8 1 X X 6 X X X 3
E 4 X X 8 X 3 X X 1
F 7 X X X 2 X X X 6
G X 6 X X X X 2 8 X
H X X X 4 1 9 X X 5
I X X X X 8 X X 7 9
#The goal would be for this to run like a loop until there are no 'X' left in the dataframe.
I really appreciate your help :)
Generally speaking, you'd better not loop through a pandas DataFrame, but use more pythonic methods. In this case, you need to move your while loop a bit higher in your code, before the input statements, so your function would become:
def modifyDF():
T = pd.read_fwf('file', header=None, names=['1','2','3','4','5','6','7','8','9'])
T = T.rename(index={0:'A',1:'B',2:'C',3:'D',4:'E',5:'F',6:'G',7:'H',8:'I'})
df = pd.DataFrame(T)
print(T,'\n')
while df.isin(['X']).any().any():
x = input('row: ')
y = input('column: ')
v = input('value: ')
df.loc[x,y] = v
f = v
print(f)
Also remember that f = df.loc[x,y]= v is wrong in Python.
Comp sci student here,
Very lost on how to add those X's on a multiplication table like the added photo. https://i.stack.imgur.com/cdHoZ.png
How on earth would I add those X's while also using functions? Here's my code if this helps:
for i in range(1,11):
for j in range(1,11):
print(i * j, end='\t')
print('')
The rule for the X is i>3 and j>2 and i*j != 81
for i in range(1, 10):
for j in range(1, 10):
if i > 3 and j > 2 and i * j != 81:
print('X', end='\t')
else:
print(i * j, end='\t')
print()
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 X X X X X X X
5 10 X X X X X X X
6 12 X X X X X X X
7 14 X X X X X X X
8 16 X X X X X X X
9 18 X X X X X X 81
I have the following data frames:
A.
k m n
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
5 x x x
6 x x x
7 x x x
8 x x x
9 x x x
B1.
l i j
1 x 46 x
2 x 64 x
3 x 83 x
9 x 70 x
B2.
l i j
0 x 23 x
4 x 34 x
6 x 54 x
8 x 32 x
B3.
l i j
0 x 11 x
5 x 98 x
7 x 94 x
9 x 80 x
How can I add the column "i" (from data frames B1, B2, and B3) to the data frame A?
Regarding the duplicate values (e.g. index 9 in B1 and B3 & index 0 in B2 and B3), I want to keep the leftmost value from [B1, B2, B3] (e.g. 23 for index 0 & 70 for index 9).
A desired output would be:
k m n i
0 x x x 23
1 x x x 46
2 x x x 64
3 x x x 83
4 x x x 34
5 x x x 98
6 x x x 54
7 x x x 94
8 x x x 32
9 x x x 70
you can concat the Bx dataframes, and use duplicated on the index to remove the duplicated index and keep the first.
A['i'] = (pd.concat([B1, B2, B3])
.loc[lambda x: ~x.index.duplicated(keep='first'), 'i'])
print(A)
k m n i
0 x x x 23
1 x x x 46
2 x x x 64
3 x x x 83
4 x x x 34
5 x x x 98
6 x x x 54
7 x x x 94
8 x x x 32
9 x x x 70
I have a dataframe which has 10 different columns, A1, A2, ...,A10. These columns contain y or n. I'd like to create another column whose value is y if the majority of columns (A1, A2, ...,A10) are y and n otherwise. How can I do this?
Use DataFrame.mode:
df['majority'] = df.mode(axis=1)[0]
Example
np.random.seed(0)
df = pd.DataFrame(np.random.choice(['y', 'n'], size=(10, 10)))
print(df)
0 1 2 3 4 5 6 7 8 9
0 y n n y n n n n n n
1 n y y n y y y y y n
2 y n n y y n n n n y
3 n y n y n n y n n y
4 y n y n n n n n y n
5 y n n n n y n y y n
6 n y n y n y y y y y
7 n n y y y n n y n y
8 y n y n n n n n n y
9 n n y y n y y n n y
df['majority'] = df.mode(axis=1)[0]
print(df)
0 1 2 3 4 5 6 7 8 9 majority
0 y n n y n n n n n n n
1 n y y n y y y y y n y
2 y n n y y n n n n y n
3 n y n y n n y n n y n
4 y n y n n n n n y n n
5 y n n n n y n y y n n
6 n y n y n y y y y y y
7 n n y y y n n y n y n
8 y n y n n n n n n y n
9 n n y y n y y n n y n
If it is necessary to handle the distinction between true majority and split decisions, you could use numpy.where. eg:
mode = df.mode(axis=1)
df['majority'] = np.where(mode.isna().any(1), mode[0], 'split')
print(df)
0 1 2 3 4 5 6 7 8 9 majority
0 y n n y n n n n n n n
1 n y y n y y y y y n y
2 y n n y y n n n n y n
3 n y n y n n y n n y n
4 y n y n n n n n y n n
5 y n n n n y n y y n n
6 n y n y n y y y y y y
7 n n y y y n n y n y split
8 y n y n n n n n n y n
9 n n y y n y y n n y split
I am trying to create an empty data frame and filling the empty data frame with columns existing in another file. It works when i use this simple code.
InputData['Quote'] = store['QUOTE_ID']
but when i add some conditions before the code then it does not accept the conditions and gives same values as in store(original)file.
below is my code i am trying to use .
original data set
InputData = pd.read_csv('datalink')
creating empty data frame
OutputData=pd.DataFrame()
code with conditions
for i in xrange(len(InputData.index)):
if (i % 5000) == 0:
print i,
if ((InputData.ix[i,'WIN']=='Y') and ((InputData.ix[i,'COM_C']=='H') or (InputData.ix[i,'COM_C']=='S')) and(InputData.ix[i,'COM_L']!=0)):
OutputData['Quote']=InputData['QUOTE_ID']
OutputData['ComList']=InputData['COM_LISTPR']
OutputData['WIN']=1
OutputData['COM_C']=InputData['COM_C']
OutputData.to_csv(link,index=False)
original data set
QUOTE_ID WIN COM_C COM_L
1400453-IT N H 1.46E+05
1400453-IT N H 7.12E+04
1400453-IT N H 2.74E+04
1403796-IT Y S 3.11E+04
1400453-IT N M 3.12E+02
1403796-IT Y H 3.97E+04
1403796-IT Y H 3.97E+04
1403796-IT Y M 1.99E+02
1403796-IT Y M 1.99E+02
1403796-IT Y H 7.40E+04
1403796-IT Y H 7.40E+04
1403796-IT Y M 3.19E+02
1403796-IT Y M 3.19E+02
1403796-IT Y H 9.56E+04
expected data set
require only Y from InputData and replace to 1 if Y
Quote WIN COM_C COM_LISTPR
1403796-IT 1 S 3.11E+04
1403796-IT 1 H 3.97E+04
1403796-IT 1 H 3.97E+04
1403796-IT 1 H 7.40E+04
1403796-IT 1 H 7.40E+04
1403796-IT 1 H 9.56E+04
many thanks in advance
Python code -
import pandas as pd
df = pd.read_csv('a.csv', delim_whitespace=True) # reading file
modified_df = df[(df['WIN'] == 'Y') & ((df['COM_C'] == 'S') | (df['COM_C'] ==
'H')) &
(df['COM_L'] !=
0)].copy()
modified_df['WIN'] = 1
print(modified_df)
Output -
QUOTE_ID WIN COM_C COM_L
3 1403796-IT 1 S 31100
5 1403796-IT 1 H 39700
6 1403796-IT 1 H 39700
9 1403796-IT 1 H 74000
10 1403796-IT 1 H 74000
13 1403796-IT 1 H 95600