Merge multiple Series as a single column into a DataFrame - python

I have the following data frames:
A.
k m n
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
5 x x x
6 x x x
7 x x x
8 x x x
9 x x x
B1.
l i j
1 x 46 x
2 x 64 x
3 x 83 x
9 x 70 x
B2.
l i j
0 x 23 x
4 x 34 x
6 x 54 x
8 x 32 x
B3.
l i j
0 x 11 x
5 x 98 x
7 x 94 x
9 x 80 x
How can I add the column "i" (from data frames B1, B2, and B3) to the data frame A?
Regarding the duplicate values (e.g. index 9 in B1 and B3 & index 0 in B2 and B3), I want to keep the leftmost value from [B1, B2, B3] (e.g. 23 for index 0 & 70 for index 9).
A desired output would be:
k m n i
0 x x x 23
1 x x x 46
2 x x x 64
3 x x x 83
4 x x x 34
5 x x x 98
6 x x x 54
7 x x x 94
8 x x x 32
9 x x x 70

you can concat the Bx dataframes, and use duplicated on the index to remove the duplicated index and keep the first.
A['i'] = (pd.concat([B1, B2, B3])
.loc[lambda x: ~x.index.duplicated(keep='first'), 'i'])
print(A)
k m n i
0 x x x 23
1 x x x 46
2 x x x 64
3 x x x 83
4 x x x 34
5 x x x 98
6 x x x 54
7 x x x 94
8 x x x 32
9 x x x 70

Related

Implement a loop that keeps the user entering the input until the condition is not met

Hey guys I hope you're doing well.
The problem I have is that my loop is not well defined, therefore the condition that I give it is not met. The print of the DataFrame that I implemented inside the while loop is not performed when the condition is not met.
This is the code I have so far. By implementing the while loop it stopped returning me the modified dataframe. As I said before, the loop is poorly constructed.
Dataframe content:
1 2 3 4 5 6 7 8 9
A 5 3 X X 7 X X X X
B 6 X X 1 9 5 X X X
C X 9 8 X X X X 6 X
D 8 X X X 6 X X X 3
E 4 X X 8 X 3 X X 1
F 7 X X X 2 X X X 6
G X 6 X X X X 2 8 X
H X X X 4 1 9 X X 5
I X X X X 8 X X 7 9
Code:
import pandas as pd
def modifyDF():
T = pd.read_fwf('file', header= None, names=['1','2','3','4','5','6','7','8','9'])
T = T.rename(index={0:'A',1:'B',2:'C',3:'D',4:'E',5:'F',6:'G',7:'H',8:'I'})
df = pd.DataFrame(T)
print(T,'\n')
x= input('row: ')
y= input('column: ')
v= input('value: ')
while 'X' in df:
f = df.loc[x,y]= v
print(f)
while 'X' not in df:
break
modifyDF()
Expected OUTPUT:
1 2 3 4 5 6 7 8 9
A 5 3 X X 7 X X X X
B 6 X X 1 9 5 X X X
C X 9 8 X X X X 6 X
D 8 X X X 6 X X X 3
E 4 X X 8 X 3 X X 1
F 7 X X X 2 X X X 6
G X 6 X X X X 2 8 X
H X X X 4 1 9 X X 5
I X X X X 8 X X 7 9
row: D #For example
column: 2 #For example
value: 1 #For example
#The modified dataframe:
1 2 3 4 5 6 7 8 9
A 5 3 X X 7 X X X X
B 6 X X 1 9 5 X X X
C X 9 8 X X X X 6 X
D 8 1 X X 6 X X X 3
E 4 X X 8 X 3 X X 1
F 7 X X X 2 X X X 6
G X 6 X X X X 2 8 X
H X X X 4 1 9 X X 5
I X X X X 8 X X 7 9
#The goal would be for this to run like a loop until there are no 'X' left in the dataframe.
I really appreciate your help :)
Generally speaking, you'd better not loop through a pandas DataFrame, but use more pythonic methods. In this case, you need to move your while loop a bit higher in your code, before the input statements, so your function would become:
def modifyDF():
T = pd.read_fwf('file', header=None, names=['1','2','3','4','5','6','7','8','9'])
T = T.rename(index={0:'A',1:'B',2:'C',3:'D',4:'E',5:'F',6:'G',7:'H',8:'I'})
df = pd.DataFrame(T)
print(T,'\n')
while df.isin(['X']).any().any():
x = input('row: ')
y = input('column: ')
v = input('value: ')
df.loc[x,y] = v
f = v
print(f)
Also remember that f = df.loc[x,y]= v is wrong in Python.

Multiplication table with X's while using functions?

Comp sci student here,
Very lost on how to add those X's on a multiplication table like the added photo. https://i.stack.imgur.com/cdHoZ.png
How on earth would I add those X's while also using functions? Here's my code if this helps:
for i in range(1,11):
for j in range(1,11):
print(i * j, end='\t')
print('')
The rule for the X is i>3 and j>2 and i*j != 81
for i in range(1, 10):
for j in range(1, 10):
if i > 3 and j > 2 and i * j != 81:
print('X', end='\t')
else:
print(i * j, end='\t')
print()
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 X X X X X X X
5 10 X X X X X X X
6 12 X X X X X X X
7 14 X X X X X X X
8 16 X X X X X X X
9 18 X X X X X X 81

create a matrix from columns and horizontal lines

How can I create a matrix by using rows and columns.
when I print the matrix the output should be like this:
O X X X X X X X X X X X X X X X
N X X X X X X X X X X X X X X X
M X X X X X X X X X X X X X X X
L X X X X X X X X X X X X X X X
K X X X X X X X X X X X X X X X
J X X X X X X X X X X X X X X X
I X X X X X X X X X X X X X X X
H X X X X X X X X X X X X X X X
G X X X X X X X X X X X X X X X
F X X X X X X X X X X X X X X X
E X X X X X X X X X X X X X X X
D X X X X X X X X X X X X X X X
C X X X X X X X X X X X X X X X
B X X X X X X X X X X X X X X X
A X X X X X X X X X X X X X X X
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
I think I need to use a list in a dictionary and use matrix for For "X"s to be edited later.
hall_dictionary = {}
hall_dictionary["merhaba"] = []
rows = 10
columns = 15
x = [[hall_dictionary["merhaba"] for i in range(columns)] for j in range(rows)]
You can capsule the whole data-storage away into a class. It handles all the "book-keeping" and you simply use A to ... and 1 to ... to change the X.
Internally it uses a simple 1-dim list:
class Field:
def __init__(self, rows, cols, init_piece="x"):
self.rows = rows
self.cols = cols
self.field = [init_piece] * rows * cols
def place_at(self, row, col, piece):
"""Changes one tile on the field. Does all the reverse-engineering to compute
1-dim place of A..?,1..? given tuple of coords."""
def validation():
"""Raises error when out of bounds."""
error = []
if not (isinstance(row,str) and len(row) == 1 and row.isalpha()):
error.append("Use rows between A and {}".format(chr(ord("A") +
self.rows - 1)))
if not (0 < col <= self.cols):
error.append("Use columns between 1 and {}".format(self.cols))
if error:
error = ["Invalid row/column: {}/{}".format(row,col)] + error
raise ValueError('\n- '.join(error))
validation()
row = ord(row.upper()[0]) - ord("A")
self.field[row * self.cols + col - 1] = piece
def print_field(self):
"""Prints the playing field."""
for c in range(self.rows - 1,-1,-1):
ch = chr(ord("A") + c)
print("{:<4} ".format(ch), end = "")
print(("{:>2} " * self.cols).format(*self.field[c * self.cols:
(c + 1) * self.cols], sep = " "))
print("{:<4} ".format(""), end = "")
print(("{:>2} " * self.cols).format(*range(1,self.cols + 1)))
Then you can use it like so:
rows = 10
cols = 15
f = Field(rows,cols)
f.print_field()
# this uses A...? and 1...? to set things
for r,c in [(0,0),("A",1),("ZZ",99),("A",99),("J",15)]:
try:
f.place_at(r,c,"i") # set to 'i'
except ValueError as e:
print(e)
f.print_field()
Output (before):
J x x x x x x x x x x x x x x x
I x x x x x x x x x x x x x x x
H x x x x x x x x x x x x x x x
G x x x x x x x x x x x x x x x
F x x x x x x x x x x x x x x x
E x x x x x x x x x x x x x x x
D x x x x x x x x x x x x x x x
C x x x x x x x x x x x x x x x
B x x x x x x x x x x x x x x x
A x x x x x x x x x x x x x x x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Output (setting things && after):
Invalid row/column: 0/0
- Use rows between A and J
- Use columns between 1 and 15
Invalid row/column: ZZ/99
- Use rows between A and J
- Use columns between 1 and 15
Invalid row/column: A/99
- Use columns between 1 and 15
J x x x x x x x x x x x x x x i
I x x x x x x x x x x x x x x x
H x x x x x x x x x x x x x x x
G x x x x x x x x x x x x x x x
F x x x x x x x x x x x x x x x
E x x x x x x x x x x x x x x x
D x x x x x x x x x x x x x x x
C x x x x x x x x x x x x x x x
B x x x x x x x x x x x x x x x
A i x x x x x x x x x x x x x x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Sounds like a 2-D array (similiar to answer How to define a two-dimensional array in Python), so something like this:
vertical = list(string.ascii_uppercase[0:15][::-1]) # ['O', 'N', ..., 'A']
columns = 15
hall_dictionary = {}
hall_dictionary["merhaba"] = [[x for x in range(columns)] for y in vertical]
for i in range(len(vertical)):
for j in range(columns):
hall_dictionary["merhaba"][i][j] = 'X'
Then you can index as desired:
hall_dictionary["merhaba"][0][1] # Always prints 'X'
display the entire array:
for row in hall_dictionary["merhaba"]:
print(row)
['X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X']
... 15 rows ...
and assign, update new values:
hall_dictionary["merhaba"][0][2] = 'O'
for row in hall_dictionary["merhaba"]:
print(row)
['X', 'X', 'O', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X']
...
confirms element [0][2] has been updated.
If you're interested, you can use a pandas DataFrame as well.
import pandas as pd
rows = 10
columns = 15
def indexToLetter(index:int): # This function might be a bit too verbose
if index == 0: # but all this does is convert an
return 'A' # integer index [0, ∞) to an
# alphabetical index [A..Z, AA..ZZ, AAA...]
ret = ''
while index > 0:
length = len(ret)
letter = chr(ord('A') + index % 26 - [0, 1][length >= 1])
ret = letter + ret
index //= 26
return ret
# create the row labels
rLabels = [*map(indexToLetter, range(rows))]
# create the dataframe, note that we can simplify
# [['X' for i in range(columns)] for j in range(rows)]
# to [['X'] * columns] * rows
df = pd.DataFrame([['X'] * columns] * rows, index=rLabels)
print(df)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
A X X X X X X X X X X X X X X X
B X X X X X X X X X X X X X X X
C X X X X X X X X X X X X X X X
D X X X X X X X X X X X X X X X
E X X X X X X X X X X X X X X X
F X X X X X X X X X X X X X X X
G X X X X X X X X X X X X X X X
H X X X X X X X X X X X X X X X
I X X X X X X X X X X X X X X X
J X X X X X X X X X X X X X X X
The output looks slightly ugly, and might not be what you're looking for. But with a dataframe, it's very convenient to manipulate a matrices and tables of data.
You can access it by specifying the column then the row (unlike some other solutions).
df[1][0] = 'O'
df[1][2] = 'O'
df[1][3] = 'O'
print(df)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
A X O X X X X X X X X X X X X X
B X X X X X X X X X X X X X X X
C X O X X X X X X X X X X X X X
D X O X X X X X X X X X X X X X
E X X X X X X X X X X X X X X X
F X X X X X X X X X X X X X X X
G X X X X X X X X X X X X X X X
H X X X X X X X X X X X X X X X
I X X X X X X X X X X X X X X X
J X X X X X X X X X X X X X X X
Say, someone wants to book the entire row 'E' of the hall.
if any(df.loc['E'] == 'O'): # check if any seats were taken
print('Error: some seats in Row <E> are taken.')
else:
df.loc['E'] = 'O'
print(df)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
A X O X X X X X X X X X X X X X
B X X X X X X X X X X X X X X X
C X O X X X X X X X X X X X X X
D X O X X X X X X X X X X X X X
E O O O O O O O O O O O O O O O
F X X X X X X X X X X X X X X X
G X X X X X X X X X X X X X X X
H X X X X X X X X X X X X X X X
I X X X X X X X X X X X X X X X
J X X X X X X X X X X X X X X X
Note: you can also do df.iloc[4] to access row E. Want to access rows B to E? Use df.loc['B':'E'] or df.iloc[1:5].
You can also do the same with columns by accessing df[<column_index>] = 'O'.

How to convert row values to attributes (columns) in pandas

I have a dataset in pandas with column pid (patient id), and code (drug code), sorted in rows as the example shows. I need to convert them to 1 patient/row, and list all the drugs as attributes for each patient.
What I have now:
pid code
1 Az
1 Bn
2 Az
2 Bn
2 C4
3 Bn
3 C4
3 Dx
4 Az
4 Bn
4 Dx
4 E
5 C4
5 Dx
5 E
I need to convert it to:
pid Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
IIUC crosstab
pd.crosstab(df.pid,df.code).replace({1:'y',0:'n'})
Out[231]:
code Az Bn C4 Dx E
pid
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
One way is to pivot your dataframe
new_df = df.assign(values='y').pivot(index='pid', columns='code', values='values').replace({None:'n'})
>>> new_df
code Az Bn C4 Dx E
pid
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
Having fun!
Fun 1
Create a Series with a MultiIndex and unstack
pd.Series('y', df.values.T.tolist()).unstack(fill_value='n')
Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
Fun 2
Use defaultdict
d = defaultdict(dict)
for i, p, c in df.itertuples():
d[c][p] = 'y'
pd.DataFrame(d).fillna('n')
Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y
Fun 3
i, r = pd.factorize(df.pid)
j, c = pd.factorize(df.code)
e = np.empty((len(r), len(c)), str)
e.fill('n')
e[i, j] = 'y'
pd.DataFrame(e, r, c)
Az Bn C4 Dx E
1 y y n n n
2 y y y n n
3 n y y y n
4 y y n y y
5 n n y y y

How to calculate an expression based on names in the second level of a mult-index column

Suppose I have a dataframe with a multiindex columns object where the first level defines some category and the second level defines a component of a formula. Consider the dataframe df
np.random.seed([3,1415])
mux = pd.MultiIndex.from_product([list('XYZ'), list('kap'), ])
df = pd.DataFrame(np.random.randint(1, 5, size=(2, 9)), columns=mux)
df
X Y Z
k a p k a p k a p
0 1 4 3 4 3 3 4 3 4
1 2 4 2 3 4 4 1 4 3
I want to calculate the the formula k * a ** p for each of X, Y, and Z
I could assign to a separate dataframe
x = df.X
x.eval('k * a ** p')
0 64
1 32
dtype: int64
But how do I get this for X, Y, and Z all at once.
The final result should look like:
X Y Z
0 64 108 324
1 32 768 64
1). One way would be groupby on level
In [1841]: df.groupby(level=0, axis=1).apply(lambda x: x[x.name].eval('k*a**p'))
Out[1841]:
X Y Z
0 64 108 324
1 32 768 64
2). Another, loop by levels.
In [1818]: pd.DataFrame({c: df[c].eval('k*a**p') for c in df.columns.levels[0]})
Out[1818]:
X Y Z
0 64 108 324
1 32 768 64
Solution without eval:
d = {c: df[c].assign(A=lambda x: x.k*x.a**x.p)['A'] for c in df.columns.levels[0]}
df1 = pd.DataFrame(d)
print (df1)
X Y Z
0 64 108 324
1 32 768 64
Option 1
df.stack(0).eval('k * a ** p').unstack()
X Y Z
0 64 108 324
1 32 768 64
Option 2
df.swaplevel(0, 1, 1).pipe(lambda d: d.k * d.a ** d.p)
X Y Z
0 64 108 324
1 32 768 64
A bit ugly, but involves sorting the columns and then calling .mul and .pow.
df2 = df.sort_index(level=[0, 1], axis=1)
v = df2.loc[:, (slice(None), 'a')]\
.pow(df2.loc[:, (slice(None), 'p')].values, 1)
out = df2.loc[:, (slice(None), 'k')].mul(v.values, 1)
print(out)
X Y Z
k k k
0 64 108 324
1 32 768 64

Categories