Related
How can I replace pd intervals with integers
import pandas as pd
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)
output:
age age_bands
0 43 (40, 50]
1 76 (70, 80]
2 27 (20, 30]
3 8 (0, 10]
4 57 (50, 60]
5 32 (30, 40]
6 12 (10, 20]
7 22 (20, 30]
now I want to add another column to replace the bands with a single number (int). but I could not
for example this did not work :
df['age_code']= df['age_bands'].replace({'(40, 50]':4})
how can I get a column looks like this?
age_bands age_code
0 (40, 50] 4
1 (70, 80] 7
2 (20, 30] 2
3 (0, 10] 0
4 (50, 60] 5
5 (30, 40] 3
6 (10, 20] 1
7 (20, 30] 2
Assuming you want to the first digit from every interval, then, you can use pd.apply to achieve what you want as follows:
df["age_code"] = df["age_bands"].apply(lambda band: str(band)[1])
However, note this may not be very efficient for a large dataframe,
To convert the column values to int datatype, you can use pd.to_numeric,
df["age_code"] = pd.to_numeric(df['age_code'])
As the column contains pd.Interval objects, use its property left
df['age_code'] = df['age_bands'].apply(lambda interval: interval.left // 10)
You can do that by simply adding a second pd.cut and define labels argument.
import pandas as pd
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)
#This is the part of code you need to add
age_labels = [0, 1, 2, 3, 4, 5, 6, 7, 8]
df['age_code']= pd.cut(df['age'], bins=age_band, labels=age_labels, ordered=True)
>>> print(df)
You can create a dictionary of bins and map it to the age_bands column:
bins_sorted = sorted(pd.cut(df['age'], bins=age_band, ordered=True).unique())
bins_dict = {key: idx for idx, key in enumerate(bins_sorted)}
df['age_code'] = df.age_bands.map(bins_dict).astype(int)
I am a newbie to Python. I have an NxN matrix and I want to know the maximum value per each row. Next, I want to nullify(update as zero) all other values except this maximum value. If the row contains multiple maximum values, all those maximum values should be preserved.
Using DataFrame, I tried to get the maximum of each row.Then I tried to get indices of these max values. Code is given below.
matrix = [(22, 16, 23),
(12, 6, 43),
(24, 67, 11),
(87, 9,11),
(66, 36,66)
]
dfObj = pd.DataFrame(matrix, index=list('abcde'), columns=list('xyz'))
maxValuesObj = dfObj.max(axis=1)
maxValueIndexObj = dfObj.idxmax(axis=1)
The above code doesn't consider multiple maximum values. Only the first occurrence is returned.
Also,I am stuck with how to update the matrix accordingly. My expected output is:
matrix = [(0, 0, 23),
(0, 0, 43),
(0, 67, 0),
(87, 0,0),
(66, 0,66)
]
Can you please help me to sort out this?
Using df.where():
dfObj.where(dfObj.eq(dfObj.max(1),axis=0),0)
x y z
a 0 0 23
b 0 0 43
c 0 67 0
d 87 0 0
e 66 0 66
For an ND array instead of a dataframe , call .values after the above code:
dfObj.where(dfObj.eq(dfObj.max(1),axis=0),0).values
Or better is to_numpy():
dfObj.where(dfObj.eq(dfObj.max(1),axis=0),0).to_numpy()
Or np.where:
np.where(dfObj.eq(dfObj.max(1),axis=0),dfObj,0)
array([[ 0, 0, 23],
[ 0, 0, 43],
[ 0, 67, 0],
[87, 0, 0],
[66, 0, 66]], dtype=int64)
I'll show how to do it with a Python built-ins instead of Pandas, since you're new to Python and should know how to do it outside of Pandas (and the Pandas syntax isn't as clean).
matrix = [(22, 16, 23),
(12, 6, 43),
(24, 67, 11),
(87, 9,11),
(66, 36,66)
]
new_matrix = []
for row in matrix:
row_max = max(row)
new_row = tuple(element if element == row_max else 0 for element in row)
new_matrix.append(new_row)
You can do this with a short for loop pretty easily:
import numpy as np
matrix = np.array([(22, 16, 23), (12, 6, 43), (24, 67, 11), (87, 9,11), (66, 36,66)])
for i in range(0, len(matrix)):
matrix[i] = [x if x == max(matrix[i]) else 0 for x in matrix[i]]
print(matrix)
output:
[[ 0 0 23]
[ 0 0 43]
[ 0 67 0]
[87 0 0]
[66 0 66]]
I would also use numpy for matrices not pandas.
This isn't the most performant solution, but you can write a function for the row operation then apply it to each row:
def max_row(row):
row.loc[row != row.max()] = 0
return row
dfObj.apply(max_row, axis=1)
Out[17]:
x y z
a 0 0 23
b 0 0 43
c 0 67 0
d 87 0 0
e 66 0 66
This is a follow-up question from my last question:
Python3 Numpy np.where Error.
I have 2 lists like these:
x = [None,[1, 15, 175, 20],
[150, 175, 18, 20],
[150, 175, 18],
[192, 150, 177],...]
y = [None,[12, 43, 55, 231],
[243, 334, 44, 12],
[656, 145, 138],
[12, 150, 177],
[150, 177, 188],...]
I want to remove the x values lower than 30 and y values that correspond to the removed x values. (For example, (x,y) = (1,12) in x[1] and y[1])
In order to do that, I got the corrected x list:
In : [[v2 for v2 in v1 if v2>=30] for v1 in x[1:]]
Out: [[175], [150, 175], [150, 175], [192, 150, 177]]
I also got the coordinates of the remaining x values:
In : [(i,j) for i,v1 in enumerate(x[1:]) for j,v2 in enumerate(v1) if v2<30]
Out: [(0, 0), (0, 1), (0, 3), (1, 2), (1, 3), (2, 2)]
Now I want to use these coordinates to remove items from y.
How can I implement this?
To get the corrected y values, I would recommend bypassing the coordinates entirely as a first approach. The reason is that you may end up with empty lists along the way, which will throw off the shape of the output of you don't keep special track of them. Also, removing elements is generally much more awkward than not including them in the first place.
It would be much easier to make a corrected version of y in the same way you corrected x:
y_corr = [[n for m, n in zip(row_x, row_y) if m >= 30] for row_x, row_y in zip(x, y)]
Here we just used zip to step along both sets of lists in the same way you did with one.
If you absolutely insist on using the coordinates, I would recommend just copying y entirely and removing the elements from the corrected copy. You have to go backwards in each row to avoid shifting the meaning of the coordinates (e.g. with reversed). You could use itertools.groupby to do the actual iteration for each row:
y_corr = [row.copy() for row in y]
for r, group in groupby(reversed(coord), itemgetter(0)):
for c in map(itemgetter(1), group):
del y_corr[r][c]
Instead of reversing coord, you can reverse each group individually, e.g. with map(itemgetter(1), reversed(group)).
A better approach might be to compute the coordinates of the retained values instead of the discarded ones. I would recommend pre-allocating the output list, to help keep track of the empty lists and preserve the shape:
from itertools import groupby
from operator import itemgetter
coord = [(r, c) for r, row in enumerate(x) for c, n in enumerate(row) if n >= 30]
y_corr = [[]] * len(x)
for r, group in groupby(coord, itemgetter(0)):
y_corr[r] = [y[r][c] for c in map(itemgetter(1), group)]
If you don't care about preserving the empty rows, you can skip the loop and use a one-liner instead:
y_corr = [[y[r][c] for c in map(itemgetter(1), group)] for r, group in groupby(coord, itemgetter(0))]
new_y = []
for i in range(len(y)):
new_y.append([y[i][j] for j in range(len(y[i])) if (i,j) not in BadList])
where BadList is
[(i,j) for i,v1 in enumerate(x[1:]) for j,v2 in enumerate(v1) if v2<30]
You can get it using zip with
In [395]: [(a, b) for z in list(zip(x, y))[1:] for a, b in list(zip(*z)) if a >= 30]
Out[395]:
[(175, 55),
(150, 243),
(175, 334),
(150, 656),
(175, 145),
(192, 12),
(150, 150),
(177, 177)]
This is the equivalent of
In [396]: v = []
In [398]: for z in list(zip(x, y))[1:]:
...: for a, b in list(zip(*z)):
...: if a >= 30:
...: v.append((a,b))
...:
Where
In [388]: list(zip(x, y))[1:]
Out[388]:
[([1, 15, 175, 20], [12, 43, 55, 231]),
([150, 175, 18, 20], [243, 334, 44, 12]),
([150, 175, 18], [656, 145, 138]),
([192, 150, 177], [12, 150, 177])]
and
In [392]: list(zip(*list(zip(x, y))[1]))
Out[392]: [(1, 12), (15, 43), (175, 55), (20, 231)]
I have two arrays:
Array_a = [20, 30, 50, 20]
Array_b = [1 ,2 ,3 , 4]
would like to have the following output:
(20, '(1,Days Learning)')
(30, '(2,Days Learning)')
(50, '(3,Days Learning)')
(20, '(4,Days Learning)')
My code looks like the following:
for i,j in zip(Array_a, Array_b):
msg = (i, "(" + str(j) + ",Days Learning)")
print(msg)
but I would like to have it somehow easier like the way:
for a, b in []
Try this one:
msg = [(a, '({}, Days Learning)'.format(b)) for a, b in zip(Array_a, Array_b)]
print(msg)
Will output:
[(20, '(1, Days Learning)'), (30, '(2, Days Learning)'), (50, '(3, Days Learning)'), (20, '(4, Days Learning)')]
NOTE:
To print the elements line by line you can use print with join and another list-comprehension:
print('\n'.join(str(m) for m in msg))
How about this :
it looks more pythonic way to me
Array_a = [20, 30, 50, 20]
Array_b = [1 ,2 ,3 , 4]
sample = tuple(zip(Array_a,zip(Array_b,["Days Learning" for i in range(len(Array_b))])))
print(sample)
it will give you this result:
((20, (1, 'Days Learning')), (30, (2, 'Days Learning')), (50, (3, 'Days Learning')), (20, (4, 'Days Learning')))
I am trying to display a list in vertically sorted columns with number of columns decided by the user. I want to use zip() but I can't seem to figure out how to tell it to go through n no of lists.
#!/usr/bin/env python
import random
lst = random.sample(range(100), 30)
lst = sorted(lst)
col_size = int(raw_input('How many columns do you want?: '))
sorting = 'Vertical'
if sorting == 'Vertical':
# vertically sorted
columns = []
n = len(lst)//col_size
for i in range(0, len(lst), n):
columns.append(lst[i:i+n])
print '\nVertically Sorted:'
print columns
print zip(*columns)
This gives this result:
How many columns do you want?: 4
Vertically Sorted:
[[0, 2, 4, 11, 12, 16, 23], [24, 31, 32, 36, 41, 48, 50], [52, 54, 61, 62, 63, 64, 67], [76, 80, 81, 89, 91, 92, 94], [96, 97]]
[(0, 24, 52, 76, 96), (2, 31, 54, 80, 97)]
If I knew the number of columns (e.g. 4), I could've coded:
for c1, c2, c3, c4 in zip(columns[0], columns[1], columns[2], columns[3]):
print str(c1), str(c2).rjust(8), str(c3).rjust(8), str(c4).rjust(8)
But since I don't, how do I use zip? As you can see I tried zip(*columns) but that failed due to unequal no. of items in the last list.
Zip doesn't do what you're after because the rows are different sizes. Map will transpose when rows are uneven.
See the following with code help from Create nice column output in python.
PROGRAM
import random
lst = random.sample(range(100), 30)
lst = sorted(lst)
col_size = int(raw_input('How many columns do you want?: '))
sorting = 'Vertical'
if sorting == 'Vertical':
# vertically sorted
columns = []
n = len(lst)//col_size
for i in range(0, len(lst), n):
columns.append(lst[i:i+n])
print '\nColumns:'
columns = map(None,*columns)
print columns
print '\nVertically Sorted:'
col_width = max(len(str(word)) for row in columns for word in row) + 2 # padding
for row in columns:
print "".join(str(word).ljust(col_width) for word in row if word is not None)
OUTPUT
How many columns do you want?: 4
Columns:
[(0, 19, 45, 62, 92), (1, 24, 47, 64, 93), (5, 29, 48, 72, None), (6, 31, 50, 80, None), (9, 34, 56, 85, None), (14, 36, 58, 87, None), (15, 37, 61, 90, None)]
Vertically Sorted:
0 19 45 62 92
1 24 47 64 93
5 29 48 72
6 31 50 80
9 34 56 85
14 36 58 87
15 37 61 90
Use the grouper recipe IT.izip_longest(*[iterable]*n) to collect the items in lst into groups of size n. (See this page for a more detailed explanation of how the grouper recipe works.)
import random
import itertools as IT
# lst = random.sample(range(100), 30)
lst = range(30)
lst = sorted(lst)
col_size = int(raw_input('How many columns do you want?: '))
sorting = 'Vertical'
if sorting == 'Vertical':
# vertically sorted
n = len(lst)//col_size
lst = iter(lst)
columns = IT.izip_longest(*[lst]*n, fillvalue='')
print '\nVertically Sorted:'
print('\n'.join(
[''.join(map('{:4}'.format, row))
for row in IT.izip(*columns)]))
yields
0 7 14 21 28
1 8 15 22 29
2 9 16 23
3 10 17 24
4 11 18 25
5 12 19 26
6 13 20 27
Extend last list with None elements to get equal length, zip, after remove None elements from result.