I have a pandas dataframe like:
I need to style it using a list of lists like:
[[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
If R1 values are in first list color blue, if R2 values are in second list color blue and so on.
In other words color numbers of each column if value is in the correspondent list.
I have tried:
def posclass(val):
color = 'black'
for i in range(5):
if (val in list[i]):
color = 'blue'
return 'color: %s' % color
df.style.applymap(posclass, subset=['R1','R2','R3','R4','R5'])
But this is not working properly applying each list to each column.
The desired result is a dataframe with colored numbers (those that matches in each column with each list).
Try something like this:
df = pd.DataFrame(np.arange(40).reshape(-1,4), columns=[f'R{i}' for i in range(1,5)])
Input df:
R1 R2 R3 R4
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
5 20 21 22 23
6 24 25 26 27
7 28 29 30 31
8 32 33 34 35
9 36 37 38 39
and
list_l = [[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
Then:
def f(x):
colpos = df.columns.get_loc(x.name)
return ['color: blue' if n in list_l[colpos] else '' for n in x]
df.style.apply(f)
Output:
Related
I have a problem with matrix sort.
I need to create a matrix (MxM) from input. And create nested lists using randrange.
matrix_size = int(input("Enter size of the matrix: "))
matrix = [[randrange(1, 51) for column in range(matrix_size)] for row in range(matrix_size)]
Next step i should find sum of each column of matrix. So i do this thing:
for i in range(matrix_size):
sum_column = 0
for j in range(matrix_size):
sum_column += matrix[j][i]
print(f'{matrix[i][j]:>5}', end='')
print(f'{sum_column:>5}')
So problem is... that i should add sum row in the end of a matrix. But what happens to me:
Enter the size of the matrix: 5
15 23 14 22 20 73
7 26 26 27 27 160
17 36 9 13 42 104
1 32 41 2 29 113
33 43 14 49 12 130
Yeah. It counting right but how i can add it to the end of matrix. And sort ascending to the sums of columns. Hope some of you will understand what i need. Thanks
Do you mean something like this?
import numpy as np
matrix = np.array(matrix)
rowsum = matrix.sum(axis=1) # sum of rows
idx = np.argsort(rowsum) # permutation that makes rowsum sorted
result = np.hstack([matrix, rowsum[:, None]]) # join matrix and roswum
result = result[idx] # sort rows in ascending order
for matrix
array([[31, 13, 29, 5, 1],
[21, 9, 34, 31, 22],
[13, 38, 29, 20, 50],
[21, 12, 26, 5, 15],
[19, 24, 38, 44, 41]])
would the output be:
array([[ 31, 13, 29, 5, 1, 79],
[ 21, 12, 26, 5, 15, 79],
[ 21, 9, 34, 31, 22, 117],
[ 13, 38, 29, 20, 50, 150],
[ 19, 24, 38, 44, 41, 166]])
I make bins out of my column using pandas' pd.qcut(). I would like to, then apply smoothing by corresponding bin's mean value.
I generate my bins with something like
pd.qcut(col, 3)
For example,
Given the column values [4, 8, 15, 21, 21, 24, 25, 28, 34]
and the generated bins
Bin1 [4, 15]: 4, 8, 15
Bin2 [21, 24]: 21, 21, 24
Bin3 [25, 34]: 25, 28, 34
I would like to replace the values with the following means
Mean of Bin1 (4, 8, 15) = 9
Mean of Bin2 (21, 21, 24) = 22
Mean of Bin3 (25, 28, 34) = 29
Therefore:
Bin1: 9, 9, 9
Bin2: 22, 22, 22
Bin3: 29, 29, 29
making the final dataset: [9, 9, 9, 22, 22, 22, 29, 29, 29]
How can one also add a column with closest bin boundaries?
Bin1: 4, 4, 15
Bin2: 21, 21, 24
Bin3: 25, 25, 34
making the final dataset: [4, 4, 15, 21, 21, 24, 25, 25, 34]
very similar to this question which is for R
It's exactly as you laid out. Using this technique to get nearest
df = pd.DataFrame({"col":[4, 8, 15, 21, 21, 24, 25, 28, 34]})
df2 = df.assign(bin=pd.qcut(df.col, 3),
colbmean=lambda dfa: dfa.groupby("bin").transform("mean"),
colbin=lambda dfa: dfa.apply(lambda r: min([r.bin.left,r.bin.right], key=lambda x: abs(x-r.col)), axis=1))
col
bin
colbmean
colbin
0
4
(3.999, 19.0]
9
3.999
1
8
(3.999, 19.0]
9
3.999
2
15
(3.999, 19.0]
9
19
3
21
(19.0, 24.333]
22
19
4
21
(19.0, 24.333]
22
19
5
24
(19.0, 24.333]
22
24.333
6
25
(24.333, 34.0]
29
24.333
7
28
(24.333, 34.0]
29
24.333
8
34
(24.333, 34.0]
29
34
You'll find below the solution I came up with to answer your problem.
There is still a limitation, pandas.qcut does not return closed intervals, for this matter the results are not exactly the one you described.
import pandas as pd
df = pd.DataFrame({'value': [4, 8, 15, 21, 21, 24, 25, 28, 34]})
df['bin'] = pd.qcut(df['value'], 3)
df = df.join(df.groupby('bin')['value'].mean(), on='bin', rsuffix='_average_in_bin')
df['bin_left'] = df['bin'].apply(lambda x: x.left)
df['bin_right'] = df['bin'].apply(lambda x: x.right)
df['nearest_boundary'] = df.apply(lambda x: x['bin_left'] if abs(x['value'] - x['bin_left']) < abs(x['value'] - x['bin_right']) else x['bin_right'], axis=1)
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(seq[i],end ="\t")
How do I get my output table to look like this?
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
one of many ways is this, you make iterate over the seq list by a step of 6 and print the element between those margins
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(0, len(seq), 6):
print(*seq[i:i+6], sep=' ')
output
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
You probably want to make use of string formatting. Below, f"{seq[i]:<4d}" means "A string of length 4, left-aligned, containing the string representation of seq[i]". If you want to right-align, just remove <.
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(f"{seq[i]:<4d}", end = "")
if not (i+1) % 6:
print("")
print("")
Output:
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
The simplest relevant technique is padding
for i in range(0, len(seq), 6):
print(" ".join[str(k).ljust(2, " ") for k in seq[i: i + 6]]
but string formatting as in Printing Lists as Tabular Data will make is a more sophisticated solution
Any one know why I cannot use "|" to concatenate multiple numpy.flatiter object after converting it inot set? I try to look for all display number
from all row 11, all column 1 and section from (2,2) to (3,3) if I use np.concatenate I can get the right answer but after I use "|" I have empty set? or if there is a better way to write it?
import numpy as np
matrix = np.matrix(np.arange(36).reshape(6, 6))
rnum = matrix[1, :].flat
cnum = matrix[:, 1].flat
snum = matrix[2:4, 2:4].flat
print(matrix)
print(rnum)
print(set(rnum))
print(set(cnum))
print(set(snum))
print(set(np.concatenate((rnum, cnum, snum))))
print(set(rnum) | set(cnum) | set(snum))
#[[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]
# [24 25 26 27 28 29]
# [30 31 32 33 34 35]]
#<numpy.flatiter object at 0x7faf52966c00>
#{6, 7, 8, 9, 10, 11}
#{1, 7, 13, 19, 25, 31}
#{20, 21, 14, 15}
#{1, 6, 7, 8, 9, 10, 11, 13, 14, 15, 19, 20, 21, 25, 31} => expect result
#set() => why?
The first call of set(rnum) in print(set(rnum)) consumes the iterator rnum. When you use set(rnum) again in set(rnum) | set(cnum) | set(snum), there are no more values left in the iterator rnum, so set(rnum) is the empty set.
Here's a more direct demonstration:
In [621]: matrix = np.matrix(np.arange(36).reshape(6, 6))
In [622]: rnum = matrix[1, :].flat
In [623]: set(rnum)
Out[623]: {6, 7, 8, 9, 10, 11}
In [624]: set(rnum)
Out[624]: set()
Instead of using rnum, you could create another iterator by repeating matrix[1, :].flat:
In [625]: set(matrix[1, :].flat)
Out[625]: {6, 7, 8, 9, 10, 11}
Alternatively, skip the use of numpy.matrix and iterators, and just index into a regular NumPy array:
In [639]: a = np.arange(36).reshape(6, 6)
In [640]: set(a[1,:])
Out[640]: {6, 7, 8, 9, 10, 11}
In [641]: set(a[:,1])
Out[641]: {1, 7, 13, 19, 25, 31}
In [642]: set(a[2:4, 2:4].ravel())
Out[642]: {20, 21, 14, 15}
So I found this:
When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then
reshape back. As reshape (usually) produces views onto the same
storage, it should be possible to do this fairly efficiently.
Note that the scan order used by reshape in Numpy defaults to the 'C'
order, whereas MATLAB uses the Fortran order. If you are simply
converting to a linear sequence and back this doesn't matter. But if
you are converting reshapes from MATLAB code which relies on the scan
order, then this MATLAB code:
z = reshape(x,3,4);
should become
z = x.reshape(3,4,order='F').copy()
in Numpy.
I have a multidimensional 16*2 array called mafs, when I do in MATLAB:
mafs2 = reshape(mafs,[4,4,2])
I get something different than when in python I do:
mafs2 = reshape(mafs,(4,4,2))
or even
mafs2 = mafs.reshape((4,4,2),order='F').copy()
Any help on this? Thank you all.
Example:
MATLAB:
>> mafs = [(1:16)' (17:32)']
mafs =
1 17
2 18
3 19
4 20
5 21
6 22
7 23
8 24
9 25
10 26
11 27
12 28
13 29
14 30
15 31
16 32
>> reshape(mafs,[4 4 2])
ans(:,:,1) =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
ans(:,:,2) =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
Python:
>>> import numpy as np
>>> mafs = np.c_[np.arange(1,17), np.arange(17,33)]
>>> mafs.shape
(16, 2)
>>> mafs[:,0]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> mafs[:,1]
array([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32])
>>> r = np.reshape(mafs, (4,4,2), order="F")
>>> r.shape
(4, 4, 2)
>>> r[:,:,0]
array([[ 1, 5, 9, 13],
[ 2, 6, 10, 14],
[ 3, 7, 11, 15],
[ 4, 8, 12, 16]])
>>> r[:,:,1]
array([[17, 21, 25, 29],
[18, 22, 26, 30],
[19, 23, 27, 31],
[20, 24, 28, 32]])
I was having a similar issue myself, as I am also trying to make the transition from MATLAB to Python. I was finally able to convert a numpy matrix, given in depth, row, col, format to a single sheet of column vectors (per image).
In MATLAB I would have done something like:
output = reshape(imStack,[row*col,depth])
In Python this seems to translate to:
import numpy as np
output=np.transpose(imStack)
output=output.reshape((row*col, depth), order='F')