How to do a substring using pandas or numpy - python

I'm trying to do a substring on data from column "ORG". I only need the 2nd and 3rd character. So for 413 I only need 13. I've tried the following:
Attempt 1: dr2['unit'] = dr2[['ORG']][1:2]
Attempt 2: dr2['unit'] = dr2[['ORG'].str[1:2]
Attempt 3: dr2['unit'] = dr2[['ORG'].str([1:2])
My dataframe:
REGION ORG
90 4 413
91 4 413
92 4 413
93 5 503
94 5 503
95 5 503
96 5 503
97 5 504
98 5 504
99 1 117
100 1 117
101 1 117
102 1 117
103 1 117
104 1 117
105 1 117
106 3 3
107 3 3
108 3 3
109 3 3
Expected output:
REGION ORG UNIT
90 4 413 13
91 4 413 13
92 4 413 13
93 5 503 03
94 5 503 03
95 5 503 03
96 5 503 03
97 5 504 04
98 5 504 04
99 1 117 17
100 1 117 17
101 1 117 17
102 1 117 17
103 1 117 17
104 1 117 17
105 1 117 17
106 3 3 03
107 3 3 03
108 3 3 03
109 3 3 03
thanks for any and all help!

Your square braces are not matching and you can easily slice with [-2:].
apply str.zfill with a width of 2 to pad the items in the new series:
>>> import pandas as pd
>>> ld = [{'REGION': '4', 'ORG': '413'}, {'REGION': '4', 'ORG': '414'}]
>>> df = pd.DataFrame(ld)
>>> df
ORG REGION
0 413 4
1 414 4
>>> df['UNIT'] = df['ORG'].str[-2:].apply(str.zfill, args=(2,))
>>> df
ORG REGION UNIT
0 413 4 13
1 414 4 14
2 3 4 03

Related

concat result of apply in python

I am trying to apply a function on a column of a dataframe.
After getting multiple results as dataframes, I want to concat them all in one.
Why does the first option work and the second not?
import numpy as np
import pandas as pd
def testdf(n):
test = pd.DataFrame(np.random.randint(0,n*100,size=(n*3, 3)), columns=list('ABC'))
test['index'] = n
return test
test = pd.DataFrame({'id': [1,2,3,4]})
testapply = test['id'].apply(func = testdf)
#option 1
pd.concat([testapply[0],testapply[1],testapply[2],testapply[3]])
#option2
pd.concat([testapply])
pd.concat expects a sequence of pandas objects, but your #2 case/option passes a sequence of single pd.Series object that contains multiple dataframes, so it doesn't make concatenation - you just get that series as is.To fix your 2nd approach use unpacking:
print(pd.concat([*testapply]))
A B C index
0 91 15 91 1
1 93 85 91 1
2 26 87 74 1
0 195 103 134 2
1 14 26 159 2
2 96 143 9 2
3 18 153 35 2
4 148 146 130 2
5 99 149 103 2
0 276 150 115 3
1 232 126 91 3
2 37 242 234 3
3 144 73 81 3
4 96 153 145 3
5 144 94 207 3
6 104 197 49 3
7 0 93 179 3
8 16 29 27 3
0 390 74 379 4
1 78 37 148 4
2 350 381 260 4
3 279 112 260 4
4 115 387 173 4
5 70 213 378 4
6 43 37 149 4
7 240 399 117 4
8 123 0 47 4
9 255 172 1 4
10 311 329 9 4
11 346 234 374 4

Add a new line to .txt file in python

I have the following .txt file:
0 40 50 0 0 1236 0 0 0
1 45 70 -20 825 870 90 3 0
2 42 68 -10 727 782 90 4 0
3 40 69 20 621 702 90 0 1
4 38 70 10 534 605 90 0 2
5 25 85 -20 652 721 90 11 0
6 22 75 30 30 92 90 0 10
7 22 85 -40 567 620 90 9 0
8 20 80 -10 384 429 90 12 0
9 20 85 40 475 528 90 0 7
10 18 75 -30 99 148 90 6 0
11 15 75 20 179 254 90 0 5
12 15 80 10 278 345 90 0 8
I need to copy the first line and add it to the .txt file as last line in order to get this:
0 40 50 0 0 1236 0 0 0
1 45 70 -20 825 870 90 3 0
2 42 68 -10 727 782 90 4 0
3 40 69 20 621 702 90 0 1
4 38 70 10 534 605 90 0 2
5 25 85 -20 652 721 90 11 0
6 22 75 30 30 92 90 0 10
7 22 85 -40 567 620 90 9 0
8 20 80 -10 384 429 90 12 0
9 20 85 40 475 528 90 0 7
10 18 75 -30 99 148 90 6 0
11 15 75 20 179 254 90 0 5
12 15 80 10 278 345 90 0 8
13 40 50 0 0 1236 0 0 0
How can I do that? (Notice the 13 as the first entry of the last line)
Try the following. I have added some comments to describe the steps
with open('yourfile.txt') as f:
t=f.readlines()
row=t[-1].split()[0] #get the last row index
row=str(int(row)+1) #increase the last row index
new_line=t[0] #copy first line
new_line=new_line.replace('0', row, 1).replace(' ', '',len(row)-1) #add the next row index to new line, taking care of spaces
t[-1]=t[-1]+'\n'
t.append(new_line) #append the new line
with open('yourfile.txt', 'w') as f:
f.writelines(t)
Applied to your existing .txt, result of the above code is:
0 40 50 0 0 1236 0 0 0
1 45 70 -20 825 870 90 3 0
2 42 68 -10 727 782 90 4 0
3 40 69 20 621 702 90 0 1
4 38 70 10 534 605 90 0 2
5 25 85 -20 652 721 90 11 0
6 22 75 30 30 92 90 0 10
7 22 85 -40 567 620 90 9 0
8 20 80 -10 384 429 90 12 0
9 20 85 40 475 528 90 0 7
10 18 75 -30 99 148 90 6 0
11 15 75 20 179 254 90 0 5
12 15 80 10 278 345 90 0 8
13 40 50 0 0 1236 0 0 0
You can use:
with open('fileName.txt') as file:
first_line = file.readline()
count = sum(1 for _ in file)
line1 = first_line.split()
line1[0] = count
str = ' '.join(line1)
#then you can add it to the end of the file with:
file_object.write(str)

column select from transpose data in data frame [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
hi all i have date frame with columns name dates, the problem was i want to get the 4 weak form every column , so i try to transpose the date then i when columns become in rows so it will bw in one column i can select 4 weak from it ,
but when i transpose it the columns become rows but don’t add to data frame index data and i cant select it any more
i attach the picture for more clear view
any help for that, regards
firts image data frame before transpose
seconde image after transpose
It is still not clear to me what you want to do with the 5th weeks, but this will get you the number for the week of the month.
# this will only work if the first date belongs to the first week of that month
# and if there is only one date per week
wom = ( # week of month
df.index.to_series()
.groupby([df.index.year, df.index.month])
.cumcount() + 1 # create the 1, 2, 3, 4, 5 tags for week of month
)
# you can keep it as a separate indexer
sales_month_over = df.loc[wom < 5, :]
# or you can create a MultiIndex
df.index = pd.MultiIndex.from_arrays([df.index, wom], names=['date', 'wom'])
sales_month_over = df.loc[df.index.get_level_values('wom') < 5]
fifth_weeks = df.loc[~df.index.isin(sales_month_over.index)]
>>> print(sales_month_over)
0 1 2 3 4 5 6 7 ... 107 108 109 110 111 112 113 114
date wom ...
2019-01-05 1 78 135 66 68 64 69 109 70 ... 58 166 122 81 162 193 74 196
2019-01-12 2 138 191 130 80 177 60 139 114 ... 147 188 59 149 126 131 133 178
2019-01-19 3 198 111 181 145 91 60 128 184 ... 80 54 110 152 114 165 86 68
2019-01-26 4 154 169 134 90 173 122 140 182 ... 186 140 150 65 68 92 128 169
2019-02-02 1 105 55 82 74 125 163 91 95 ... 199 67 116 155 128 162 133 110
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-23 4 163 176 92 78 88 64 55 79 ... 142 156 134 158 63 157 77 75
2020-06-06 1 133 167 117 91 180 106 169 154 ... 58 170 115 101 108 89 57 56
2020-06-13 2 78 86 93 192 53 143 182 184 ... 193 139 68 179 55 61 131 167
2020-06-20 3 119 123 91 145 71 193 97 182 ... 146 163 52 120 195 56 153 126
2020-06-27 4 50 191 72 89 76 151 166 89 ... 132 95 111 134 83 64 188 150
[72 rows x 115 columns]
>>> print(fifth_weeks)
0 1 2 3 4 5 6 7 ... 107 108 109 110 111 112 113 114
date wom ...
2019-03-30 5 199 120 147 81 61 85 132 174 ... 99 162 177 104 118 168 117 92
2019-06-29 5 113 72 92 64 192 188 51 164 ... 143 137 126 117 162 157 53 102
2019-08-31 5 129 192 60 156 153 137 183 117 ... 155 115 57 92 124 99 143 119
2019-11-30 5 133 190 156 179 79 107 158 118 ... 165 180 91 139 176 159 61 103
2020-02-29 5 123 195 182 170 155 145 189 84 ... 152 115 74 128 190 72 53 104
2020-05-30 5 176 121 132 155 60 57 120 182 ... 57 136 52 190 152 168 65 164
[6 rows x 115 columns]
Now you can take the percentage change compared to the previous month
sales_month_over = sales_month_over.groupby(level='wom').pct_change()
>>> print(sales_month_over)
0 1 2 3 ... 111 112 113 114
date wom ...
2019-01-05 1 NaN NaN NaN NaN ... NaN NaN NaN NaN
2019-01-12 2 NaN NaN NaN NaN ... NaN NaN NaN NaN
2019-01-19 3 NaN NaN NaN NaN ... NaN NaN NaN NaN
2019-01-26 4 NaN NaN NaN NaN ... NaN NaN NaN NaN
2019-02-02 1 -0.063953 -0.259259 -0.259067 -0.401961 ... 0.084416 0.452055 -0.481250 0.012579
... ... ... ... ... ... ... ... ... ...
2020-05-23 4 -0.554878 -0.191860 0.285714 0.265734 ... -0.658824 0.943820 0.444444 0.950000
2020-06-06 1 -0.598540 1.763889 -0.155844 -0.338983 ... -0.248000 -0.006757 -0.512821 -0.043243
2020-06-13 2 0.130435 0.390244 0.423358 -0.460177 ... -0.013158 -0.167702 0.015385 0.305785
2020-06-20 3 -0.437500 -0.100000 1.650000 0.175439 ... 0.666667 -0.088235 0.155556 0.246753
2020-06-27 4 0.452055 -0.474820 -0.374269 -0.414365 ... 1.172414 -0.710983 -0.525641 -0.521368
[72 rows x 115 columns]
You can use df.index to select the index column.

Matplotlib ScalarMappable returning only black color

I am using matplotlib to create many plots. The plots involve making many FancyBboxPatches and setting the color for each patch using a ScalarMappable. Each plot corresponds to a "time step" from a physical process. I have made the following minimal working example to illustrate what I am trying to do and the problem I am having.
Suppose there is a file data.txt. If a line has one entry, that value is the time step. If a line has three entries, then the first entry is the x value, the second entry is the y value, and the third entry is the value that will use the ScalarMappable. Here is an example of data.txt:
1
0 0 0.1
0 1 1
0 2 2
0 3 3
0 4 4
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 20
2 1 21
2 2 22
2 3 23
2 4 24
3 0 30
3 1 31
3 2 32
3 3 33
3 4 34
2
1 0 10
1 1 11
1 2 12
1 3 13
1 4 14
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 120
3 1 121
3 2 122
3 3 123
3 4 124
4 0 130
4 1 131
4 2 132
4 3 133
4 4 134
3
2 0 110
2 1 111
2 2 112
2 3 113
2 4 114
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 1120
4 1 1121
4 2 1122
4 3 1123
4 4 1124
5 0 1130
5 1 1131
5 2 1132
5 3 1133
5 4 1134
4
3 0 1110
3 1 1111
3 2 1112
3 3 1113
3 4 1114
4 0 11110
4 1 11111
4 2 11112
4 3 11113
4 4 11114
5 0 11120
5 1 11121
5 2 11122
5 3 11123
5 4 11124
6 0 11130
6 1 11131
6 2 11132
6 3 11133
6 4 11134
Here is the script I use to generate the plots:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 1:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(-1,10)
axes.set_ylim(-1,10)
cmap = cm.plasma
norm = LogNorm(vmin = 1e-2, vmax = 1.2e4)
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
And here is one of the plots that is created from running that script:
This plot (and the other three that are created, but not shown here) look fine. So I am confident that I am using ScalarMappable correctly. Now I take the actual data I want to plot, again in a file called data.txt. The format is the same as before, except if a line has four entries, then the first entry is the time step (and I do not care about the other entries). Here is an example of data.txt:
2 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850703E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.591676E-08
26 78 0.745985E-08
27 77 0.263136E-08
27 78 0.131857E-08
27 79 0.151193E-05
27 80 0.265941E-07
27 81 0.170975E-05
27 82 0.206355E-08
27 83 0.334444E-07
28 80 0.569439E-05
28 81 0.864904E-07
28 82 0.114196E-02
28 83 0.130067E-06
28 84 0.608045E-04
28 85 0.351649E-07
28 86 0.543117E-07
28 88 0.202115E-08
29 83 0.225374E-07
29 84 0.125586E-07
29 85 0.253383E-04
29 86 0.943810E-06
29 87 0.104539E-04
29 88 0.210241E-06
29 89 0.196533E-03
29 90 0.707278E-06
29 91 0.565096E-05
29 92 0.840856E-08
29 93 0.277478E-07
30 86 0.707234E-09
30 88 0.549048E-07
30 89 0.281776E-08
30 90 0.259219E-04
30 91 0.298973E-06
30 92 0.311047E-04
30 93 0.144465E-05
30 94 0.632642E-04
30 95 0.787893E-08
30 96 0.252900E-08
31 91 0.425350E-08
31 92 0.371105E-08
31 93 0.621869E-05
31 94 0.680069E-06
31 95 0.315149E-04
31 96 0.670790E-07
31 97 0.568911E-06
31 98 0.187946E-08
31 99 0.135024E-07
32 94 0.384693E-09
32 96 0.174407E-06
32 97 0.480216E-08
32 98 0.244989E-05
32 99 0.876257E-07
32 100 0.189371E-04
32 101 0.264917E-06
32 102 0.297745E-05
32 103 0.213684E-09
33 99 0.110356E-08
33 100 0.131345E-08
33 101 0.448076E-06
33 102 0.106369E-06
33 103 0.128984E-04
33 104 0.230382E-07
33 105 0.266535E-07
34 102 0.428166E-08
34 103 0.668242E-08
34 104 0.842244E-05
34 105 0.843016E-07
34 106 0.137510E-05
34 107 0.879097E-08
34 108 0.758233E-07
35 105 0.280844E-06
35 106 0.639110E-07
35 107 0.497335E-05
35 108 0.260105E-06
35 109 0.188060E-05
35 110 0.375853E-09
35 111 0.935430E-09
35 112 0.138533E-07
35 113 0.101658E-06
35 114 0.504823E-09
35 115 0.989704E-09
35 116 0.152468E-06
35 117 0.220735E-07
36 114 0.430884E-08
36 116 0.115980E-07
36 117 0.128436E-05
36 118 0.814433E-05
37 117 0.316595E-09
37 118 0.141531E-06
37 119 0.965141E-05
38 119 0.459954E-08
38 120 0.114088E-04
38 121 0.198695E-09
39 120 0.109457E-08
39 121 0.105160E-04
39 122 0.254984E-08
40 122 0.717566E-05
40 123 0.179081E-08
40 124 0.352463E-09
41 123 0.454357E-05
41 124 0.629608E-07
41 125 0.777480E-07
42 124 0.453866E-05
42 125 0.108592E-06
42 126 0.320262E-06
42 127 0.252596E-09
42 128 0.114714E-09
43 125 0.372578E-06
43 126 0.344297E-07
43 127 0.188018E-05
43 128 0.631276E-08
43 129 0.368003E-08
44 126 0.170090E-07
44 127 0.121695E-07
44 128 0.147407E-05
44 129 0.349674E-07
44 130 0.767494E-06
45 128 0.193141E-09
45 129 0.361851E-06
45 130 0.573704E-07
45 131 0.457287E-06
45 132 0.148004E-08
45 133 0.164772E-07
45 134 0.386942E-09
45 135 0.539603E-08
45 136 0.227778E-09
45 137 0.640126E-08
45 138 0.189604E-09
45 139 0.754561E-09
46 132 0.215880E-07
46 134 0.102847E-08
46 136 0.628736E-08
46 137 0.427124E-09
46 138 0.711664E-07
46 139 0.749082E-08
46 140 0.425043E-06
46 141 0.776307E-08
46 142 0.102985E-06
46 143 0.693232E-09
46 144 0.215846E-08
47 141 0.660244E-08
47 142 0.901189E-09
47 143 0.299062E-07
47 144 0.195833E-08
47 145 0.178405E-07
47 146 0.558550E-09
47 147 0.235167E-08
48 144 0.393065E-09
48 146 0.493252E-08
48 147 0.299176E-09
48 148 0.130504E-07
48 149 0.244654E-09
48 150 0.143702E-08
49 149 0.565286E-09
49 151 0.122230E-08
3 0.424066E-02 0.200000E+01 0.885500E+08
0 1 0.850710E+00
1 3 0.388551E-09
2 4 0.141948E-06
2 6 0.126299E-09
3 9 0.166871E-08
4 12 0.340738E-08
5 13 0.246948E-09
5 14 0.129005E-09
6 16 0.140043E-08
6 17 0.885307E-09
26 76 0.593799E-08
26 78 0.747463E-08
27 77 0.283934E-08
27 78 0.115725E-08
27 79 0.153613E-05
27 80 0.236099E-08
27 81 0.171178E-05
27 83 0.334426E-07
28 80 0.575684E-05
28 81 0.242170E-07
28 82 0.114208E-02
28 83 0.133947E-07
28 84 0.608362E-04
28 85 0.335522E-08
28 86 0.543624E-07
28 88 0.202170E-08
29 83 0.258149E-07
29 84 0.107337E-07
29 85 0.261133E-04
29 86 0.167223E-06
29 87 0.108977E-04
29 88 0.432469E-08
29 89 0.196993E-03
29 90 0.997563E-08
29 91 0.565922E-05
29 92 0.127589E-09
29 93 0.277365E-07
30 86 0.731139E-09
30 88 0.613936E-07
30 89 0.984612E-09
30 90 0.261316E-04
30 91 0.845314E-07
30 92 0.324848E-04
30 93 0.656773E-07
30 94 0.632706E-04
30 95 0.335583E-09
30 96 0.252938E-08
31 91 0.529954E-08
31 92 0.394099E-08
31 93 0.681605E-05
31 94 0.104800E-06
31 95 0.315602E-04
31 96 0.231610E-08
31 97 0.566868E-06
31 99 0.135330E-07
32 94 0.450380E-09
32 96 0.178679E-06
32 97 0.955313E-09
32 98 0.252946E-05
32 99 0.770340E-08
32 100 0.191937E-04
32 101 0.825856E-08
32 102 0.297762E-05
33 99 0.128999E-08
33 100 0.146516E-08
33 101 0.616111E-06
33 102 0.539415E-07
33 103 0.128046E-04
33 104 0.865090E-09
33 105 0.266759E-07
34 102 0.899336E-08
34 103 0.331924E-08
34 104 0.850733E-05
34 105 0.462457E-08
34 106 0.137714E-05
34 107 0.199044E-09
34 108 0.758844E-07
35 105 0.308602E-06
35 106 0.470668E-07
35 107 0.520013E-05
35 108 0.458893E-07
35 109 0.185756E-05
35 111 0.159320E-07
35 112 0.729552E-09
35 113 0.101697E-06
35 114 0.135746E-09
35 115 0.128676E-06
35 116 0.231448E-07
35 117 0.220783E-07
36 114 0.480979E-08
36 116 0.921582E-06
36 117 0.373798E-06
36 118 0.814449E-05
37 117 0.888355E-08
37 118 0.132905E-06
37 119 0.965147E-05
38 118 0.360663E-09
38 119 0.423745E-08
38 120 0.114090E-04
39 120 0.109122E-08
39 121 0.105186E-04
40 122 0.717737E-05
40 124 0.352428E-09
41 123 0.460618E-05
41 124 0.358205E-09
41 125 0.777514E-07
42 124 0.464136E-05
42 125 0.589035E-08
42 126 0.320503E-06
42 128 0.114709E-09
43 125 0.408148E-06
43 126 0.567978E-08
43 127 0.187958E-05
43 129 0.368007E-08
44 126 0.258868E-07
44 127 0.348446E-08
44 128 0.150718E-05
44 129 0.167101E-08
44 130 0.767515E-06
45 128 0.176686E-09
45 129 0.403334E-06
45 130 0.162718E-07
45 131 0.458273E-06
45 132 0.196826E-09
45 133 0.167474E-07
45 135 0.563904E-08
45 137 0.655709E-08
45 139 0.751998E-09
46 132 0.216010E-07
46 134 0.107901E-08
46 136 0.673825E-08
46 138 0.784839E-07
46 139 0.220743E-09
46 140 0.432287E-06
46 141 0.427029E-09
46 142 0.103696E-06
46 144 0.211976E-08
47 141 0.696394E-08
47 142 0.585710E-09
47 143 0.315456E-07
47 144 0.425448E-09
47 145 0.181981E-07
47 146 0.136911E-09
47 147 0.226765E-08
48 144 0.442465E-09
48 146 0.553370E-08
48 147 0.138932E-09
48 148 0.128376E-07
48 150 0.144107E-08
49 149 0.624360E-09
49 151 0.123765E-08
The script that I use to create the plots is almost the same as before. The only differences are (1) how data.txt is parsed, (2) setting the limits of the x and y axes, and (3) the variable norm. Here is the script:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import LogNorm
from matplotlib.patches import FancyBboxPatch
def parse_file(file_name):
output = {}
with open(file_name, 'r') as data_file:
for line in data_file:
entries = line.strip().split()
if len(entries) == 4:
time_step = int(entries[0])
output[time_step] = {}
elif len(entries) == 3:
x = float(entries[0])
y = float(entries[1])
value = float(entries[2])
output[time_step][(x, y)] = value
else:
raise RuntimeError('Anomalous line {} in file {}'.format(line, data_file.name))
return output
def main():
fig, axes = plt.subplots()
axes.set_xlim(0,150)
axes.set_ylim(0,250)
cmap = cm.plasma
norm = LogNorm(vmin = pow(10, -10), vmax = pow(10, -2.2))
smap = cm.ScalarMappable(norm = norm, cmap = cmap)
smap.set_array([])
color_bar = fig.colorbar(mappable = smap, ax = axes, orientation = 'vertical', label = 'label')
data = parse_file(file_name = 'data.txt')
for time_step, information in data.items():
cells = []
for (x,y), value in information.items():
cell = FancyBboxPatch(xy = (x - 0.5, y - 0.5),
width = 1, height = 1,
boxstyle = 'square,pad=0.',
edgecolor = 'black',
facecolor = smap.to_rgba(value))
#print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
axes.add_patch(cell)
cells.append(cell)
fig.savefig('time-step_{}.png'.format(time_step))
for cell in cells:
cell.remove()
if __name__ == '__main__':
main()
Now all of the patches are black. Here is one of the plots that is created:
I do not see anything obviously wrong using the print statement (which is commented out in the script):
print(time_step, '\t', x, '\t', y, '\t', value, '\t', smap.to_rgba(value))
Why are all the FancyBboxPatches black instead of the color I have chosen with the ScalarMappable (and how can I make them be the color I have chosen with the ScalarMappable)?
It doesn't look as if the patches are black. I would guess that they are just too small, such that their edge (which is black) takes up the complete area of the patch. You may use a thinner edge, or no edge at all, or you may set the edgecolor to the value of your liking as well. In general, You may also use simple patches like Rectangle instead of the FancyBboxPatch.

python pandas: Grouping dataframe by ranges

I have a dateframe object with date and calltime columns.
Was trying to build a histogram based on the second column. E.g.
df.groupby('calltime').head(10).plot(kind='hist', y='calltime')
Got the following:
The thing is that I want to get more details for the first bar. E.g. the range itself 0-2500 is huge, and all the data is hidden there... Is there a possibility to split group by smaller range? E.g. by 50, or something like that?
UPD
date calltime
0 1491928756414930 4643
1 1491928756419607 166
2 1491928756419790 120
3 1491928756419927 142
4 1491928756420083 121
5 1491928756420217 109
6 1491928756420409 52
7 1491928756420476 105
8 1491928756420605 35
9 1491928756420654 120
10 1491928756420787 105
11 1491928756420907 93
12 1491928756421013 37
13 1491928756421062 112
14 1491928756421187 41
15 1491928756421240 122
16 1491928756421375 28
17 1491928756421416 158
18 1491928756421587 65
19 1491928756421667 108
20 1491928756421790 55
21 1491928756421858 145
22 1491928756422018 37
23 1491928756422068 63
24 1491928756422145 57
25 1491928756422214 43
26 1491928756422270 73
27 1491928756422357 90
28 1491928756422460 72
29 1491928756422546 77
... ... ...
9845 1491928759997328 670
9846 1491928759998255 372
9848 1491928759999116 659
9849 1491928759999897 369
9850 1491928760000380 746
9851 1491928760001245 823
9852 1491928760002189 634
9853 1491928760002869 335
9856 1491928760003929 4162
9865 1491928760009368 531
use bins
s = pd.Series(np.abs(np.random.randn(100)) ** 3 * 2000)
s.hist(bins=20)
Or you can use pd.cut to produce your own custom bins.
pd.cut(
s, [-np.inf] + [100 * i for i in range(10)] + [np.inf]
).value_counts(sort=False).plot.bar()

Categories