How to properly index a NumPy array with stored images? - python

I am trying to understand the output of cv2.imread.
I have loaded PNG images into a NumPy array as grayscale. I think I have succeeded in doing so. I understand that imagesList[0] gives me a NumPy array of the first image, however I don't understand what the numbers within imagesList[0][1] correspond to. In that same vein of questioning, what would the numbers correspond to an (m, n, 3) image?
# cv2 is openCV, an image processing package
import numpy as np
import cv2
#start and ending slice
startSlice = 753
endSlice = 823
# each image array will be stored in the list.
# imagesList[0] = first slice, imagesList[-1] = last slice
imagesList = []
# reading the image as a greyscale array, and storing into imagesList
for i in range(startSlice,endSlice+1):
fileName = "/somefilelocation"+'{0:04}'.format(i)+'.png'
im = cv2.imread(fileName,0)
imagesList.append(im)
print(fileName)

In your for loop, you read in several images with cv2.imread, which all will be stored as NumPy arrays in your list imagesList:
First level of indexing in imagesList, e.g. imagesList[0] will
give you the corresponding (whole) image (NumPy array).
Second level of indexing, e.g. imagesList[0][1] will
give you the corresponding row in that image (NumPy array).
Third level of indexing, e.g. imagesList[0][1][2] will
give you the corresponding row and column (i.e. an actual pixel) in that image (NumPy array).
Fourth level of indexing, e.g. imagesList[0][1][2][0] will
give you the corresponding color value (blue, green or red) at the corresponding row and column (pixel) in that image (NumPy array). Attention: Fourth indices > 0 are only applicable to color images!
Let's have a small test:
import cv2
# Read image.
image = cv2.imread('ithMo.png')
# Store image as color and grayscale in list.
imagesList = [image, cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)]
# Access first image (color)
print('whole image (BGR values): \n\n', imagesList[0], '\n')
print('second row of image (BGR values): \n\n', imagesList[0][1], '\n')
print('second row, third column of image (BGR value): \n\n', imagesList[0][1][2], '\n')
print('second row, third column of image, first channel (B value): \n\n', imagesList[0][1][2][0], '\n')
# Access second image (gray)
print('whole image (gray values): \n\n', imagesList[1], '\n')
print('second row of image (gray values): \n\n', imagesList[1][1], '\n')
print('second row, third column of image (gray value): \n\n', imagesList[1][1][2], '\n')
This image
is stored as color and grayscale in imagesList, and the aforementioned levels of indexing are tested.
I tried to shorten the output as much as possible, and marked the important parts:
whole image (BGR values):
[[[239 124 63]
[239 123 65]
[240 124 64]
...
[[238 128 74] <<< Here begins the second row
[239 122 66]
[239 125 68]
...
[[244 200 173]
[239 134 86]
[240 132 80]
...
second row of image (BGR values):
[[238 128 74]
[239 122 66]
[239 125 68] <<< Here is the third column
...
second row, third column of image (BGR value):
[239 125 68]
second row, third column of image, first channel (B value):
239
whole image (gray values):
[[119 119 119 ... 231 230 228]
[124 119 121 ... 229 228 228] <<< Here begins the second row
[197 132 129 ... 227 228 230]
...
[ 49 56 54 ... 52 53 54]
[ 45 48 55 ... 51 54 50]
[ 57 56 55 ... 48 48 46]]
second row of image (gray values):
[124 119 121 126 143 119 120 123 133 128 122 117 117 115 116 120 157 171
162 178 173 177 173 137 144 158 124 116 117 117 116 123 131 132 122 127
141 136 127 126 130 148 168 162 163 137 132 124 118 121 120 121 120 118
119 119 121 121 125 125 127 129 127 128 130 132 132 130 129 129 135 132
134 135 135 133 136 143 149 129 131 132 132 135 138 139 139 140 148 154
157 185 211 222 224 223 221 215 209 208 212 221 230 235 237 238 235 230
211 166 159 164 169 173 179 186 190 197 211 217 211 212 211 212 217 215
209 201 202 194 193 188 184 183 185 188 182 183 173 167 159 152 147 142
139 138 137 137 132 125 123 124 124 122 121 122 126 130 132 135 141 140
145 148 151 148 149 151 160 159 158 154 157 160 158 166 166 165 162 156
165 178 180 169 172 169 173 193 188 181 172 165 164 177 139 157 176 177
157 131 132 131 145 146 140 127 132 150 195 205 224 239 242 243 242 242
242 242 241 243 243 241 244 244 244 245 246 245 245 245 245 245 244 242
242 243 243 243 243 244 241 241 242 238 226 216 229 234 238 235 239 240
240 240 240 240 242 242 242 243 242 242 242 241 242 241 241 240 240 241
241 240 241 242 242 242 242 242 242 242 240 240 239 238 236 235 234 234
234 235 236 236 235 233 231 232 230 229 228 228]
second row, third column of image (gray value):
121
Hope that helps!

Related

Python Print List Elements with Defined Range

I have a list that is a column of numbers in a df called "doylist" for day of year list. I need to figure out how to print a range of user-defined rows in ascending order from the doylist df. For example, let's say I need to print the last daysback=60 days in the list from today's day of year to daysforward = 19 days from today's day of year. So, if today's day of year is 47, then my new list would look like this ranging from day of year 352 to day of year 67.
day_of_year =
day_of_year = (today - datetime.datetime(today.year, 1, 1)).days + 1
doylist =
doylist
Out[106]:
dyofyr
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
21 22
22 23
23 24
24 25
25 26
26 27
27 28
28 29
29 30
30 31
31 32
32 33
33 34
34 35
35 36
36 37
37 38
38 39
39 40
40 41
41 42
42 43
43 44
44 45
45 46
46 47
47 48
48 49
49 50
50 51
51 52
52 53
53 54
54 55
55 56
56 57
57 58
58 59
59 60
60 61
61 62
62 63
63 64
64 65
65 66
66 67
67 68
68 69
69 70
70 71
71 72
72 73
73 74
74 75
75 76
76 77
77 78
78 79
79 80
80 81
81 82
82 83
83 84
84 85
85 86
86 87
87 88
88 89
89 90
90 91
91 92
92 93
93 94
94 95
95 96
96 97
97 98
98 99
99 100
100 101
101 102
102 103
103 104
104 105
105 106
106 107
107 108
108 109
109 110
110 111
111 112
112 113
113 114
114 115
115 116
116 117
117 118
118 119
119 120
120 121
121 122
122 123
123 124
124 125
125 126
126 127
127 128
128 129
129 130
130 131
131 132
132 133
133 134
134 135
135 136
136 137
137 138
138 139
139 140
140 141
141 142
142 143
143 144
144 145
145 146
146 147
147 148
148 149
149 150
150 151
151 152
152 153
153 154
154 155
155 156
156 157
157 158
158 159
159 160
160 161
161 162
162 163
163 164
164 165
165 166
166 167
167 168
168 169
169 170
170 171
171 172
172 173
173 174
174 175
175 176
176 177
177 178
178 179
179 180
180 181
181 182
182 183
183 184
184 185
185 186
186 187
187 188
188 189
189 190
190 191
191 192
192 193
193 194
194 195
195 196
196 197
197 198
198 199
199 200
200 201
201 202
202 203
203 204
204 205
205 206
206 207
207 208
208 209
209 210
210 211
211 212
212 213
213 214
214 215
215 216
216 217
217 218
218 219
219 220
220 221
221 222
222 223
223 224
224 225
225 226
226 227
227 228
228 229
229 230
230 231
231 232
232 233
233 234
234 235
235 236
236 237
237 238
238 239
239 240
240 241
241 242
242 243
243 244
244 245
245 246
246 247
247 248
248 249
249 250
250 251
251 252
252 253
253 254
254 255
255 256
256 257
257 258
258 259
259 260
260 261
261 262
262 263
263 264
264 265
265 266
266 267
267 268
268 269
269 270
270 271
271 272
272 273
273 274
274 275
275 276
276 277
277 278
278 279
279 280
280 281
281 282
282 283
283 284
284 285
285 286
286 287
287 288
288 289
289 290
290 291
291 292
292 293
293 294
294 295
295 296
296 297
297 298
298 299
299 300
300 301
301 302
302 303
303 304
304 305
305 306
306 307
307 308
308 309
309 310
310 311
311 312
312 313
313 314
314 315
315 316
316 317
317 318
318 319
319 320
320 321
321 322
322 323
323 324
324 325
325 326
326 327
327 328
328 329
329 330
330 331
331 332
332 333
333 334
334 335
335 336
336 337
337 338
338 339
339 340
340 341
341 342
342 343
343 344
344 345
345 346
346 347
347 348
348 349
349 350
350 351
351 352
352 353
353 354
354 355
355 356
356 357
357 358
358 359
359 360
360 361
361 362
362 363
363 364
364 365
daysback = doylist.iloc[day_of_year-61] # 60 days back from today
daysforward = doylist.iloc[day_of_year+19] # 20 days forward from today
I need my final df or list to look like this:
final_list =
352
353
354
355
356
357
358
359
360
361
362
363
364
365
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
I have tried variations of this but get the following error using this with a df called "doylist"-thank you!
finallist = list(range(doylist.iloc[day_of_year-61],doylist.iloc[day_of_year+19]))
Traceback (most recent call last):
Cell In[113], line 1
finallist = list(range(doylist.iloc[day_of_year-61],doylist.iloc[day_of_year+19]))
TypeError: 'Series' object cannot be interpreted as an integer
I can't understand why you are using a dataframe to do this. This could be done with a simple list and modulus.
def days_between_forward_back(day_of_year, days_since, days_forward):
doylist = [x + 1 for x in range(365)]
lower_index = (day_of_year - days_since - 1) % 365
upper_index = day_of_year + days_forward
assert upper_index < 365
if lower_index > upper_index:
result = doylist[lower_index:]
result.extend(doylist[:upper_index])
return result
else:
return doylist[lower_index:upper_index]
days = days_between_forward_back(47, 60, 20)
print(f"For day of year 47, 60 days before, 20 days ahead, days are {days}")
days = days_between_forward_back(300, 61, 10)
print(f"For day of year 300, 61 days before, 10 days ahead, days are {days}")
Handling the case where both days_since and days_forward will move us to another year is left as an exercise for the asker.
i think this will help you :
import datetime
this_date = datetime.datetime.now()
how_many_dayes_do_you_want_to_go_back = 80
how_many_dayes_in_each_munth = {1:31
,2:28
,3:31
,4:30
,5:31
,6:30
,7:31
,8:31
,9:30
,10:31
,11:30
,12:31}
dayes_in_this_year = 0
for i in range(1,this_date.month+1):
dayes_in_this_year += how_many_dayes_in_each_munth.get(i)
if how_many_dayes_do_you_want_to_go_back % dayes_in_this_year == how_many_dayes_do_you_want_to_go_back and how_many_dayes_do_you_want_to_go_back < dayes_in_this_year:
for i in range(dayes_in_this_year-how_many_dayes_do_you_want_to_go_back,dayes_in_this_year+1):
print(i)
else:
the_rest_to_the_last_year = how_many_dayes_do_you_want_to_go_back - dayes_in_this_year
for i in range(365-the_rest_to_the_last_year,366):
print(i)
for i in range(dayes_in_this_year+1):
print(i)
and yes , you know you can improve the code to use it anywhere
It seems like you're getting hung up while converting back and forth between data formats of int, datetime etc... This type of error is much easier to keep track of and fix if you utilize python's new-ish type hinting to make sure you're being careful with data types. To that end it is also useful to keep using datetime as much as possible to take better advantage of the library (so you don't have to keep track of things like leap years etc. on your own). I wrote a few functions to help you convert:
from datetime import datetime, timedelta
def dt_from_doy(year: int, doy: int) -> datetime:
#useful if you need to use doy from your dataframe to get datetime.
#if you can convert the input to be a datetime in the first place that
#might be even better (fewer conversions of data type)
return datetime.strptime("{:04d}-{:03d}".format(year, doy), "%Y-%j")
def doy_from_dt(dt: datetime) -> int:
#used in the example below
return int(dt.strftime("%j"))
#example
today = datetime(2023,2,16)
list_of_dt = [today + timedelta(days=x) for x in range(-20,20)]
list_of_doy = [doy_from_dt(dt) for dt in list_of_dt]

finding the size of an array in python

for i in set(data['MovementNumber'].values):
data2 = data.loc[data['MovementNumber'] == i]
x = len(data2)
print(x)
This for loop basically makes a cell array for the 83 trials I have. The issue im having is that when I use len for the cell array. i get the array with the length of the individual arrays. and i get this array:
'''
[957
280
305
217
204
321
228
291
341
245
381
177
284
410
226
279
270
241
302
235
260
218
207
202
261
370
288
210
243
199
282
184
223
191
205
228
244
213
164
230
226
245
197
303
187
239
242
267
250
221
271
218
208
225
248
334
215
462
321
319
346
231
223
293
261
274
181
304
329
295
311
298
303
227
218
245
235
182
215
242
174
261
460]
I want the length of this array but when i try to use len(x) i get the following error
object of type 'int' has no len()
can someone help me with this? Thanks!

Python tuples TypeError: 'bool' object is not subscriptable

I wrote the code below for a python image graph. I have a bug that I cannot resolve.
Can someone take a look for me? Thanks
Below is the error output i am getting from the console. I think it has something to do with tuples. The code in the main should not be touched.
routes is a tuple of boolean list of lists, and an elevation number in integer.
Traceback (most recent call last):
File "c:/Users/James/Desktop/mountain.py", line 91, in <module>
show_image(data, [min_route[0]])
File "c:/Users/James/Desktop/mountain.py", line 16, in show_image
if any(route[i][j] for route in routes):
File "c:/Users/James/Desktop/mountain.py", line 16, in <genexpr>
if any(route[i][j] for route in routes):
TypeError: 'bool' object is not subscriptable
mountain.py
import math, random
import numpy as np
import skimage.io as io
# Please do not alter this function.
def show_image(data, routes=[]):
'''
Given a list of lists of integers "data",
and an optional list of boolean list of lists "routes",
show the data as an image and overlay the routes on the image in red.
'''
image_data = [x[:] for x in data]
for i in range(len(image_data)):
for j in range(len(image_data[i])):
image_data[i][j] = [image_data[i][j]] * 3
if any(route[i][j] for route in routes):
image_data[i][j] = [255, 0, 0]
io.imshow(np.array(image_data, dtype=np.uint8))
io.show()
def load_dat_file(filename):
data = []
data = np.loadtxt(filename)
data_temp = []
try:
for item in data:
item = str(item).replace('.', '').replace('[', '').replace(']', '').split()
data_temp.append(item)
data = data_temp
except:
print("error")
return data
def find_elevation_route_for_starting_row(grid, starting_row):
boolean_list = []
elevation_change = 0
for index_grid in range(len(grid)):
boolean_route = []
for index_grid_item in range(len(grid[index_grid])):
boolean_grid_item = False
if (index_grid == starting_row):
boolean_grid_item = True
if (index_grid_item + 1 < len(grid[index_grid])):
absolute_value = abs(int(grid[index_grid][index_grid_item + 1]) - int(grid[index_grid][index_grid_item]))
elevation_change = elevation_change + absolute_value
boolean_route.append(boolean_grid_item)
boolean_list.append(boolean_route)
return (boolean_list, elevation_change)
def get_all_elevation_routes(grid):
boolean_list_grid = []
elevation_change_grid = []
for index_grid in range(len(grid)):
current_elevation_route = find_elevation_route_for_starting_row(grid, index_grid)
boolean_list_grid.append(current_elevation_route[0])
elevation_change_grid.append(current_elevation_route[1])
return tuple(boolean_list_grid), tuple(elevation_change_grid)
def get_min_elevation_route(routes):
routes_boolean_list = routes[0]
routes_elevation_change_list = routes[1]
lowest_route_index = 0
for index_routes_elevation_change_list in range(len(routes_elevation_change_list)):
if (routes_elevation_change_list[index_routes_elevation_change_list] < routes_elevation_change_list[lowest_route_index]):
lowest_route_index = index_routes_elevation_change_list
return tuple(routes_boolean_list[lowest_route_index])
# Please do not alter anything below this line.
if __name__ == '__main__':
data = load_dat_file("mountroyal.dat")
show_image(data, [])
routes = get_all_elevation_routes(data)
min_route = get_min_elevation_route(routes)
assert(isinstance(min_route, tuple))
show_image(data, [min_route[0]])
show_image(data, [route for route, change in routes])
This is a sample of the .dat file. it should be a lot longer but the post has limitation to characters allowed. full .dat file can be found here
mountroyal.dat
189 203 203 203 189 189 203 196 189 190 185 186 187 180 187 193 186 186 187 180 179 186 193 193 187 180 187 193 193 193 193 193 193 185 174 174 177 186 186 186 193 200 200 193 187 180 187 200 205 205 205 205 205 205 200 200 205 205 200 193 200 200 199 205 199 205 215 219 219 218 218 214 218 217 204 204 211 211 206 200 206 211 212 214 218 217 217 217 211 211 209 212 216 216 205 200 212 216 216 216 210 202 212 220 214 207 200 197 187 174 184 184 184 191 185 185 197 197 185 186 193 192 197 203 203 195 191 197 197 191 185 199 211 211 211 211 211 217 218 209 203 203 203 215 215 203 203 203 197 184 190 203 203 203 203 189 189 203 197 184 184 177 170 178 178 178 178 178 178 178 171 164 163 170 170 163 157 166 166 149 166 166 149
189 203 203 203 195 193 202 202 193 187 189 190 185 186 193 187 180 187 186 179 180 187 193 193 186 186 193 193 193 193 193 193 193 192 182 172 174 178 180 187 193 193 187 180 187 193 200 205 205 205 205 205 205 200 194 200 205 205 205 204 204 199 205 210 209 224 228 223 223 223 217 217 218 209 202 206 211 211 204 204 211 211 211 217 223 217 217 217 216 218 216 216 216 216 207 205 214 216 216 216 214 209 213 225 216 207 202 190 187 181 191 187 182 185 185 197 197 185 185 197 203 203 209 212 211 204 197 197 191 185 191 204 211 211 211 211 211 212 214 212 206 202 209 219 215 203 203 197 185 185 197 203 203 203 203 189 189 203 190 177 178 170 170 178 178 178 178 178 178 178 170 170 170 163 157 149 149 166 166 149 166 166 149
189 203 203 203 202 199 204 206 187 187 193 187 189 196 197 185 179 180 180 179 186 193 193 187 180 187 193 193 193 193 193 193 193 193 185 174 172 174 178 180 180 180 180 187 200 200 200 205 205 205 205 205 205 200 205 211 205 205 211 215 212 211 218 219 223 230 227 223 223 223 217 212 214 212 211 211 211 211 204 204 211 211 211 217 223 221 220 216 217 221 218 216 216 216 214 209 213 219 219 219 219 213 213 222 211 205 200 192 190 184 190 187 183 187 196 203 190 184 197 203 209 212 217 217 211 204 197 191 185 191 197 204 211 211 211 211 211 211 217 217 204 204 217 218 209 203 203 190 184 197 203 203 203 203 203 189 189 197 184 176 177 177 176 177 178 178 178 178 178 178 170 163 157 149 149 149 149 166 166 149 166 166 157
189 203 203 203 203 202 206 199 185 193 193 191 191 195 190 177 178 186 187 180 187 193 193 186 186 193 193 193 193 193 193 193 193 193 192 182 172 172 174 178 180 180 187 200 205 200 205 211 211 211 211 211 211 210 215 215 210 220 229 231 224 216 218 225 234 229 221 223 223 218 214 212 217 217 211 211 211 211 204 204 211 211 211 221 225 225 224 220 224 226 222 216 216 216 216 214 216 219 219 219 223 214 210 219 213 202 197 202 192 184 192 192 192 194 202 204 193 199 211 211 217 217 212 214 212 209 200 187 195 200 200 212 216 216 216 216 216 212 211 209 202 206 212 209 203 203 197 184 190 203 203 203 203 203 203 189 189 190 177 177 176 177 170 170 178 178 178 178 178 171 164 163 162 162 162 155 148 166 166 157 172 165 165
190 203 203 203 203 203 203 195 191 197 197 191 185 193 187 174 184 191 185 186 193 193 187 180 187 193 193 193 193 193 193 193 193 193 193 185 174 179 183 177 184 199 210 215 215 215 224 223 223 223 218 215 215 220 219 219 223 234 237 237 228 216 220 229 236 228 218 225 225 216 221 224 218 216 216 218 216 212 207 212 220 222 223 231 231 231 231 229 223 223 225 219 219 219 219 219 219 219 219 219 228 221 212 218 208 196 199 206 196 190 194 198 199 196 199 201 203 213 219 219 222 216 209 221 224 216 202 192 209 206 206 223 223 221 217 213 214 205 202 211 206 202 203 203 203 197 185 185 197 203 203 203 203 203 203 189 189 189 174 177 177 176 169 170 171 172 179 178 178 170 170 170 162 162 162 146 147 172 165 165 172 164 172
197 203 203 203 203 203 203 202 193 193 193 179 187 193 179 179 193 195 189 196 203 197 185 186 193 193 193 193 193 193 193 193 193 193 193 192 182 193 205 187 203 229 226 224 224 225 234 231 229 226 222 222 224 230 229 231 234 236 237 237 228 216 226 232 232 229 225 231 228 223 231 228 219 219 225 225 219 213 213 223 231 231 229 228 228 227 226 226 214 219 228 223 221 221 221 221 221 221 221 221 229 229 221 212 202 199 195 200 199 192 196 208 209 200 196 196 200 212 218 218 217 210 213 227 226 216 207 206 221 215 212 225 225 220 212 199 207 206 206 216 202 202 211 206 202 196 189 196 203 203 203 203 203 203 197 184 184 184 174 174 177 170 162 163 164 172 180 179 178 170 163 157 149 149 149 140 155 171 164 172 165 165 172 ```
If you try printing route, you will either get True, or False, which are both bool type, and are unsubscriptable.

Probabilistic neural network

i am implementing probabilistic neural network on my dataset and below it my code which tested on iris dataset and there is no error but when i applied to my dataset i got the following error:
KeyError Traceback (most recent call last)
<ipython-input-30-230e6aa7ae95> in <module>()
13 for i, (train, test) in enumerate(skfold, start=1):
14 pnn_network = PNN(std=std, step=0.2, verbose=False, batch_size=2)
---> 15 pnn_network.train(input_dataset_data[train], input_dataset_target[train])
16 predictions = pnn_network.predict(input_dataset_data[test])
17 print("Positive in predictions:", 1 in predictions)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2677 if isinstance(key, (Series, np.ndarray, Index, list)):
2678 # either boolean or fancy integer index
-> 2679 return self._getitem_array(key)
2680 elif isinstance(key, DataFrame):
2681 return self._getitem_frame(key)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
2721 return self._take(indexer, axis=0)
2722 else:
-> 2723 indexer = self.loc._convert_to_indexer(key, axis=1)
2724 return self._take(indexer, axis=1)
2725
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
1325 if mask.any():
1326 raise KeyError('{mask} not in index'
-> 1327 .format(mask=objarr[mask]))
1328
1329 return com._values_from_object(indexer)
KeyError: '[ 0 1 2 4 5 6 7 8 9 10 11 12 15 16 17 18 19 20\n 21 22 23 25 26 27 28 29 30 31 32 33 34 35 36 38 39 40\n 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58\n 59 60 61 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77\n 78 79 80 82 83 84 85 86 87 88 90 92 93 94 95 96 97 98\n 99 100 101 102 104 105 106 108 109 110 112 114 115 116 117 118 119 120\n 121 122 123 125 126 127 128 131 132 133 134 136 137 138 139 140 141 142\n 143 144 145 146 147 148 149 151 153 154 155 156 157 159 160 161 162 163\n 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 181 182 183\n 185 186 187 188 189 190 192 193 194 195 196 197 198 199 200 201 202 204\n 205 206 207 208 209 211 212 213 214 215 216 217 218 219 220 221 222 223\n 224 225 226 227 228 229 230 231 232 233 234 236 237 238 239 240 241 242\n 243 244 245 246 247 248 249 250 251 252 253 255 257 258 259 260 261 262\n 263 264 265 267 269 270 271 272 273 274 275 276 277 278 279 280 281 282\n 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 300 301\n 302 303 304 305 306 307 308 309 310 311 312 313 314 315 317 318 320 321\n 322 323 324 325 326 327] not in index'
The code on iris example is below:
from sklearn import datasets
iris=datasets.load_iris()
input_dataset_data = iris.data
input_dataset_target = iris.target
print(input_dataset_data.shape)
print(input_dataset_target.shape)
kfold_number = 10
skfold = StratifiedKFold(input_dataset_target, kfold_number, shuffle=True)
#print("> Start classify input_dataset dataset")
for std in [0.2, 0.4, 0.6, 0.8, 1]:
average_results = []
for i, (train, test) in enumerate(skfold, start=1):
pnn_network = PNN(std=std, step=0.2, verbose=False, batch_size=2)
pnn_network.train(input_dataset_data[train], input_dataset_target[train])
predictions = pnn_network.predict(input_dataset_data[test])
print("Positive in predictions:", 1 in predictions)
average_results.append(np.sum(predictions == input_dataset_target[test]) /float(len(predictions)))
print(std, np.average(average_results))
Below shapes of mydataset
X.shape
(328, 13)
Y.shape
Y.shape
(328,)
You need to access the dataframe by index:
pnn_network.train(input_dataset_data.iloc[train], input_dataset_target.iloc[train])

ValueError: Data must be positive (boxcox scipy)

I'm trying to transform my dataset to a normal distribution.
0 8.298511e-03
1 3.055319e-01
2 6.938647e-02
3 2.904091e-02
4 7.422441e-02
5 6.074046e-02
6 9.265747e-04
7 7.521846e-02
8 5.960521e-02
9 7.405019e-04
10 3.086551e-02
11 5.444835e-02
12 2.259236e-02
13 4.691038e-02
14 6.463911e-02
15 2.172805e-02
16 8.210005e-02
17 2.301189e-02
18 4.073898e-07
19 4.639910e-02
20 1.662777e-02
21 8.662539e-02
22 4.436425e-02
23 4.557591e-02
24 3.499897e-02
25 2.788340e-02
26 1.707958e-02
27 1.506404e-02
28 3.207647e-02
29 2.147011e-03
30 2.972746e-02
31 1.028140e-01
32 2.183737e-02
33 9.063370e-03
34 3.070437e-02
35 1.477440e-02
36 1.036309e-02
37 2.000609e-01
38 3.366233e-02
39 1.479767e-03
40 1.137169e-02
41 1.957088e-02
42 4.921303e-03
43 4.279257e-02
44 4.363429e-02
45 1.040123e-01
46 2.930958e-02
47 1.935434e-03
48 1.954418e-02
49 2.980253e-02
50 3.643772e-02
51 3.411437e-02
52 4.976063e-02
53 3.704608e-02
54 7.044161e-02
55 8.101365e-03
56 9.310477e-03
57 7.626637e-02
58 8.149728e-03
59 4.157399e-01
60 8.200258e-02
61 2.844295e-02
62 1.046601e-01
63 6.565680e-02
64 9.825436e-04
65 9.353639e-02
66 6.535298e-02
67 6.979044e-04
68 2.772859e-02
69 4.378422e-02
70 2.020185e-02
71 4.774493e-02
72 6.346146e-02
73 2.466264e-02
74 6.636585e-02
75 2.548934e-02
76 1.113937e-06
77 5.723409e-02
78 1.533288e-02
79 1.027341e-01
80 4.294570e-02
81 4.844853e-02
82 5.579620e-02
83 2.531824e-02
84 1.661426e-02
85 1.430836e-02
86 3.157232e-02
87 2.241722e-03
88 2.946256e-02
89 1.038383e-01
90 1.868837e-02
91 8.854596e-03
92 2.391759e-02
93 1.612714e-02
94 1.007823e-02
95 1.975513e-01
96 3.581289e-02
97 1.199747e-03
98 1.263381e-02
99 1.966746e-02
100 4.040786e-03
101 4.497264e-02
102 4.030524e-02
103 8.627087e-02
104 3.248317e-02
105 5.727582e-03
106 1.781355e-02
107 2.377991e-02
108 4.299568e-02
109 3.664353e-02
110 5.167902e-02
111 4.006848e-02
112 7.072990e-02
113 6.744938e-03
114 1.064900e-02
115 9.823497e-02
116 8.992714e-03
117 1.792453e-01
118 6.817763e-02
119 2.588843e-02
120 1.048027e-01
121 6.468491e-02
122 1.035536e-03
123 8.800684e-02
124 5.975065e-02
125 7.365861e-04
126 4.209485e-02
127 4.232421e-02
128 2.371866e-02
129 5.894714e-02
130 7.177195e-02
131 2.116566e-02
132 7.579219e-02
133 3.174744e-02
134 0.000000e+00
135 5.786439e-02
136 1.458493e-02
137 9.820156e-02
138 4.373873e-02
139 4.271649e-02
140 5.532575e-02
141 2.311324e-02
142 1.644508e-02
143 1.328273e-02
144 3.908473e-02
145 2.355468e-03
146 2.519321e-02
147 1.131868e-01
148 1.708967e-02
149 1.027661e-02
150 2.439899e-02
151 1.604058e-02
152 1.134323e-02
153 2.247722e-01
154 3.408590e-02
155 2.222239e-03
156 1.659830e-02
157 2.284733e-02
158 4.618550e-03
159 3.674162e-02
160 4.131283e-02
161 8.846273e-02
162 2.504404e-02
163 6.004396e-03
164 1.986309e-02
165 2.347111e-02
166 3.865636e-02
167 3.672307e-02
168 6.658419e-02
169 3.726879e-02
170 7.600138e-02
171 7.184871e-03
172 1.142840e-02
173 9.741311e-02
174 8.165448e-03
175 1.529210e-01
176 6.648081e-02
177 2.617601e-02
178 9.547816e-02
179 6.857775e-02
180 8.129399e-04
181 7.107914e-02
182 5.884794e-02
183 8.398721e-04
184 6.972981e-02
185 4.461767e-02
186 2.264404e-02
187 5.566633e-02
188 6.595136e-02
189 2.301914e-02
190 7.488919e-02
191 3.108619e-02
192 4.989364e-07
193 4.834949e-02
194 1.422578e-02
195 9.398186e-02
196 4.870391e-02
197 3.841369e-02
198 6.406801e-02
199 2.603315e-02
200 1.692629e-02
201 1.409982e-02
202 4.099215e-02
203 2.093724e-03
204 2.640732e-02
205 1.032129e-01
206 1.581881e-02
207 8.977325e-03
208 1.941141e-02
209 1.502126e-02
210 9.923589e-03
211 2.757357e-01
212 3.096234e-02
213 4.388900e-03
214 1.784778e-02
215 2.179550e-02
216 3.944159e-03
217 3.703552e-02
218 4.033897e-02
219 1.157076e-01
220 2.400446e-02
221 5.761179e-03
222 1.899621e-02
223 2.401468e-02
224 4.458745e-02
225 3.357898e-02
226 5.331003e-02
227 3.488753e-02
228 7.466599e-02
229 6.075236e-03
230 9.815318e-03
231 9.598735e-02
232 7.103607e-03
233 1.100602e-01
234 5.677641e-02
235 2.420500e-02
236 9.213369e-02
237 4.024043e-02
238 6.987694e-04
239 8.612055e-02
240 5.663353e-02
241 4.871693e-04
242 4.533811e-02
243 3.593244e-02
244 1.982537e-02
245 5.490786e-02
246 5.603109e-02
247 1.671653e-02
248 6.522711e-02
249 3.341356e-02
250 2.378629e-06
251 4.299939e-02
252 1.223163e-02
253 8.392798e-02
254 4.272826e-02
255 3.183946e-02
256 4.431299e-02
257 2.661024e-02
258 1.686707e-02
259 4.070924e-03
260 3.325947e-02
261 2.023611e-03
262 2.402284e-02
263 8.369778e-02
264 1.375093e-02
265 8.899898e-03
266 2.148740e-02
267 1.301483e-02
268 8.355791e-03
269 2.549934e-01
270 2.792516e-02
271 4.652563e-03
272 1.556313e-02
273 1.936942e-02
274 3.547794e-03
275 3.412516e-02
276 3.932606e-02
277 5.305868e-02
278 2.354438e-02
279 5.379380e-03
280 1.904203e-02
281 2.045495e-02
282 3.275855e-02
283 3.007389e-02
284 8.227664e-02
285 2.479949e-02
286 6.573835e-02
287 5.165842e-03
288 7.599650e-03
289 9.613557e-02
290 6.690175e-03
291 1.779880e-01
292 5.076263e-02
293 3.117607e-02
294 7.495692e-02
295 3.707768e-02
296 7.086975e-04
297 8.935981e-02
298 5.624249e-02
299 7.105331e-04
300 3.339868e-02
301 3.354603e-02
302 2.041988e-02
303 3.862522e-02
304 5.977081e-02
305 1.730081e-02
306 6.909621e-02
307 3.729478e-02
308 3.940647e-07
309 4.385336e-02
310 1.391891e-02
311 8.898305e-02
312 3.840141e-02
313 3.214408e-02
314 4.284080e-02
315 1.841022e-02
316 1.528207e-02
317 3.106559e-03
318 3.945481e-02
319 2.085094e-03
320 2.464190e-02
321 7.844914e-02
322 1.526590e-02
323 9.922147e-03
324 1.649218e-02
325 1.341602e-02
326 8.124446e-03
327 2.867380e-01
328 2.663867e-02
329 5.342012e-03
330 1.752612e-02
331 2.010863e-02
332 3.581845e-03
333 3.652284e-02
334 4.484362e-02
335 4.600939e-02
336 2.213280e-02
337 5.494917e-03
338 2.016594e-02
339 2.118010e-02
340 2.964000e-02
341 3.405549e-02
342 1.014185e-01
343 2.451624e-02
344 7.966998e-02
345 5.301538e-03
346 8.198895e-03
347 8.789368e-02
348 7.222417e-03
349 1.448276e-01
350 5.676056e-02
351 2.987054e-02
352 6.851434e-02
353 4.193034e-02
354 7.025054e-03
355 8.557358e-02
356 5.812736e-02
357 2.263676e-02
358 2.922588e-02
359 3.363161e-02
360 1.495056e-02
361 5.871619e-02
362 6.235094e-02
363 1.691340e-02
364 5.361939e-02
365 3.722318e-02
366 9.828477e-03
367 4.155345e-02
368 1.327760e-02
369 7.205372e-02
370 4.151130e-02
371 3.265365e-02
372 2.879418e-02
373 2.314340e-02
374 1.653692e-02
375 1.077611e-02
376 3.481427e-02
377 1.815487e-03
378 2.232305e-02
379 1.005192e-01
380 1.491262e-02
381 3.752658e-02
382 1.271613e-02
383 1.223707e-02
384 8.088923e-03
385 2.572550e-01
386 2.300194e-02
387 2.847960e-02
388 1.782098e-02
389 1.900759e-02
390 3.647629e-03
391 3.723368e-02
392 4.079514e-02
393 5.510332e-02
394 3.072313e-02
395 4.183566e-03
396 1.891549e-02
397 1.870293e-02
398 3.182769e-02
399 4.167840e-02
400 1.343152e-01
401 2.451973e-02
402 7.567017e-02
403 4.837843e-03
404 6.477297e-03
405 7.664675e-02
Name: value, dtype: float64
This is the code I used for transforming dataset:
from scipy import stats
x,_ = stats.boxcox(df)
I get this error:
if any(x <= 0):
-> 1031 raise ValueError("Data must be positive.")
1032
1033 if lmbda is not None: # single transformation
ValueError: Data must be positive
Is it because my values are too small that it's producing an error? Not sure what I'm doing wrong. New to using boxcox, could be using it incorrectly in this example. Open to suggestions and alternatives. Thanks!
Your data contains the value 0 (at index 134). When boxcox says the data must be positive, it means strictly positive.
What is the meaning of your data? Does 0 make sense? Is that 0 actually a very small number that was rounded down to 0?
You could simply discard that 0. Alternatively, you could do something like the following. (This amounts to temporarily discarding the 0, and then using -1/λ for the transformed value of 0, where λ is the Box-Cox transformation parameter.)
First, create some data that contains one 0 (all other values are positive):
In [13]: np.random.seed(8675309)
In [14]: data = np.random.gamma(1, 1, size=405)
In [15]: data[100] = 0
(In your code, you would replace that with, say, data = df.values.)
Copy the strictly positive data to posdata:
In [16]: posdata = data[data > 0]
Find the optimal Box-Cox transformation, and verify that λ is positive. This work-around doesn't work if λ ≤ 0.
In [17]: bcdata, lam = boxcox(posdata)
In [18]: lam
Out[18]: 0.244049919975582
Make a new array to hold that result, along with the limiting value of the transform of 0 (which is -1/λ):
In [19]: x = np.empty_like(data)
In [20]: x[data > 0] = bcdata
In [21]: x[data == 0] = -1/lam
The following plot shows the histograms of data and x.
Rather than normal boxcox, you can use boxcox1p. It adds 1 to x so there won't be any "0" record
from scipy.special import boxcox1p
scipy.special.boxcox1p(x, lmbda)
For more info check out the docs at https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.boxcox1p.html
Is your data that you are sending to boxcox 1-dimensional ndarray?
Second way could be adding shift parameter by summing shift (see details from the link) to all of the ndarray elements before sending it to boxcox and subtracting shift from the resulting array elements (if I have understood boxcox algorithm correctly, that could be solution in your case, too).
https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.stats.boxcox.html

Categories