Python Print List Elements with Defined Range - python

I have a list that is a column of numbers in a df called "doylist" for day of year list. I need to figure out how to print a range of user-defined rows in ascending order from the doylist df. For example, let's say I need to print the last daysback=60 days in the list from today's day of year to daysforward = 19 days from today's day of year. So, if today's day of year is 47, then my new list would look like this ranging from day of year 352 to day of year 67.
day_of_year =
day_of_year = (today - datetime.datetime(today.year, 1, 1)).days + 1
doylist =
doylist
Out[106]:
dyofyr
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
21 22
22 23
23 24
24 25
25 26
26 27
27 28
28 29
29 30
30 31
31 32
32 33
33 34
34 35
35 36
36 37
37 38
38 39
39 40
40 41
41 42
42 43
43 44
44 45
45 46
46 47
47 48
48 49
49 50
50 51
51 52
52 53
53 54
54 55
55 56
56 57
57 58
58 59
59 60
60 61
61 62
62 63
63 64
64 65
65 66
66 67
67 68
68 69
69 70
70 71
71 72
72 73
73 74
74 75
75 76
76 77
77 78
78 79
79 80
80 81
81 82
82 83
83 84
84 85
85 86
86 87
87 88
88 89
89 90
90 91
91 92
92 93
93 94
94 95
95 96
96 97
97 98
98 99
99 100
100 101
101 102
102 103
103 104
104 105
105 106
106 107
107 108
108 109
109 110
110 111
111 112
112 113
113 114
114 115
115 116
116 117
117 118
118 119
119 120
120 121
121 122
122 123
123 124
124 125
125 126
126 127
127 128
128 129
129 130
130 131
131 132
132 133
133 134
134 135
135 136
136 137
137 138
138 139
139 140
140 141
141 142
142 143
143 144
144 145
145 146
146 147
147 148
148 149
149 150
150 151
151 152
152 153
153 154
154 155
155 156
156 157
157 158
158 159
159 160
160 161
161 162
162 163
163 164
164 165
165 166
166 167
167 168
168 169
169 170
170 171
171 172
172 173
173 174
174 175
175 176
176 177
177 178
178 179
179 180
180 181
181 182
182 183
183 184
184 185
185 186
186 187
187 188
188 189
189 190
190 191
191 192
192 193
193 194
194 195
195 196
196 197
197 198
198 199
199 200
200 201
201 202
202 203
203 204
204 205
205 206
206 207
207 208
208 209
209 210
210 211
211 212
212 213
213 214
214 215
215 216
216 217
217 218
218 219
219 220
220 221
221 222
222 223
223 224
224 225
225 226
226 227
227 228
228 229
229 230
230 231
231 232
232 233
233 234
234 235
235 236
236 237
237 238
238 239
239 240
240 241
241 242
242 243
243 244
244 245
245 246
246 247
247 248
248 249
249 250
250 251
251 252
252 253
253 254
254 255
255 256
256 257
257 258
258 259
259 260
260 261
261 262
262 263
263 264
264 265
265 266
266 267
267 268
268 269
269 270
270 271
271 272
272 273
273 274
274 275
275 276
276 277
277 278
278 279
279 280
280 281
281 282
282 283
283 284
284 285
285 286
286 287
287 288
288 289
289 290
290 291
291 292
292 293
293 294
294 295
295 296
296 297
297 298
298 299
299 300
300 301
301 302
302 303
303 304
304 305
305 306
306 307
307 308
308 309
309 310
310 311
311 312
312 313
313 314
314 315
315 316
316 317
317 318
318 319
319 320
320 321
321 322
322 323
323 324
324 325
325 326
326 327
327 328
328 329
329 330
330 331
331 332
332 333
333 334
334 335
335 336
336 337
337 338
338 339
339 340
340 341
341 342
342 343
343 344
344 345
345 346
346 347
347 348
348 349
349 350
350 351
351 352
352 353
353 354
354 355
355 356
356 357
357 358
358 359
359 360
360 361
361 362
362 363
363 364
364 365
daysback = doylist.iloc[day_of_year-61] # 60 days back from today
daysforward = doylist.iloc[day_of_year+19] # 20 days forward from today
I need my final df or list to look like this:
final_list =
352
353
354
355
356
357
358
359
360
361
362
363
364
365
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
I have tried variations of this but get the following error using this with a df called "doylist"-thank you!
finallist = list(range(doylist.iloc[day_of_year-61],doylist.iloc[day_of_year+19]))
Traceback (most recent call last):
Cell In[113], line 1
finallist = list(range(doylist.iloc[day_of_year-61],doylist.iloc[day_of_year+19]))
TypeError: 'Series' object cannot be interpreted as an integer

I can't understand why you are using a dataframe to do this. This could be done with a simple list and modulus.
def days_between_forward_back(day_of_year, days_since, days_forward):
doylist = [x + 1 for x in range(365)]
lower_index = (day_of_year - days_since - 1) % 365
upper_index = day_of_year + days_forward
assert upper_index < 365
if lower_index > upper_index:
result = doylist[lower_index:]
result.extend(doylist[:upper_index])
return result
else:
return doylist[lower_index:upper_index]
days = days_between_forward_back(47, 60, 20)
print(f"For day of year 47, 60 days before, 20 days ahead, days are {days}")
days = days_between_forward_back(300, 61, 10)
print(f"For day of year 300, 61 days before, 10 days ahead, days are {days}")
Handling the case where both days_since and days_forward will move us to another year is left as an exercise for the asker.

i think this will help you :
import datetime
this_date = datetime.datetime.now()
how_many_dayes_do_you_want_to_go_back = 80
how_many_dayes_in_each_munth = {1:31
,2:28
,3:31
,4:30
,5:31
,6:30
,7:31
,8:31
,9:30
,10:31
,11:30
,12:31}
dayes_in_this_year = 0
for i in range(1,this_date.month+1):
dayes_in_this_year += how_many_dayes_in_each_munth.get(i)
if how_many_dayes_do_you_want_to_go_back % dayes_in_this_year == how_many_dayes_do_you_want_to_go_back and how_many_dayes_do_you_want_to_go_back < dayes_in_this_year:
for i in range(dayes_in_this_year-how_many_dayes_do_you_want_to_go_back,dayes_in_this_year+1):
print(i)
else:
the_rest_to_the_last_year = how_many_dayes_do_you_want_to_go_back - dayes_in_this_year
for i in range(365-the_rest_to_the_last_year,366):
print(i)
for i in range(dayes_in_this_year+1):
print(i)
and yes , you know you can improve the code to use it anywhere

It seems like you're getting hung up while converting back and forth between data formats of int, datetime etc... This type of error is much easier to keep track of and fix if you utilize python's new-ish type hinting to make sure you're being careful with data types. To that end it is also useful to keep using datetime as much as possible to take better advantage of the library (so you don't have to keep track of things like leap years etc. on your own). I wrote a few functions to help you convert:
from datetime import datetime, timedelta
def dt_from_doy(year: int, doy: int) -> datetime:
#useful if you need to use doy from your dataframe to get datetime.
#if you can convert the input to be a datetime in the first place that
#might be even better (fewer conversions of data type)
return datetime.strptime("{:04d}-{:03d}".format(year, doy), "%Y-%j")
def doy_from_dt(dt: datetime) -> int:
#used in the example below
return int(dt.strftime("%j"))
#example
today = datetime(2023,2,16)
list_of_dt = [today + timedelta(days=x) for x in range(-20,20)]
list_of_doy = [doy_from_dt(dt) for dt in list_of_dt]

Related

How to turn a dictionary into a dataframe with all the keys in a column

def weights():
saved = {}
for i in range(len(bread_pairs["key_id"])):
drawing = np.array(bread_pairs['bitmap'][i], dtype=np.uint8)
new_test_cnn = drawing.reshape(1, 28, 28, 1).astype('float32')
new_cnn_predict = model.predict(new_test_cnn, batch_size=32, verbose=0)
w = model.layers[8].get_weights()
w = list(w[0].flatten())
saved[bread_pairs["key_id"][i]] = w
return saved
I have this function that is creating a dictionary of key_ids and mapping them to an associated list of values of length 200. So for example my dictionary looks something like saved = {key_id_1: [1,2,3...200], key_id_2: [1,2,...,200], ....}
I would like to turn this dictionary into a dataframe with a column of key_ids and each element in the associated list of 200 becomes its own column. So there is a total of 201 columns where the first column is the first key_id and then the second column is the first element of the list, the third column is the second element of the list etc. And then the second row first column is the second key_id and then the second row second column is the first element of the key_id's second list and so on. Is there a way to convert this dictionary to a df? I have 10000 key_ids do the dimensions would be 10000x201. Thanks!
Load the dict into a DataFrame using pandas.DataFrame.from_dict with the orient parameter, and reset the index with .reset_index()
This will create the DataFrame as requested, however, I recommend leaving the keys as the index, which should make it easier to perform calculations and address specific rows.
If the columns should be named 0...201, then use df.columns = list(range(202)), or use pandas.DataFrame.rename to rename specific columns.
import pandas as pd
# test data
saved = {'key_id_1': list(range(201)), 'key_id_2': list(range(201))}
# create the DataFrame
df = pd.DataFrame.from_dict(saved, orient='index')
# reset the index
df = df.reset_index()
# display(df)
index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
0 key_id_1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
1 key_id_2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
Alternative Implementation
Create the DataFrame with pandas.DataFrame, transpose the DataFrame with pandas.DataFrame.T, and then reset with .reset_index().
df = pd.DataFrame(saved)
df = df.T.reset_index()

TypeError: only integer scalar arrays can be converted to a scalar index , while trying kfold cv

Trying to perform Kfold cv on a dataset containing 279 files , the files are of shape ( 279 , 5 , 90) after performing a k-means. I reshaped it in order to fit it on a svm. Now the shape is ( 279, 5*90 ). Trying the Kfold cv approach gives me the error
"TypeError: only integer scalar arrays can be converted to a scalar
index "
#input
with open("dataset.pkl", "rb") as file:
dataset = pkl.load(file)
print(len(dataset))
x = [i[0] for i in dataset] #k-means cc
y = [i[1] for i in dataset] #label for the data
X = np.reshape(x,[279,5*90])
#cv
from sklearn.model_selection import KFold
kf = KFold(n_splits=5,random_state=42)
kf.get_n_splits(X)
for train_index, test_index in kf.split(X):
print("TRAIN:", train_index,"\n TEST:", test_index)
X_train, X_test, y_train, y_test = X[train_index], X[test_index],
y[train_index], y[test_index] #this is where i'm getting the error.
out
TRAIN: [ 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271
272 273 274 275 276 277 278]
TEST: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55]
----------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-923-a534f873feb4> in <module>
2 for train_index, test_index in kf.split(X):
3 print("TRAIN:", train_index,"\n TEST:", test_index)
----> 4 X_train, X_test, y_train, y_test = X[train_index], X[test_index], y[train_index], y[test_index]
TypeError: only integer scalar arrays can be converted to a scalar index
y which is an list cannot be indexed like numpy arrays.
Example:
y = [1,2,3,4,6]
idx = np.array([0,1])
print (y[idx]) # This will throw an error as list cannot be index this way
print (np.array(y)[idx]) # This is fine because it is a numpy array now
Solution
If y is a flat list then convert it into a numpy first
y = np.array([i[1] for i in dataset]) #label for the data
If y is a nested list then
y = np.array([np.array(i[1]) for i in dataset]) #label for the data

Probabilistic neural network

i am implementing probabilistic neural network on my dataset and below it my code which tested on iris dataset and there is no error but when i applied to my dataset i got the following error:
KeyError Traceback (most recent call last)
<ipython-input-30-230e6aa7ae95> in <module>()
13 for i, (train, test) in enumerate(skfold, start=1):
14 pnn_network = PNN(std=std, step=0.2, verbose=False, batch_size=2)
---> 15 pnn_network.train(input_dataset_data[train], input_dataset_target[train])
16 predictions = pnn_network.predict(input_dataset_data[test])
17 print("Positive in predictions:", 1 in predictions)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2677 if isinstance(key, (Series, np.ndarray, Index, list)):
2678 # either boolean or fancy integer index
-> 2679 return self._getitem_array(key)
2680 elif isinstance(key, DataFrame):
2681 return self._getitem_frame(key)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
2721 return self._take(indexer, axis=0)
2722 else:
-> 2723 indexer = self.loc._convert_to_indexer(key, axis=1)
2724 return self._take(indexer, axis=1)
2725
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
1325 if mask.any():
1326 raise KeyError('{mask} not in index'
-> 1327 .format(mask=objarr[mask]))
1328
1329 return com._values_from_object(indexer)
KeyError: '[ 0 1 2 4 5 6 7 8 9 10 11 12 15 16 17 18 19 20\n 21 22 23 25 26 27 28 29 30 31 32 33 34 35 36 38 39 40\n 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58\n 59 60 61 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77\n 78 79 80 82 83 84 85 86 87 88 90 92 93 94 95 96 97 98\n 99 100 101 102 104 105 106 108 109 110 112 114 115 116 117 118 119 120\n 121 122 123 125 126 127 128 131 132 133 134 136 137 138 139 140 141 142\n 143 144 145 146 147 148 149 151 153 154 155 156 157 159 160 161 162 163\n 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 181 182 183\n 185 186 187 188 189 190 192 193 194 195 196 197 198 199 200 201 202 204\n 205 206 207 208 209 211 212 213 214 215 216 217 218 219 220 221 222 223\n 224 225 226 227 228 229 230 231 232 233 234 236 237 238 239 240 241 242\n 243 244 245 246 247 248 249 250 251 252 253 255 257 258 259 260 261 262\n 263 264 265 267 269 270 271 272 273 274 275 276 277 278 279 280 281 282\n 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 300 301\n 302 303 304 305 306 307 308 309 310 311 312 313 314 315 317 318 320 321\n 322 323 324 325 326 327] not in index'
The code on iris example is below:
from sklearn import datasets
iris=datasets.load_iris()
input_dataset_data = iris.data
input_dataset_target = iris.target
print(input_dataset_data.shape)
print(input_dataset_target.shape)
kfold_number = 10
skfold = StratifiedKFold(input_dataset_target, kfold_number, shuffle=True)
#print("> Start classify input_dataset dataset")
for std in [0.2, 0.4, 0.6, 0.8, 1]:
average_results = []
for i, (train, test) in enumerate(skfold, start=1):
pnn_network = PNN(std=std, step=0.2, verbose=False, batch_size=2)
pnn_network.train(input_dataset_data[train], input_dataset_target[train])
predictions = pnn_network.predict(input_dataset_data[test])
print("Positive in predictions:", 1 in predictions)
average_results.append(np.sum(predictions == input_dataset_target[test]) /float(len(predictions)))
print(std, np.average(average_results))
Below shapes of mydataset
X.shape
(328, 13)
Y.shape
Y.shape
(328,)
You need to access the dataframe by index:
pnn_network.train(input_dataset_data.iloc[train], input_dataset_target.iloc[train])

ValueError: Data must be positive (boxcox scipy)

I'm trying to transform my dataset to a normal distribution.
0 8.298511e-03
1 3.055319e-01
2 6.938647e-02
3 2.904091e-02
4 7.422441e-02
5 6.074046e-02
6 9.265747e-04
7 7.521846e-02
8 5.960521e-02
9 7.405019e-04
10 3.086551e-02
11 5.444835e-02
12 2.259236e-02
13 4.691038e-02
14 6.463911e-02
15 2.172805e-02
16 8.210005e-02
17 2.301189e-02
18 4.073898e-07
19 4.639910e-02
20 1.662777e-02
21 8.662539e-02
22 4.436425e-02
23 4.557591e-02
24 3.499897e-02
25 2.788340e-02
26 1.707958e-02
27 1.506404e-02
28 3.207647e-02
29 2.147011e-03
30 2.972746e-02
31 1.028140e-01
32 2.183737e-02
33 9.063370e-03
34 3.070437e-02
35 1.477440e-02
36 1.036309e-02
37 2.000609e-01
38 3.366233e-02
39 1.479767e-03
40 1.137169e-02
41 1.957088e-02
42 4.921303e-03
43 4.279257e-02
44 4.363429e-02
45 1.040123e-01
46 2.930958e-02
47 1.935434e-03
48 1.954418e-02
49 2.980253e-02
50 3.643772e-02
51 3.411437e-02
52 4.976063e-02
53 3.704608e-02
54 7.044161e-02
55 8.101365e-03
56 9.310477e-03
57 7.626637e-02
58 8.149728e-03
59 4.157399e-01
60 8.200258e-02
61 2.844295e-02
62 1.046601e-01
63 6.565680e-02
64 9.825436e-04
65 9.353639e-02
66 6.535298e-02
67 6.979044e-04
68 2.772859e-02
69 4.378422e-02
70 2.020185e-02
71 4.774493e-02
72 6.346146e-02
73 2.466264e-02
74 6.636585e-02
75 2.548934e-02
76 1.113937e-06
77 5.723409e-02
78 1.533288e-02
79 1.027341e-01
80 4.294570e-02
81 4.844853e-02
82 5.579620e-02
83 2.531824e-02
84 1.661426e-02
85 1.430836e-02
86 3.157232e-02
87 2.241722e-03
88 2.946256e-02
89 1.038383e-01
90 1.868837e-02
91 8.854596e-03
92 2.391759e-02
93 1.612714e-02
94 1.007823e-02
95 1.975513e-01
96 3.581289e-02
97 1.199747e-03
98 1.263381e-02
99 1.966746e-02
100 4.040786e-03
101 4.497264e-02
102 4.030524e-02
103 8.627087e-02
104 3.248317e-02
105 5.727582e-03
106 1.781355e-02
107 2.377991e-02
108 4.299568e-02
109 3.664353e-02
110 5.167902e-02
111 4.006848e-02
112 7.072990e-02
113 6.744938e-03
114 1.064900e-02
115 9.823497e-02
116 8.992714e-03
117 1.792453e-01
118 6.817763e-02
119 2.588843e-02
120 1.048027e-01
121 6.468491e-02
122 1.035536e-03
123 8.800684e-02
124 5.975065e-02
125 7.365861e-04
126 4.209485e-02
127 4.232421e-02
128 2.371866e-02
129 5.894714e-02
130 7.177195e-02
131 2.116566e-02
132 7.579219e-02
133 3.174744e-02
134 0.000000e+00
135 5.786439e-02
136 1.458493e-02
137 9.820156e-02
138 4.373873e-02
139 4.271649e-02
140 5.532575e-02
141 2.311324e-02
142 1.644508e-02
143 1.328273e-02
144 3.908473e-02
145 2.355468e-03
146 2.519321e-02
147 1.131868e-01
148 1.708967e-02
149 1.027661e-02
150 2.439899e-02
151 1.604058e-02
152 1.134323e-02
153 2.247722e-01
154 3.408590e-02
155 2.222239e-03
156 1.659830e-02
157 2.284733e-02
158 4.618550e-03
159 3.674162e-02
160 4.131283e-02
161 8.846273e-02
162 2.504404e-02
163 6.004396e-03
164 1.986309e-02
165 2.347111e-02
166 3.865636e-02
167 3.672307e-02
168 6.658419e-02
169 3.726879e-02
170 7.600138e-02
171 7.184871e-03
172 1.142840e-02
173 9.741311e-02
174 8.165448e-03
175 1.529210e-01
176 6.648081e-02
177 2.617601e-02
178 9.547816e-02
179 6.857775e-02
180 8.129399e-04
181 7.107914e-02
182 5.884794e-02
183 8.398721e-04
184 6.972981e-02
185 4.461767e-02
186 2.264404e-02
187 5.566633e-02
188 6.595136e-02
189 2.301914e-02
190 7.488919e-02
191 3.108619e-02
192 4.989364e-07
193 4.834949e-02
194 1.422578e-02
195 9.398186e-02
196 4.870391e-02
197 3.841369e-02
198 6.406801e-02
199 2.603315e-02
200 1.692629e-02
201 1.409982e-02
202 4.099215e-02
203 2.093724e-03
204 2.640732e-02
205 1.032129e-01
206 1.581881e-02
207 8.977325e-03
208 1.941141e-02
209 1.502126e-02
210 9.923589e-03
211 2.757357e-01
212 3.096234e-02
213 4.388900e-03
214 1.784778e-02
215 2.179550e-02
216 3.944159e-03
217 3.703552e-02
218 4.033897e-02
219 1.157076e-01
220 2.400446e-02
221 5.761179e-03
222 1.899621e-02
223 2.401468e-02
224 4.458745e-02
225 3.357898e-02
226 5.331003e-02
227 3.488753e-02
228 7.466599e-02
229 6.075236e-03
230 9.815318e-03
231 9.598735e-02
232 7.103607e-03
233 1.100602e-01
234 5.677641e-02
235 2.420500e-02
236 9.213369e-02
237 4.024043e-02
238 6.987694e-04
239 8.612055e-02
240 5.663353e-02
241 4.871693e-04
242 4.533811e-02
243 3.593244e-02
244 1.982537e-02
245 5.490786e-02
246 5.603109e-02
247 1.671653e-02
248 6.522711e-02
249 3.341356e-02
250 2.378629e-06
251 4.299939e-02
252 1.223163e-02
253 8.392798e-02
254 4.272826e-02
255 3.183946e-02
256 4.431299e-02
257 2.661024e-02
258 1.686707e-02
259 4.070924e-03
260 3.325947e-02
261 2.023611e-03
262 2.402284e-02
263 8.369778e-02
264 1.375093e-02
265 8.899898e-03
266 2.148740e-02
267 1.301483e-02
268 8.355791e-03
269 2.549934e-01
270 2.792516e-02
271 4.652563e-03
272 1.556313e-02
273 1.936942e-02
274 3.547794e-03
275 3.412516e-02
276 3.932606e-02
277 5.305868e-02
278 2.354438e-02
279 5.379380e-03
280 1.904203e-02
281 2.045495e-02
282 3.275855e-02
283 3.007389e-02
284 8.227664e-02
285 2.479949e-02
286 6.573835e-02
287 5.165842e-03
288 7.599650e-03
289 9.613557e-02
290 6.690175e-03
291 1.779880e-01
292 5.076263e-02
293 3.117607e-02
294 7.495692e-02
295 3.707768e-02
296 7.086975e-04
297 8.935981e-02
298 5.624249e-02
299 7.105331e-04
300 3.339868e-02
301 3.354603e-02
302 2.041988e-02
303 3.862522e-02
304 5.977081e-02
305 1.730081e-02
306 6.909621e-02
307 3.729478e-02
308 3.940647e-07
309 4.385336e-02
310 1.391891e-02
311 8.898305e-02
312 3.840141e-02
313 3.214408e-02
314 4.284080e-02
315 1.841022e-02
316 1.528207e-02
317 3.106559e-03
318 3.945481e-02
319 2.085094e-03
320 2.464190e-02
321 7.844914e-02
322 1.526590e-02
323 9.922147e-03
324 1.649218e-02
325 1.341602e-02
326 8.124446e-03
327 2.867380e-01
328 2.663867e-02
329 5.342012e-03
330 1.752612e-02
331 2.010863e-02
332 3.581845e-03
333 3.652284e-02
334 4.484362e-02
335 4.600939e-02
336 2.213280e-02
337 5.494917e-03
338 2.016594e-02
339 2.118010e-02
340 2.964000e-02
341 3.405549e-02
342 1.014185e-01
343 2.451624e-02
344 7.966998e-02
345 5.301538e-03
346 8.198895e-03
347 8.789368e-02
348 7.222417e-03
349 1.448276e-01
350 5.676056e-02
351 2.987054e-02
352 6.851434e-02
353 4.193034e-02
354 7.025054e-03
355 8.557358e-02
356 5.812736e-02
357 2.263676e-02
358 2.922588e-02
359 3.363161e-02
360 1.495056e-02
361 5.871619e-02
362 6.235094e-02
363 1.691340e-02
364 5.361939e-02
365 3.722318e-02
366 9.828477e-03
367 4.155345e-02
368 1.327760e-02
369 7.205372e-02
370 4.151130e-02
371 3.265365e-02
372 2.879418e-02
373 2.314340e-02
374 1.653692e-02
375 1.077611e-02
376 3.481427e-02
377 1.815487e-03
378 2.232305e-02
379 1.005192e-01
380 1.491262e-02
381 3.752658e-02
382 1.271613e-02
383 1.223707e-02
384 8.088923e-03
385 2.572550e-01
386 2.300194e-02
387 2.847960e-02
388 1.782098e-02
389 1.900759e-02
390 3.647629e-03
391 3.723368e-02
392 4.079514e-02
393 5.510332e-02
394 3.072313e-02
395 4.183566e-03
396 1.891549e-02
397 1.870293e-02
398 3.182769e-02
399 4.167840e-02
400 1.343152e-01
401 2.451973e-02
402 7.567017e-02
403 4.837843e-03
404 6.477297e-03
405 7.664675e-02
Name: value, dtype: float64
This is the code I used for transforming dataset:
from scipy import stats
x,_ = stats.boxcox(df)
I get this error:
if any(x <= 0):
-> 1031 raise ValueError("Data must be positive.")
1032
1033 if lmbda is not None: # single transformation
ValueError: Data must be positive
Is it because my values are too small that it's producing an error? Not sure what I'm doing wrong. New to using boxcox, could be using it incorrectly in this example. Open to suggestions and alternatives. Thanks!
Your data contains the value 0 (at index 134). When boxcox says the data must be positive, it means strictly positive.
What is the meaning of your data? Does 0 make sense? Is that 0 actually a very small number that was rounded down to 0?
You could simply discard that 0. Alternatively, you could do something like the following. (This amounts to temporarily discarding the 0, and then using -1/λ for the transformed value of 0, where λ is the Box-Cox transformation parameter.)
First, create some data that contains one 0 (all other values are positive):
In [13]: np.random.seed(8675309)
In [14]: data = np.random.gamma(1, 1, size=405)
In [15]: data[100] = 0
(In your code, you would replace that with, say, data = df.values.)
Copy the strictly positive data to posdata:
In [16]: posdata = data[data > 0]
Find the optimal Box-Cox transformation, and verify that λ is positive. This work-around doesn't work if λ ≤ 0.
In [17]: bcdata, lam = boxcox(posdata)
In [18]: lam
Out[18]: 0.244049919975582
Make a new array to hold that result, along with the limiting value of the transform of 0 (which is -1/λ):
In [19]: x = np.empty_like(data)
In [20]: x[data > 0] = bcdata
In [21]: x[data == 0] = -1/lam
The following plot shows the histograms of data and x.
Rather than normal boxcox, you can use boxcox1p. It adds 1 to x so there won't be any "0" record
from scipy.special import boxcox1p
scipy.special.boxcox1p(x, lmbda)
For more info check out the docs at https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.boxcox1p.html
Is your data that you are sending to boxcox 1-dimensional ndarray?
Second way could be adding shift parameter by summing shift (see details from the link) to all of the ndarray elements before sending it to boxcox and subtracting shift from the resulting array elements (if I have understood boxcox algorithm correctly, that could be solution in your case, too).
https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.stats.boxcox.html

Efficient finite field multiplication with log-antilog-table lookup in numpy

I am trying to implement efficient multiplication in GF(2^8), which elements are most naturally represented as uint8-numpy-values, in a numpy-thonic way. Therefore, I implemented GF-Arithmetics (in pure Python, not numpy) in order to build log-antilog-tables (I took a ranom generator, 9); in particular, I implemented a (non-numpy) Python-Function multGF which implements GF-Multiplication, which works great but is slow (since it uses polynomial modulo calcs). A common trick to speed up multiplication is to use the following equation:
Building the log-antilog-uint8-ndarrays is easily performed like this:
gen = 9 ; K = [1] ; g = gen
for i in range(1,255):
K.append(g)
g = multGF(g,gen)
antilog = np.array(K, dtype='uint8')
log = np.full(256,0, dtype='uint8')
for i in range(255): log[antilog[i]] = i
But, and that is my question, how to implement the multiplication in a numpy-thonic way? Both, the log table and the antilog table are of size 255 (not 256; no log for 0) and the exponents have to be added modulo 255 - and not mod 256. I came up with the following IMHO non numpy-thonic solution:
def multGF2(a,b):
return antilog[(int(log[a]) + log[b]) % 255]
I had to convert the uint8-addition (which works mod-256 naturally) into an int-addtion in order to perform mod-255-addition. This is neither elegant nor efficient and I am quite sure, that any has a better solution?
For testing: here are both logtables as arrays:
log = [ nan 0 250 214 245 173 209 42 240 1 168 71 204 187 37 132 235 91
251 191 163 84 66 146 199 212 182 215 32 30 127 247 230 206 86 229
246 65 186 244 158 87 79 171 61 174 141 180 194 113 207 50 177 150
210 54 27 105 25 231 122 93 242 43 225 2 201 156 81 142 224 52
241 53 60 64 181 190 239 254 153 119 82 72 74 9 166 62 56 13
169 143 136 34 175 109 189 80 108 165 202 188 45 99 172 203 145 126
205 157 49 24 22 139 100 159 20 111 226 133 117 233 88 46 237 130
38 3 220 217 252 35 196 96 151 89 76 6 137 192 219 5 47 178
236 110 48 98 55 118 59 155 176 92 185 179 234 211 249 70 148 18
114 39 77 124 67 14 69 58 4 195 161 7 57 147 51 238 8 135
164 144 138 116 131 208 29 162 170 85 104 193 184 97 75 216 103 115
160 123 197 11 183 10 40 222 94 101 167 213 198 90 140 243 121 149
200 63 152 12 44 23 19 129 17 68 134 28 95 218 154 248 15 16
106 227 221 102 128 120 112 26 228 78 83 31 41 36 232 21 125 107
33 73 253 223]
antilog = [ 1 9 65 127 170 141 137 173 178 85 203 201 219 89 167 232 233 224
161 222 116 249 112 221 111 58 241 56 227 186 29 245 28 252 93 131
247 14 126 163 204 246 7 63 220 102 123 142 146 110 51 176 71 73
55 148 88 174 169 150 74 44 87 217 75 37 22 166 225 168 159 11
83 253 84 194 136 164 243 42 97 68 82 244 21 189 34 41 122 135
211 17 153 61 206 228 133 193 147 103 114 207 237 196 190 57 234 251
98 95 145 117 240 49 162 197 183 120 149 81 239 214 60 199 165 250
107 30 238 223 125 184 15 119 226 179 92 138 182 113 212 46 69 91
181 106 23 175 160 215 53 134 218 80 230 151 67 109 40 115 198 172
187 20 180 99 86 208 10 90 188 43 104 5 45 94 152 52 143 155
47 76 26 202 192 154 38 13 101 96 77 19 139 191 48 171 132 200
210 24 216 66 100 105 12 108 33 50 185 6 54 157 25 209 3 27
195 129 229 140 128 236 205 255 70 64 118 235 242 35 32 59 248 121
156 16 144 124 177 78 8 72 62 213 39 4 36 31 231 158 2 18
130 254 79 ]

Categories