Python, calculating lag of the center of the data - python

My two data sets are:
fnamerp1=([ 93, 87, 96, 93, 90, 123, 111, 82, 87, 115, 103,
101, 93, 92, 111, 107, 114, 106, 116, 106, 128, 115,
141, 134, 120, 149, 140, 166, 152, 171, 192, 207, 227,
266, 270, 286, 355, 385, 397, 488, 462, 531, 579, 622,
711, 720, 801, 858, 906, 915, 915, 956, 1004, 1012, 1045,
1076, 1063, 1013, 985, 924, 959, 838, 766, 763, 742, 642,
587, 557, 484, 393, 353, 341, 284, 240, 221, 209, 147,
109, 113, 102, 71, 63, 63, 50, 29, 39, 36, 25,
30, 23, 27, 23, 19, 19, 24, 15, 23, 21, 26,
15])
fnamerp2=([ 105, 89, 120, 121, 103, 105, 113, 94, 104, 115, 122, 116, 121,
129, 118, 126, 138, 146, 161, 163, 178, 192, 194, 222, 268, 272,
285, 342, 380, 378, 373, 448, 493, 511, 571, 603, 691, 772, 738,
796, 839, 832, 883, 930, 963, 975, 972, 931, 947, 941, 934, 964,
871, 869, 826, 793, 733, 708, 606, 610, 515, 483, 409, 352, 358,
264, 266, 205, 191, 167, 136, 138, 99, 102, 82, 57, 65, 53,
51, 32, 26, 27, 39, 21, 29, 23, 25, 24, 16, 17, 27,
33, 19, 13, 24, 26, 18, 22, 18, 20])
I want to find the lag between the center of the two peaks (not just their max). And my plan is to use np.argmax(signal.correlate(fnamerp1,fnamerp2)).
What is the right way to do this both from a mathematical perspective and also elegant in Python?

Related

How to save printed console output to pandas dataframe in python?

I have a printed output:
{-1: [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...], 0: [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...], 1: [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, 293, 330, 331, 332, 361, 401, 413, 507, 520, 522, 523, 558, 560, 643, 650, 681, 700, 734, 747, 753, 782, 784, 836, 839, 893, 905, 934, 951, 976, 999, 1037, 1048, 1052, 1053, 1082, 1109, 1113, 1115, 1121, 1139, 1146, 1219, 1221, 1264, 1355, 1382, 1392, 1432, 1467, 1485, 1490, 1497, 1513, 1526, 1565, 1682, 1728, 1737, 1738, 1806, 1815, 1824, 1828, 1844, 1845, 1885, 1959, 2014, 2017, 2029, 2052, 2072, 2153, 2157, 2168, 2193, 2199, 2214, 2228, 2232, 2240, 2243, 2264, 2300, 2317, 2353, 2376, 2402, 2405, ...], 2: [15, 39, 60, 61, 149, 157, 222, 250, 289, 320, 448, 538, 630, 658, 662, 665, 709, 759, 810, 837, 897, 901, 917, 924, 925, 945, 946, 954, 959, 1049, 1050, 1090, 1131, 1140, 1154, 1172, 1251, 1300, 1313, 1328, 1387, 1393, 1431, 1440, 1448, 1475, 1507, 1535, 1591, 1597, 1603, 1615, 1636, 1705, 1725, 1736, 1771, 1777, 1791, 1796, 1855, 1867, 1903, 1918, 1928, 1930, 1942, 1943, 1989, 2021, 2039, 2095, 2119, 2169, 2195, 2309, 2337, 2418, 2426, 2429, 2522, 2582, 2598, 2678, 2679, 2682], 3: [50, 113, 160, 213, 224, 229, 238, 239, 352, 400, 409, 506, 545, 570, 701, 703, 712, 716, 830, 838, 858, 921, 1008, 1078, 1124, 1130, 1194, 1214, 1305, 1308, 1311, 1360, 1421, 1441, 1473, 1476, 1532, 1533, 1548, 1580, 1616, 1622, 1649, 1679, 1735, 1883, 1897, 1920, 1985, 2015, 2084, 2091, 2097, 2118, 2152, 2181, 2212, 2223, 2237, 2249, 2310, 2313, 2347, 2369, 2381, 2390, 2470, 2496, 2511, 2514, 2529, 2549, 2569, 2601, 2626, 2666, 2688],
Is it possible i can put this to dataframe
, suppose to column: For example:
Number
Value
-1
[2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...]
0
[3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...],
Try:
dct = {
-1: [2, 10, 11],
0: [3, 6, 27, 33],
1: [0, 5, 21],
2: [15],
3: [50, 113, 160, 213, 224],
}
df = pd.DataFrame({"Number": dct.keys(), "Value": dct.values()})
print(df)
Prints:
Number Value
0 -1 [2, 10, 11]
1 0 [3, 6, 27, 33]
2 1 [0, 5, 21]
3 2 [15]
4 3 [50, 113, 160, 213, 224]
df = pd.DataFrame()
df["Value"] = list(d.values())
df.index = d.keys()
# OR
df = pd.DataFrame.from_dict({k: [v] for k, v in d.items()},
orient="index",
columns=["Value"])
print(df)
# Value
# -1 [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35...
# 0 [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, ...
# 1 [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, ...
# 2 [15, 39, 60, 61, 149, 157, 222, 250, 289, 320,...
# 3 [50, 113, 160, 213, 224, 229, 238, 239, 352, 4...

Unique combination of numbers from list

The following list is generated from a function.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
Now I want to combine the numbers to generate new numbers and expect the new list to be :
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32, 35, 37, 311, 313, 319, 323, 329, 331, 337, 52, 53, 57, 511, 513, 517, 519, 523, 529, 531, 537, 72, 73, 75, 711, 713, 717, .. ...]
How can it be done in Python ??
Use itertools.combinations:
>>> import itertools
>>> [int(f"{a}{b}") for a, b in itertools.combinations([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37], 2)]
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 237, 35, 37, 311, 313, 317, 319, 323, 329, 331, 337, 57, 511, 513, 517, 519, 523, 529, 531, 537, 711, 713, 717, 719, 723, 729, 731, 737, 1113, 1117, 1119, 1123, 1129, 1131, 1137, 1317, 1319, 1323, 1329, 1331, 1337, 1719, 1723, 1729, 1731, 1737, 1923, 1929, 1931, 1937, 2329, 2331, 2337, 2931, 2937, 3137]
x=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
new_list=[]
for i in x:
for j in x:
if j!=i:
new_list.append(int(str(i)+str(j)))
#new_list should be [23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32...]

Dask distributed calculation performance drop in loop

I'm new to Dask and still trying to figure out how to get this running smoothly. I've been experimenting with future API and I get some surprising results.
I have a simple while loop in my code that call the function cpi5. When I %timeit the execution of the fucntion I get :
%timeit min(cpci5(x,N,M,n,lciw,uciw))
6.74 s ± 178 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Now I've run the same function but using a distributed sheduler in dask and I get this :
from dask.distributed import Client
client= Client()
%timeit B.append(client.submit(cpci5,x,N,M,n,lciw,uciw)); bb = np.array(client.gather(B))
29 ms ± 6.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
So far so good, the expected improvement is there but when I time the execution of the loop itself where the function is called, I barely get any difference and they both run in approximately 19s. (I've profile the initial loop and more than 90% of the computation time is due to the function so the improvement should be there.)
The results are coherent and identical.
What could cause such a difference ?
You'll find below the relevant piece of code.
PS : I've already gone as far as i could in code optimization but it's not enough in my case.
N = 392
n = 326
x = np.arange(0,n+1)
if np.floor(n/2) == n/2:
xvalue = int(n/2 +1)
else :
xvalue = int((n+1)/2)
aa = np.arange(lciw[xvalue-1],np.floor(N/2)).astype(int)
lciw :
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26,
27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41,
42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55,
57, 58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 69, 70,
72, 73, 74, 75, 76, 77, 79, 80, 81, 82, 83, 84, 86,
87, 88, 89, 90, 91, 93, 94, 95, 96, 97, 98, 100, 101,
102, 103, 104, 105, 107, 108, 109, 110, 111, 112, 114, 115, 116,
117, 118, 120, 121, 122, 123, 124, 125, 127, 128, 129, 130, 131,
132, 134, 135, 136, 137, 138, 140, 141, 142, 143, 144, 146, 147,
148, 149, 150, 151, 153, 154, 155, 156, 157, 159, 160, 161, 162,
163, 165, 166, 167, 168, 169, 170, 172, 173, 174, 175, 176, 178,
179, 180, 181, 182, 184, 185, 186, 189, 188, 190, 191, 192, 193,
194, 196, 197, 198, 199, 200, 202, 203, 204, 205, 206, 208, 209,
210, 211, 212, 214, 215, 216, 217, 218, 220, 221, 222, 223, 225,
226, 227, 228, 229, 231, 232, 233, 234, 235, 237, 238, 239, 240,
241, 243, 244, 245, 246, 248, 249, 250, 251, 252, 254, 255, 256,
257, 258, 260, 261, 262, 263, 265, 266, 267, 268, 269, 271, 272,
273, 274, 276, 277, 278, 279, 280, 282, 283, 284, 285, 287, 288,
289, 290, 292, 293, 294, 295, 296, 298, 299, 300, 301, 303, 304,
305, 306, 308, 309, 310, 311, 313, 314, 315, 316, 317, 319, 320,
321, 322, 324, 325, 326, 327, 329, 330, 331, 332, 334, 335, 336,
338, 339, 340, 341, 343, 344, 345, 346, 348, 349, 350, 351, 353,
354, 355, 357, 358, 359, 360, 362, 363, 364, 366, 367, 368, 369,
371, 372, 373, 375, 376, 377, 379, 380, 382, 383, 384, 386, 387,
389, 390])
uciw :
array([2, 3, 5, 6, 8, 9, 10, 12, 13, 15, 16, 17, 19,
20, 21, 23, 24, 25, 26, 28, 29, 30, 32, 33, 34, 35,
37, 38, 39, 41, 42, 43, 44, 46, 47, 48, 49, 51, 52,
53, 54, 56, 57, 58, 60, 61, 62, 63, 65, 66, 67, 68,
70, 71, 72, 73, 75, 76, 77, 78, 79, 81, 82, 83, 84,
86, 87, 88, 89, 91, 92, 93, 94, 96, 97, 98, 99, 100,
102, 103, 104, 105, 107, 108, 109, 110, 112, 113, 114, 115, 116,
118, 119, 120, 121, 123, 124, 125, 126, 127, 129, 130, 131, 132,
134, 135, 136, 137, 138, 140, 141, 142, 143, 144, 146, 147, 148,
149, 151, 152, 153, 154, 155, 157, 158, 159, 160, 161, 163, 164,
165, 166, 167, 169, 170, 171, 172, 174, 175, 176, 177, 178, 180,
181, 182, 183, 184, 186, 187, 188, 189, 190, 192, 193, 194, 195,
196, 198, 199, 200, 201, 202, 204, 203, 206, 207, 208, 210, 211,
212, 213, 214, 216, 217, 218, 219, 220, 222, 223, 224, 225, 226,
227, 229, 230, 231, 232, 233, 235, 236, 237, 238, 239, 241, 242,
243, 244, 245, 246, 248, 249, 250, 251, 252, 254, 255, 256, 257,
258, 260, 261, 262, 263, 264, 265, 267, 268, 269, 270, 271, 272,
274, 275, 276, 277, 278, 280, 281, 282, 283, 284, 285, 287, 288,
289, 290, 291, 292, 294, 295, 296, 297, 298, 299, 301, 302, 303,
304, 305, 306, 308, 309, 310, 311, 312, 313, 315, 316, 317, 318,
319, 320, 322, 323, 324, 325, 326, 327, 328, 330, 331, 332, 333,
334, 335, 337, 338, 339, 340, 341, 342, 343, 345, 346, 347, 348,
349, 350, 351, 352, 354, 355, 356, 357, 358, 359, 360, 361, 363,
364, 365, 366, 367, 368, 369, 370, 371, 373, 374, 375, 376, 377,
378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390,
391, 392])
def cpci5(x,N,M,n,lciw,uciw):
f = np.vectorize(hypergeom.pmf)
idd = np.vectorize(ind)
X, m = np.meshgrid(x,M)
kk = idd(m,lciw,uciw) * f(X, N, m, n) #idd just implement a test lciw<= m <=uciw
return min(pd.Series(kk.sum(axis=1)))
M = np.arange(0,N+1) # Initial implementation of the function
ii = 0
while (ii <len(aa)+1):
lciw[xvalue-1] = aa[ii]
uciw[xvalue-1] = N - aa[ii]
bb = min(cpci5(x,N,M,n,lciw,uciw))
if bb >= 1-alpha:
ii1 = ii
ii += 1
else :
ii = len(aa)+1
lciw[xvalue-1] = aa[ii1]
uciw[xvalue-1] = N - lciw[xvalue-1]
M = np.arange(0,N+1) # Distributed version
ii = 0
B = []
while (ii <len(aa)):
lciw[xvalue-1] = aa[ii]
uciw[xvalue-1] = N - aa[ii]
B.append(client.submit(cpci5,x,N,M,n,lciw,uciw))
ii += 1
bb = np.array(client.gather(B))
ii1 = len(bb[bb>1-alpha])-1
lciw[xvalue-1] = aa[ii1]
uciw[xvalue-1] = N - lciw[xvalue-1]

How to deal with large array in numpy/python when memory becomes issue

I have an array all with dimensions (19494500, 376) I need to arrange these 376 columns in a particular sequence I have generated,
l
array([ 0, 94, 188, 282, 1, 95, 189, 283, 2, 96, 190, 284, 3,
97, 191, 285, 4, 98, 192, 286, 5, 99, 193, 287, 6, 100,
194, 288, 7, 101, 195, 289, 8, 102, 196, 290, 9, 103, 197,
291, 10, 104, 198, 292, 11, 105, 199, 293, 12, 106, 200, 294,
13, 107, 201, 295, 14, 108, 202, 296, 15, 109, 203, 297, 16,
110, 204, 298, 17, 111, 205, 299, 18, 112, 206, 300, 19, 113,
207, 301, 20, 114, 208, 302, 21, 115, 209, 303, 22, 116, 210,
304, 23, 117, 211, 305, 24, 118, 212, 306, 25, 119, 213, 307,
26, 120, 214, 308, 27, 121, 215, 309, 28, 122, 216, 310, 29,
123, 217, 311, 30, 124, 218, 312, 31, 125, 219, 313, 32, 126,
220, 314, 33, 127, 221, 315, 34, 128, 222, 316, 35, 129, 223,
317, 36, 130, 224, 318, 37, 131, 225, 319, 38, 132, 226, 320,
39, 133, 227, 321, 40, 134, 228, 322, 41, 135, 229, 323, 42,
136, 230, 324, 43, 137, 231, 325, 44, 138, 232, 326, 45, 139,
233, 327, 46, 140, 234, 328, 47, 141, 235, 329, 48, 142, 236,
330, 49, 143, 237, 331, 50, 144, 238, 332, 51, 145, 239, 333,
52, 146, 240, 334, 53, 147, 241, 335, 54, 148, 242, 336, 55,
149, 243, 337, 56, 150, 244, 338, 57, 151, 245, 339, 58, 152,
246, 340, 59, 153, 247, 341, 60, 154, 248, 342, 61, 155, 249,
343, 62, 156, 250, 344, 63, 157, 251, 345, 64, 158, 252, 346,
65, 159, 253, 347, 66, 160, 254, 348, 67, 161, 255, 349, 68,
162, 256, 350, 69, 163, 257, 351, 70, 164, 258, 352, 71, 165,
259, 353, 72, 166, 260, 354, 73, 167, 261, 355, 74, 168, 262,
356, 75, 169, 263, 357, 76, 170, 264, 358, 77, 171, 265, 359,
78, 172, 266, 360, 79, 173, 267, 361, 80, 174, 268, 362, 81,
175, 269, 363, 82, 176, 270, 364, 83, 177, 271, 365, 84, 178,
272, 366, 85, 179, 273, 367, 86, 180, 274, 368, 87, 181, 275,
369, 88, 182, 276, 370, 89, 183, 277, 371, 90, 184, 278, 372,
91, 185, 279, 373, 92, 186, 280, 374, 93, 187, 281, 375])
So I am doing following
all_c = all[:,l]
but I am getting
"memory error"
Can you suggest what could be the most memory-efficient way?
Rather than permute the whole array at once you can do it row by row in place. Try
for r in range(all.shape[0]):
all[r] = all[r, l]

Unable to find the index in array

Sorry for the very baisc question, but I have an issue on Python (this is my very first python script). I don't understand the root cause
Below my code:
# On découpe notre dataset en train et en test
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
# On calcule les coefficients
est = sm.OLS(Y_train, X_train).fit()
# Export les coef dans des variables
coefSurface = est.params["surface"]
coefArrondissement = est.params["arrondissement"]
indices = [i for i, x in enumerate(X_train["surface"]) if x != "whatever"]
indices2 = [i for i, x in enumerate(X_train["arrondissement"]) if x != "whatever"]
print(indices)
print(indices2)
#TRAINING
predicted_prices = []
for n in range(0, len(Y_train)):
print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
This code display:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
1670.5735272748664
1481.1109472016328
2001.2043042654109
1666.8585747244108
1778.3071512380775
2446.9986103200777
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-332-7502a7c24282> in <module>()
20 predicted_prices = []
21 for n in range(0, len(Y_train)):
---> 22 print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
23 # #predicted_prices.append( (coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
24
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 6
My two first print show my all available index on both list (0 to 656).
But when I doing a 'for' function to do a calculation on each value of my array Python crash (not always at the same index) because it seems that he is not able to retrieve a index on the list 'KeyError: 6'
If you need the full code and I give it to you
Many thanks for your help
Python raises a KeyError whenever a dict() object is requested and the key is not in the dictionary.
The fact that you see a KeyError here means that X_train["surface"] is a dictionary, not a list.
enumerate(X_train["surface"]) will create a new list of indices which do not necessarily exist as keys in the dictionary X_train["surface"] (it's basically just a count of each item in the dict, with no reference to its actual key).
For example
my_dict = {1: 'a', 2: 'b', 3: 'c', 5: 'e', 6: 'f'}
indices = [i for i, x in enumerate(my_dict)]
print(indices)
# [0, 1, 2, 3, 4]
print(my_dict[4])
# KeyError! Nothing in my_dict has the key 4!
To get the actual keys in your example, try:
indices = [i for i, x in X_train["surface"].items() if x != "whatever"]

Categories