Related
I have a printed output:
{-1: [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...], 0: [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...], 1: [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, 293, 330, 331, 332, 361, 401, 413, 507, 520, 522, 523, 558, 560, 643, 650, 681, 700, 734, 747, 753, 782, 784, 836, 839, 893, 905, 934, 951, 976, 999, 1037, 1048, 1052, 1053, 1082, 1109, 1113, 1115, 1121, 1139, 1146, 1219, 1221, 1264, 1355, 1382, 1392, 1432, 1467, 1485, 1490, 1497, 1513, 1526, 1565, 1682, 1728, 1737, 1738, 1806, 1815, 1824, 1828, 1844, 1845, 1885, 1959, 2014, 2017, 2029, 2052, 2072, 2153, 2157, 2168, 2193, 2199, 2214, 2228, 2232, 2240, 2243, 2264, 2300, 2317, 2353, 2376, 2402, 2405, ...], 2: [15, 39, 60, 61, 149, 157, 222, 250, 289, 320, 448, 538, 630, 658, 662, 665, 709, 759, 810, 837, 897, 901, 917, 924, 925, 945, 946, 954, 959, 1049, 1050, 1090, 1131, 1140, 1154, 1172, 1251, 1300, 1313, 1328, 1387, 1393, 1431, 1440, 1448, 1475, 1507, 1535, 1591, 1597, 1603, 1615, 1636, 1705, 1725, 1736, 1771, 1777, 1791, 1796, 1855, 1867, 1903, 1918, 1928, 1930, 1942, 1943, 1989, 2021, 2039, 2095, 2119, 2169, 2195, 2309, 2337, 2418, 2426, 2429, 2522, 2582, 2598, 2678, 2679, 2682], 3: [50, 113, 160, 213, 224, 229, 238, 239, 352, 400, 409, 506, 545, 570, 701, 703, 712, 716, 830, 838, 858, 921, 1008, 1078, 1124, 1130, 1194, 1214, 1305, 1308, 1311, 1360, 1421, 1441, 1473, 1476, 1532, 1533, 1548, 1580, 1616, 1622, 1649, 1679, 1735, 1883, 1897, 1920, 1985, 2015, 2084, 2091, 2097, 2118, 2152, 2181, 2212, 2223, 2237, 2249, 2310, 2313, 2347, 2369, 2381, 2390, 2470, 2496, 2511, 2514, 2529, 2549, 2569, 2601, 2626, 2666, 2688],
Is it possible i can put this to dataframe
, suppose to column: For example:
Number
Value
-1
[2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...]
0
[3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...],
Try:
dct = {
-1: [2, 10, 11],
0: [3, 6, 27, 33],
1: [0, 5, 21],
2: [15],
3: [50, 113, 160, 213, 224],
}
df = pd.DataFrame({"Number": dct.keys(), "Value": dct.values()})
print(df)
Prints:
Number Value
0 -1 [2, 10, 11]
1 0 [3, 6, 27, 33]
2 1 [0, 5, 21]
3 2 [15]
4 3 [50, 113, 160, 213, 224]
df = pd.DataFrame()
df["Value"] = list(d.values())
df.index = d.keys()
# OR
df = pd.DataFrame.from_dict({k: [v] for k, v in d.items()},
orient="index",
columns=["Value"])
print(df)
# Value
# -1 [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35...
# 0 [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, ...
# 1 [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, ...
# 2 [15, 39, 60, 61, 149, 157, 222, 250, 289, 320,...
# 3 [50, 113, 160, 213, 224, 229, 238, 239, 352, 4...
The following list is generated from a function.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
Now I want to combine the numbers to generate new numbers and expect the new list to be :
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32, 35, 37, 311, 313, 319, 323, 329, 331, 337, 52, 53, 57, 511, 513, 517, 519, 523, 529, 531, 537, 72, 73, 75, 711, 713, 717, .. ...]
How can it be done in Python ??
Use itertools.combinations:
>>> import itertools
>>> [int(f"{a}{b}") for a, b in itertools.combinations([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37], 2)]
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 237, 35, 37, 311, 313, 317, 319, 323, 329, 331, 337, 57, 511, 513, 517, 519, 523, 529, 531, 537, 711, 713, 717, 719, 723, 729, 731, 737, 1113, 1117, 1119, 1123, 1129, 1131, 1137, 1317, 1319, 1323, 1329, 1331, 1337, 1719, 1723, 1729, 1731, 1737, 1923, 1929, 1931, 1937, 2329, 2331, 2337, 2931, 2937, 3137]
x=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
new_list=[]
for i in x:
for j in x:
if j!=i:
new_list.append(int(str(i)+str(j)))
#new_list should be [23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32...]
I'm new to Dask and still trying to figure out how to get this running smoothly. I've been experimenting with future API and I get some surprising results.
I have a simple while loop in my code that call the function cpi5. When I %timeit the execution of the fucntion I get :
%timeit min(cpci5(x,N,M,n,lciw,uciw))
6.74 s ± 178 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Now I've run the same function but using a distributed sheduler in dask and I get this :
from dask.distributed import Client
client= Client()
%timeit B.append(client.submit(cpci5,x,N,M,n,lciw,uciw)); bb = np.array(client.gather(B))
29 ms ± 6.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
So far so good, the expected improvement is there but when I time the execution of the loop itself where the function is called, I barely get any difference and they both run in approximately 19s. (I've profile the initial loop and more than 90% of the computation time is due to the function so the improvement should be there.)
The results are coherent and identical.
What could cause such a difference ?
You'll find below the relevant piece of code.
PS : I've already gone as far as i could in code optimization but it's not enough in my case.
N = 392
n = 326
x = np.arange(0,n+1)
if np.floor(n/2) == n/2:
xvalue = int(n/2 +1)
else :
xvalue = int((n+1)/2)
aa = np.arange(lciw[xvalue-1],np.floor(N/2)).astype(int)
lciw :
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26,
27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41,
42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55,
57, 58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 69, 70,
72, 73, 74, 75, 76, 77, 79, 80, 81, 82, 83, 84, 86,
87, 88, 89, 90, 91, 93, 94, 95, 96, 97, 98, 100, 101,
102, 103, 104, 105, 107, 108, 109, 110, 111, 112, 114, 115, 116,
117, 118, 120, 121, 122, 123, 124, 125, 127, 128, 129, 130, 131,
132, 134, 135, 136, 137, 138, 140, 141, 142, 143, 144, 146, 147,
148, 149, 150, 151, 153, 154, 155, 156, 157, 159, 160, 161, 162,
163, 165, 166, 167, 168, 169, 170, 172, 173, 174, 175, 176, 178,
179, 180, 181, 182, 184, 185, 186, 189, 188, 190, 191, 192, 193,
194, 196, 197, 198, 199, 200, 202, 203, 204, 205, 206, 208, 209,
210, 211, 212, 214, 215, 216, 217, 218, 220, 221, 222, 223, 225,
226, 227, 228, 229, 231, 232, 233, 234, 235, 237, 238, 239, 240,
241, 243, 244, 245, 246, 248, 249, 250, 251, 252, 254, 255, 256,
257, 258, 260, 261, 262, 263, 265, 266, 267, 268, 269, 271, 272,
273, 274, 276, 277, 278, 279, 280, 282, 283, 284, 285, 287, 288,
289, 290, 292, 293, 294, 295, 296, 298, 299, 300, 301, 303, 304,
305, 306, 308, 309, 310, 311, 313, 314, 315, 316, 317, 319, 320,
321, 322, 324, 325, 326, 327, 329, 330, 331, 332, 334, 335, 336,
338, 339, 340, 341, 343, 344, 345, 346, 348, 349, 350, 351, 353,
354, 355, 357, 358, 359, 360, 362, 363, 364, 366, 367, 368, 369,
371, 372, 373, 375, 376, 377, 379, 380, 382, 383, 384, 386, 387,
389, 390])
uciw :
array([2, 3, 5, 6, 8, 9, 10, 12, 13, 15, 16, 17, 19,
20, 21, 23, 24, 25, 26, 28, 29, 30, 32, 33, 34, 35,
37, 38, 39, 41, 42, 43, 44, 46, 47, 48, 49, 51, 52,
53, 54, 56, 57, 58, 60, 61, 62, 63, 65, 66, 67, 68,
70, 71, 72, 73, 75, 76, 77, 78, 79, 81, 82, 83, 84,
86, 87, 88, 89, 91, 92, 93, 94, 96, 97, 98, 99, 100,
102, 103, 104, 105, 107, 108, 109, 110, 112, 113, 114, 115, 116,
118, 119, 120, 121, 123, 124, 125, 126, 127, 129, 130, 131, 132,
134, 135, 136, 137, 138, 140, 141, 142, 143, 144, 146, 147, 148,
149, 151, 152, 153, 154, 155, 157, 158, 159, 160, 161, 163, 164,
165, 166, 167, 169, 170, 171, 172, 174, 175, 176, 177, 178, 180,
181, 182, 183, 184, 186, 187, 188, 189, 190, 192, 193, 194, 195,
196, 198, 199, 200, 201, 202, 204, 203, 206, 207, 208, 210, 211,
212, 213, 214, 216, 217, 218, 219, 220, 222, 223, 224, 225, 226,
227, 229, 230, 231, 232, 233, 235, 236, 237, 238, 239, 241, 242,
243, 244, 245, 246, 248, 249, 250, 251, 252, 254, 255, 256, 257,
258, 260, 261, 262, 263, 264, 265, 267, 268, 269, 270, 271, 272,
274, 275, 276, 277, 278, 280, 281, 282, 283, 284, 285, 287, 288,
289, 290, 291, 292, 294, 295, 296, 297, 298, 299, 301, 302, 303,
304, 305, 306, 308, 309, 310, 311, 312, 313, 315, 316, 317, 318,
319, 320, 322, 323, 324, 325, 326, 327, 328, 330, 331, 332, 333,
334, 335, 337, 338, 339, 340, 341, 342, 343, 345, 346, 347, 348,
349, 350, 351, 352, 354, 355, 356, 357, 358, 359, 360, 361, 363,
364, 365, 366, 367, 368, 369, 370, 371, 373, 374, 375, 376, 377,
378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390,
391, 392])
def cpci5(x,N,M,n,lciw,uciw):
f = np.vectorize(hypergeom.pmf)
idd = np.vectorize(ind)
X, m = np.meshgrid(x,M)
kk = idd(m,lciw,uciw) * f(X, N, m, n) #idd just implement a test lciw<= m <=uciw
return min(pd.Series(kk.sum(axis=1)))
M = np.arange(0,N+1) # Initial implementation of the function
ii = 0
while (ii <len(aa)+1):
lciw[xvalue-1] = aa[ii]
uciw[xvalue-1] = N - aa[ii]
bb = min(cpci5(x,N,M,n,lciw,uciw))
if bb >= 1-alpha:
ii1 = ii
ii += 1
else :
ii = len(aa)+1
lciw[xvalue-1] = aa[ii1]
uciw[xvalue-1] = N - lciw[xvalue-1]
M = np.arange(0,N+1) # Distributed version
ii = 0
B = []
while (ii <len(aa)):
lciw[xvalue-1] = aa[ii]
uciw[xvalue-1] = N - aa[ii]
B.append(client.submit(cpci5,x,N,M,n,lciw,uciw))
ii += 1
bb = np.array(client.gather(B))
ii1 = len(bb[bb>1-alpha])-1
lciw[xvalue-1] = aa[ii1]
uciw[xvalue-1] = N - lciw[xvalue-1]
I have an array all with dimensions (19494500, 376) I need to arrange these 376 columns in a particular sequence I have generated,
l
array([ 0, 94, 188, 282, 1, 95, 189, 283, 2, 96, 190, 284, 3,
97, 191, 285, 4, 98, 192, 286, 5, 99, 193, 287, 6, 100,
194, 288, 7, 101, 195, 289, 8, 102, 196, 290, 9, 103, 197,
291, 10, 104, 198, 292, 11, 105, 199, 293, 12, 106, 200, 294,
13, 107, 201, 295, 14, 108, 202, 296, 15, 109, 203, 297, 16,
110, 204, 298, 17, 111, 205, 299, 18, 112, 206, 300, 19, 113,
207, 301, 20, 114, 208, 302, 21, 115, 209, 303, 22, 116, 210,
304, 23, 117, 211, 305, 24, 118, 212, 306, 25, 119, 213, 307,
26, 120, 214, 308, 27, 121, 215, 309, 28, 122, 216, 310, 29,
123, 217, 311, 30, 124, 218, 312, 31, 125, 219, 313, 32, 126,
220, 314, 33, 127, 221, 315, 34, 128, 222, 316, 35, 129, 223,
317, 36, 130, 224, 318, 37, 131, 225, 319, 38, 132, 226, 320,
39, 133, 227, 321, 40, 134, 228, 322, 41, 135, 229, 323, 42,
136, 230, 324, 43, 137, 231, 325, 44, 138, 232, 326, 45, 139,
233, 327, 46, 140, 234, 328, 47, 141, 235, 329, 48, 142, 236,
330, 49, 143, 237, 331, 50, 144, 238, 332, 51, 145, 239, 333,
52, 146, 240, 334, 53, 147, 241, 335, 54, 148, 242, 336, 55,
149, 243, 337, 56, 150, 244, 338, 57, 151, 245, 339, 58, 152,
246, 340, 59, 153, 247, 341, 60, 154, 248, 342, 61, 155, 249,
343, 62, 156, 250, 344, 63, 157, 251, 345, 64, 158, 252, 346,
65, 159, 253, 347, 66, 160, 254, 348, 67, 161, 255, 349, 68,
162, 256, 350, 69, 163, 257, 351, 70, 164, 258, 352, 71, 165,
259, 353, 72, 166, 260, 354, 73, 167, 261, 355, 74, 168, 262,
356, 75, 169, 263, 357, 76, 170, 264, 358, 77, 171, 265, 359,
78, 172, 266, 360, 79, 173, 267, 361, 80, 174, 268, 362, 81,
175, 269, 363, 82, 176, 270, 364, 83, 177, 271, 365, 84, 178,
272, 366, 85, 179, 273, 367, 86, 180, 274, 368, 87, 181, 275,
369, 88, 182, 276, 370, 89, 183, 277, 371, 90, 184, 278, 372,
91, 185, 279, 373, 92, 186, 280, 374, 93, 187, 281, 375])
So I am doing following
all_c = all[:,l]
but I am getting
"memory error"
Can you suggest what could be the most memory-efficient way?
Rather than permute the whole array at once you can do it row by row in place. Try
for r in range(all.shape[0]):
all[r] = all[r, l]
Sorry for the very baisc question, but I have an issue on Python (this is my very first python script). I don't understand the root cause
Below my code:
# On découpe notre dataset en train et en test
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
# On calcule les coefficients
est = sm.OLS(Y_train, X_train).fit()
# Export les coef dans des variables
coefSurface = est.params["surface"]
coefArrondissement = est.params["arrondissement"]
indices = [i for i, x in enumerate(X_train["surface"]) if x != "whatever"]
indices2 = [i for i, x in enumerate(X_train["arrondissement"]) if x != "whatever"]
print(indices)
print(indices2)
#TRAINING
predicted_prices = []
for n in range(0, len(Y_train)):
print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
This code display:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
1670.5735272748664
1481.1109472016328
2001.2043042654109
1666.8585747244108
1778.3071512380775
2446.9986103200777
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-332-7502a7c24282> in <module>()
20 predicted_prices = []
21 for n in range(0, len(Y_train)):
---> 22 print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
23 # #predicted_prices.append( (coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
24
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 6
My two first print show my all available index on both list (0 to 656).
But when I doing a 'for' function to do a calculation on each value of my array Python crash (not always at the same index) because it seems that he is not able to retrieve a index on the list 'KeyError: 6'
If you need the full code and I give it to you
Many thanks for your help
Python raises a KeyError whenever a dict() object is requested and the key is not in the dictionary.
The fact that you see a KeyError here means that X_train["surface"] is a dictionary, not a list.
enumerate(X_train["surface"]) will create a new list of indices which do not necessarily exist as keys in the dictionary X_train["surface"] (it's basically just a count of each item in the dict, with no reference to its actual key).
For example
my_dict = {1: 'a', 2: 'b', 3: 'c', 5: 'e', 6: 'f'}
indices = [i for i, x in enumerate(my_dict)]
print(indices)
# [0, 1, 2, 3, 4]
print(my_dict[4])
# KeyError! Nothing in my_dict has the key 4!
To get the actual keys in your example, try:
indices = [i for i, x in X_train["surface"].items() if x != "whatever"]