Unable to find the index in array

Unable to find the index in array - python

Sorry for the very baisc question, but I have an issue on Python (this is my very first python script). I don't understand the root cause
Below my code:
# On découpe notre dataset en train et en test
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
# On calcule les coefficients
est = sm.OLS(Y_train, X_train).fit()
# Export les coef dans des variables
coefSurface = est.params["surface"]
coefArrondissement = est.params["arrondissement"]
indices = [i for i, x in enumerate(X_train["surface"]) if x != "whatever"]
indices2 = [i for i, x in enumerate(X_train["arrondissement"]) if x != "whatever"]
print(indices)
print(indices2)
#TRAINING
predicted_prices = []
for n in range(0, len(Y_train)):
print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
This code display:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
1670.5735272748664
1481.1109472016328
2001.2043042654109
1666.8585747244108
1778.3071512380775
2446.9986103200777
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-332-7502a7c24282> in <module>()
20 predicted_prices = []
21 for n in range(0, len(Y_train)):
---> 22 print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
23 # #predicted_prices.append( (coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
24
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 6
My two first print show my all available index on both list (0 to 656).
But when I doing a 'for' function to do a calculation on each value of my array Python crash (not always at the same index) because it seems that he is not able to retrieve a index on the list 'KeyError: 6'
If you need the full code and I give it to you
Many thanks for your help

Python raises a KeyError whenever a dict() object is requested and the key is not in the dictionary.
The fact that you see a KeyError here means that X_train["surface"] is a dictionary, not a list.
enumerate(X_train["surface"]) will create a new list of indices which do not necessarily exist as keys in the dictionary X_train["surface"] (it's basically just a count of each item in the dict, with no reference to its actual key).
For example
my_dict = {1: 'a', 2: 'b', 3: 'c', 5: 'e', 6: 'f'}
indices = [i for i, x in enumerate(my_dict)]
print(indices)
# [0, 1, 2, 3, 4]
print(my_dict[4])
# KeyError! Nothing in my_dict has the key 4!
To get the actual keys in your example, try:
indices = [i for i, x in X_train["surface"].items() if x != "whatever"]

Related

How to save printed console output to pandas dataframe in python?

I have a printed output:
{-1: [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...], 0: [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...], 1: [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, 293, 330, 331, 332, 361, 401, 413, 507, 520, 522, 523, 558, 560, 643, 650, 681, 700, 734, 747, 753, 782, 784, 836, 839, 893, 905, 934, 951, 976, 999, 1037, 1048, 1052, 1053, 1082, 1109, 1113, 1115, 1121, 1139, 1146, 1219, 1221, 1264, 1355, 1382, 1392, 1432, 1467, 1485, 1490, 1497, 1513, 1526, 1565, 1682, 1728, 1737, 1738, 1806, 1815, 1824, 1828, 1844, 1845, 1885, 1959, 2014, 2017, 2029, 2052, 2072, 2153, 2157, 2168, 2193, 2199, 2214, 2228, 2232, 2240, 2243, 2264, 2300, 2317, 2353, 2376, 2402, 2405, ...], 2: [15, 39, 60, 61, 149, 157, 222, 250, 289, 320, 448, 538, 630, 658, 662, 665, 709, 759, 810, 837, 897, 901, 917, 924, 925, 945, 946, 954, 959, 1049, 1050, 1090, 1131, 1140, 1154, 1172, 1251, 1300, 1313, 1328, 1387, 1393, 1431, 1440, 1448, 1475, 1507, 1535, 1591, 1597, 1603, 1615, 1636, 1705, 1725, 1736, 1771, 1777, 1791, 1796, 1855, 1867, 1903, 1918, 1928, 1930, 1942, 1943, 1989, 2021, 2039, 2095, 2119, 2169, 2195, 2309, 2337, 2418, 2426, 2429, 2522, 2582, 2598, 2678, 2679, 2682], 3: [50, 113, 160, 213, 224, 229, 238, 239, 352, 400, 409, 506, 545, 570, 701, 703, 712, 716, 830, 838, 858, 921, 1008, 1078, 1124, 1130, 1194, 1214, 1305, 1308, 1311, 1360, 1421, 1441, 1473, 1476, 1532, 1533, 1548, 1580, 1616, 1622, 1649, 1679, 1735, 1883, 1897, 1920, 1985, 2015, 2084, 2091, 2097, 2118, 2152, 2181, 2212, 2223, 2237, 2249, 2310, 2313, 2347, 2369, 2381, 2390, 2470, 2496, 2511, 2514, 2529, 2549, 2569, 2601, 2626, 2666, 2688],
Is it possible i can put this to dataframe
, suppose to column: For example:
Number
Value
-1
[2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...]
0
[3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...],

Try:
dct = {
-1: [2, 10, 11],
0: [3, 6, 27, 33],
1: [0, 5, 21],
2: [15],
3: [50, 113, 160, 213, 224],
}
df = pd.DataFrame({"Number": dct.keys(), "Value": dct.values()})
print(df)
Prints:
Number Value
0 -1 [2, 10, 11]
1 0 [3, 6, 27, 33]
2 1 [0, 5, 21]
3 2 [15]
4 3 [50, 113, 160, 213, 224]

df = pd.DataFrame()
df["Value"] = list(d.values())
df.index = d.keys()
# OR
df = pd.DataFrame.from_dict({k: [v] for k, v in d.items()},
orient="index",
columns=["Value"])
print(df)
# Value
# -1 [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35...
# 0 [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, ...
# 1 [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, ...
# 2 [15, 39, 60, 61, 149, 157, 222, 250, 289, 320,...
# 3 [50, 113, 160, 213, 224, 229, 238, 239, 352, 4...

Dask distributed calculation performance drop in loop

I'm new to Dask and still trying to figure out how to get this running smoothly. I've been experimenting with future API and I get some surprising results.
I have a simple while loop in my code that call the function cpi5. When I %timeit the execution of the fucntion I get :
%timeit min(cpci5(x,N,M,n,lciw,uciw))
6.74 s ± 178 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Now I've run the same function but using a distributed sheduler in dask and I get this :
from dask.distributed import Client
client= Client()
%timeit B.append(client.submit(cpci5,x,N,M,n,lciw,uciw)); bb = np.array(client.gather(B))
29 ms ± 6.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
So far so good, the expected improvement is there but when I time the execution of the loop itself where the function is called, I barely get any difference and they both run in approximately 19s. (I've profile the initial loop and more than 90% of the computation time is due to the function so the improvement should be there.)
The results are coherent and identical.
What could cause such a difference ?
You'll find below the relevant piece of code.
PS : I've already gone as far as i could in code optimization but it's not enough in my case.
N = 392
n = 326
x = np.arange(0,n+1)
if np.floor(n/2) == n/2:
xvalue = int(n/2 +1)
else :
xvalue = int((n+1)/2)
aa = np.arange(lciw[xvalue-1],np.floor(N/2)).astype(int)
lciw :
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26,
27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41,
42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55,
57, 58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 69, 70,
72, 73, 74, 75, 76, 77, 79, 80, 81, 82, 83, 84, 86,
87, 88, 89, 90, 91, 93, 94, 95, 96, 97, 98, 100, 101,
102, 103, 104, 105, 107, 108, 109, 110, 111, 112, 114, 115, 116,
117, 118, 120, 121, 122, 123, 124, 125, 127, 128, 129, 130, 131,
132, 134, 135, 136, 137, 138, 140, 141, 142, 143, 144, 146, 147,
148, 149, 150, 151, 153, 154, 155, 156, 157, 159, 160, 161, 162,
163, 165, 166, 167, 168, 169, 170, 172, 173, 174, 175, 176, 178,
179, 180, 181, 182, 184, 185, 186, 189, 188, 190, 191, 192, 193,
194, 196, 197, 198, 199, 200, 202, 203, 204, 205, 206, 208, 209,
210, 211, 212, 214, 215, 216, 217, 218, 220, 221, 222, 223, 225,
226, 227, 228, 229, 231, 232, 233, 234, 235, 237, 238, 239, 240,
241, 243, 244, 245, 246, 248, 249, 250, 251, 252, 254, 255, 256,
257, 258, 260, 261, 262, 263, 265, 266, 267, 268, 269, 271, 272,
273, 274, 276, 277, 278, 279, 280, 282, 283, 284, 285, 287, 288,
289, 290, 292, 293, 294, 295, 296, 298, 299, 300, 301, 303, 304,
305, 306, 308, 309, 310, 311, 313, 314, 315, 316, 317, 319, 320,
321, 322, 324, 325, 326, 327, 329, 330, 331, 332, 334, 335, 336,
338, 339, 340, 341, 343, 344, 345, 346, 348, 349, 350, 351, 353,
354, 355, 357, 358, 359, 360, 362, 363, 364, 366, 367, 368, 369,
371, 372, 373, 375, 376, 377, 379, 380, 382, 383, 384, 386, 387,
389, 390])
uciw :
array([2, 3, 5, 6, 8, 9, 10, 12, 13, 15, 16, 17, 19,
20, 21, 23, 24, 25, 26, 28, 29, 30, 32, 33, 34, 35,
37, 38, 39, 41, 42, 43, 44, 46, 47, 48, 49, 51, 52,
53, 54, 56, 57, 58, 60, 61, 62, 63, 65, 66, 67, 68,
70, 71, 72, 73, 75, 76, 77, 78, 79, 81, 82, 83, 84,
86, 87, 88, 89, 91, 92, 93, 94, 96, 97, 98, 99, 100,
102, 103, 104, 105, 107, 108, 109, 110, 112, 113, 114, 115, 116,
118, 119, 120, 121, 123, 124, 125, 126, 127, 129, 130, 131, 132,
134, 135, 136, 137, 138, 140, 141, 142, 143, 144, 146, 147, 148,
149, 151, 152, 153, 154, 155, 157, 158, 159, 160, 161, 163, 164,
165, 166, 167, 169, 170, 171, 172, 174, 175, 176, 177, 178, 180,
181, 182, 183, 184, 186, 187, 188, 189, 190, 192, 193, 194, 195,
196, 198, 199, 200, 201, 202, 204, 203, 206, 207, 208, 210, 211,
212, 213, 214, 216, 217, 218, 219, 220, 222, 223, 224, 225, 226,
227, 229, 230, 231, 232, 233, 235, 236, 237, 238, 239, 241, 242,
243, 244, 245, 246, 248, 249, 250, 251, 252, 254, 255, 256, 257,
258, 260, 261, 262, 263, 264, 265, 267, 268, 269, 270, 271, 272,
274, 275, 276, 277, 278, 280, 281, 282, 283, 284, 285, 287, 288,
289, 290, 291, 292, 294, 295, 296, 297, 298, 299, 301, 302, 303,
304, 305, 306, 308, 309, 310, 311, 312, 313, 315, 316, 317, 318,
319, 320, 322, 323, 324, 325, 326, 327, 328, 330, 331, 332, 333,
334, 335, 337, 338, 339, 340, 341, 342, 343, 345, 346, 347, 348,
349, 350, 351, 352, 354, 355, 356, 357, 358, 359, 360, 361, 363,
364, 365, 366, 367, 368, 369, 370, 371, 373, 374, 375, 376, 377,
378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390,
391, 392])
def cpci5(x,N,M,n,lciw,uciw):
f = np.vectorize(hypergeom.pmf)
idd = np.vectorize(ind)
X, m = np.meshgrid(x,M)
kk = idd(m,lciw,uciw) * f(X, N, m, n) #idd just implement a test lciw<= m <=uciw
return min(pd.Series(kk.sum(axis=1)))
M = np.arange(0,N+1) # Initial implementation of the function
ii = 0
while (ii <len(aa)+1):
lciw[xvalue-1] = aa[ii]
uciw[xvalue-1] = N - aa[ii]
bb = min(cpci5(x,N,M,n,lciw,uciw))
if bb >= 1-alpha:
ii1 = ii
ii += 1
else :
ii = len(aa)+1
lciw[xvalue-1] = aa[ii1]
uciw[xvalue-1] = N - lciw[xvalue-1]
M = np.arange(0,N+1) # Distributed version
ii = 0
B = []
while (ii <len(aa)):
lciw[xvalue-1] = aa[ii]
uciw[xvalue-1] = N - aa[ii]
B.append(client.submit(cpci5,x,N,M,n,lciw,uciw))
ii += 1
bb = np.array(client.gather(B))
ii1 = len(bb[bb>1-alpha])-1
lciw[xvalue-1] = aa[ii1]
uciw[xvalue-1] = N - lciw[xvalue-1]

How to deal with large array in numpy/python when memory becomes issue

I have an array all with dimensions (19494500, 376) I need to arrange these 376 columns in a particular sequence I have generated,
l
array([ 0, 94, 188, 282, 1, 95, 189, 283, 2, 96, 190, 284, 3,
97, 191, 285, 4, 98, 192, 286, 5, 99, 193, 287, 6, 100,
194, 288, 7, 101, 195, 289, 8, 102, 196, 290, 9, 103, 197,
291, 10, 104, 198, 292, 11, 105, 199, 293, 12, 106, 200, 294,
13, 107, 201, 295, 14, 108, 202, 296, 15, 109, 203, 297, 16,
110, 204, 298, 17, 111, 205, 299, 18, 112, 206, 300, 19, 113,
207, 301, 20, 114, 208, 302, 21, 115, 209, 303, 22, 116, 210,
304, 23, 117, 211, 305, 24, 118, 212, 306, 25, 119, 213, 307,
26, 120, 214, 308, 27, 121, 215, 309, 28, 122, 216, 310, 29,
123, 217, 311, 30, 124, 218, 312, 31, 125, 219, 313, 32, 126,
220, 314, 33, 127, 221, 315, 34, 128, 222, 316, 35, 129, 223,
317, 36, 130, 224, 318, 37, 131, 225, 319, 38, 132, 226, 320,
39, 133, 227, 321, 40, 134, 228, 322, 41, 135, 229, 323, 42,
136, 230, 324, 43, 137, 231, 325, 44, 138, 232, 326, 45, 139,
233, 327, 46, 140, 234, 328, 47, 141, 235, 329, 48, 142, 236,
330, 49, 143, 237, 331, 50, 144, 238, 332, 51, 145, 239, 333,
52, 146, 240, 334, 53, 147, 241, 335, 54, 148, 242, 336, 55,
149, 243, 337, 56, 150, 244, 338, 57, 151, 245, 339, 58, 152,
246, 340, 59, 153, 247, 341, 60, 154, 248, 342, 61, 155, 249,
343, 62, 156, 250, 344, 63, 157, 251, 345, 64, 158, 252, 346,
65, 159, 253, 347, 66, 160, 254, 348, 67, 161, 255, 349, 68,
162, 256, 350, 69, 163, 257, 351, 70, 164, 258, 352, 71, 165,
259, 353, 72, 166, 260, 354, 73, 167, 261, 355, 74, 168, 262,
356, 75, 169, 263, 357, 76, 170, 264, 358, 77, 171, 265, 359,
78, 172, 266, 360, 79, 173, 267, 361, 80, 174, 268, 362, 81,
175, 269, 363, 82, 176, 270, 364, 83, 177, 271, 365, 84, 178,
272, 366, 85, 179, 273, 367, 86, 180, 274, 368, 87, 181, 275,
369, 88, 182, 276, 370, 89, 183, 277, 371, 90, 184, 278, 372,
91, 185, 279, 373, 92, 186, 280, 374, 93, 187, 281, 375])
So I am doing following
all_c = all[:,l]
but I am getting
"memory error"
Can you suggest what could be the most memory-efficient way?

Rather than permute the whole array at once you can do it row by row in place. Try
for r in range(all.shape[0]):
all[r] = all[r, l]

Finding middle points between two lines of figure

I have a figure in the coordinates system that you can see here:
As you can see it consists of 2 mostly parallel lines, what I want is to make, instead of two lines, just one line that goes through middle of those 2 lines, for example:
Just note I did this manually so it doesn't look like the original picture, but you should get the point.
Here is the code I am using for plotting:
from matplotlib import pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
A = 155, 158, 182, 186, 195, 256, 263, 275, 284, 317, 317, 317, 316, 316, 312, 310, 305, 295, 296, 300, 294, 288, 280, 266, 251, 194, 189, 188, 187, 183, 142, 126, 101, 101, 101, 131, 189, 232, 290, 306, 309, 309, 309, 288, 266, 227, 218, 214, 208, 175, 156, 134, 89 , 79 , 72 , 58 , 57 , 56 , 69 , 87 , 132, 154, 167, 196, 234, 252, 260, 274, 281, 281, 281, 233, 201, 176, 132, 110, 90 , 74 , 74 , 74 , 75 , 75 , 77 , 94 , 108, 151
B = 475, 475, 492, 492, 492, 519, 521, 524, 524, 524, 495, 494, 488, 487, 471, 471, 471, 484, 488, 492, 496, 499, 499, 499, 492, 467, 467, 467, 467, 467, 436, 420, 397, 387, 375, 360, 329, 284, 224, 178, 170, 163, 142, 132, 122, 122, 122, 124, 128, 133, 136, 136, 136, 126, 119, 124, 134, 146, 152, 161, 161, 161, 158, 151, 151, 151, 152, 153, 158, 165, 178, 248, 283, 308, 329, 340, 358, 374, 381, 382, 386, 388, 400, 421, 437, 475
plt.plot(A, B)
for xy in zip(A, B):
ax.annotate('(%s, %s)' % xy, xy=xy, textcoords='data') # <--
plt.grid()
plt.show()

Not the perfect solution but you can get a good approximation using a Nearest Neighbors algorithm (the best option would be using dijkstra distance or other hierarchical clustering procedures but I am not sure how much level of detail you need).
Here is a simple implementation using KMeans
from sklearn.cluster import KMeans
import numpy as np
A = 155, 158, 182, 186, 195, 256, 263, 275, 284, 317, 317, 317, 316, 316, 312, 310, 305, 295, 296, 300, 294, 288, 280, 266, 251, 194, 189, 188, 187, 183, 142, 126, 101, 101, 101, 131, 189, 232, 290, 306, 309, 309, 309, 288, 266, 227, 218, 214, 208, 175, 156, 134, 89 , 79 , 72 , 58 , 57 , 56 , 69 , 87 , 132, 154, 167, 196, 234, 252, 260, 274, 281, 281, 281, 233, 201, 176, 132, 110, 90 , 74 , 74 , 74 , 75 , 75 , 77 , 94 , 108, 151
B = 475, 475, 492, 492, 492, 519, 521, 524, 524, 524, 495, 494, 488, 487, 471, 471, 471, 484, 488, 492, 496, 499, 499, 499, 492, 467, 467, 467, 467, 467, 436, 420, 397, 387, 375, 360, 329, 284, 224, 178, 170, 163, 142, 132, 122, 122, 122, 124, 128, 133, 136, 136, 136, 126, 119, 124, 134, 146, 152, 161, 161, 161, 158, 151, 151, 151, 152, 153, 158, 165, 178, 248, 283, 308, 329, 340, 358, 374, 381, 382, 386, 388, 400, 421, 437, 475
df=pd.DataFrame([list(A),list(B)]).T
df.columns=['A','B']
km = KMeans(n_clusters=20).fit(df)
pts=km.cluster_centers_[km.labels_]
Here is the output:
plt.plot(pts[:,0],pts[:,1])
plt.plot(X, Y)
Here are the centroids:
plt.plot(km.cluster_centers_[:,0], km.cluster_centers_[:,1],'.')
plt.plot(X, Y)
You need to play around with the number of clusters but you can get pretty accurate approximations. Here is with 12 centroids:
km = KMeans(n_clusters=12).fit(df)
pts=km.cluster_centers_[km.labels_]
plt.plot(pts[:,0], pts[:,1])
plt.plot(X, Y)

Python, calculating lag of the center of the data

My two data sets are:
fnamerp1=([ 93, 87, 96, 93, 90, 123, 111, 82, 87, 115, 103,
101, 93, 92, 111, 107, 114, 106, 116, 106, 128, 115,
141, 134, 120, 149, 140, 166, 152, 171, 192, 207, 227,
266, 270, 286, 355, 385, 397, 488, 462, 531, 579, 622,
711, 720, 801, 858, 906, 915, 915, 956, 1004, 1012, 1045,
1076, 1063, 1013, 985, 924, 959, 838, 766, 763, 742, 642,
587, 557, 484, 393, 353, 341, 284, 240, 221, 209, 147,
109, 113, 102, 71, 63, 63, 50, 29, 39, 36, 25,
30, 23, 27, 23, 19, 19, 24, 15, 23, 21, 26,
15])
fnamerp2=([ 105, 89, 120, 121, 103, 105, 113, 94, 104, 115, 122, 116, 121,
129, 118, 126, 138, 146, 161, 163, 178, 192, 194, 222, 268, 272,
285, 342, 380, 378, 373, 448, 493, 511, 571, 603, 691, 772, 738,
796, 839, 832, 883, 930, 963, 975, 972, 931, 947, 941, 934, 964,
871, 869, 826, 793, 733, 708, 606, 610, 515, 483, 409, 352, 358,
264, 266, 205, 191, 167, 136, 138, 99, 102, 82, 57, 65, 53,
51, 32, 26, 27, 39, 21, 29, 23, 25, 24, 16, 17, 27,
33, 19, 13, 24, 26, 18, 22, 18, 20])
I want to find the lag between the center of the two peaks (not just their max). And my plan is to use np.argmax(signal.correlate(fnamerp1,fnamerp2)).
What is the right way to do this both from a mathematical perspective and also elegant in Python?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to find the index in array - python

Related

How to save printed console output to pandas dataframe in python?

Dask distributed calculation performance drop in loop

How to deal with large array in numpy/python when memory becomes issue

Finding middle points between two lines of figure

Python, calculating lag of the center of the data

Categories

Resources