OperationalError 1241: Operand should contain 1 column(s) - python

Code:
import pandas as pd
df = pd.DataFrame(list(inverted_index.items()),columns = ['words','docids'])
from pandas.io import sql
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://{user}:{pw}#localhost/{db}"
.format(user="root",
pw="shreshre",
db="nltk"))
df.to_sql(con=engine, name='documents', if_exists='replace')
Output:
Here I want convert my inverted index, which is in dictionary type, into a dataframe and write it in MySQL. But I am receiving an error:
OperationalError: (pymysql.err.OperationalError) (1241, 'Operand should contain 1 column(s)')
[SQL: INSERT INTO documents (`index`, words, docids) VALUES (%(index)s, %(words)s, %(docids)s)]
[parameters: ({'index': 0, 'words': 'bank', 'docids': {0, 1, 2, 3, 4, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 37, 38, 39, 40, 41, 43, 44, 45, 46, 48, 52, 53, 54, 55, 56, 59, 60, 62, 64, 66, 67, 68, 69 ... (1314 characters truncated) ... 719, 720, 721, 722, 724, 726, 728, 733, 734, 735, 736, 737, 739, 740, 743, 746, 748, 752, 753, 755, 756, 757, 758, 759, 762, 765, 766, 767, 768, 772}}, {'index': 1, 'words': 'defin', 'docids': {0, 2, 354, 612, 773, 76, 84}}, {'index': 2, 'words': 'establish', 'docids': {0, 161, 391, 328, 330, 718, 719, 720, 722, 245, 217, 411, 156}}, {'index': 3, 'words': 'custodi', 'docids': {0, 405}}, {'index': 4, 'words': ',', 'docids': {0, 1, 2, 3, 8, 14, 17, 20, 22, 24, 25, 26, 27, 30, 31, 41, 43, 45, 48, 49, 51, 52, 54, 55, 59, 62, 63, 65, 67, 69, 70, 72, 73, 74, 76, 78, 79, 80, 81 ... (1428 characters truncated) ... 705, 706, 708, 710, 711, 712, 716, 718, 719, 721, 722, 724, 729, 730, 732, 735, 736, 741, 743, 745, 749, 756, 757, 758, 762, 766, 768, 769, 771, 773}}, {'index': 5, 'words': 'loan', 'docids': {0, 512, 517, 519, 538, 29, 33, 34, 557, 558, 47, 559, 564, 574, 578, 580, 70, 73, 76, 79, 616, 621, 113, 114, 115, 116, 117, 123, 124, 127, 128, 129, ... (75 characters truncated) ... 711, 200, 219, 227, 228, 234, 235, 241, 758, 771, 309, 310, 340, 343, 346, 349, 354, 365, 368, 380, 383, 384, 385, 386, 440, 447, 448, 451, 453, 474}}, {'index': 6, 'words': 'exchang', 'docids': {0, 416, 290, 354, 357, 425, 10, 302, 430, 405, 376, 415}}, {'index': 7, 'words': 'issu', 'docids': {0, 386, 419, 676, 390, 397, 302, 272, 274, 306, 722, 700, 350}} ... displaying 10 of 1969 total bound parameter sets ... {'index': 1967, 'words': '86', 'docids': {774}}, {'index': 1968, 'words': 'separ', 'docids': {774}})]
I am not able to understand the existing solution posted on Stackoverflow regarding a similar error. Someone please help me out.

Related

How to save printed console output to pandas dataframe in python?

I have a printed output:
{-1: [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...], 0: [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...], 1: [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, 293, 330, 331, 332, 361, 401, 413, 507, 520, 522, 523, 558, 560, 643, 650, 681, 700, 734, 747, 753, 782, 784, 836, 839, 893, 905, 934, 951, 976, 999, 1037, 1048, 1052, 1053, 1082, 1109, 1113, 1115, 1121, 1139, 1146, 1219, 1221, 1264, 1355, 1382, 1392, 1432, 1467, 1485, 1490, 1497, 1513, 1526, 1565, 1682, 1728, 1737, 1738, 1806, 1815, 1824, 1828, 1844, 1845, 1885, 1959, 2014, 2017, 2029, 2052, 2072, 2153, 2157, 2168, 2193, 2199, 2214, 2228, 2232, 2240, 2243, 2264, 2300, 2317, 2353, 2376, 2402, 2405, ...], 2: [15, 39, 60, 61, 149, 157, 222, 250, 289, 320, 448, 538, 630, 658, 662, 665, 709, 759, 810, 837, 897, 901, 917, 924, 925, 945, 946, 954, 959, 1049, 1050, 1090, 1131, 1140, 1154, 1172, 1251, 1300, 1313, 1328, 1387, 1393, 1431, 1440, 1448, 1475, 1507, 1535, 1591, 1597, 1603, 1615, 1636, 1705, 1725, 1736, 1771, 1777, 1791, 1796, 1855, 1867, 1903, 1918, 1928, 1930, 1942, 1943, 1989, 2021, 2039, 2095, 2119, 2169, 2195, 2309, 2337, 2418, 2426, 2429, 2522, 2582, 2598, 2678, 2679, 2682], 3: [50, 113, 160, 213, 224, 229, 238, 239, 352, 400, 409, 506, 545, 570, 701, 703, 712, 716, 830, 838, 858, 921, 1008, 1078, 1124, 1130, 1194, 1214, 1305, 1308, 1311, 1360, 1421, 1441, 1473, 1476, 1532, 1533, 1548, 1580, 1616, 1622, 1649, 1679, 1735, 1883, 1897, 1920, 1985, 2015, 2084, 2091, 2097, 2118, 2152, 2181, 2212, 2223, 2237, 2249, 2310, 2313, 2347, 2369, 2381, 2390, 2470, 2496, 2511, 2514, 2529, 2549, 2569, 2601, 2626, 2666, 2688],
Is it possible i can put this to dataframe
, suppose to column: For example:
Number
Value
-1
[2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...]
0
[3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...],
Try:
dct = {
-1: [2, 10, 11],
0: [3, 6, 27, 33],
1: [0, 5, 21],
2: [15],
3: [50, 113, 160, 213, 224],
}
df = pd.DataFrame({"Number": dct.keys(), "Value": dct.values()})
print(df)
Prints:
Number Value
0 -1 [2, 10, 11]
1 0 [3, 6, 27, 33]
2 1 [0, 5, 21]
3 2 [15]
4 3 [50, 113, 160, 213, 224]
df = pd.DataFrame()
df["Value"] = list(d.values())
df.index = d.keys()
# OR
df = pd.DataFrame.from_dict({k: [v] for k, v in d.items()},
orient="index",
columns=["Value"])
print(df)
# Value
# -1 [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35...
# 0 [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, ...
# 1 [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, ...
# 2 [15, 39, 60, 61, 149, 157, 222, 250, 289, 320,...
# 3 [50, 113, 160, 213, 224, 229, 238, 239, 352, 4...

How should I find the line of best fit for a possible logarithmic scatter plot?

This is the first time I am giving a go with scikit learn. However, I am struggling to get the closest line of best fit using the following data
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
y = [0, 187, 262, 296, 319, 340, 359, 376, 388, 401, 411, 414, 423, 430, 433, 439, 446, 452, 457, 461, 465, 469, 470, 470, 472, 474, 479, 484, 486, 487, 489, 489, 491, 491, 491, 494, 494, 498, 500, 500, 500, 500, 505, 506, 506, 506, 506, 507, 508, 509, 509, 509, 511, 511, 512, 514, 515, 515, 515, 517, 517, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519]
I was able to graph the following out of of matplotlib...
...as a result of the code below...
fig, ax = plt.subplots(figsize = (10,8))
ax1 = plt.scatter(x, y, c = 'brown')
def func(x, a, b,c):
return a*np.log2(b+x)+c
popt, pcov = curve_fit(func, frequency['pct'], frequency['Facility Count Military'])
print(popt)
#popt was the following: [4.28209689e+01 1.46600585e-02 2.59467635e+02]
ax2 = sns.lineplot(frequency['pct'], popt[0]*np.log2(popt[1]+frequency['pct'])+popt[2], c = 'black')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([0, 530])
plt.xlim([0, 100])
plt.title('y over x', y = 1, fontsize=15, fontweight='semibold')
plt.show()
(a) Is my methodology correct?
(b) Does it make sense to make a line of best fit with a log based 2 line or is this something different?
Edited:
Nevermind about part c. I just edited the code accordingly and figured that out on my own.
(c) Is there a way to translate the "popt" into the line graph that will eventually be used?
Any assistance on this is truly appreciated.

Unique combination of numbers from list

The following list is generated from a function.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
Now I want to combine the numbers to generate new numbers and expect the new list to be :
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32, 35, 37, 311, 313, 319, 323, 329, 331, 337, 52, 53, 57, 511, 513, 517, 519, 523, 529, 531, 537, 72, 73, 75, 711, 713, 717, .. ...]
How can it be done in Python ??
Use itertools.combinations:
>>> import itertools
>>> [int(f"{a}{b}") for a, b in itertools.combinations([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37], 2)]
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 237, 35, 37, 311, 313, 317, 319, 323, 329, 331, 337, 57, 511, 513, 517, 519, 523, 529, 531, 537, 711, 713, 717, 719, 723, 729, 731, 737, 1113, 1117, 1119, 1123, 1129, 1131, 1137, 1317, 1319, 1323, 1329, 1331, 1337, 1719, 1723, 1729, 1731, 1737, 1923, 1929, 1931, 1937, 2329, 2331, 2337, 2931, 2937, 3137]
x=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
new_list=[]
for i in x:
for j in x:
if j!=i:
new_list.append(int(str(i)+str(j)))
#new_list should be [23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32...]

Unable to find the index in array

Sorry for the very baisc question, but I have an issue on Python (this is my very first python script). I don't understand the root cause
Below my code:
# On découpe notre dataset en train et en test
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
# On calcule les coefficients
est = sm.OLS(Y_train, X_train).fit()
# Export les coef dans des variables
coefSurface = est.params["surface"]
coefArrondissement = est.params["arrondissement"]
indices = [i for i, x in enumerate(X_train["surface"]) if x != "whatever"]
indices2 = [i for i, x in enumerate(X_train["arrondissement"]) if x != "whatever"]
print(indices)
print(indices2)
#TRAINING
predicted_prices = []
for n in range(0, len(Y_train)):
print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
This code display:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656]
1670.5735272748664
1481.1109472016328
2001.2043042654109
1666.8585747244108
1778.3071512380775
2446.9986103200777
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-332-7502a7c24282> in <module>()
20 predicted_prices = []
21 for n in range(0, len(Y_train)):
---> 22 print((coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
23 # #predicted_prices.append( (coefSurface * X_train["surface"][n]) + (coefArrondissement * X_train["arrondissement"][n]))
24
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 6
My two first print show my all available index on both list (0 to 656).
But when I doing a 'for' function to do a calculation on each value of my array Python crash (not always at the same index) because it seems that he is not able to retrieve a index on the list 'KeyError: 6'
If you need the full code and I give it to you
Many thanks for your help
Python raises a KeyError whenever a dict() object is requested and the key is not in the dictionary.
The fact that you see a KeyError here means that X_train["surface"] is a dictionary, not a list.
enumerate(X_train["surface"]) will create a new list of indices which do not necessarily exist as keys in the dictionary X_train["surface"] (it's basically just a count of each item in the dict, with no reference to its actual key).
For example
my_dict = {1: 'a', 2: 'b', 3: 'c', 5: 'e', 6: 'f'}
indices = [i for i, x in enumerate(my_dict)]
print(indices)
# [0, 1, 2, 3, 4]
print(my_dict[4])
# KeyError! Nothing in my_dict has the key 4!
To get the actual keys in your example, try:
indices = [i for i, x in X_train["surface"].items() if x != "whatever"]

Python, calculating lag of the center of the data

My two data sets are:
fnamerp1=([ 93, 87, 96, 93, 90, 123, 111, 82, 87, 115, 103,
101, 93, 92, 111, 107, 114, 106, 116, 106, 128, 115,
141, 134, 120, 149, 140, 166, 152, 171, 192, 207, 227,
266, 270, 286, 355, 385, 397, 488, 462, 531, 579, 622,
711, 720, 801, 858, 906, 915, 915, 956, 1004, 1012, 1045,
1076, 1063, 1013, 985, 924, 959, 838, 766, 763, 742, 642,
587, 557, 484, 393, 353, 341, 284, 240, 221, 209, 147,
109, 113, 102, 71, 63, 63, 50, 29, 39, 36, 25,
30, 23, 27, 23, 19, 19, 24, 15, 23, 21, 26,
15])
fnamerp2=([ 105, 89, 120, 121, 103, 105, 113, 94, 104, 115, 122, 116, 121,
129, 118, 126, 138, 146, 161, 163, 178, 192, 194, 222, 268, 272,
285, 342, 380, 378, 373, 448, 493, 511, 571, 603, 691, 772, 738,
796, 839, 832, 883, 930, 963, 975, 972, 931, 947, 941, 934, 964,
871, 869, 826, 793, 733, 708, 606, 610, 515, 483, 409, 352, 358,
264, 266, 205, 191, 167, 136, 138, 99, 102, 82, 57, 65, 53,
51, 32, 26, 27, 39, 21, 29, 23, 25, 24, 16, 17, 27,
33, 19, 13, 24, 26, 18, 22, 18, 20])
I want to find the lag between the center of the two peaks (not just their max). And my plan is to use np.argmax(signal.correlate(fnamerp1,fnamerp2)).
What is the right way to do this both from a mathematical perspective and also elegant in Python?

Categories