Unique combination of numbers from list - python

The following list is generated from a function.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
Now I want to combine the numbers to generate new numbers and expect the new list to be :
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32, 35, 37, 311, 313, 319, 323, 329, 331, 337, 52, 53, 57, 511, 513, 517, 519, 523, 529, 531, 537, 72, 73, 75, 711, 713, 717, .. ...]
How can it be done in Python ??

Use itertools.combinations:
>>> import itertools
>>> [int(f"{a}{b}") for a, b in itertools.combinations([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37], 2)]
[23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 237, 35, 37, 311, 313, 317, 319, 323, 329, 331, 337, 57, 511, 513, 517, 519, 523, 529, 531, 537, 711, 713, 717, 719, 723, 729, 731, 737, 1113, 1117, 1119, 1123, 1129, 1131, 1137, 1317, 1319, 1323, 1329, 1331, 1337, 1719, 1723, 1729, 1731, 1737, 1923, 1929, 1931, 1937, 2329, 2331, 2337, 2931, 2937, 3137]

x=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
new_list=[]
for i in x:
for j in x:
if j!=i:
new_list.append(int(str(i)+str(j)))
#new_list should be [23, 25, 27, 211, 213, 217, 219, 223, 229, 231, 32...]

Related

How to save printed console output to pandas dataframe in python?

I have a printed output:
{-1: [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...], 0: [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...], 1: [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, 293, 330, 331, 332, 361, 401, 413, 507, 520, 522, 523, 558, 560, 643, 650, 681, 700, 734, 747, 753, 782, 784, 836, 839, 893, 905, 934, 951, 976, 999, 1037, 1048, 1052, 1053, 1082, 1109, 1113, 1115, 1121, 1139, 1146, 1219, 1221, 1264, 1355, 1382, 1392, 1432, 1467, 1485, 1490, 1497, 1513, 1526, 1565, 1682, 1728, 1737, 1738, 1806, 1815, 1824, 1828, 1844, 1845, 1885, 1959, 2014, 2017, 2029, 2052, 2072, 2153, 2157, 2168, 2193, 2199, 2214, 2228, 2232, 2240, 2243, 2264, 2300, 2317, 2353, 2376, 2402, 2405, ...], 2: [15, 39, 60, 61, 149, 157, 222, 250, 289, 320, 448, 538, 630, 658, 662, 665, 709, 759, 810, 837, 897, 901, 917, 924, 925, 945, 946, 954, 959, 1049, 1050, 1090, 1131, 1140, 1154, 1172, 1251, 1300, 1313, 1328, 1387, 1393, 1431, 1440, 1448, 1475, 1507, 1535, 1591, 1597, 1603, 1615, 1636, 1705, 1725, 1736, 1771, 1777, 1791, 1796, 1855, 1867, 1903, 1918, 1928, 1930, 1942, 1943, 1989, 2021, 2039, 2095, 2119, 2169, 2195, 2309, 2337, 2418, 2426, 2429, 2522, 2582, 2598, 2678, 2679, 2682], 3: [50, 113, 160, 213, 224, 229, 238, 239, 352, 400, 409, 506, 545, 570, 701, 703, 712, 716, 830, 838, 858, 921, 1008, 1078, 1124, 1130, 1194, 1214, 1305, 1308, 1311, 1360, 1421, 1441, 1473, 1476, 1532, 1533, 1548, 1580, 1616, 1622, 1649, 1679, 1735, 1883, 1897, 1920, 1985, 2015, 2084, 2091, 2097, 2118, 2152, 2181, 2212, 2223, 2237, 2249, 2310, 2313, 2347, 2369, 2381, 2390, 2470, 2496, 2511, 2514, 2529, 2549, 2569, 2601, 2626, 2666, 2688],
Is it possible i can put this to dataframe
, suppose to column: For example:
Number
Value
-1
[2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35, 36, 40, 42, 49, 54, 56, 59, 64, 66, 78, 94, 99, 101, 102, 103, 106, 107, 109, 110, 114, 117, 123, 126, 127, 129, 131, 132, 133, 136, 144, 146, 147, 150, 155, 156, 164, 166, 177, 179, 181, 182, 188, 190, 192, 194, 201, 202, 204, 209, 214, 217, 220, 221, 225, 231, 232, 234, 235, 236, 240, 244, 246, 248, 253, 254, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 271, 275, 277, 279, 280, 281, 285, 286, 287, 288, 297, 302, 309, ...]
0
[3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, 70, 72, 74, 83, 89, 91, 92, 98, 111, 112, 122, 124, 135, 158, 175, 187, 197, 198, 199, 200, 205, 206, 207, 215, 216, 242, 243, 258, 267, 272, 283, 299, 300, 303, 305, 306, 307, 310, 311, 312, 313, 314, 315, 316, 319, 326, 329, 348, 353, 355, 376, 377, 378, 380, 385, 386, 387, 389, 399, 402, 406, 418, 424, 425, 426, 427, 431, 432, 433, 434, 435, 447, 486, 487, 503, 511, 512, 514, 515, 524, 525, 535, 536, 539, 547, 549, 550, 554, ...],
Try:
dct = {
-1: [2, 10, 11],
0: [3, 6, 27, 33],
1: [0, 5, 21],
2: [15],
3: [50, 113, 160, 213, 224],
}
df = pd.DataFrame({"Number": dct.keys(), "Value": dct.values()})
print(df)
Prints:
Number Value
0 -1 [2, 10, 11]
1 0 [3, 6, 27, 33]
2 1 [0, 5, 21]
3 2 [15]
4 3 [50, 113, 160, 213, 224]
df = pd.DataFrame()
df["Value"] = list(d.values())
df.index = d.keys()
# OR
df = pd.DataFrame.from_dict({k: [v] for k, v in d.items()},
orient="index",
columns=["Value"])
print(df)
# Value
# -1 [2, 10, 11, 13, 16, 19, 24, 28, 30, 32, 34, 35...
# 0 [3, 6, 8, 25, 27, 33, 38, 57, 62, 63, 67, 69, ...
# 1 [0, 5, 21, 44, 46, 48, 51, 82, 115, 118, 274, ...
# 2 [15, 39, 60, 61, 149, 157, 222, 250, 289, 320,...
# 3 [50, 113, 160, 213, 224, 229, 238, 239, 352, 4...

OperationalError 1241: Operand should contain 1 column(s)

Code:
import pandas as pd
df = pd.DataFrame(list(inverted_index.items()),columns = ['words','docids'])
from pandas.io import sql
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://{user}:{pw}#localhost/{db}"
.format(user="root",
pw="shreshre",
db="nltk"))
df.to_sql(con=engine, name='documents', if_exists='replace')
Output:
Here I want convert my inverted index, which is in dictionary type, into a dataframe and write it in MySQL. But I am receiving an error:
OperationalError: (pymysql.err.OperationalError) (1241, 'Operand should contain 1 column(s)')
[SQL: INSERT INTO documents (`index`, words, docids) VALUES (%(index)s, %(words)s, %(docids)s)]
[parameters: ({'index': 0, 'words': 'bank', 'docids': {0, 1, 2, 3, 4, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 37, 38, 39, 40, 41, 43, 44, 45, 46, 48, 52, 53, 54, 55, 56, 59, 60, 62, 64, 66, 67, 68, 69 ... (1314 characters truncated) ... 719, 720, 721, 722, 724, 726, 728, 733, 734, 735, 736, 737, 739, 740, 743, 746, 748, 752, 753, 755, 756, 757, 758, 759, 762, 765, 766, 767, 768, 772}}, {'index': 1, 'words': 'defin', 'docids': {0, 2, 354, 612, 773, 76, 84}}, {'index': 2, 'words': 'establish', 'docids': {0, 161, 391, 328, 330, 718, 719, 720, 722, 245, 217, 411, 156}}, {'index': 3, 'words': 'custodi', 'docids': {0, 405}}, {'index': 4, 'words': ',', 'docids': {0, 1, 2, 3, 8, 14, 17, 20, 22, 24, 25, 26, 27, 30, 31, 41, 43, 45, 48, 49, 51, 52, 54, 55, 59, 62, 63, 65, 67, 69, 70, 72, 73, 74, 76, 78, 79, 80, 81 ... (1428 characters truncated) ... 705, 706, 708, 710, 711, 712, 716, 718, 719, 721, 722, 724, 729, 730, 732, 735, 736, 741, 743, 745, 749, 756, 757, 758, 762, 766, 768, 769, 771, 773}}, {'index': 5, 'words': 'loan', 'docids': {0, 512, 517, 519, 538, 29, 33, 34, 557, 558, 47, 559, 564, 574, 578, 580, 70, 73, 76, 79, 616, 621, 113, 114, 115, 116, 117, 123, 124, 127, 128, 129, ... (75 characters truncated) ... 711, 200, 219, 227, 228, 234, 235, 241, 758, 771, 309, 310, 340, 343, 346, 349, 354, 365, 368, 380, 383, 384, 385, 386, 440, 447, 448, 451, 453, 474}}, {'index': 6, 'words': 'exchang', 'docids': {0, 416, 290, 354, 357, 425, 10, 302, 430, 405, 376, 415}}, {'index': 7, 'words': 'issu', 'docids': {0, 386, 419, 676, 390, 397, 302, 272, 274, 306, 722, 700, 350}} ... displaying 10 of 1969 total bound parameter sets ... {'index': 1967, 'words': '86', 'docids': {774}}, {'index': 1968, 'words': 'separ', 'docids': {774}})]
I am not able to understand the existing solution posted on Stackoverflow regarding a similar error. Someone please help me out.

How should I find the line of best fit for a possible logarithmic scatter plot?

This is the first time I am giving a go with scikit learn. However, I am struggling to get the closest line of best fit using the following data
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
y = [0, 187, 262, 296, 319, 340, 359, 376, 388, 401, 411, 414, 423, 430, 433, 439, 446, 452, 457, 461, 465, 469, 470, 470, 472, 474, 479, 484, 486, 487, 489, 489, 491, 491, 491, 494, 494, 498, 500, 500, 500, 500, 505, 506, 506, 506, 506, 507, 508, 509, 509, 509, 511, 511, 512, 514, 515, 515, 515, 517, 517, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 518, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519, 519]
I was able to graph the following out of of matplotlib...
...as a result of the code below...
fig, ax = plt.subplots(figsize = (10,8))
ax1 = plt.scatter(x, y, c = 'brown')
def func(x, a, b,c):
return a*np.log2(b+x)+c
popt, pcov = curve_fit(func, frequency['pct'], frequency['Facility Count Military'])
print(popt)
#popt was the following: [4.28209689e+01 1.46600585e-02 2.59467635e+02]
ax2 = sns.lineplot(frequency['pct'], popt[0]*np.log2(popt[1]+frequency['pct'])+popt[2], c = 'black')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([0, 530])
plt.xlim([0, 100])
plt.title('y over x', y = 1, fontsize=15, fontweight='semibold')
plt.show()
(a) Is my methodology correct?
(b) Does it make sense to make a line of best fit with a log based 2 line or is this something different?
Edited:
Nevermind about part c. I just edited the code accordingly and figured that out on my own.
(c) Is there a way to translate the "popt" into the line graph that will eventually be used?
Any assistance on this is truly appreciated.

How to deal with large array in numpy/python when memory becomes issue

I have an array all with dimensions (19494500, 376) I need to arrange these 376 columns in a particular sequence I have generated,
l
array([ 0, 94, 188, 282, 1, 95, 189, 283, 2, 96, 190, 284, 3,
97, 191, 285, 4, 98, 192, 286, 5, 99, 193, 287, 6, 100,
194, 288, 7, 101, 195, 289, 8, 102, 196, 290, 9, 103, 197,
291, 10, 104, 198, 292, 11, 105, 199, 293, 12, 106, 200, 294,
13, 107, 201, 295, 14, 108, 202, 296, 15, 109, 203, 297, 16,
110, 204, 298, 17, 111, 205, 299, 18, 112, 206, 300, 19, 113,
207, 301, 20, 114, 208, 302, 21, 115, 209, 303, 22, 116, 210,
304, 23, 117, 211, 305, 24, 118, 212, 306, 25, 119, 213, 307,
26, 120, 214, 308, 27, 121, 215, 309, 28, 122, 216, 310, 29,
123, 217, 311, 30, 124, 218, 312, 31, 125, 219, 313, 32, 126,
220, 314, 33, 127, 221, 315, 34, 128, 222, 316, 35, 129, 223,
317, 36, 130, 224, 318, 37, 131, 225, 319, 38, 132, 226, 320,
39, 133, 227, 321, 40, 134, 228, 322, 41, 135, 229, 323, 42,
136, 230, 324, 43, 137, 231, 325, 44, 138, 232, 326, 45, 139,
233, 327, 46, 140, 234, 328, 47, 141, 235, 329, 48, 142, 236,
330, 49, 143, 237, 331, 50, 144, 238, 332, 51, 145, 239, 333,
52, 146, 240, 334, 53, 147, 241, 335, 54, 148, 242, 336, 55,
149, 243, 337, 56, 150, 244, 338, 57, 151, 245, 339, 58, 152,
246, 340, 59, 153, 247, 341, 60, 154, 248, 342, 61, 155, 249,
343, 62, 156, 250, 344, 63, 157, 251, 345, 64, 158, 252, 346,
65, 159, 253, 347, 66, 160, 254, 348, 67, 161, 255, 349, 68,
162, 256, 350, 69, 163, 257, 351, 70, 164, 258, 352, 71, 165,
259, 353, 72, 166, 260, 354, 73, 167, 261, 355, 74, 168, 262,
356, 75, 169, 263, 357, 76, 170, 264, 358, 77, 171, 265, 359,
78, 172, 266, 360, 79, 173, 267, 361, 80, 174, 268, 362, 81,
175, 269, 363, 82, 176, 270, 364, 83, 177, 271, 365, 84, 178,
272, 366, 85, 179, 273, 367, 86, 180, 274, 368, 87, 181, 275,
369, 88, 182, 276, 370, 89, 183, 277, 371, 90, 184, 278, 372,
91, 185, 279, 373, 92, 186, 280, 374, 93, 187, 281, 375])
So I am doing following
all_c = all[:,l]
but I am getting
"memory error"
Can you suggest what could be the most memory-efficient way?
Rather than permute the whole array at once you can do it row by row in place. Try
for r in range(all.shape[0]):
all[r] = all[r, l]

Python, calculating lag of the center of the data

My two data sets are:
fnamerp1=([ 93, 87, 96, 93, 90, 123, 111, 82, 87, 115, 103,
101, 93, 92, 111, 107, 114, 106, 116, 106, 128, 115,
141, 134, 120, 149, 140, 166, 152, 171, 192, 207, 227,
266, 270, 286, 355, 385, 397, 488, 462, 531, 579, 622,
711, 720, 801, 858, 906, 915, 915, 956, 1004, 1012, 1045,
1076, 1063, 1013, 985, 924, 959, 838, 766, 763, 742, 642,
587, 557, 484, 393, 353, 341, 284, 240, 221, 209, 147,
109, 113, 102, 71, 63, 63, 50, 29, 39, 36, 25,
30, 23, 27, 23, 19, 19, 24, 15, 23, 21, 26,
15])
fnamerp2=([ 105, 89, 120, 121, 103, 105, 113, 94, 104, 115, 122, 116, 121,
129, 118, 126, 138, 146, 161, 163, 178, 192, 194, 222, 268, 272,
285, 342, 380, 378, 373, 448, 493, 511, 571, 603, 691, 772, 738,
796, 839, 832, 883, 930, 963, 975, 972, 931, 947, 941, 934, 964,
871, 869, 826, 793, 733, 708, 606, 610, 515, 483, 409, 352, 358,
264, 266, 205, 191, 167, 136, 138, 99, 102, 82, 57, 65, 53,
51, 32, 26, 27, 39, 21, 29, 23, 25, 24, 16, 17, 27,
33, 19, 13, 24, 26, 18, 22, 18, 20])
I want to find the lag between the center of the two peaks (not just their max). And my plan is to use np.argmax(signal.correlate(fnamerp1,fnamerp2)).
What is the right way to do this both from a mathematical perspective and also elegant in Python?

Categories