Nodejs equivalent of Python's b'string'? - python

Give this salt in Python
salt = b"0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526"
What's the equivalent in Nodejs?
I have this
const salt = Buffer.from("0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526", "hex");
But upon conversion to string, they don't match.

This isn't a duplicate, but it should explain, why your code doesn't work: What is the difference between a string and a byte string? A byte string in Python isn't a hex encoding of a string.
The equivalent Node.js code is:
const salt = Buffer.from("0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526");
without 'hex'. The default encoding is 'utf8'.
You can see it with:
salt = b"0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526"
print(' '.join(map(str, salt)))
# output:
# 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 52 100 54 101 99 49 54 100 97 102 101 57 100 56 51 55 48 57 53 56 54 54 52 99 49 100 99 52 50 50 102 52 53 50 56 57 50 50 54 52 99 53 57 53 50 54
and
const salt = Buffer.from("0000000000000000004d6ec16dafe9d8370958664c1dc422f452892264c59526");
console.log(salt.join(' '));
// output:
// 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 52 100 54 101 99 49 54 100 97 102 101 57 100 56 51 55 48 57 53 56 54 54 52 99 49 100 99 52 50 50 102 52 53 50 56 57 50 50 54 52 99 53 57 53 50 54

Related

How to extract the same column from a large dictionary of dataframes efficiently ? (PerformanceWarning)

Below given is the code to compute a dictionary of data frame for the purpose of this example:
import pandas as pd
import numpy as np
import random
from datetime import datetime
# Generate the large dictionary of dataframes
dic = {}
index = pd.date_range(datetime.strptime('01/01/2021', '%d/%m/%Y'), periods=100).tolist()
for i in np.arange(0,1000):
# Initialise data:
data = {'col1':random.sample(range(0, 100), 100),
'col2':random.sample(range(0, 100), 100),
'col3':random.sample(range(0, 100), 100),
'desired_col':random.sample(range(0, 100), 100),
'coln':random.sample(range(0, 100), 100),
}
# Create DataFrame
df = pd.DataFrame(index = index, data = data)
dic['df'+str(i)] = df
--> I have now a dictionary containing df0, df1, df2, dfn... etc, with all unique data but with the same columns' name and index.
My objective is to filter out / extract the same column from all these dfs and save it into a final data frame.
My temporary solution would be to go for a for loop arround the keys:
# Generate the new df's index
df_filtered = pd.DataFrame(index=dic[list(dic.keys())[0]].index)
for key in list(dic.keys()):
# Extract the df of the dictionary
df_i = dic[key].copy()
# Save the relevant column into the final df
df_filtered[key] = df_i[['desired_col']].copy()
But I get a PerformanceWarning: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider using pd.concat instead.
I am sure there would be a more elegant way to accomplish the same task.
Many thanks in advance.
You can use .concat + .unstack:
df = pd.concat(dic)
print(df["desired_col"].unstack(level=0))
Prints:
df0 df1 df2 df3 df4 df5 df6 df7 df8 df9 df10 df11 df12 df13 df14 df15 df16 df17 df18 df19 df20 df21 df22 df23 df24 df25 df26 df27 df28 df29 df30 df31 df32 df33 df34 df35 df36 df37 df38 df39 df40 df41 df42 df43 df44 df45 df46 df47 df48 df49 df50 df51 df52 df53 df54 df55 df56 df57 df58 df59 df60 df61 df62 df63 df64 df65 df66 df67 df68 df69 df70 df71 df72 df73 df74 df75 df76 df77 df78 df79 df80 df81 df82 df83 df84 df85 df86 df87 df88 df89 df90 df91 df92 df93 df94 df95 df96 df97 df98 df99 df100 df101 df102 df103 df104 df105 df106 df107 df108 df109 df110 df111 df112 df113 df114 df115 df116 df117 df118 df119 df120 df121 df122 df123 df124 df125 df126 df127 df128 df129 df130 df131 df132 df133 df134 df135 df136 df137 df138 df139 df140 df141 df142 df143 df144 df145 df146 df147 df148 df149 df150 df151 df152 df153 df154 df155 df156 df157 df158 df159 df160 df161 df162 df163 df164 df165 df166 df167 df168 df169 df170 df171 df172 df173 df174 df175 df176 df177 df178 df179 df180 df181 df182 df183 df184 df185 df186 df187 df188 df189 df190 df191 df192 df193 df194 df195 df196 df197 df198 df199 df200 df201 df202 df203 df204 df205 df206 df207 df208 df209 df210 df211 df212 df213 df214 df215 df216 df217 df218 df219 df220 df221 df222 df223 df224 df225 df226 df227 df228 df229 df230 df231 df232 df233 df234 df235 df236 df237 df238 df239 df240 df241 df242 df243 df244 df245 df246 df247 df248 df249 df250 df251 df252 df253 df254 df255 df256 df257 df258 df259 df260 df261 df262 df263 df264 df265 df266 df267 df268 df269 df270 df271 df272 df273 df274 df275 df276 df277 df278 df279 df280 df281 df282 df283 df284 df285 df286 df287 df288 df289 df290 df291 df292 df293 df294 df295 df296 df297 df298 df299 df300 df301 df302 df303 df304 df305 df306 df307 df308 df309 df310 df311 df312 df313 df314 df315 df316 df317 df318 df319 df320 df321 df322 df323 df324 df325 df326 df327 df328 df329 df330 df331 df332 df333 df334 df335 df336 df337 df338 df339 df340 df341 df342 df343 df344 df345 df346 df347 df348 df349 df350 df351 df352 df353 df354 df355 df356 df357 df358 df359 df360 df361 df362 df363 df364 df365 df366 df367 df368 df369 df370 df371 df372 df373 df374 df375 df376 df377 df378 df379 df380 df381 df382 df383 df384 df385 df386 df387 df388 df389 df390 df391 df392 df393 df394 df395 df396 df397 df398 df399 df400 df401 df402 df403 df404 df405 df406 df407 df408 df409 df410 df411 df412 df413 df414 df415 df416 df417 df418 df419 df420 df421 df422 df423 df424 df425 df426 df427 df428 df429 df430 df431 df432 df433 df434 df435 df436 df437 df438 df439 df440 df441 df442 df443 df444 df445 df446 df447 df448 df449 df450 df451 df452 df453 df454 df455 df456 df457 df458 df459 df460 df461 df462 df463 df464 df465 df466 df467 df468 df469 df470 df471 df472 df473 df474 df475 df476 df477 df478 df479 df480 df481 df482 df483 df484 df485 df486 df487 df488 df489 df490 df491 df492 df493 df494 df495 df496 df497 df498 df499 df500 df501 df502 df503 df504 df505 df506 df507 df508 df509 df510 df511 df512 df513 df514 df515 df516 df517 df518 df519 df520 df521 df522 df523 df524 df525 df526 df527 df528 df529 df530 df531 df532 df533 df534 df535 df536 df537 df538 df539 df540 df541 df542 df543 df544 df545 df546 df547 df548 df549 df550 df551 df552 df553 df554 df555 df556 df557 df558 df559 df560 df561 df562 df563 df564 df565 df566 df567 df568 df569 df570 df571 df572 df573 df574 df575 df576 df577 df578 df579 df580 df581 df582 df583 df584 df585 df586 df587 df588 df589 df590 df591 df592 df593 df594 df595 df596 df597 df598 df599 df600 df601 df602 df603 df604 df605 df606 df607 df608 df609 df610 df611 df612 df613 df614 df615 df616 df617 df618 df619 df620 df621 df622 df623 df624 df625 df626 df627 df628 df629 df630 df631 df632 df633 df634 df635 df636 df637 df638 df639 df640 df641 df642 df643 df644 df645 df646 df647 df648 df649 df650 df651 df652 df653 df654 df655 df656 df657 df658 df659 df660 df661 df662 df663 df664 df665 df666 df667 df668 df669 df670 df671 df672 df673 df674 df675 df676 df677 df678 df679 df680 df681 df682 df683 df684 df685 df686 df687 df688 df689 df690 df691 df692 df693 df694 df695 df696 df697 df698 df699 df700 df701 df702 df703 df704 df705 df706 df707 df708 df709 df710 df711 df712 df713 df714 df715 df716 df717 df718 df719 df720 df721 df722 df723 df724 df725 df726 df727 df728 df729 df730 df731 df732 df733 df734 df735 df736 df737 df738 df739 df740 df741 df742 df743 df744 df745 df746 df747 df748 df749 df750 df751 df752 df753 df754 df755 df756 df757 df758 df759 df760 df761 df762 df763 df764 df765 df766 df767 df768 df769 df770 df771 df772 df773 df774 df775 df776 df777 df778 df779 df780 df781 df782 df783 df784 df785 df786 df787 df788 df789 df790 df791 df792 df793 df794 df795 df796 df797 df798 df799 df800 df801 df802 df803 df804 df805 df806 df807 df808 df809 df810 df811 df812 df813 df814 df815 df816 df817 df818 df819 df820 df821 df822 df823 df824 df825 df826 df827 df828 df829 df830 df831 df832 df833 df834 df835 df836 df837 df838 df839 df840 df841 df842 df843 df844 df845 df846 df847 df848 df849 df850 df851 df852 df853 df854 df855 df856 df857 df858 df859 df860 df861 df862 df863 df864 df865 df866 df867 df868 df869 df870 df871 df872 df873 df874 df875 df876 df877 df878 df879 df880 df881 df882 df883 df884 df885 df886 df887 df888 df889 df890 df891 df892 df893 df894 df895 df896 df897 df898 df899 df900 df901 df902 df903 df904 df905 df906 df907 df908 df909 df910 df911 df912 df913 df914 df915 df916 df917 df918 df919 df920 df921 df922 df923 df924 df925 df926 df927 df928 df929 df930 df931 df932 df933 df934 df935 df936 df937 df938 df939 df940 df941 df942 df943 df944 df945 df946 df947 df948 df949 df950 df951 df952 df953 df954 df955 df956 df957 df958 df959 df960 df961 df962 df963 df964 df965 df966 df967 df968 df969 df970 df971 df972 df973 df974 df975 df976 df977 df978 df979 df980 df981 df982 df983 df984 df985 df986 df987 df988 df989 df990 df991 df992 df993 df994 df995 df996 df997 df998 df999
2021-01-01 82 26 60 7 21 85 78 98 67 45 24 53 29 61 47 86 89 55 95 95 26 47 57 6 44 81 98 48 48 65 91 11 54 58 74 88 62 55 73 36 67 81 46 36 18 21 65 32 33 61 0 57 81 71 42 98 82 9 16 54 91 41 80 83 24 52 65 86 5 67 17 86 56 36 44 88 34 14 3 63 11 36 54 44 88 98 85 15 95 35 31 1 9 94 35 63 0 86 13 22 73 80 79 90 26 79 79 46 3 97 19 35 11 20 84 16 83 11 93 26 43 49 18 38 12 41 35 12 73 11 17 78 45 29 13 15 29 52 70 58 30 29 2 81 32 5 84 91 28 81 85 77 59 11 61 45 11 97 36 10 77 84 26 93 56 60 12 36 47 42 73 65 47 79 86 51 32 77 99 58 99 19 38 88 40 49 66 82 13 97 23 65 4 75 63 18 35 48 1 6 37 82 55 96 93 93 5 67 42 51 57 75 72 59 55 40 5 93 35 62 67 29 41 39 54 96 98 75 96 1 55 27 51 16 88 28 78 31 24 54 88 27 67 94 34 83 4 30 92 85 4 30 24 76 60 9 86 38 87 31 40 15 13 99 51 3 94 98 24 6 11 85 57 32 41 3 56 31 24 17 33 99 32 39 22 42 40 30 35 61 3 96 25 35 83 77 79 64 79 60 36 57 97 6 42 11 55 97 25 2 28 86 30 84 24 72 54 86 61 35 65 65 38 67 58 39 39 81 43 63 91 68 7 35 27 21 60 95 79 16 71 84 99 25 97 42 39 64 8 40 19 31 6 84 69 49 48 15 78 46 28 66 97 81 97 26 29 60 96 52 90 9 48 39 74 52 76 97 78 8 96 6 60 34 11 88 66 8 57 66 11 35 64 59 25 53 56 90 60 35 29 81 74 18 39 23 82 75 28 23 34 18 78 75 30 25 41 62 66 9 6 63 79 21 67 22 78 90 59 11 76 85 50 45 25 64 84 98 11 12 62 41 90 89 11 18 71 4 31 56 96 48 31 13 12 54 70 52 91 3 3 5 95 24 16 81 88 10 97 71 80 58 19 62 79 31 42 44 73 21 63 56 15 15 79 31 8 43 5 55 28 23 23 32 74 85 44 88 96 39 77 41 44 64 26 60 56 53 31 30 35 24 32 50 49 35 89 79 87 38 42 64 21 62 56 91 47 78 23 75 89 54 66 38 60 88 25 28 83 74 11 87 11 78 25 41 46 58 59 39 39 67 42 52 27 2 20 21 17 72 59 92 66 32 0 39 81 93 17 82 12 20 64 67 93 78 72 80 53 14 74 78 9 76 5 29 57 68 67 40 20 74 0 13 70 2 41 41 5 29 83 10 43 19 55 10 66 68 92 64 16 59 36 70 14 62 97 28 66 84 69 70 76 43 32 33 0 52 62 88 38 58 5 10 56 16 19 70 45 7 50 15 44 29 37 5 50 84 1 90 48 25 41 4 50 47 77 43 42 31 32 55 24 13 52 67 44 87 22 53 11 80 89 45 86 38 90 9 88 96 48 59 37 22 59 76 99 54 98 30 99 96 77 94 46 31 25 25 99 74 72 37 95 14 58 74 82 34 52 28 19 53 55 28 62 60 98 53 38 1 24 87 84 12 56 83 87 20 7 85 11 46 37 47 99 61 49 0 17 90 47 50 13 32 57 87 27 51 42 6 82 75 78 61 49 55 36 47 59 77 20 62 10 45 58 50 70 24 80 67 43 65 32 94 45 27 17 27 29 21 97 5 15 80 95 58 0 5 90 28 96 3 38 48 93 51 44 70 85 85 72 35 25 69 25 54 58 38 58 51 7 30 65 13 57 35 9 73 80 7 66 78 46 46 0 41 41 29 72 34 61 52 84 25 86 92 13 56 67 44 59 18 29 52 54 54 76 67 98 53 83 63 14 93 20 54 35 18 69 52 31 54 83 93 92 41 75 34 45 95 27 94 22 1 56 90 80 58 39 16 73 64 15 53 96 91 37 65 13 86 0 89 42 93 7 11 22 16 14 65 61 39 20 96 84 73 57 64 2 69 51 30 93 70 70 11 13 23 12 93 73 82 42 37 79 45 36 11 21 23 81 46 95 55 51 32 37 4 8 92 13 39 56 94 57 81 55 90 54 94 25 5 61 7 91 47 23 80 42 33 26 18 63 70 87 99 27 46 65 74 10 72 8 32 23 86 88 86 6 97 29 33 50 41 40 1 13 53 53 3 15 45 29 83 13 42 55 34 50 39
2021-01-02 46 77 94 40 15 39 79 69 94 11 60 39 1 57 50 81 23 52 72 6 37 10 32 76 92 53 9 20 46 87 36 64 94 25 32 35 17 71 89 28 47 21 85 71 72 69 49 46 80 20 95 61 68 89 93 41 86 54 81 29 79 77 22 52 97 9 63 60 66 47 60 49 1 48 76 32 52 51 30 69 16 54 15 89 10 29 96 16 52 66 49 87 97 82 60 56 93 80 69 52 40 33 1 63 82 88 91 55 61 59 68 77 12 93 47 70 24 76 3 7 32 8 75 81 58 83 49 75 58 22 32 53 30 71 35 28 17 3 95 17 32 38 80 39 24 76 99 46 82 42 68 7 19 80 52 10 8 82 34 13 39 66 8 9 28 78 98 12 58 69 52 96 54 10 55 30 96 74 16 74 29 61 10 8 99 86 64 1 26 38 38 38 67 33 15 0 39 99 84 8 17 78 99 4 27 27 97 77 0 98 72 76 32 2 98 93 3 61 43 41 37 1 77 12 49 47 46 29 83 14 54 35 95 39 23 41 64 27 62 89 54 54 76 8 21 13 24 62 48 76 18 64 22 36 87 25 65 34 89 97 86 35 78 44 95 90 43 93 51 19 29 10 17 97 80 33 97 20 76 62 52 76 46 45 99 6 86 11 59 77 10 26 90 71 9 28 52 57 4 81 81 76 51 75 52 33 86 17 10 55 44 20 97 28 11 6 15 35 35 97 79 4 95 36 71 5 0 79 39 32 37 73 89 5 16 28 22 61 36 44 85 91 84 10 85 66 34 2 80 68 52 43 17 40 20 45 85 75 27 92 60 87 76 74 19 61 97 50 11 90 54 25 21 32 48 38 29 16 16 31 31 61 78 35 12 78 84 39 86 40 55 93 40 42 65 65 41 80 90 99 51 52 11 34 0 50 49 69 19 62 51 37 42 88 64 24 74 63 60 75 63 57 80 60 26 84 33 96 8 46 1 35 83 60 45 9 69 67 94 75 1 82 15 64 2 5 2 59 68 86 89 73 74 38 85 56 43 6 95 42 51 73 4 97 91 23 13 92 60 63 56 56 8 70 31 63 87 3 55 76 46 69 32 79 72 80 66 93 26 82 22 71 76 14 8 43 96 37 90 28 15 48 41 7 8 17 25 78 90 9 3 6 16 10 26 91 76 58 61 5 78 82 48 89 38 62 58 94 25 11 28 16 13 80 86 8 35 85 90 27 84 95 77 35 63 56 86 50 1 14 86 5 53 60 77 69 74 6 31 94 43 74 99 75 43 75 40 55 1 78 57 27 56 36 31 41 25 17 51 90 82 76 25 79 63 8 10 1 29 46 81 89 2 9 37 85 59 40 13 55 74 78 30 93 29 64 31 36 87 51 98 42 18 25 26 22 76 94 65 56 68 6 43 16 63 2 93 77 74 0 17 37 32 70 33 40 13 44 41 28 3 56 11 57 65 53 69 24 86 31 37 80 50 74 91 6 37 18 79 62 65 8 80 29 38 86 26 59 65 84 23 90 16 63 98 61 65 51 39 26 9 94 41 89 71 78 23 95 90 91 88 63 19 2 24 28 89 26 64 80 73 51 84 93 61 62 98 10 28 81 6 52 79 52 13 25 83 50 68 68 61 1 83 92 18 19 60 22 33 98 17 56 3 80 70 97 24 81 3 69 1 47 92 52 46 10 33 13 85 80 89 16 67 49 61 95 69 25 63 56 88 15 6 57 13 55 95 25 32 53 97 34 76 50 99 81 42 86 38 43 12 87 76 84 29 29 93 43 96 92 6 14 69 40 91 46 51 55 28 14 36 17 0 24 4 39 61 18 51 86 94 33 71 64 52 68 12 46 97 17 97 46 18 55 50 82 97 5 97 58 95 97 73 19 23 11 36 14 55 35 0 42 51 14 15 5 6 34 82 34 27 27 88 64 61 35 51 3 72 27 11 27 77 19 67 27 87 25 50 87 66 60 63 77 31 36 77 0 15 84 71 79 93 81 33 25 90 72 77 8 98 50 71 52 36 10 67 72 18 94 89 0 33 26 79 12 36 95 22 26 27 97 41 16 3 49 2 76 53 47 76 33 92 63 99 19 10 96 7 35 92 99 87 6 52 3 8 80 77 48 66 60 1 6 13 85 2 4 79 33 37 79 65 55 72 95 9 18 44 86 14 23 44 72 73 97 85 97 97 75 86 32 26 9 42 34 79 83 55 10 31 69 29 9 7 70 14 46 2 73 86 29 74 4 49 95 91 73 43 27
...
How about something akin to this:
pd.concat([i['desired_col'] for i in dic.values()], axis=1)

Downsample matrix from an image python

I need help to understand how to downsample a matrix from an image to a matrix of 10x10.
Libraries:
import sys
import cv2
import numpy
from PIL import Image
from numpy import array
I used PIL to get the image and used numpy to get the matrix array
im_1 = Image.open(r"C:\Users\DELL\Downloads\FaceDataset\s1\1.pgm")
ar = array(im_1)
numpy.set_printoptions(threshold=sys.maxsize)
print("Orignal Image Matrix")
print(ar)
output:
[[ 48 49 45 47 49 57 39 42 53 49 53 60 76 91 99 95 80 75
66 54 47 49 50 43 46 53 61 70 84 105 133 130 110 94 81 107
95 80 57 55 66 86 80 74 65 71 62 84 52 74 71 67 64 88
68 71 75 66 57 61 62 52 47 50 58 60 64 66 57 46 54 66
80 80 68 71 87 64 77 66 83 77 58 46 41 43 56 55 51 56
56 54]
this is not the complete output. how can i downsample this into a 10x10 matrix?
note: i used opencv to use the "pyrDown"but it didn't give the output i need
I solved it by image slicing method in 2d array
dsAr = ar[::12, ::10]
print(dsAr)
note: slicing notation is => start:stop:step

Unable to return correct output from created function in JES

I am fairly new to jython and I am inquiring about creating a function that is dealing with a list. Basically what I am trying to do with the below function is create a function that will loop through the entirety of my list, then find the lowest value within that list and return the variable with the lowest number. Though, I keep getting a return of function min at 0x26 everytime I execute the main() I receive the same message but it seems as if the function min at 0x26 will count up ex: 0x27, 0x28... Not sure why this is. As my list only contains integers of minimum 0 to max 99.
Here is the sourcecode:
def min(dataset): #defining a function minimum, with input dataset(the list we are using)..
min = dataset[0]
for num in range(0, len(dataset)):
if dataset[num] < min:
min = dataset(num)
return min
minimum = min(dataset)
print(str(minimum))
Here is the code in its entirety. Though, I currently have a way to find the min/max values in the list. I am looking to move towards a function, as I want to know how to correctly use a function.
def main( ):
dataset = [0]
file = open("D:\numbs.dat", "r")
for line in file: #loop for writing over every line to a storage loc.
num = int(float(line)) #converting int to string
dataset.append(num) #appending the data to 'dataset' list
max = dataset[0] #setting an imaginary max initially
low = dataset[0] #setting an imaginary low initially
for i in range(0, len(dataset)): #for loop to scan thru entire list
if dataset[i] > max: #find highest through scan and replacing each max in var max
max = dataset[i]
if dataset[i] < low: #find lowest through scan and replacing each max in var max
low = dataset[i]
#printNow(dataset) #printing the list in its entirety
#printNow("The maximum is " +str(max)) #print each values of lowest and highest
#printNow("The lowest is " +str(low))
def min(dataset): #defining a function minimum..
min = dataset[0]
for num in range(0, len(dataset)):
if dataset[num] < min:
min = dataset(num)
return min
minimum = min(dataset)
print(str(minimum)) #test to see what output is.
As mentioned above, there is the for loop for finding max/min values. Though I tried doing the same exact thing for the function I am trying to create...
the contents of the numbs.dat can be found here (1001 entries):
70
75
76
49
73
76
52
63
11
25
19
89
17
48
5
48
29
41
23
84
28
39
67
48
97
34
0
24
47
98
0
64
24
51
45
11
37
77
5
54
53
33
91
0
27
0
80
5
11
66
45
57
48
25
72
8
38
29
93
29
58
5
72
36
94
18
92
17
43
82
44
93
10
38
31
52
44
10
50
22
39
71
46
40
33
51
51
57
27
24
40
61
88
87
40
85
91
99
6
3
56
10
85
38
61
91
31
69
39
74
9
17
80
96
49
0
47
68
12
5
6
60
81
51
62
87
70
66
50
30
30
22
45
35
2
39
23
63
35
69
83
84
69
6
54
74
3
29
31
54
45
79
21
74
30
77
77
80
26
63
84
21
58
54
69
2
50
79
90
26
45
29
97
28
57
22
59
2
72
1
92
35
38
2
47
23
52
77
87
34
84
15
84
13
23
93
19
50
99
74
59
4
73
93
29
61
8
45
10
20
15
95
58
43
75
19
61
39
68
47
69
58
88
82
33
30
72
21
74
12
18
0
52
50
62
21
66
26
56
84
16
12
7
45
58
22
26
95
82
6
74
12
16
2
61
58
22
39
0
53
88
79
71
13
54
25
31
93
48
91
90
45
23
54
42
39
78
25
95
58
2
41
61
72
98
91
48
97
93
11
12
1
35
80
81
86
38
70
67
55
55
87
73
79
31
43
97
79
3
51
17
58
70
34
59
61
28
46
13
42
18
0
18
75
75
62
50
62
85
49
83
71
63
32
27
59
42
46
8
13
39
25
13
94
17
48
73
40
31
31
86
23
81
40
92
24
94
67
30
18
74
78
62
89
1
27
95
99
33
53
74
5
84
88
8
52
0
24
21
99
1
74
84
94
29
25
83
93
98
40
21
66
93
28
72
63
77
9
71
18
87
50
77
48
68
88
22
33
16
79
68
69
94
64
5
28
33
22
21
74
44
62
68
47
93
69
9
42
44
87
64
97
42
34
90
70
91
12
18
84
65
23
99
1
55
6
1
23
92
50
96
96
68
27
17
98
42
10
27
26
20
13
94
73
75
12
12
25
33
1
33
67
61
0
98
71
35
75
68
56
45
11
1
69
57
9
15
96
69
2
0
65
44
86
78
97
17
4
81
23
4
43
24
72
70
57
21
91
84
94
40
96
40
78
46
67
6
7
16
49
24
14
12
82
73
60
42
76
62
10
84
49
75
89
43
47
31
68
15
11
32
37
98
72
40
25
69
30
64
60
48
21
11
74
54
24
60
10
96
29
39
53
48
24
68
4
52
12
6
91
15
86
77
65
68
22
91
36
72
82
81
9
77
0
5
83
27
88
17
35
66
76
78
81
19
51
87
66
26
59
65
2
37
37
73
34
98
37
78
92
17
52
62
40
50
84
34
22
25
42
90
19
86
76
68
42
9
89
57
78
64
89
12
34
94
9
77
58
32
27
97
93
79
35
32
75
97
79
65
90
53
43
98
4
99
5
79
38
99
60
78
64
90
2
39
42
52
2
21
77
15
8
87
13
0
4
7
43
76
31
74
16
87
50
73
49
14
35
10
37
91
44
88
71
95
75
98
7
17
23
13
16
77
20
50
50
74
78
58
30
21
74
76
93
5
74
94
83
23
67
18
5
50
47
56
79
26
84
78
48
71
43
41
8
91
23
7
11
96
87
12
42
32
44
99
67
99
64
96
52
19
79
60
66
52
62
17
61
54
24
25
36
4
78
3
94
91
62
65
76
94
2
52
25
61
55
49
88
85
96
5
46
56
48
17
25
3
70
62
3
50
45
47
58
12
41
27
42
90
91
71
53
4
79
47
68
43
87
35
63
10
49
4
81
45
88
80
6
92
47
70
40
7
33
70
61
30
9
55
42
83
26
72
57
77
91
13
15
33
13
62
49
43
65
73
98
59
56
77
62
12
25
33
53
78
73
1
17
44
56
95
10
33
89
33
20
56
69
66
60
53
83
58
43
33
25
21
8
28
65
51
70
53
78
49
30
64
17
76
9
2
32
87
77
39
25
21
66
65
54
81
49
15
27
7
14
4
11
94
9
84
23
13
95
45
67
57
20
3
58
50
97
35
68
47
41
84
59
46
34
19
25
77
29
41
89
80
61
70
40
1
18
32
70
86
76
25
98
99
40
43
92
43
4
70
78
72
71
85
14
84
73
92
60
23
57
44
56
6
96
39
91
63
43
39
71
80
18
93
54
1
4
46
68
93
74
74
88
52
88
55
24
19
92
53
59
1
91
48
47
Let me know what the heck I am doing wrong. Thanks!
#ohGosh welcome to stack overflow. You are almost there with the solution. There are few problems with your program
1) Nums.dat file contains just one line with numbers separated by spaces, not a new line(\n). In order to get the read the numbers from the file do the following
dataset = [] #Create an empty list
file = open("D:\numbs.dat", "r") #Open the file
for line in file:
tempData = line.split(" ") #Returns a list with space used as delimiter
dataset = map(int, tempData) #Convert string data to int
2) Wrong way to get data from a list in the min function
Use
min = dataset[num]
Instead of
min = dataset(num)
Fix this and your program will work. Cheers.

Object Similarity Pandas and Scikit Learn

Is there a way to find to find and rank rows in a Pandas Dataframe by their similarity to a row from another Dataframe?
My understanding of your question: you have two data frames, hopfully of the same column count. You want to rate first data frame's, the subject data frame, members by how close, i.e. similar, they are to any of the members of the target data frame.
I am not aware of a built in method.
It is probably not the most efficient way but here is how I'd approach:
#! /usr/bin/python3
import pandas as pd
import numpy as np
import pprint
pp = pprint.PrettyPrinter(indent=4)
# Simulate data
df_subject = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # This is the one we're iterating to check similarity to target.
df_target = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # This is the one we're checking distance to
# This will hold the min dstances.
distances=[]
# Loop to iterate over subject DF
for ix1,subject in df_subject.iterrows():
distances_cur=[]
# Loop to iterate over target DF
for ix2,target in df_target.iterrows():
distances_cur.append(np.linalg.norm(target-subject))
# Get the minimum distance for the subject set member.
distances.append(min(distances_cur))
# Distances to df
distances=pd.DataFrame(distances)
# Normalize.
distances=0.5-(distances-distances.mean(axis=0))/distances.max(axis=0)
# Column index joining, ordering and beautification.
Proximity_Ratings_name='Proximity Ratings'
distances=distances.rename(columns={0: Proximity_Ratings_name})
df_subject=df_subject.join(distances)
pp.pprint(df_subject.sort_values(Proximity_Ratings_name,ascending=False))
It should yeild something like the table below. Higher rating means there's a similar member in the target data frame:
A B C D Proximity Ratings
55 86 21 91 78 0.941537
38 91 31 35 95 0.901638
43 49 89 49 6 0.878030
98 28 98 98 36 0.813685
77 67 23 78 84 0.809324
35 52 16 36 58 0.802223
54 2 25 61 44 0.788591
95 76 3 60 46 0.766896
5 55 39 88 37 0.756049
52 79 71 90 70 0.752520
66 52 27 82 82 0.751353
41 45 67 55 33 0.739919
76 12 93 50 62 0.720323
94 99 84 39 63 0.716123
26 62 6 97 60 0.715081
40 64 50 37 27 0.714042
68 70 21 8 82 0.698824
47 90 54 60 65 0.676680
7 85 95 45 71 0.672036
2 14 68 50 6 0.661113
34 62 63 83 29 0.659322
8 87 90 28 74 0.647873
75 14 61 27 68 0.633370
60 9 91 42 40 0.630030
4 46 46 52 35 0.621792
81 94 19 82 44 0.614510
73 67 27 34 92 0.608137
30 92 64 93 51 0.608137
11 52 25 93 50 0.605770
51 17 48 57 52 0.604984
.. .. .. .. .. ...
64 28 56 0 9 0.397054
18 52 84 36 79 0.396518
99 41 5 32 34 0.388519
27 19 54 43 94 0.382714
92 69 56 73 93 0.382714
59 1 29 46 16 0.374878
58 2 36 8 96 0.362525
69 58 92 16 48 0.361505
31 27 57 80 35 0.349887
10 59 23 47 24 0.345891
96 41 77 76 33 0.345891
78 42 71 87 65 0.344398
93 12 31 6 27 0.329152
23 6 5 10 42 0.320445
14 44 6 43 29 0.319964
6 81 51 44 15 0.311840
3 17 60 13 22 0.293066
70 28 40 22 82 0.251549
36 95 72 35 5 0.249354
49 78 10 30 18 0.242370
17 79 69 57 96 0.225168
46 42 95 86 81 0.224742
84 58 81 59 86 0.221346
9 9 62 8 30 0.211659
72 11 51 74 8 0.159265
90 74 26 80 1 0.138993
20 90 4 6 5 0.117652
50 3 12 5 53 0.077088
42 90 76 42 1 0.075284
45 94 46 88 14 0.054244
Hope I understand correctly. Don't use if performance matters, I'm sure there's an algebraic way to approach this (Multiply matrices) that would run way faster.

python index error pop up when I have a longer input

I am solving one sorting problem. I encountered with one problem which has been troubling me for 2 days.
I ran the code for a shorter input, it worked just as what I expected.
However, when I put a much longer input into this program, runtime error emerges.
Here is the code:
row_number, row_length = input().split()
row_number, row_length = int(row_number), int(row_length)
def row_input():
data_input = []
for i in range(0,row_number):
row = list(map(int,input().split()))
data_input.append(row)
return data_input
def sort_data(data):
k = int(input())
sorted_data = []
for row in data:
sorted_data.append(row[k])
sorted_data.sort()
n = 0
while n < row_number:
for m in data:
if sorted_data[n] == m[k]:
print_data(m)
n = n + 1
def print_data(data):
b=''
for n in data:
b= b + str(n).ljust(len(str(n))+1)
print(b)
data = row_input()
sort_data(data)
Here is the short input:
10 3
1 1 1
1 1 2
1 1 3
1 1 4
2 2 5
2 3 6
2 3 7
2 3 8
2 3 9
2 4 0
1
Here is the longer input:
100 10
64 79 18 94 46 81 74 97 71 92
46 24 23 20 68 15 53 93 24 91
17 66 34 64 28 5 55 25 44 96
16 71 80 84 5 79 63 77 69 77
33 77 24 13 58 81 41 36 73 62
93 26 16 55 61 51 39 69 29 45
44 85 1 48 23 59 52 82 50 37
77 74 9 21 35 54 81 57 32 76
82 21 72 49 98 21 77 64 6 63
68 17 93 83 12 43 84 28 96 86
9 16 3 89 38 11 70 25 41 38
49 99 31 19 85 97 80 63 16 69
50 85 80 75 36 48 56 69 63 94
78 80 83 86 92 60 56 90 22 73
69 81 45 9 67 25 82 46 68 82
98 38 23 31 38 83 37 76 69 82
95 48 21 64 25 6 38 96 69 23
44 97 46 54 21 56 65 51 66 34
87 22 27 24 55 48 90 10 8 51
21 6 74 78 8 88 26 63 72 43
64 4 42 20 54 91 2 51 79 40
93 76 52 58 40 78 98 27 53 48
85 23 86 30 91 49 81 4 59 9
88 96 77 95 36 71 7 52 14 20
69 98 21 94 14 35 28 97 3 9
60 47 56 34 35 61 9 44 80 92
4 76 57 28 60 3 46 4 6 17
59 44 88 7 71 60 84 12 91 38
76 57 5 2 25 12 46 62 32 68
14 15 11 1 34 20 54 58 45 38
89 49 16 43 74 51 80 22 88 31
8 98 51 73 32 13 59 12 56 92
36 82 9 63 77 79 77 25 52 91
63 82 58 75 13 20 79 89 55 89
58 37 93 1 29 72 78 95 47 35
90 82 58 60 55 86 82 22 44 94
55 17 51 99 29 92 1 79 96 34
32 78 41 1 24 52 11 80 3 25
30 32 32 71 85 80 63 23 80 97
35 22 11 71 10 48 43 58 31 33
30 98 60 58 28 71 95 28 21 29
74 4 13 99 90 64 28 27 73 4
52 21 52 31 35 82 35 64 21 71
92 85 13 48 5 32 92 70 15 85
47 55 25 80 24 22 19 78 17 43
3 91 71 53 49 39 96 88 59 61
79 26 98 2 95 95 70 38 82 85
69 67 41 11 95 39 20 19 96 36
11 74 48 23 84 49 47 43 27 90
4 28 35 14 70 62 52 94 46 91
72 11 14 82 59 51 93 98 55 79
90 84 84 24 21 81 11 57 27 78
98 97 59 51 89 40 96 35 25 59
73 85 64 17 46 9 79 54 27 15
48 91 7 56 41 6 4 26 96 39
43 22 34 89 52 59 55 52 38 42
10 31 9 8 21 46 29 4 97 4
44 49 78 31 53 29 11 35 46 14
44 39 57 35 9 63 85 5 97 24
9 72 49 50 41 47 23 71 15 45
51 6 98 64 75 35 39 48 2 50
92 22 72 60 96 15 17 4 79 27
90 30 98 28 92 8 83 71 24 62
5 54 86 14 71 96 87 2 58 78
37 61 60 30 46 96 49 58 27 48
14 59 22 35 75 60 55 28 91 85
21 1 85 85 78 67 24 69 22 17
76 61 84 64 33 76 61 10 33 95
71 9 1 32 31 80 69 7 25 59
69 64 78 85 21 88 56 70 92 74
79 12 8 9 54 56 37 44 1 84
6 66 54 5 82 17 41 25 3 71
8 44 63 17 75 43 87 15 85 3
15 42 15 59 38 22 46 27 19 13
54 71 76 93 67 39 46 12 78 46
23 82 71 34 31 61 94 58 10 62
30 8 43 38 7 23 77 38 93 32
32 72 46 59 64 45 14 73 62 72
76 26 47 89 25 73 79 28 60 48
41 58 85 55 29 64 39 84 20 87
24 8 70 16 69 32 17 26 58 16
40 53 40 63 22 37 11 74 7 8
23 4 56 39 27 94 91 72 14 61
41 86 3 29 41 15 99 50 82 84
33 5 22 93 73 86 99 87 26 66
73 25 55 46 69 38 99 14 43 55
43 21 82 30 90 66 6 67 49 25
81 38 65 40 80 7 90 82 33 13
18 45 1 90 53 51 51 96 32 90
32 69 51 22 71 85 80 61 99 23
88 8 41 92 4 25 64 89 30 75
93 85 99 87 67 3 54 16 98 57
33 54 31 83 64 93 3 24 65 81
74 19 15 66 17 14 34 50 57 16
10 30 20 97 32 85 83 89 68 18
46 82 9 14 54 50 55 28 26 96
29 96 3 33 12 52 11 26 19 22
50 81 95 59 76 53 10 9 72 87
25 85 54 43 53 13 52 70 38 76
20 14 30 80 23 43 27 67 42 11
5
Here is the error while running the longer input:
Traceback (most recent call last):
File "solution.py", line 30, in <module>
sort_data(data)
File "solution.py", line 19, in sort_data
if sorted_data[n] == m[k]:
IndexError: list index out of range
The problem is in your sorting logics, because it is highly possible you increment n by more than one in one iteration of the while loop if there are multiple matching rows in the dataset.
The right solution is simpler than you think:
def sort_data(data):
k = int(input())
output = sorted(data, key=lambda row: row[k])
for r in output:
print_data(r)
UPDATE: The smallest dataset on what your algorithm fails is:
2 1
2
1
0
A small modification on your function will stop it from overindexing. The key is to store sorted_data[n] in a variable, and that way it will not try to over index sorted_data when no more output is expected.
def sort_data(data):
k = int(input())
sorted_data = []
for row in data:
sorted_data.append(row[k])
sorted_data.sort()
n = 0
while n < row_number:
key = sorted_data[n]
for m in data:
if key == m[k]:
print_data(m)
n = n + 1
UPDATE:
The sorted function's key parameter is a function, which just selects a value what to sort by. In your case, selects the kth column, which is what you want to sort by.

Categories