I have a DataFrame with more than 2 million rows, that looks like this:
+-------------------+--------------+--------+----------------------------+-------------+
| PartitionKey | RowKey | Type | Path | Name |
+-------------------+--------------+--------+----------------------------+-------------+
| / | /People | Folder | /People | People |
| /People | /index1.xlsx | File | /People/index1.xlsx | index1.xlsx |
| /People | /index2.xlsx | File | /People/index2.xlsx | index2.xlsx |
| /People | /index3.xlsx | File | /People/index3.xlsx | index3.xlsx |
| /People | /Employees | Folder | /People/Employees | Employees |
| /People/Employees | /cv1.pdf | File | /People/Employees/cv1.pdf | cv1.pdf |
| /People/Employees | /cv2.pdf | File | /People/Employees/cv2.pdf | cv2.pdf |
| /People/Employees | /cv3.pdf | File | /People/Employees/cv3.pdf | cv3.pdf |
| / | /Buildings | Folder | /Buildings | Buildings |
| /Buildings | /index1.xlsx | File | /Buildings/index1.xlsx | index1.xlsx |
| /Buildings | /index2.xlsx | File | /Buildings/index2.xlsx | index2.xlsx |
| /Buildings | /index3.xlsx | File | /Buildings/index3.xlsx | index3.xlsx |
| /Buildings | /Rooms | Folder | /Buildings/Rooms | Rooms |
| /Buildings/Rooms | /room1.pdf | File | /Buildings/Rooms/room1.pdf | room1.pdf |
| /Buildings/Rooms | /room2.pdf | File | /Buildings/Rooms/room2.pdf | room2.pdf |
| /Buildings/Rooms | /room3.pdf | File | /Buildings/Rooms/room3.pdf | room3.pdf |
+-------------------+--------------+--------+----------------------------+-------------+
I want to add two new columns: DirectFileCount and RecursiveFileCount.
Those should indicate the number of files within the folder itself, and the number of files within itself and all sub folders recursively, as per the Path --> PartitionKey relationship from the folders to files.
It should make the DataFrame look like this:
+-------------------+--------------+--------+---------------------------+-------------+-----------------+--------------------+
| PartitionKey | RowKey | Type | Path | Name | DirectFileCount | RecursiveFileCount |
+-------------------+--------------+--------+---------------------------+-------------+-----------------+--------------------+
| / | /People | Folder | /People | People | 3 | 6 |
| /People | /index1.xlsx | File | /People/index1.xlsx | index1.xlsx | 0 | 0 |
| /People | /index2.xlsx | File | /People/index2.xlsx | index2.xlsx | 0 | 0 |
| /People | /index3.xlsx | File | /People/index3.xlsx | index3.xlsx | 0 | 0 |
| /People | /Employees | Folder | /People/Employees | Employees | 3 | 3 |
| /People/Employees | /cv1.pdf | File | /People/Employees/cv1.pdf | cv1.pdf | 0 | 0 |
| /People/Employees | /cv2.pdf | File | /People/Employees/cv2.pdf | cv2.pdf | 0 | 0 |
| /People/Employees | /cv3.pdf | File | /People/Employees/cv3.pdf | cv3.pdf | 0 | 0 |
+-------------------+--------------+--------+---------------------------+-------------+-----------------+--------------------+
I have something for direct count that works:
df_count = df.groupby(['.tag', 'PartitionKey']).size().reset_index(name='counts')
df_file_count = df_count[df_count['.tag'] == 'file'].set_index('PartitionKey')
def direct_count(row):
if row['.tag'] == 'folder':
try:
return df_file_count.loc[row['path_lower']].counts
except:
pass
return 0
df['DirectFileCount'] = df.apply(lambda row: direct_count(row), axis=1)
The above code takes care of DirectFileCount and completes in less than 2 minutes.
UPDATE 16th Oct. 2019
I got RecursiveFileCount completed, but it took 1h 52mins. Code below:
dfc = df[df['Type'] == 'Folder'][['PartitionKey', 'DirectFileCount']].set_index('PartitionKey').groupby('PartitionKey').sum()
def recursive_count(row):
count = 0
if row['Type'] == 'Folder':
count = dfc[dfc.index.str.startswith(row['Path'])]['DirectFileCount'].sum()
return count
df['RecursiveFileCount'] = df.apply(lambda row: recursive_count(row), axis=1)
Got it working now to produce the results I needed. However, it is fairly slow with 2.7m rows so hopefully someone has an idea for improving performance.
Related
I have a non-systematic/arranged data as follows:
+-------------+------------------+
| x | y |
+-------------+------------------+
| 0.049098 | 82854.2105263158 |
| 0.049058 | 82472.2368421053 |
| 0.066427 | 84358.3421052632 |
| 0.066465 | 83842.9210526316 |
| 0.06095 | 71843.6052631579 |
| 0.060989 | 71951.7368421053 |
| 0.066999 | 84068.5526315789 |
| 0.067037 | 83808.5263157895 |
| 0.089523 | 101753.684210526 |
| 0.089483 | 101556.842105263 |
| 0.084839 | 97206.7105263158 |
| 0.084876 | 97108.8421052632 |
| 0.063842 | 88679.5263157895 |
| 0.063802 | 88309.9210526316 |
| 0.037228 | 11944.3947368421 |
| 0.037268 | 11993.3421052632 |
| 0.029821 | 8830.4131578947 |
| 0.029861 | 8822.0815789474 |
| 0.03014 | 8938.6973684211 |
| 0.03018 | 8964.6868421053 |
| 0.00817 | 138170 |
| 0.0081363 | 137640 |
| 0.093207 | 103233.947368421 |
| 0.093244 | 103177.631578947 |
| 0.097776 | 106011.578947368 |
| 0.097814 | 106073.421052632 |
| 0.0089158 | 135440 |
| 0.0088818 | 136660 |
| 0.02952 | 90309.3421052632 |
| 0.029481 | 89523 |
| 0.034049 | 10589.8342105263 |
| 0.034089 | 10666.1973684211 |
| 0.086063 | 98010.6315789474 |
| 0.0861 | 98108.9736842105 |
| 0.045509 | 82204.8947368421 |
| 0.045469 | 81673.8947368421 |
| 0.03045 | 87057.7105263158 |
| 0.030411 | 89830.2105263158 |
| 0.0072205 | 5150.6763157895 |
| 0.0072587 | 5151.1710526316 |
| 0.068255 | 83407.7894736842 |
| 0.068293 | 83492.1052631579 |
| 0.06145 | 73967.8684210526 |
| 0.061488 | 74132.5789473684 |
| 0.027424 | 8204.1210526316 |
| 0.027464 | 8205.0763157895 |
| 0.027184 | 8141.3947368421 |
| 0.027224 | 8146.002631579 |
| 0.046346 | 81611.4736842105 |
| 0.046306 | 81550.8947368421 |
| 0.058526 | 58270.5526315789 |
| 0.058564 | 58725.2631578947 |
| 0.051402 | 29829.4473684211 |
| 0.05144 | 29684.2105263158 |
| 0.0014855 | 5757.1894736842 |
| 0.0015227 | 5742.7289473684 |
| 0.068954 | 91718 |
| 0.068914 | 91719.2894736842 |
| 0.091635 | 104896.052631579 |
| 0.091595 | 104854.210526316 |
| 0.038524 | 82972.3421052632 |
| 0.038484 | 83128.8684210526 |
| 0.0094275 | 133930 |
| 0.0093933 | 133770 |
| 0.098839 | 105576.842105263 |
| 0.098833 | 105552.105263158 |
| 0.0087119 | 136020 |
| 0.0086779 | 136080 |
| 0.049537 | 82553.5789473684 |
| 0.049497 | 82109.6578947368 |
| 0.0099132 | 5289.7473684211 |
| 0.0099519 | 5290.3421052632 |
| 0.069273 | 91867.1842105263 |
| 0.069233 | 91812 |
| 0.039564 | 12888.1578947368 |
| 0.039603 | 13071.3947368421 |
| 0.023351 | 7234.2473684211 |
| 0.023391 | 7244.1789473684 |
| 0.085419 | 94590.4210526316 |
| 0.085379 | 94557.6578947368 |
| 0.077463 | 90603.6315789474 |
| 0.0775 | 90673.7105263158 |
| 0.015378 | 5389.0578947369 |
| 0.015417 | 5376.9263157895 |
| 0.090262 | 101246.315789474 |
| 0.090299 | 101226.578947368 |
| 0.090969 | 101686.052631579 |
| 0.091006 | 101689.210526316 |
| 0.040275 | 13386.1578947368 |
| 0.040314 | 13415.2368421053 |
| 0.065053 | 84553.3157894737 |
| 0.065091 | 84354.8157894737 |
| 0.041064 | 13609.8684210526 |
| 0.041103 | 13574 |
| 0.028143 | 8369.052631579 |
| 0.028183 | 8367.3710526316 |
| 0.041057 | 81182.5789473684 |
| 0.041018 | 82941.2368421053 |
| 0.049623 | 23859.1315789474 |
| 0.049662 | 24037.2368421053 |
| 0.036984 | 83875.2368421053 |
| 0.036945 | 84167.6315789474 |
| 0.083188 | 94965.8947368421 |
| 0.083148 | 94978.3421052632 |
| 0.058336 | 57671 |
| 0.058374 | 57815.0263157895 |
| 0.089369 | 100686.315789474 |
| 0.089406 | 100711.052631579 |
| 0.069592 | 92141.2368421053 |
| 0.069552 | 92152.5789473684 |
| 0.025793 | 94431.7894736842 |
| 0.025755 | 94648.2368421053 |
| 0.098749 | 105945.526315789 |
| 0.098743 | 105963.421052632 |
| 0.0098043 | 132600 |
| 0.00977 | 133260 |
| 0.060082 | 62819.4210526316 |
| 0.06012 | 62454.0789473684 |
| 0.033059 | 86467 |
| 0.03302 | 86483.9736842105 |
| 0.0036852 | 153970 |
| 0.003653 | 154510 |
| 0.0085422 | 136250 |
| 0.0085083 | 136650 |
| 0.056689 | 84997.6578947368 |
| 0.056649 | 85002 |
| 0.096125 | 105024.736842105 |
| 0.096162 | 105118.421052632 |
| 0.021898 | 101320 |
| 0.02186 | 101500 |
| 0.076134 | 94468.3157894737 |
| 0.076094 | 94467.5789473684 |
| 0.053904 | 42192.3157894737 |
| 0.053942 | 42010.052631579 |
| 0.098707 | 106005 |
| 0.098701 | 106008.947368421 |
| 0.049546 | 23698.6052631579 |
| 0.049584 | 23714.5263157895 |
| 0.067636 | 90985 |
| 0.067596 | 90934 |
| 0.053851 | 84300.7368421053 |
| 0.053811 | 83469.3157894737 |
| 0.075979 | 88750.5 |
| 0.076016 | 88815.7894736842 |
| 0.071428 | 93062.2894736842 |
| 0.071388 | 93069.7368421053 |
| 0.0036544 | 5232.8736842105 |
| 0.0036921 | 5239.252631579 |
| 0.047332 | 17610.8947368421 |
| 0.047371 | 17750.7894736842 |
| 0.03975 | 82067.8684210526 |
| 0.03971 | 83408.6578947368 |
| 0.0038426 | 5238.65 |
| 0.0038802 | 5240.4131578947 |
| 0.014999 | 116980 |
| 0.014964 | 116700 |
| 0.06467 | 84685.7368421053 |
| 0.064709 | 84580.6578947368 |
| 0.047137 | 17372.6578947368 |
| 0.047176 | 17571.9473684211 |
| 0.083467 | 96096.2368421053 |
| 0.083504 | 96070.5263157895 |
| 0.037576 | 82819.5789473684 |
| 0.037536 | 84310.5526315789 |
| 0.016188 | 114852.368421053 |
| 0.016152 | 116141.842105263 |
| 0.016477 | 113262.631578947 |
| 0.016441 | 113310 |
| 0.02246 | 100100 |
| 0.022423 | 100270 |
| 0.005354 | 5153.0815789474 |
| 0.005392 | 5142.1552631579 |
| 0.0284 | 90929 |
| 0.028362 | 91147.2631578947 |
| 0.083626 | 94884.6578947368 |
| 0.083586 | 94890.3421052632 |
| 0.06532 | 89854.8947368421 |
| 0.06528 | 89501 |
| 0.094505 | 106278.947368421 |
| 0.094465 | 106231.578947368 |
| 0.076387 | 88890.1315789474 |
| 0.076424 | 89590.1578947368 |
| 0.055207 | 47767.2894736842 |
| 0.055245 | 47816.7631578947 |
| 0.013148 | 121650 |
| 0.013113 | 122170 |
| 0.058754 | 59487.1842105263 |
| 0.058792 | 59770.1578947368 |
| 0.089592 | 100777.631578947 |
| 0.089629 | 100783.421052632 |
| 0.038813 | 12513.2368421053 |
| 0.038852 | 12835.4473684211 |
| 0.040621 | 82482.0789473684 |
| 0.040581 | 82730.3947368421 |
| 0.079242 | 92299.3157894737 |
| 0.079279 | 92281.5789473684 |
| 0.088328 | 98621.8947368421 |
| 0.088288 | 98518.9736842105 |
| 0.044673 | 81556.3157894737 |
| 0.044633 | 81528.5789473684 |
| 0.0033321 | 155138.947368421 |
| 0.0033001 | 155460 |
| 0.056449 | 84852 |
| 0.056409 | 84876.5 |
| 0.056289 | 84755 |
| 0.056249 | 84690.4473684211 |
| 0.083825 | 94857.6052631579 |
| 0.083786 | 94850.7105263158 |
| 0.097551 | 105735.526315789 |
| 0.097589 | 105823.947368421 |
| 0.086991 | 98877.4473684211 |
| 0.087028 | 98838.8421052632 |
| 0.017095 | 111495.263157895 |
| 0.017058 | 111965.263157895 |
| 0.071779 | 82918.1315789474 |
| 0.071817 | 82398.5789473684 |
| 0.054287 | 43715.9736842105 |
| 0.054326 | 43708.8421052632 |
| 0.082112 | 95046.4210526316 |
| 0.082072 | 95044 |
| 0.068369 | 83746.3157894737 |
| 0.068407 | 83829.8684210526 |
| 0.027104 | 8103.8631578947 |
| 0.027144 | 8118.85 |
| 0.011047 | 129060 |
| 0.011012 | 128640 |
| 0.084582 | 94701.0263157895 |
| 0.084543 | 94672.7368421053 |
| 0.050495 | 83239.7631578947 |
| 0.050455 | 83211.9210526316 |
| 0.087251 | 99002.3421052632 |
| 0.087288 | 99021.7368421053 |
| 0.060446 | 86629 |
| 0.060406 | 86340 |
| 0.036315 | 83643.4210526316 |
| 0.036275 | 83463.7368421053 |
| 0.030699 | 9294.2289473684 |
| 0.030739 | 9340.0578947369 |
| 0.039682 | 13377.1842105263 |
| 0.039722 | 13303.4736842105 |
| 0.071364 | 83794.9210526316 |
| 0.071401 | 83361.5 |
| 0.080319 | 94917.5 |
| 0.080279 | 94918.3157894737 |
| 0.072505 | 93484.7894736842 |
| 0.072465 | 93468.6315789474 |
| 0.034743 | 84187.2894736842 |
| 0.034704 | 84421.552631579 |
| 0.066159 | 89952.6842105263 |
| 0.066119 | 89935 |
| 0.007565 | 5184.3157894737 |
| 0.0076033 | 5181.4763157895 |
| 0.020563 | 6035.7105263158 |
| 0.020603 | 6043.1578947369 |
| 0.046007 | 15889.9736842105 |
| 0.046046 | 15788.5526315789 |
| 0.017351 | 5446.0736842105 |
| 0.01739 | 5445.7078947369 |
| 0.05481 | 84160.3684210526 |
| 0.05477 | 84138.1578947368 |
| 0.063714 | 84575.8947368421 |
| 0.063752 | 84555.3684210526 |
| 0.038338 | 12504.8421052632 |
| 0.038377 | 12557.4736842105 |
| 0.011037 | 5331.7342105263 |
| 0.011076 | 5325.8736842105 |
| 0.002124 | 158640 |
| 0.0020924 | 158730 |
| 0.014356 | 5376.3447368421 |
| 0.014396 | 5381.3842105263 |
| 0.066359 | 90386 |
| 0.066319 | 90076.0789473684 |
| 0.017715 | 109750 |
| 0.017678 | 109140 |
| 0.013846 | 5385.45 |
| 0.013886 | 5373.5157894737 |
| 0.032356 | 87047.1842105263 |
| 0.032317 | 87202.2368421053 |
| 0.00022592 | 6227.5342105263 |
| 0.00026284 | 6525.5394736842 |
| 0.026944 | 8170.5052631579 |
| 0.026984 | 8161.0552631579 |
| 0.067684 | 84077.5 |
| 0.067722 | 84406.1578947368 |
| 0.00056151 | 154602.631578947 |
| 0.00053058 | 154025.263157895 |
| 0.01369 | 5392.0289473684 |
| 0.013729 | 5388.5763157895 |
| 0.069712 | 92249.6315789474 |
| 0.069672 | 92203 |
| 0.090336 | 101292.631578947 |
| 0.041221 | 13574.5 |
| 0.041261 | 13649.3157894737 |
| 0.088248 | 98377.2631578947 |
| 0.032747 | 86214.6578947368 |
| 0.032707 | 86199.7105263158 |
| 0.018063 | 5426.3657894737 |
| 0.018102 | 5400.5368421053 |
| 0.037221 | 82838 |
| 0.037182 | 82739.0526315789 |
| 0.079316 | 92217.8421052632 |
| -0.00010567 | 22281.2894736842 |
| -0.00010105 | 16333.2105263158 |
| 0.073491 | 86313.1315789474 |
| 0.073528 | 85860.3157894737 |
| 0.069113 | 91703.1052631579 |
| 0.069073 | 91686.2894736842 |
| 0.077489 | 94662.4210526316 |
| 0.077449 | 94656.2631578947 |
| 0.064242 | 89154.7368421053 |
| 0.064202 | 88866.8947368421 |
| 0.032512 | 86583 |
| 0.032473 | 86524 |
| 0.062564 | 88165 |
| 0.062524 | 87488.0789473684 |
| 0.036792 | 11964.3947368421 |
| 0.036831 | 11973.1315789474 |
| 0.027823 | 8258.2552631579 |
| 0.027863 | 8251.5105263158 |
| 0.078008 | 94725.7105263158 |
| 0.077968 | 94721.5263157895 |
| 0.071747 | 93315.2105263158 |
| 0.071707 | 93235.3421052632 |
| 0.04546 | 15240.2631578947 |
| 0.045499 | 15235.0526315789 |
| 0.044482 | 14225 |
| 0.044521 | 14200.3947368421 |
| 0.024748 | 7701.9210526316 |
| 0.024788 | 7717.5815789474 |
| 0.046545 | 82522.7105263158 |
| 0.046505 | 83732.5789473684 |
| 0.033646 | 85096.8947368421 |
| 0.033607 | 85049 |
| 0.042923 | 82511.9473684211 |
| 0.042883 | 82387.2894736842 |
| 0.03076 | 88367.9210526316 |
| 0.030722 | 88564.7105263158 |
| 0.087696 | 99199.4473684211 |
| 0.087733 | 99362.7368421053 |
| 0.026106 | 8012.5921052632 |
| 0.026145 | 8007.1894736842 |
| 0.098174 | 106645.263157895 |
| 0.098134 | 106593.421052632 |
| 0.086654 | 95285.7368421053 |
| 0.086614 | 95258.9210526316 |
| 0.047382 | 82383.3421052632 |
| 0.047342 | 81833.7105263158 |
| 0.085247 | 97339.8157894737 |
| 0.085284 | 97644.3157894737 |
| 0.04409 | 14122.2368421053 |
| 0.044129 | 14113.7368421053 |
| 0.047901 | 81497 |
| 0.047861 | 81418.3157894737 |
| 0.026825 | 92594.1315789474 |
| 0.026787 | 92662.3684210526 |
| 0.088625 | 100252.184210526 |
| 0.088662 | 100399.131578947 |
| 0.029461 | 8646.752631579 |
| 0.029501 | 8660.6105263158 |
| 0.01543 | 115970 |
| 0.015394 | 115980 |
| 0.021836 | 6580.3368421053 |
| 0.021876 | 6564.8184210526 |
| 0.057528 | 85296 |
| 0.057488 | 85006 |
| 0.032903 | 86497 |
| 0.032864 | 86557 |
| 0.042874 | 13655.0263157895 |
| 0.042913 | 13685.4473684211 |
| 0.026175 | 94264.5 |
| 0.026136 | 94268 |
| 0.023931 | 97683.052631579 |
| 0.023893 | 97162.5789473684 |
| 0.086434 | 98644.9736842105 |
| 0.086471 | 98392.6052631579 |
| 0.0097586 | 5293.0263157895 |
| 0.0097972 | 5290.9763157895 |
| 0.0019695 | 5567.652631579 |
| 0.0020067 | 5540.9947368421 |
| 0.062218 | 78864.3947368421 |
| 0.062257 | 79100.7105263158 |
| 0.085658 | 94532.6052631579 |
| 0.085618 | 94520.1842105263 |
| 0.07039 | 92623.2631578947 |
| 0.070351 | 92675.2894736842 |
| 0.096459 | 106720 |
| 0.096419 | 106790.789473684 |
| 0.059167 | 85700 |
| 0.059127 | 85778 |
| 0.034272 | 84811.6842105263 |
| 0.034233 | 84255.4736842105 |
| 0.024987 | 7870.9342105263 |
| 0.025027 | 7889.0263157895 |
| 0.037743 | 11876 |
| 0.037783 | 12060.9473684211 |
| 0.042991 | 13413.9210526316 |
| 0.043031 | 13357.1315789474 |
| 0.097974 | 107210.263157895 |
| 0.097935 | 107250.263157895 |
| 0.095788 | 104838.947368421 |
| 0.095825 | 104789.473684211 |
| 0.098491 | 105937.894736842 |
| 0.098529 | 105891.842105263 |
| 0.062124 | 87670 |
| 0.062084 | 87664 |
| 0.046864 | 82101.0263157895 |
| 0.046824 | 82290.8157894737 |
| 0.016087 | 5395.5868421053 |
| 0.016127 | 5394.0657894737 |
| 0.065015 | 84096.0526315789 |
| 0.090918 | 104228.684210526 |
| 0.090878 | 104018.421052632 |
| 0.065664 | 84754.8947368421 |
| 0.065702 | 84734.4473684211 |
| 0.012619 | 123970 |
| 0.012584 | 124412.631578947 |
| 0.064561 | 89115.0263157895 |
| 0.064521 | 88996.7368421053 |
| 0.048382 | 20419.7368421053 |
| 0.048421 | 20430.5526315789 |
| 0.07741 | 94648.2631578947 |
| 0.010148 | 131200 |
| 0.010114 | 131180 |
| 0.031421 | 88356.7105263158 |
| 0.031382 | 88397.8157894737 |
| 0.0018949 | 5593.8789473684 |
| 0.0019322 | 5588.7631578947 |
| 0.056048 | 50006.8157894737 |
| 0.056086 | 50654.4736842105 |
| 0.024468 | 7700.4473684211 |
| 0.024508 | 7719.8421052632 |
| 0.078566 | 94788.3947368421 |
| 0.078526 | 94786.3421052632 |
| 0.089045 | 100410 |
| 0.089005 | 100279.210526316 |
| 0.026225 | 7953.7 |
| 0.026265 | 7956.9105263158 |
| 0.021636 | 100290 |
| 0.021598 | 100420 |
| 0.013715 | 120830 |
| 0.01368 | 121700 |
| 0.05765 | 55683.2631578947 |
| 0.057689 | 55756.9210526316 |
| 0.0025295 | 5411.2236842105 |
| 0.0025668 | 5389.6973684211 |
| 0.046704 | 82967 |
| 0.046665 | 82828.552631579 |
| 0.050615 | 81974 |
| 0.050575 | 82113 |
| 0.071907 | 93276.4210526316 |
| 0.071867 | 93275.5526315789 |
| 0.077569 | 94671.2631578947 |
| 0.077529 | 94667.1578947368 |
| 0.057079 | 53966 |
| 0.057117 | 54124.1842105263 |
| 0.010649 | 5281.5394736842 |
| 0.010688 | 5270.6394736842 |
| 0.051093 | 29069.0263157895 |
| 0.051132 | 29441.1315789474 |
| 0.019257 | 105790 |
| 0.01922 | 105980 |
| 0.059931 | 62444.0789473684 |
| 0.059968 | 62989.5263157895 |
| 0.087491 | 96730.8684210526 |
| 0.087451 | 96664.7894736842 |
| 0.091379 | 101955.263157895 |
| 0.091416 | 101995.526315789 |
| 0.095143 | 106449.210526316 |
| 0.095103 | 106461.578947368 |
| 0.087889 | 97553.9210526316 |
| 0.08785 | 97504 |
| 0.024428 | 7661.7315789474 |
| 0.014788 | 5407.6947368421 |
| 0.014828 | 5407.6315789474 |
| 0.083541 | 96009.3947368421 |
| 0.037664 | 12100.8157894737 |
| 0.037704 | 11912.6842105263 |
| 0.070311 | 92669.6842105263 |
| 0.031458 | 9692.202631579 |
| 0.031498 | 9714.7368421053 |
| 0.072561 | 84201.3421052632 |
| 0.072598 | 84699.2105263158 |
| 0.067876 | 91029.6578947368 |
| 0.067836 | 91104.0263157895 |
| 0.035998 | 11560.7105263158 |
| 0.036037 | 11357.4473684211 |
| 0.042963 | 82322 |
| 0.022912 | 100400 |
| 0.022874 | 99962.6842105263 |
| 0.0064626 | 143520 |
| 0.0064294 | 143710 |
| 0.085499 | 94484.4210526316 |
| 0.085459 | 94543.5 |
| 0.091528 | 102023.684210526 |
+-------------+------------------+
When I tried to plot it using the line plot, I am getting a really messy plot,
But if I plot it using the symbols then it is making more sense, which is as follows:
As you can clearly notice that there are 2 separate curves (upper and lower) in the plot. I wanted to take the average of the 5-10 adjacent points of upper and lower curve to make the plot smoother (shown by the arrows in the previous figure) and then finally plot it using the line plot. The major issue in taking the average is that the Python/Pandas takes the average of the adjacent data point which is specifically the upper or lower portion of graph (since randomly arranged data). I tried to write a small code to plot it but could not achieve the desired output.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
dataframe1 = pd.read_csv('excel_file_trial.csv')
plt.figure(1) #
plt.scatter(dataframe1.iloc[:,0], dataframe1.iloc[0:,1], s=1, marker="o", facecolors='none', edgecolor='black', linewidths=2)
plt.xlabel("x", fontsize=17)
plt.ylabel("y", fontsize=17)
plt.tight_layout()
plt.savefig("trialv0.png", bbox_inches = "tight", format='png', dpi=300)
# figure 3
plt.figure(3) #
plt.plot(dataframe1.iloc[:,0], dataframe1.iloc[0:,1], color="green", linewidth=3.5)
plt.xlabel("x", fontsize=17)
plt.ylabel("y", fontsize=17)
plt.tight_layout()
plt.savefig("trialv2.png", bbox_inches = "tight", format='png', dpi=300)
Could anyone please help/suggest in this regard. Many thanks in advance.
I have created a pandas df that has the distances between location i and location j. Beginning with a start point P1 and end point P2, I want to find the sub-dataframe (distance matrix) that has one axis of the df having P1, P2 and the other axis having the rest of the indices.
I'm using a Pandas DF because I think its' the most efficient way
dm_dict = # distance matrix in dict form where you can call dm_dict[i][j] and get the distance from i to j
dm_df = pd.DataFrame().from_dict(dm_dict)
P1 = dm_df.max(axis=0).idxmax()
P2 = dm_df[i].idxmax()
route = [i, j]
remaining_locs = dm_df[dm_df[~dm_df.isin(route)].isin(route)]
while not_done:
# go through the remaining_locs until found all the locations are added.
No error messages, but the remaining_locs df is full of nan's rather than a df with the distances.
using dm_df[~dm_df.isin(route)].isin(route) seems to give me a boolean df that is accurate.
sample data, it's technically the haversine distance but the euclidean should be fine for filling up the matrix:
import numpy
def dist(i, j):
a = numpy.array((i[1], i[2]))
b = numpy.array((j[1], j[2]))
return numpy.linalg.norm(a-b)
locations = [
("Ottawa", 45.424722,-75.695),
("Edmonton", 53.533333,-113.5),
("Victoria", 48.428611,-123.365556),
("Winnipeg", 49.899444,-97.139167),
("Fredericton", 49.899444,-97.139167),
("StJohns", 47.561389, -52.7125),
("Halifax", 44.647778, -63.571389),
("Toronto", 43.741667, -79.373333),
("Charlottetown",46.238889, -63.129167),
("QuebecCity",46.816667, -71.216667 ),
("Regina", 50.454722, -104.606667),
("Yellowknife", 62.442222, -114.3975),
("Iqaluit", 63.748611, -68.519722)
]
dm_dict = {i: {j: dist(i, j) for j in locations if j != i} for i in locations}
It looks like you want scipy's distance_matrix:
df = pd.DataFrame(locations)
x = df[[1,2]]
dm = pd.DataFrame(distance_matrix(x,x),
index=df[0],
columns=df[0])
Output:
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
| | Ottawa | Edmonton | Victoria | Winnipeg | Fredericton | StJohns | Halifax | Toronto | Charlottetown | QuebecCity | Regina | Yellowknife | Iqaluit |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
| 0 | | | | | | | | | | | | | |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
| Ottawa | 0.000000 | 38.664811 | 47.765105 | 21.906059 | 21.906059 | 23.081609 | 12.148481 | 4.045097 | 12.592181 | 4.689667 | 29.345960 | 42.278586 | 19.678657 |
| Edmonton | 38.664811 | 0.000000 | 11.107987 | 16.759535 | 16.759535 | 61.080146 | 50.713108 | 35.503607 | 50.896264 | 42.813477 | 9.411122 | 8.953983 | 46.125669 |
| Victoria | 47.765105 | 11.107987 | 0.000000 | 26.267600 | 26.267600 | 70.658378 | 59.913580 | 44.241193 | 60.276176 | 52.173796 | 18.867990 | 16.637528 | 56.945306 |
| Winnipeg | 21.906059 | 16.759535 | 26.267600 | 0.000000 | 0.000000 | 44.488147 | 33.976105 | 18.802741 | 34.206429 | 26.105163 | 7.488117 | 21.334745 | 31.794214 |
| Fredericton | 21.906059 | 16.759535 | 26.267600 | 0.000000 | 0.000000 | 44.488147 | 33.976105 | 18.802741 | 34.206429 | 26.105163 | 7.488117 | 21.334745 | 31.794214 |
| StJohns | 23.081609 | 61.080146 | 70.658378 | 44.488147 | 44.488147 | 0.000000 | 11.242980 | 26.933071 | 10.500284 | 18.519147 | 51.974763 | 63.454538 | 22.625084 |
| Halifax | 12.148481 | 50.713108 | 59.913580 | 33.976105 | 33.976105 | 11.242980 | 0.000000 | 15.827902 | 1.651422 | 7.946971 | 41.444115 | 53.851052 | 19.731392 |
| Toronto | 4.045097 | 35.503607 | 44.241193 | 18.802741 | 18.802741 | 26.933071 | 15.827902 | 0.000000 | 16.434995 | 8.717042 | 26.111037 | 39.703942 | 22.761342 |
| Charlottetown | 12.592181 | 50.896264 | 60.276176 | 34.206429 | 34.206429 | 10.500284 | 1.651422 | 16.434995 | 0.000000 | 8.108112 | 41.691201 | 53.767927 | 18.320711 |
| QuebecCity | 4.689667 | 42.813477 | 52.173796 | 26.105163 | 26.105163 | 18.519147 | 7.946971 | 8.717042 | 8.108112 | 0.000000 | 33.587610 | 45.921044 | 17.145385 |
| Regina | 29.345960 | 9.411122 | 18.867990 | 7.488117 | 7.488117 | 51.974763 | 41.444115 | 26.111037 | 41.691201 | 33.587610 | 0.000000 | 15.477744 | 38.457705 |
| Yellowknife | 42.278586 | 8.953983 | 16.637528 | 21.334745 | 21.334745 | 63.454538 | 53.851052 | 39.703942 | 53.767927 | 45.921044 | 15.477744 | 0.000000 | 45.896374 |
| Iqaluit | 19.678657 | 46.125669 | 56.945306 | 31.794214 | 31.794214 | 22.625084 | 19.731392 | 22.761342 | 18.320711 | 17.145385 | 38.457705 | 45.896374 | 0.000000 |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
I am pretty sure this is what I wanted:
filtered = dm_df.filter(items=route,axis=1).filter(items=set(locations).difference(set(route)), axis=0)
filtered is a df with [2 rows x 10 columns] and then I can find the minimum value from there
I'm trying to create a new dataframe for each possible combination in 'combinations' reading in some values from a dataframe, an example of the dataframe:
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Species | OGT | Domain | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Aeropyrum pernix | 95 | Archaea | 9.7659115711 | 0.6720465616 | 4.3895390781 | 7.6501943794 | 2.9344881615 | 8.8666657183 | 1.5011817208 | 5.6901432494 | 4.1428307243 | 11.0604191603 | 2.21143353 | 1.9387130928 | 5.1038552753 | 1.6855017182 | 7.7664358772 | 6.266067034 | 4.2052190807 | 9.2692433532 | 1.318690698 | 3.5614200159 |
| Argobacterium fabrum | 26 | Bacteria | 11.5698896021 | 0.7985475923 | 5.5884500155 | 5.8165463343 | 4.0512504104 | 8.2643271309 | 2.0116736244 | 5.7962804605 | 3.8931525401 | 9.9250463349 | 2.5980609708 | 2.9846761128 | 4.7828063605 | 3.1262365491 | 6.5684282943 | 5.9454781844 | 5.3740045968 | 7.3382308193 | 1.2519739683 | 2.3149400984 |
| Anaeromyxobacter dehalogenans | 27 | Bacteria | 16.0337898849 | 0.8860252895 | 5.1368827707 | 6.1864992608 | 2.9730203513 | 9.3167603253 | 1.9360386851 | 2.940143349 | 2.3473650439 | 10.898494736 | 1.6343905351 | 1.5247123262 | 6.3580285706 | 2.4715303021 | 9.2639057482 | 4.1890063803 | 4.3992339725 | 8.3885969061 | 1.2890166336 | 1.8265589289 |
| Aquifex aeolicus | 85 | Bacteria | 5.8730327277 | 0.795341216 | 4.3287799008 | 9.6746388172 | 5.1386954322 | 6.7148035486 | 1.5438364179 | 7.3358775924 | 9.4641440609 | 10.5736658776 | 1.9263080969 | 3.6183861236 | 4.0518679067 | 2.0493569604 | 4.9229955632 | 4.7976564501 | 4.2005259246 | 7.9169763709 | 0.9292167138 | 4.1438942987 |
| Archaeoglobus fulgidus | 83 | Archaea | 7.8742687687 | 1.1695110027 | 4.9165979364 | 8.9548767369 | 4.568636662 | 7.2640358917 | 1.4998752909 | 7.2472039919 | 6.8957233203 | 9.4826333048 | 2.6014466253 | 3.206476915 | 3.8419576418 | 1.7789787933 | 5.7572748236 | 5.4763351139 | 4.1490633048 | 8.6330814159 | 1.0325605451 | 3.6494619148 |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
Here is my code at the moment.
import itertools
import pandas as pd
letters = ['G','A','L','M','F','W','K','Q','E','S','P','V','I','C','Y','H','R','N','D','T']
combinations = [''.join(i) for j in range(1,len(letters) + 1) for i in itertools.combinations(letters,r=j)]
df = pd.read_csv('COMPLETECOPYFORR.csv')
for combination in combinations:
new_df = df[['Species', 'OGT']]
new_df['Sum of percentage'] = df[list(combination)]
new_df.to_csv(combination + '.csv')
The desired output is something along the lines of 10 million CSV files, each with the name of the different combinations, so
G.csv, A.csv, through to GALMFWKQESPVICYHRNDT.csv
Species OGT Sum of percentage
------------------------------- ----- -------------------
Aeropyrum pernix 95 23.4353
Anaeromyxobacter dehalogenans 26 20.3232
Argobacterium fabrum 27 14.2312
Aquifex aeolicus 85 15.0403
Archaeoglobus fulgidus 83 34.0532
It looks like need:
new_df['Sum of percentage'] = df[list(combination)].sum(axis=1)
I parse with beautifulsoup html tags from a html page and get the text content from this tags.
Somehow empty/blank lines are in my output, and I cannot find this reason of this.
Then I tried to format the output with PrettyTable. Looks quite well, but brings me nothing. Because I want to import the file later in MS Access. It can be .csv to as output.
I hope you can help me.
Part of my Code:
text_file = open("Output3.txt", "w")
table = PrettyTable(['Names', 'Addresses'])
with urllib.request.urlopen("file:///C:/Users/x/Desktop/test.html") as url:
soup = BeautifulSoup(url, "html.parser")
names = [name.get_text() for name in soup.findAll("div", {"class": "name m08_name"})]
addresses = [address.get_text() for address in soup.findAll("div", {"class": "adresse m08_adresse"})]
for line in zip(names, addresses):
table.add_row(line)
text_file.write(str(table))
text_file.close()
print("Prozess finish")
My current output:
The output is coming from a open source phone book. The adresses are doctors in the city. Its no privat data and information.
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------+
| Namen | Adressen |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------+
| | |
| | |
| Augenarzt Dr.med. Wolf Eckard Weingärtner | |
| | Königstr. 70 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Baumann Achim Dr. med., Facharzt für Hautkrankheiten | |
| | Königstr. 66 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Ärztehaus am Diakonie-Klinikum Praxis für Mund-, Kiefer- und Plastische | |
| | Falkertstr. 46 |
| | |
| | 70176 Stuttgart-West |
| | |
| | |
| | |
| | |
| | |
| Ihr Hautarzt Dr. Malte-Christian Thode, Dr. Sabrina Germann-Samara und Kollegen | |
| | Wilhelmstr. 40 |
| | |
| | 71638 Ludwigsburg-Mitte |
| | |
| | |
| | |
| | |
| | |
| Ärztehaus-West Dr.med.Angela Faller, Dr.med. Claudia Lerschmacher Faller, Lerschmacher, Vogt | |
| | Kornbergstr. 29 |
| | |
| | 70176 Stuttgart-West |
| | |
| | |
| | |
| | |
| | |
| Richter Constanze Dr.med., Fachärztin für Innere Medizin | |
| | Seelbergstr. 11 |
| | |
| | 70372 Stuttgart-Bad Cannstatt |
| | |
| | |
| | |
| | |
| | |
| Dr.med. Ulrich Schreiber, Ihre Frauenarztpraxis in Stuttgart | |
| | Hirschstr. 31 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Günther Eck Dr.med., FA für HNO | |
| | Marienstr. 5 |
| | |
| | 70178 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Ambulante Gastroenterologie Dres.med. Karl M. Teubner, Albrecht G. Maier, Diet | |
| | Industriestr. 4 |
| | |
| | 70565 Stuttgart-Vaihingen |
| | |
| | |
| | |
| | |
| | |
| Bergener Malte Dr.med., Reuter Matthias Dr.med., Ungemach Gerd Dr.med. | |
| | Wilhelmsplatz 11 |
| | |
| | 70182 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Dr. med. Holger Lange Facharzt für Urologie - Belegarzt | |
| | Hirschstr. 31 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Abel Theophil Dr.med., Arzt für Orthopädie | |
| | Schwabstr. 91 |
| | |
| | 70193 Stuttgart-West |
| | |
| | |
| | |
| | |
| | |
| Ambulante Pneumologie mit Allergiezentrum (BAG) - Dr.med.Frank Heimann Dr.med.Rainer Ehmann, Dr.med.K. Seyfahrt-Jürgens | |
| | Rotebühlplatz 19 |
| | |
| | 70178 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| Aerzte am HautTherapieZentrum Stuttgart | |
| | Calwer Str. 11 |
| | |
| | 70173 Stuttgart-Mitte |
| | |
| | |
| | |
| | |
| | |
| auge im fokus Dr.med. Pervanidis, Dr.med. Wagner, Dr.med. Stergiou | |
| | Rotebühlplatz 17 |
| | |
| | 70178 Stuttgart-Mitte |
| | |
| | |
| | |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------+
My expected output:
names; adresses;
name1; adress1
name2; adress2
.. ; ..
Short snipped from the test.html
<div class="adresse m08_adresse" data-role="adresse" itemprop="address" itemscope itemtype="https://schema.org/PostalAddress">
<address>
<a
href="https://adresse.gelbeseiten.de/1124105191531/baumann-achim-dr-med-facharzt-fuer-hautkrankheiten/stuttgart/mitte#originIndex=1;origin=/arzt/stuttgart"
data-wipe='{"listener":"click","name":"Trefferliste Adresse zur Detailseite","id":"1124105191531", "synchron": true}'
>
<span itemprop="streetAddress">Königstr. 66</span>
<br />
<span itemprop="postalCode">70173</span> <span itemprop="addressLocality">Stuttgart-Mitte</span>
</a>
</address>
</div>
Welcome. Can someone explain me what happens in this code? I would like to know how exactly does this work (it comes from http://rosettacode.org/wiki/Maze_generation#Python).
from random import shuffle, randrange
def make_maze(w = 16, h = 8):
vis = [[0] * w + [1] for _ in range(h)] + [[1] * (w + 1)]
ver = [["| "] * w + ['|'] for _ in range(h)] + [[]]
hor = [["+--"] * w + ['+'] for _ in range(h + 1)]
def walk(x, y):
vis[y][x] = 1
d = [(x - 1, y), (x, y + 1), (x + 1, y), (x, y - 1)]
shuffle(d)
for (xx, yy) in d:
if vis[yy][xx]: continue
if xx == x: hor[max(y, yy)][x] = "+ "
if yy == y: ver[y][max(x, xx)] = " "
walk(xx, yy)
walk(randrange(w), randrange(h))
for (a, b) in zip(hor, ver):
print(''.join(a + ['\n'] + b))
make_maze()
I know nothing about maze generation, but I also got curious about how this piece of code works. Here are some insights:
These two lines print the maze:
for (a, b) in zip(hor, ver):
print(''.join(a + ['\n'] + b))
So what happens if we put these lines right after the three lines that define vis, ver and hor? We get this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
You can even put these two lines right before the recursive call walk(xx,yy) and see some steps of maze evolution:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+ + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+ + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+ + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+--+ + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+ + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+ + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
[...]
Now let's focus on walk(x,y). As its name and the printed output suggest, this function walks around the maze, removing the walls in a random fashion so as to build a path.
This call:
walk(randrange(w), randrange(h))
initializes the walk in a random location within the maze. Every cell in the grid is visited exactly once; the visited cells are marked in vis.
This line initializes an array with all four neighbors of the current cell:
d = [(x - 1, y), (x, y + 1), (x + 1, y), (x, y - 1)]
These are visited in a random order (thanks to shuffle(d))
And these two lines are the ones that remove the walls as the maze path is being built:
# remove horizontal wall, "+--" turns into "+ "
if xx == x: hor[max(y, yy)][x] = "+ "
# remove vertical wall, "|" turns into " "
if yy == y: ver[y][max(x, xx)] = " "
There is more to say about this algorithm (see Jongware's comment), but as far as understanding this particular piece of code goes, these are the kind of things you can do.