sum of occurrences in bins

sum of occurrences in bins - python

I'm looking at popularity of food stalls in a pop-up market:
Unnamed: 0 Shop1 Shop2 Shop3 ... shop27 shop28 shop29 shop30 shop31 shop32 shop33 shop34
0 0 484 516 484 ... 348 146 1445 1489 623 453 779 694
1 1 276 564 941 ... 1463 178 700 996 1151 364 111 1243
2 2 74 1093 961 ... 1260 1301 1151 663 1180 723 1477 1198
3 3 502 833 22 ... 349 1105 835 1 938 921 745 14
4 4 829 983 952 ... 568 1435 518 807 874 197 81 573
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
114 114 1 187 706 ... 587 1239 1413 850 1324 788 687 687
115 115 398 733 298 ... 864 981 100 80 1322 381 430 349
116 116 11 312 904 ... 34 508 850 1278 432 395 601 213
117 117 824 261 593 ... 1026 147 488 69 25 286 1229 1028
118 118 461 966 183 ... 850 817 1411 863 950 987 415 130
I then summarize the overall visits and split into bins (pd.cut(df.sum(axis=0),5,labels=['lowest','lower','medium','higher','highest'])):
Unnamed: 0 lowest
Shop1 medium
Shop2 medium
Shop3 lower
Shop4 lower
... ...
shop31 higher
shop32 medium
shop33 higher
shop34 higher
I then want to see popularity of each category over time, manual example:
6891-33086 33087-59151 59152-85216 85217-111281 111282-137346
0 0 1373 3546 13999 1238
How can I do this with python?

Related

Error when trying run loop with multiprocessing in python

I am trying to run a function in a loop many times using parallel multiprocessing.
When I run this simple code:
import time
from multiprocessing import Pool
def heavy_processing(number):
time.sleep(0.05) # simulate a long-running operation
output = number + 1
return output
with Pool(4) as p:
numbers = list(range(0, 1000))
results = p.map(heavy_processing, numbers)
I get the following error:
Process SpawnPoolWorker-1:
Traceback (most recent call last):
File "C:\ProgramData\Miniconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ProgramData\Miniconda3\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Miniconda3\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\ProgramData\Miniconda3\lib\multiprocessing\queues.py", line 367, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'heavy_processing' on <module '__main__' (built-in)>
I'm not sure why, since I pretty much pulling this example straight from other sources. Any idea what's going on?

You have to run the multiprocessing code always under the if __name__ == '__main__': or
else it doesn't work. If you see the last line of your cmd you can see AttributeError: Can't get attribute 'heavy_processing' on <module 'main' (built-in)> which specifies that it couldn't find 'heavy_processing' on 'main'
Full Code
import time
from multiprocessing import Pool
def heavy_processing(number):
time.sleep(0.05) # simulate a long-running operation
output = number + 1
print(output)
return output
if __name__ == '__main__':
with Pool(4) as p:
numbers = list(range(0, 1000))
results = p.map(heavy_processing, numbers)
Output
1
64
127
190
2
65
128
191
3
66
129
192
4
67
130
193
5
68
131
194
6
69
132
195
7
70
133
196
8
71
134
197
9
72
135
198
10
73
136
199
11
74
137
200
75
12
138
201
76
13
139
202
77
14
140
203
78
15
141
204
79
16
142
205
17
80
143
206
18
81
144
207
19
82
145
208
20
83
146
209
21
84
147
210
22
85
148
211
23
86
149
212
24
87
150
213
25
88
151
214
26
89
152
215
90
27
153
216
91
28
154
217
92
29
155
218
30
93
156
219
31
94
157
220
95
32
158
221
33
96
159
222
97
34
160
223
98
35
161
224
99
36
162
225
100
37
163
226
101
38
164
227
102
39
165
228
103
40
166
229
104
41
167
230
105
42
168
231
106
43
169
232
107
44
170
233
108
45
171
234
109
46
172
235
110
47
173
236
111
48
174
237
112
49
175
238
113
50
176
239
114
51
177
240
52
115
178
241
53
116
179
242
117
54
180
243
55
118
181
244
56
119
182
245
57
120
183
246
121
58
184
247
122
59
185
248
123
60
186
249
124
61
187
250
125
62
188
251
126
63
189
252
253
316
379
442
254
317
380
443
255
318
381
444
256
319
382
445
257
320
383
446
258
321
384
447
259
322
385
448
260
323
386
449
261
324
387
450
262
325
388
451
263
326
389
452
264
327
390
453
265
328
391
454
266
329
392
455
267
330
393
456
268
331
394
457
269
332
395
458
270
333
396
459
271
334
397
460
272
335
398
461
273
336
399
462
274
337
400
463
275
338
401
464
276
339
402
465
277
340
403
466
278
341
404
467
279
342
405
280
468
343
406
281
469
344
407
470
282
345
408
471
283
346
409
472
284
347
410
473
285
348
411
474
286
349
412
475
287
350
413
476
288
351
414
477
289
352
415
290
478
353
416
479
291
354
417
480
292
355
418
293
481
356
419
294
482
357
420
295
483
358
421
296
484
359
422
297
485
360
423
298
486
361
424
299
487
362
425
300
488
363
426
301
489
364
427
302
490
365
428
303
491
366
429
304
492
367
430
305
493
368
431
306
494
369
432
307
495
370
433
308
496
371
434
309
497
372
435
310
498
373
436
311
499
374
437
312
500
375
438
313
501
376
439
314
502
377
440
315
503
378
441
505
504
568
631
506
694
569
632
507
695
570
633
508
696
571
634
509
697
572
635
510
698
573
636
511
699
574
637
512
700
575
638
513
701
576
639
514
702
577
640
515
703
578
516
704
641
579
517
705
642
580
518
706
643
581
519
707
644
582
520
708
645
583
521
709
646
584
522
710
647
585
523
711
648
586
524
712
649
587
525
650
713
588
526
651
714
589
527
715
652
590
528
716
653
591
529
717
654
592
530
718
655
593
531
719
656
594
532
720
657
595
721
533
658
596
534
722
659
597
535
723
660
598
536
724
661
599
537
725
662
600
538
726
663
601
539
727
664
602
540
728
665
603
541
729
666
604
542
730
667
605
543
731
668
606
544
732
669
607
545
733
670
608
546
734
671
609
547
735
672
610
548
736
673
611
549
737
674
612
550
738
675
613
551
739
676
614
552
740
677
615
553
741
678
616
554
742
679
617
555
743
680
618
556
744
681
619
557
745
682
620
558
746
683
621
559
747
684
622
560
748
685
623
561
749
686
624
562
750
687
625
563
751
688
626
564
752
689
627
565
753
690
628
566
754
691
629
567
755
692
630
757
756
693
820
758
883
946
821
759
884
947
822
760
885
948
823
761
886
949
824
762
887
950
825
763
888
951
826
764
889
952
827
765
890
953
828
766
891
954
829
767
892
955
830
768
893
956
831
769
894
957
832
770
895
958
833
771
896
959
834
772
897
960
835
773
898
961
836
774
899
962
837
775
900
963
838
776
901
964
839
777
902
965
840
778
903
966
841
779
904
967
842
780
905
968
843
781
906
969
844
782
907
970
845
783
908
971
846
784
909
972
847
785
910
973
848
786
911
974
849
787
912
975
850
788
913
976
851
789
914
977
852
790
915
978
853
791
916
979
854
792
917
980
855
793
918
981
856
794
919
982
857
795
920
983
858
796
921
984
859
797
922
985
860
798
923
986
861
799
924
987
862
800
925
988
863
801
926
989
864
802
927
990
865
803
928
991
866
804
929
992
867
805
930
993
868
806
931
994
869
807
932
995
870
808
933
996
871
809
934
997
872
810
935
998
873
811
936
999
874
812
937
1000
875
813
938
876
814
939
877
815
940
878
816
941
879
817
942
880
818
943
881
819
944
882
945
Hope this helps. Happy Coding :)

How do I use pandas to organize dataframe by both row and column?

I'm learning python and pandas, and I know how to do basic operations like groupby() and sum(). But I'm trying to do more complex operations like categorizing using rows and columns, but I'm not sure how to begin the problem below.
Here's the dataset from GitHub:
https://github.com/KeithGalli/pandas/blob/master/pokemon_data.csv
Here's what I'm trying to produce:
Generation
Fire A-M
Fire N-Z
Water A-M
Water N-Z
Grass A-M
Grass N-Z
1
#pokemon
2
3
4
5
6
Here's what my approach:
df = pd.read_csv(pokemon_data.csv, header=0)
fire = df.loc[df['Type 1'] == 'Fire']
water = df.loc[df['Type 1'] == 'Water']
grass = df.loc[df['Type 1'] == 'Grass']
# Trim down columns to only related data
fire = fire[['Name', 'Type 1', 'Generation']]
water = water[['Name', 'Type 1', 'Generation']]
grass = grass[['Name', 'Type 1', 'Generation']]
Next steps: Should I begin to sort by Generation first, or by alphabetical range (A-M and N-Z)? I can't wrap my head around this.
An explanation of your work is much appreciated. Thank you!

Create helper column for new columns in final DataFrame by compare first letter of column Name and then use DataFrame.pivot_table, if need aggregate strings in Name need aggregate function join:
df['cat'] = df['Type 1'] + ' ' + np.where(df['Name'].str[0].gt('M'), 'N-Z','A-M')
print (df)
# Name Type 1 Type 2 HP Attack Defense \
0 1 Bulbasaur Grass Poison 45 49 49
1 2 Ivysaur Grass Poison 60 62 63
2 3 Venusaur Grass Poison 80 82 83
3 3 VenusaurMega Venusaur Grass Poison 80 100 123
4 4 Charmander Fire NaN 39 52 43
.. ... ... ... ... .. ... ...
795 719 Diancie Rock Fairy 50 100 150
796 719 DiancieMega Diancie Rock Fairy 50 160 110
797 720 HoopaHoopa Confined Psychic Ghost 80 110 60
798 720 HoopaHoopa Unbound Psychic Dark 80 160 60
799 721 Volcanion Fire Water 80 110 120
Sp. Atk Sp. Def Speed Generation Legendary cat
0 65 65 45 1 False Grass A-M
1 80 80 60 1 False Grass A-M
2 100 100 80 1 False Grass N-Z
3 122 120 80 1 False Grass N-Z
4 60 50 65 1 False Fire A-M
.. ... ... ... ... ... ...
795 100 150 50 6 True Rock A-M
796 160 110 110 6 True Rock A-M
797 150 130 70 6 True Psychic A-M
798 170 130 80 6 True Psychic A-M
799 130 90 70 6 True Fire N-Z
df = df.pivot_table(index='Generation', columns='cat', values='Name', aggfunc=','.join)
# print (df)

Create your column names first then pivot your dataframe:
df['Group'] = df['Type 1'] + ' ' + np.where(df['Name'].str[0].between('A', 'M'), 'A-M', 'N-Z')
out = df.astype({'#': str}).pivot_table('#', 'Generation', 'Group', aggfunc=' '.join)
Output
>>> out
Group Bug A-M Bug N-Z Dark A-M ... Steel N-Z Water A-M Water N-Z
Generation ...
1 10 11 12 14 15 15 13 46 47 48 49 123 127 127 NaN ... NaN 9 9 55 87 91 98 99 116 118 129 130 130 131 7 8 54 60 61 62 72 73 79 80 80 86 90 117 119 1...
2 165 166 168 205 214 214 167 193 204 212 212 213 198 228 229 229 ... 208 208 227 159 160 170 171 183 184 222 226 230 158 186 194 195 199 211 223 224 245
3 267 268 269 284 314 265 266 283 290 291 292 313 262 359 359 ... 379 258 259 270 271 272 318 339 341 342 349 350 36... 260 260 278 279 319 319 320 321 340 369
4 401 402 412 414 415 413 413 413 416 469 430 491 ... NaN 395 418 419 423 456 457 458 490 393 394 422 484 489
5 542 557 558 588 589 595 596 617 632 636 649 540 541 543 544 545 616 637 510 625 630 633 635 ... NaN 502 550 565 580 592 593 594 647 647 501 503 515 516 535 536 537 564 581
6 NaN 664 665 666 686 687 ... NaN 656 657 658 692 693 NaN
[6 rows x 35 columns]
Transposed view for readability:
>>> out.T
Generation 1 2 3 4 5 6
Group
Bug A-M 10 11 12 14 15 15 165 166 168 205 214 214 267 268 269 284 314 401 402 412 414 415 542 557 558 588 589 595 596 617 632 636 649 NaN
Bug N-Z 13 46 47 48 49 123 127 127 167 193 204 212 212 213 265 266 283 290 291 292 313 413 413 413 416 469 540 541 543 544 545 616 637 664 665 666
Dark A-M NaN 198 228 229 229 262 359 359 430 491 510 625 630 633 635 686 687
Dark N-Z NaN 197 215 261 302 302 461 509 559 560 570 571 624 629 634 717
Dragon A-M 147 148 149 NaN 334 334 371 380 380 381 381 443 444 445 445 610 611 612 621 646 646 646 704 706
Dragon N-Z NaN NaN 372 373 373 384 384 NaN 643 644 705 718
Electric A-M 81 82 101 125 135 179 180 181 181 239 309 310 310 312 404 405 462 466 522 587 603 604 694 695 702
Electric N-Z 25 26 100 145 172 243 311 403 417 479 479 479 479 479 479 523 602 642 642 NaN
Fairy A-M 35 36 173 210 NaN NaN NaN 669 670 671 683
Fairy N-Z NaN 175 176 209 NaN 468 NaN 682 684 685 700 716
Fighting A-M 56 66 67 68 106 107 237 296 297 307 308 308 448 448 533 534 619 620 701
Fighting N-Z 57 236 NaN 447 532 538 539 674 675
Fire A-M 4 5 6 6 6 58 59 126 136 146 155 219 240 244 250 256 257 257 323 323 390 391 392 467 485 500 554 555 555 631 653 654 655 662 667
Fire N-Z 37 38 77 78 156 157 218 255 322 324 NaN 498 499 513 514 663 668 721
Flying N-Z NaN NaN NaN NaN 641 641 714 715
Ghost A-M 92 93 94 94 200 354 354 355 356 425 426 429 477 487 487 563 607 608 609 711 711 711 711
Ghost N-Z NaN NaN 353 442 562 708 709 710 710 710 710
Grass A-M 1 2 44 69 102 103 152 153 154 182 187 189 253 286 331 332 388 406 420 421 455 460 460 470 546 549 556 590 591 597 598 650 652 673
Grass N-Z 3 3 43 45 70 71 114 188 191 192 252 254 254 273 274 275 285 315 357 387 389 407 459 465 492 492 495 496 497 511 512 547 548 640 651 672
Ground A-M 50 51 104 105 207 232 330 343 344 383 383 449 450 472 529 530 552 553 622 623 645 645 NaN
Ground N-Z 27 28 111 112 231 328 329 464 551 618 NaN
Ice A-M 124 144 225 362 362 471 473 478 613 614 615 712 713
Ice N-Z NaN 220 221 238 361 363 364 365 378 NaN 582 583 584 NaN
Normal A-M 22 39 52 83 84 85 108 113 115 115 132 133 162 163 174 190 203 206 241 242 264 294 295 298 301 351 352 399 400 424 427 428 428 431 440 441 446 463 493 506 507 531 531 572 573 585 626 628 648 648 659 660 661 676
Normal N-Z 16 17 18 18 19 20 21 40 53 128 137 143 161 164 216 217 233 234 235 263 276 277 287 288 289 293 300 327 333 335 396 397 398 432 474 486 504 505 508 519 520 521 586 627 NaN
Poison A-M 23 24 42 88 89 109 169 316 452 453 569 691
Poison N-Z 29 30 31 32 33 34 41 110 NaN 317 336 434 435 451 454 568 690
Psychic A-M 63 64 65 65 96 97 122 150 150 150 151 196 249 251 281 282 282 326 358 386 386 386 386 433 439 475 475 481 482 488 517 518 574 575 576 578 605 606 677 678 678 720 720
Psychic N-Z NaN 177 178 201 202 280 325 360 480 494 527 528 561 577 579 NaN
Rock A-M 74 75 76 140 141 142 142 246 337 345 346 347 348 408 411 438 525 526 566 567 688 689 698 699 703 719 719
Rock N-Z 95 138 139 185 247 248 248 299 338 377 409 410 476 524 639 696 697
Steel A-M NaN NaN 303 303 304 305 306 306 374 375 376 376 385 436 437 483 599 600 601 638 679 680 681 681 707
Steel N-Z NaN 208 208 227 379 NaN NaN NaN
Water A-M 9 9 55 87 91 98 99 116 118 129 130 130 131 159 160 170 171 183 184 222 226 230 258 259 270 271 272 318 339 341 342 349 350 36... 395 418 419 423 456 457 458 490 502 550 565 580 592 593 594 647 647 656 657 658 692 693
Water N-Z 7 8 54 60 61 62 72 73 79 80 80 86 90 117 119 1... 158 186 194 195 199 211 223 224 245 260 260 278 279 319 319 320 321 340 369 393 394 422 484 489 501 503 515 516 535 536 537 564 581 NaN

Filtering columns based on row values in Pandas

I am trying to create dataframes from this "master" dataframe based on unique entries in the row 2.
DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
1 DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
2 UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
3
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
7 4/1/2020 872 568 505 652 366 982 159 131 218 961 52 85 679 923
8 5/1/2020 93 58 864 682 346 19 293 19 206 500 793 962 630 413
9 6/1/2020 696 262 833 418 876 695 900 781 179 138 143 526 9 866
10 7/1/2020 810 58 579 244 81 858 362 440 186 425 55 920 345 596
11 8/1/2020 834 609 618 214 547 834 301 875 783 216 834 609 550 274
12 9/1/2020 687 935 976 380 885 246 339 904 627 460 659 352 361 793
13 10/1/2020 596 300 810 248 475 718 350 574 825 804 245 209 212 925
14 11/1/2020 584 984 711 879 916 107 277 412 122 683 151 811 129 4
15 12/1/2020 616 515 101 743 650 526 475 991 796 227 880 692 734 799
16 1/1/2021 106 441 305 964 452 249 282 486 374 620 652 793 115 697
17 2/1/2021 969 504 936 678 67 42 985 791 709 689 520 503 102 731
18 3/1/2021 823 169 412 177 783 601 613 251 533 463 13 127 516 15
19 4/1/2021 348 588 140 966 143 576 419 611 128 830 68 209 952 935
20 5/1/2021 96 711 651 121 708 360 159 229 552 951 79 665 709 165
21 6/1/2021 805 657 729 629 249 547 581 583 236 828 636 248 412 535
22 7/1/2021 286 320 908 765 336 286 148 168 821 567 63 908 248 320
23 8/1/2021 707 975 565 699 47 712 700 439 497 106 288 105 872 158
24 9/1/2021 346 523 142 181 904 266 28 740 125 64 287 707 553 437
25 10/1/2021 245 42 773 591 492 512 846 487 983 180 372 306 785 691
26 11/1/2021 785 577 448 489 425 205 672 358 868 637 104 422 873 919
so the output will look something like this
df_unit1
DATE PROP1 PROP2
1 DAYS MEAN MEAN
2 UNIT1 UNIT1
3
4 1/1/2020 677 972
5 2/1/2020 515 430
6 3/1/2020 253 174
7 4/1/2020 872 679
8 5/1/2020 93 630
9 6/1/2020 696 9
10 7/1/2020 810 345
11 8/1/2020 834 550
12 9/1/2020 687 361
13 10/1/2020 596 212
14 11/1/2020 584 129
15 12/1/2020 616 734
16 1/1/2021 106 115
17 2/1/2021 969 102
18 3/1/2021 823 516
19 4/1/2021 348 952
20 5/1/2021 96 709
21 6/1/2021 805 412
22 7/1/2021 286 248
23 8/1/2021 707 872
24 9/1/2021 346 553
25 10/1/2021 245 785
26 11/1/2021 785 873
df_unit2
DATE PROP1 PROP2
1 DAYS MEAN MEAN
2 UNIT2 UNIT2
3
4 1/1/2020 92 733
5 2/1/2020 11 272
6 3/1/2020 295 602
7 4/1/2020 568 923
8 5/1/2020 58 413
9 6/1/2020 262 866
10 7/1/2020 58 596
11 8/1/2020 609 274
12 9/1/2020 935 793
13 10/1/2020 300 925
14 11/1/2020 984 4
15 12/1/2020 515 799
16 1/1/2021 441 697
17 2/1/2021 504 731
18 3/1/2021 169 15
19 4/1/2021 588 935
20 5/1/2021 711 165
21 6/1/2021 657 535
22 7/1/2021 320 320
23 8/1/2021 975 158
24 9/1/2021 523 437
25 10/1/2021 42 691
26 11/1/2021 577 919
I have extracted the unique units from the row
unitName = pd.Series(pd.Series(df[2,:]).unique(), name = "Unit Names")
unitName = unitName.tolist()
Next I was planning to loop through this list of unique units and create dataframes with each units
for unit in unitName:
df_unit = df.iloc[[df.iloc[2:,:].str.match(unit)],:]
print(df_unit)
I am getting an error that 'DataFrame' object has no attribute 'str'. So my plan was to match all the cells in row2 that matches a given unit and then extract the entire column for the matched row cell.

This response has two parts:
Solution 1: Strip columns based on common name in dataframe
With the assumption that your dataframe columns look as follows:
['DATE DAYS', 'PROP1 MEAN UNIT1', 'PROP1 MEAN UNIT2', 'PROP1 MEAN UNIT3', 'PROP1 MEAN UNIT4', 'PROP1 MEAN UNIT5', 'PROP1 MEAN UNIT6', 'PROP2 MEAN UNIT7', 'PROP2 MEAN UNIT8', 'PROP2 MEAN UNIT3', 'PROP2 MEAN UNIT4', 'PROP2 MEAN UNIT11', 'PROP2 MEAN UNIT12', 'PROP2 MEAN UNIT1', 'PROP2 MEAN UNIT2']
and the first few records of your dataframe looks like this...
DATE DAYS PROP1 MEAN UNIT1 ... PROP2 MEAN UNIT1 PROP2 MEAN UNIT2
0 1/1/2020 677 ... 972 733
1 2/1/2020 515 ... 430 272
2 3/1/2020 253 ... 174 602
3 4/1/2020 872 ... 679 923
4 5/1/2020 93 ... 630 413
5 6/1/2020 696 ... 9 866
6 7/1/2020 810 ... 345 596
The following lines of code should give you what you want:
cols = df.columns.tolist()
units = sorted(set(x[x.rfind('UNIT'):] for x in cols[1:]))
s_units = sorted(cols[1:],key = lambda x: x.split()[2])
for i in units:
unit_sublist = ['DATE DAYS'] + [j for j in s_units if j[-6:].strip() == i]
print ('df_' + i.lower())
print (df[unit_sublist])
I got the following:
df_unit1
DATE DAYS PROP1 MEAN UNIT1 PROP2 MEAN UNIT1
0 1/1/2020 677 972
1 2/1/2020 515 430
2 3/1/2020 253 174
3 4/1/2020 872 679
4 5/1/2020 93 630
5 6/1/2020 696 9
6 7/1/2020 810 345
df_unit11
DATE DAYS PROP2 MEAN UNIT11
0 1/1/2020 586
1 2/1/2020 123
2 3/1/2020 823
3 4/1/2020 52
4 5/1/2020 793
5 6/1/2020 143
6 7/1/2020 55
df_unit12
DATE DAYS PROP2 MEAN UNIT12
0 1/1/2020 576
1 2/1/2020 36
2 3/1/2020 822
3 4/1/2020 85
4 5/1/2020 962
5 6/1/2020 526
6 7/1/2020 920
df_unit2
DATE DAYS PROP1 MEAN UNIT2 PROP2 MEAN UNIT2
0 1/1/2020 92 733
1 2/1/2020 11 272
2 3/1/2020 295 602
3 4/1/2020 568 923
4 5/1/2020 58 413
5 6/1/2020 262 866
6 7/1/2020 58 596
df_unit3
DATE DAYS PROP1 MEAN UNIT3 PROP2 MEAN UNIT3
0 1/1/2020 342 69
1 2/1/2020 86 441
2 3/1/2020 644 680
3 4/1/2020 505 218
4 5/1/2020 864 206
5 6/1/2020 833 179
6 7/1/2020 579 186
df_unit4
DATE DAYS PROP1 MEAN UNIT4 PROP2 MEAN UNIT4
0 1/1/2020 432 621
1 2/1/2020 754 11
2 3/1/2020 401 729
3 4/1/2020 652 961
4 5/1/2020 682 500
5 6/1/2020 418 138
6 7/1/2020 244 425
df_unit5
DATE DAYS PROP1 MEAN UNIT5
0 1/1/2020 878
1 2/1/2020 219
2 3/1/2020 574
3 4/1/2020 366
4 5/1/2020 346
5 6/1/2020 876
6 7/1/2020 81
df_unit6
DATE DAYS PROP1 MEAN UNIT6
0 1/1/2020 831
1 2/1/2020 818
2 3/1/2020 184
3 4/1/2020 982
4 5/1/2020 19
5 6/1/2020 695
6 7/1/2020 858
df_unit7
DATE DAYS PROP2 MEAN UNIT7
0 1/1/2020 293
1 2/1/2020 822
2 3/1/2020 354
3 4/1/2020 159
4 5/1/2020 293
5 6/1/2020 900
6 7/1/2020 362
df_unit8
DATE DAYS PROP2 MEAN UNIT8
0 1/1/2020 88
1 2/1/2020 280
2 3/1/2020 12
3 4/1/2020 131
4 5/1/2020 19
5 6/1/2020 781
6 7/1/2020 440
Solution 2: Create column names based on first 3 rows in the source data
Let us assume the first 6 rows of your dataframe looks like this.
DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
Then you can write the below code to create the dataframe.
data = '''DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
7 4/1/2020 872 568 505 652 366 982 159 131 218 961 52 85 679 923
8 5/1/2020 93 58 864 682 346 19 293 19 206 500 793 962 630 413
9 6/1/2020 696 262 833 418 876 695 900 781 179 138 143 526 9 866
10 7/1/2020 810 58 579 244 81 858 362 440 186 425 55 920 345 596
11 8/1/2020 834 609 618 214 547 834 301 875 783 216 834 609 550 274
12 9/1/2020 687 935 976 380 885 246 339 904 627 460 659 352 361 793
13 10/1/2020 596 300 810 248 475 718 350 574 825 804 245 209 212 925
14 11/1/2020 584 984 711 879 916 107 277 412 122 683 151 811 129 4
15 12/1/2020 616 515 101 743 650 526 475 991 796 227 880 692 734 799
16 1/1/2021 106 441 305 964 452 249 282 486 374 620 652 793 115 697
17 2/1/2021 969 504 936 678 67 42 985 791 709 689 520 503 102 731
18 3/1/2021 823 169 412 177 783 601 613 251 533 463 13 127 516 15
19 4/1/2021 348 588 140 966 143 576 419 611 128 830 68 209 952 935
20 5/1/2021 96 711 651 121 708 360 159 229 552 951 79 665 709 165
21 6/1/2021 805 657 729 629 249 547 581 583 236 828 636 248 412 535
22 7/1/2021 286 320 908 765 336 286 148 168 821 567 63 908 248 320
23 8/1/2021 707 975 565 699 47 712 700 439 497 106 288 105 872 158
24 9/1/2021 346 523 142 181 904 266 28 740 125 64 287 707 553 437
25 10/1/2021 245 42 773 591 492 512 846 487 983 180 372 306 785 691
26 11/1/2021 785 577 448 489 425 205 672 358 868 637 104 422 873 919'''
data_list = data.split('\n')
data_line1 = data_list[0].split()
data_line2 = data_list[1].split()
data_line3 = [''] + data_list[2].split()
data_header = [' '.join([data_line1[i],data_line2[i],data_line3[i]]) for i in range(len(data_line1))]
data_header[0] = data_header[0][:-1]
new_data= data_list[3:]
import pandas as pd
df = pd.DataFrame(data = None,columns=data_header)
for i in range(len(new_data)-1):
df.loc[i] = new_data[i].split()[1:]
print (df)

Here is what worked for me.
#Assign unique column names to the dataframe
df.columns = range(df.shape[1])
#Get all the unique units in the dataframe
unitName = pd.Series(pd.Series(df.loc[2,:]).unique(), name = "Unit Names")
#Convert them to a list to loop through
unitName = unitName.tolist()
for var in unitName:
#this looks for an exact match for the unit in row index 2 and
#extracts the entire column with the match
df_item = df[df.columns[df.iloc[3].str.fullmatch(var)]]
print (df_item)

when changed sampling rate on librosa.load, how to changed librosa.onset.onset_strength?

I'm trying to extract tempo and beats from an audio file (sample rate:2000) with the below code:
data, sr = librosa.load(path, mono=True, sr=2000)
print ("self.sr :", sr)
onset_env = librosa.onset.onset_strength(data, sr=sr)
tempo, beats = librosa.beat.beat_track(data, sr=sr, onset_envelope=onset_env)
print ("tempo :", tempo)
beats = librosa.frames_to_time(beats, self.sr)
print ("beats :", beats)
i changed only sample rate
but. output is weird
/usr/lib/python3.6/site-packages/librosa/filters.py:284: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
warnings.warn('Empty filters detected in mel frequency basis. '
/usr/lib64/python3.6/site-packages/scipy/fftpack/basic.py:160: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
z[index] = x
tempo : 117.1875
beats : [ 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74
76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110
112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 146
148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182
184 186 188 190 192 194 196 198 200 202 204 206 208 210 212 214 216 218
220 222 224 226 228 230 232 234 236 238 240 242 244 246 248 250 252 254
256 258 260 262 264 266 268 270 272 274 276 278 280 282 284 286 288 290
292 294 296 298 300 302 304 306 308 310 312 314 316 318 320 322 324 326
328 330 332 334 336 338 340 342 344 346 348 350 352 354 356 358 360 362
364 366 368 370 372 374 376 378 380 382 384 386 388 390 392 394 396 398
400 402 404 406 408 410 412 414 416 418 420 422 424 426 428 430 432 434
436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466]
so, i removed sr parameter and run below code :
data, sr = librosa.load(path, mono=True)
print ("self.sr :", sr)
onset_env = librosa.onset.onset_strength(data, sr=sr)
tempo, beats = librosa.beat.beat_track(data, sr=sr, onset_envelope=onset_env)
print ("tempo :", tempo)
beats = librosa.frames_to_time(beats, self.sr)
print ("beats :", beats)
here is removed sr output
self.sr : 22050
/usr/lib64/python3.6/site-packages/scipy/fftpack/basic.py:160: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
z[index] = x
tempo : 161.4990234375
beats : [ 7 23 39 55 71 87 102 118 134 150 166 182 197 213
228 244 260 276 292 307 323 339 355 371 387 404 420 438
454 470 486 501 517 533 549 565 581 596 612 628 644 659
675 691 706 722 738 754 770 786 801 817 833 850 868 884
900 916 932 948 964 980 996 1011 1027 1043 1059 1074 1090 1106
1121 1137 1153 1168 1184 1201 1216 1232 1248 1264 1279 1293 1312 1331
1347 1363 1379 1394 1410 1426 1442 1458 1474 1489 1505 1520 1536 1552
1568 1584 1599 1615 1631 1647 1663 1679 1696 1712 1730 1746 1762 1778
1793 1809 1825 1841 1857 1873 1888 1904 1920 1936 1951 1967 1983 1998
2014 2030 2046 2062 2078 2093 2109 2125 2142 2160 2176 2192 2208 2224
2240 2256 2272 2288 2303 2319 2335 2351 2366 2382 2398 2413 2429 2445
2460 2476 2492 2508 2524 2540 2556 2571 2585 2604 2623 2639 2655 2671
2686 2702 2718 2734 2750 2766 2781 2797 2812 2828 2844 2860 2876 2891
2907 2923 2939 2955 2971 2988 3004 3022 3038 3054 3070 3085 3101 3117
3133 3149 3165 3180 3196 3212 3228 3243 3259 3275 3290 3306 3322 3338
3354 3370 3385 3401 3417 3434 3452 3468 3484 3500 3516 3532 3548 3564
3580 3595 3611 3627 3643 3658 3674 3690 3705 3721 3737 3752 3768 3784
3800 3816 3832 3848 3863 3877 3896 3915 3931 3947 3963 3978 3994 4010
4026 4042 4058 4073 4089 4104 4120 4136 4152 4168 4183 4199 4215 4231
4247 4263 4280 4296 4314 4330 4346 4362 4377 4393 4409 4425 4441 4457
4472 4488 4504 4520 4535 4551 4567 4582 4598 4614 4630 4646 4662 4677
4693 4709 4726 4744 4760 4776 4792 4808 4824 4840 4856 4872 4887 4903
4919 4935 4950 4966 4982 4997 5013 5029 5044 5060 5076 5092 5108 5124]
How can I make the work properly when I change sr?
thank you

When calling
data, sr = librosa.load(path, mono=True, sr=2000)
you are asking librosa to resample whatever your input is to 2000 Hz (see docs: "target sampling rate"). 2000 Hz is a highly unusual sampling frequency for music and it's likely IMHO that a bunch of the algorithms in librosa will not work properly with it. Instead, typical rates are 44.1 kHz (CD quality) or 22050 Hz (the librosa default).
I assume that the beat tracker is trying to split your data into mel bands, and then process those bands individually, perhaps with some novelty curve or onset signal function, but 2 kHz is just not a whole lot to work with, which is probably why you see that empty filter message. But if the result (for the sr=2000) is correct, you could simply ignore the warning.
However, it seems like a safer bet to me to simply not set sr, let librosa resample your audio (whatever it is) to 22050 Hz and then run the beat tracking algorithm on it. 22050 Hz is the kind of sampling rate it was most likely developed on tested on and it is most likely able to succeed on.
Regarding:
/usr/lib64/python3.6/site-packages/scipy/fftpack/basic.py:160: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
This looks like a warning tied to how librosa implemented something. You should be able to ignore it without consequence.

How to input in python multiple times, given 'n' and n inputs line by line?

The first line gives n, the number of integers in the next line.
Then n integers are given .
My problem is how to accept the inputs, I tried using
ab = list(map(int,input().split()))
But no use.
Traceback (most recent call last):
File "", line 1, in
TypeError: list() takes at most 1 argument (2 given)
The format of input is:
409 //this is n
1 4 6 7 9 11 12 13 16 18 19 21 23 24 32 35 39 41 43 44 46 48 50 52 54 56 60 61 63 64 69 72 73 74 82 85 86 91 94 97 99 100 104 106 110 112 115 117 120 121 123 126 130 131 134 137 138 142 143 144 145 150 151 152 156 157 158 162 165 170 171 172 173 180 181 183 188 191 192 194 196 199 201 202 205 208 212 214 216 219 220 225 227 235 240 243 244 245 246 247 251 252 253 257 258 261 269 270 271 274 276 277 278 285 286 288 291 293 297 301 302 303 304 310 311 316 318 319 321 322 323 327 329 331 332 346 347 349 350 353 356 357 358 362 363 373 376 379 381 384 386 388 390 391 392 394 398 402 403 404 407 412 413 414 416 417 418 421 422 425 428 429 431 433 436 437 438 442 443 444 451 453 459 461 466 473 478 481 483 484 486 487 504 508 513 514 515 520 521 524 527 531 535 537 538 540 541 544 546 549 550 551 554 555 556 557 560 561 562 566 572 574 575 577 583 587 589 592 593 595 596 597 598 600 603 604 606 611 612 616 626 627 629 631 637 638 639 641 644 645 646 647 648 650 652 654 659 660 661 664 665 666 668 669 672 673 677 679 681 683 685 686 688 693 699 701 705 706 707 708 709 710 711 715 717 719 724 725 727 729 730 733 735 736 737 738 739 740 746 747 755 759 761 764 766 767 770 772 773 775 776 780 782 783 788 790 792 793 796 797 798 799 808 814 821 822 825 828 838 843 855 856 861 862 869 871 872 877 884 885 887 890 891 893 894 895 897 901 902 903 908 915 916 918 919 921 922 923 924 928 929 932 934 935 937 938 950 958 959 961 962 967 969 971 972 973 976 978 979 980 982 983 985 988 989 990 991 992 995 998 999

Newline character matters.
n = input()
input() # skip 2nd line
ab = list(map(int, input().split()))
print(n)
print(ab)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

sum of occurrences in bins - python

Related

Error when trying run loop with multiprocessing in python

How do I use pandas to organize dataframe by both row and column?

Filtering columns based on row values in Pandas

when changed sampling rate on librosa.load, how to changed librosa.onset.onset_strength?

How to input in python multiple times, given 'n' and n inputs line by line?

Categories

Resources