How do i convert Gray to RGB in nrrd file - python

i trying convert grayscale 3D image to RGB 3D image.
Now i can get each slice's array. this array value's grayscale pixel value.
but i don't know how to covert RGB value.
i tried convert color using opencv function.
import numpy as np
import nrrd
import cv2
data = nrrd.read('C:\\Users\\admin\\Desktop\\sample data\\sample_nrrd.nrrd')
print(data[0][442][376])
cv2.cvtColor(data,cv2.COLOR_GRAY2BGR)
but it's not working...
i first time using nrrd file.
how to convert gray to rgb.
And this array is example of my data.
Thanks.
[-1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000
-1000 -1000 -1000 -1000 -1000 80 236 1830 1901 1852 1742 1430
1147 1088 1285 1240 989 969 787 791 1073 1098 1380 1320
1125 1075 1209 1433 1505 1349 1114 1261 1463 1454 1696 1435
1301 1448 1384 1146 1220 1054 829 1189 1245 1319 1293 986
695 672 594 709 583 503 601 562 440 418 764 967
1275 911 842 761 652 479 691 715 505 442 768 650
705 938 1079 1076 969 936 907 902 755 588 614 770
738 646 971 802 625 890 1020 929 941 824 800 803
920 843 793 834 937 877 737 494 621 605 763 825
642 548 527 427 552 529 572 345 442 455 603 614
712 521 603 687 770 665 744 604 642 791 971 980
1059 1020 842 781 793 845 860 982 916 1077 907 491
806 533 327 709 817 913 977 735 958 624 547 651
952 1171 1184 1033 1262 2015 2193 2444 2830 2678 2650 2473
2528 2766 2915 2991 2654 2403 2700 2646 2302 2276 2706 3003
2639 2499 2414 1948 1456 1908 1409 852 500 946 747 715
864 899 960 977 807 954 1348 1053 1242 1346 1732 1634
1600 1690 1730 1797 1833 1963 1795 1775 2016 2182 2260 2132
1912 1651 1380 1576 1768 2275 1934 1790 1740 1908 2061 2068
1879 1714 1801 1678 1588 1669 1717 1596 1573 2080 1869 1922
2080 1701 2003 1617 1917 1810 1437 1292 1110 813 1079 1166
1037 1111 1518 1417 1037 603 120 137 15 -30 -197 -409
-133 -72 80 7 10 -7 -28 29 -219 -12 3 18
144 120 -89 -4 101 143 66 -162 96 218 153 120
36 188 275 58 -64 28 9 -77 89 202 206 243
349 234 54 163 262 313 282 131 175 234 102 263
109 93 57 143 282 235 175 189 217 200 297 345
314 150 -24 105 111 202 -58 20 -67 -175 -39 271
292 -2 -153 -181 -41 200 67 104 128 91 -154 -171
-42 -125 67 -172 -101 -59 -130 -94 -146 -175 23 -51
230 104 91 -16 -75 -169 -246 -203 -90 45 -99 11
72 287 149 57 111 79 -12 -104 206 0 41 68
78 -65 -255 -136 -115 53 52 61 -30 119 -155 -229
-190 -36 -163 -240 98 84 85 -17 1 54 81 -173
-205 -172 -351 -19 -86 -172 -98 -90 -169 257 126 83
171 284 297 159 50 -150 -94 -45 -39 -12 230 201
215 328 144 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000
-1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000
-1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000
-1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000
-1000 -1000 -1000 -1000 -1000 -1000 -1000 -1000]

Your question is not very precise and accurate and there can be multiple understanding of it.
First of all there is no actual 1D color to 3D color conversion, the COLOR_GRAY2BGR only convert 1-channel grey to 3-channel grey. This conversion does not add any color, just change internal image representation so we can save it in the video . The color can not recover. Since it is a 3D depth map, I think what you want is color mapping function like the image below.
Not sure your data is the final disparity or final depth. So you have to figur out it yourself later. But the general idea is to contain it within cv::Mat and the use opencv color mapping function to colorize.
Assume data_image_1D is np array
import numpy as np
import cv2
data_image_1D # this is the np array that you got it from somewhere
bw_img = np.reshape(data_image_1D,(rows,cols))
#rows and cols are the size of the depth image that you have. Try to see if you can get this convetsion working.
im_color = cv2.applyColorMap(bw_img, cv2.COLORMAP_JET)
cv2.imshow("im_color",im_color)
cv2.imshow("bw_img",bw_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
But still, you have to deal with the negative disparity, minimal disparity and maximum disparity issue. That one I can leave it to you
For more
data mapping or custom color mapping you can follow this guide https://www.learnopencv.com/applycolormap-for-pseudocoloring-in-opencv-c-python/
edit
I think what you want might be just a simple merge function. See this
import numpy as np
import nrrd
######you orginal code put it here###
bw_img = np.reshape(data_image_1D,(rows,cols))
im_color = cv2.merge([bw_img, bw_img, bw_img])
nrrd.write(filename, im_color)# Write to a NRRD file

Related

How do I use pandas to organize dataframe by both row and column?

I'm learning python and pandas, and I know how to do basic operations like groupby() and sum(). But I'm trying to do more complex operations like categorizing using rows and columns, but I'm not sure how to begin the problem below.
Here's the dataset from GitHub:
https://github.com/KeithGalli/pandas/blob/master/pokemon_data.csv
Here's what I'm trying to produce:
Generation
Fire A-M
Fire N-Z
Water A-M
Water N-Z
Grass A-M
Grass N-Z
1
#pokemon
2
3
4
5
6
Here's what my approach:
df = pd.read_csv(pokemon_data.csv, header=0)
fire = df.loc[df['Type 1'] == 'Fire']
water = df.loc[df['Type 1'] == 'Water']
grass = df.loc[df['Type 1'] == 'Grass']
# Trim down columns to only related data
fire = fire[['Name', 'Type 1', 'Generation']]
water = water[['Name', 'Type 1', 'Generation']]
grass = grass[['Name', 'Type 1', 'Generation']]
Next steps: Should I begin to sort by Generation first, or by alphabetical range (A-M and N-Z)? I can't wrap my head around this.
An explanation of your work is much appreciated. Thank you!
Create helper column for new columns in final DataFrame by compare first letter of column Name and then use DataFrame.pivot_table, if need aggregate strings in Name need aggregate function join:
df['cat'] = df['Type 1'] + ' ' + np.where(df['Name'].str[0].gt('M'), 'N-Z','A-M')
print (df)
# Name Type 1 Type 2 HP Attack Defense \
0 1 Bulbasaur Grass Poison 45 49 49
1 2 Ivysaur Grass Poison 60 62 63
2 3 Venusaur Grass Poison 80 82 83
3 3 VenusaurMega Venusaur Grass Poison 80 100 123
4 4 Charmander Fire NaN 39 52 43
.. ... ... ... ... .. ... ...
795 719 Diancie Rock Fairy 50 100 150
796 719 DiancieMega Diancie Rock Fairy 50 160 110
797 720 HoopaHoopa Confined Psychic Ghost 80 110 60
798 720 HoopaHoopa Unbound Psychic Dark 80 160 60
799 721 Volcanion Fire Water 80 110 120
Sp. Atk Sp. Def Speed Generation Legendary cat
0 65 65 45 1 False Grass A-M
1 80 80 60 1 False Grass A-M
2 100 100 80 1 False Grass N-Z
3 122 120 80 1 False Grass N-Z
4 60 50 65 1 False Fire A-M
.. ... ... ... ... ... ...
795 100 150 50 6 True Rock A-M
796 160 110 110 6 True Rock A-M
797 150 130 70 6 True Psychic A-M
798 170 130 80 6 True Psychic A-M
799 130 90 70 6 True Fire N-Z
df = df.pivot_table(index='Generation', columns='cat', values='Name', aggfunc=','.join)
# print (df)
Create your column names first then pivot your dataframe:
df['Group'] = df['Type 1'] + ' ' + np.where(df['Name'].str[0].between('A', 'M'), 'A-M', 'N-Z')
out = df.astype({'#': str}).pivot_table('#', 'Generation', 'Group', aggfunc=' '.join)
Output
>>> out
Group Bug A-M Bug N-Z Dark A-M ... Steel N-Z Water A-M Water N-Z
Generation ...
1 10 11 12 14 15 15 13 46 47 48 49 123 127 127 NaN ... NaN 9 9 55 87 91 98 99 116 118 129 130 130 131 7 8 54 60 61 62 72 73 79 80 80 86 90 117 119 1...
2 165 166 168 205 214 214 167 193 204 212 212 213 198 228 229 229 ... 208 208 227 159 160 170 171 183 184 222 226 230 158 186 194 195 199 211 223 224 245
3 267 268 269 284 314 265 266 283 290 291 292 313 262 359 359 ... 379 258 259 270 271 272 318 339 341 342 349 350 36... 260 260 278 279 319 319 320 321 340 369
4 401 402 412 414 415 413 413 413 416 469 430 491 ... NaN 395 418 419 423 456 457 458 490 393 394 422 484 489
5 542 557 558 588 589 595 596 617 632 636 649 540 541 543 544 545 616 637 510 625 630 633 635 ... NaN 502 550 565 580 592 593 594 647 647 501 503 515 516 535 536 537 564 581
6 NaN 664 665 666 686 687 ... NaN 656 657 658 692 693 NaN
[6 rows x 35 columns]
Transposed view for readability:
>>> out.T
Generation 1 2 3 4 5 6
Group
Bug A-M 10 11 12 14 15 15 165 166 168 205 214 214 267 268 269 284 314 401 402 412 414 415 542 557 558 588 589 595 596 617 632 636 649 NaN
Bug N-Z 13 46 47 48 49 123 127 127 167 193 204 212 212 213 265 266 283 290 291 292 313 413 413 413 416 469 540 541 543 544 545 616 637 664 665 666
Dark A-M NaN 198 228 229 229 262 359 359 430 491 510 625 630 633 635 686 687
Dark N-Z NaN 197 215 261 302 302 461 509 559 560 570 571 624 629 634 717
Dragon A-M 147 148 149 NaN 334 334 371 380 380 381 381 443 444 445 445 610 611 612 621 646 646 646 704 706
Dragon N-Z NaN NaN 372 373 373 384 384 NaN 643 644 705 718
Electric A-M 81 82 101 125 135 179 180 181 181 239 309 310 310 312 404 405 462 466 522 587 603 604 694 695 702
Electric N-Z 25 26 100 145 172 243 311 403 417 479 479 479 479 479 479 523 602 642 642 NaN
Fairy A-M 35 36 173 210 NaN NaN NaN 669 670 671 683
Fairy N-Z NaN 175 176 209 NaN 468 NaN 682 684 685 700 716
Fighting A-M 56 66 67 68 106 107 237 296 297 307 308 308 448 448 533 534 619 620 701
Fighting N-Z 57 236 NaN 447 532 538 539 674 675
Fire A-M 4 5 6 6 6 58 59 126 136 146 155 219 240 244 250 256 257 257 323 323 390 391 392 467 485 500 554 555 555 631 653 654 655 662 667
Fire N-Z 37 38 77 78 156 157 218 255 322 324 NaN 498 499 513 514 663 668 721
Flying N-Z NaN NaN NaN NaN 641 641 714 715
Ghost A-M 92 93 94 94 200 354 354 355 356 425 426 429 477 487 487 563 607 608 609 711 711 711 711
Ghost N-Z NaN NaN 353 442 562 708 709 710 710 710 710
Grass A-M 1 2 44 69 102 103 152 153 154 182 187 189 253 286 331 332 388 406 420 421 455 460 460 470 546 549 556 590 591 597 598 650 652 673
Grass N-Z 3 3 43 45 70 71 114 188 191 192 252 254 254 273 274 275 285 315 357 387 389 407 459 465 492 492 495 496 497 511 512 547 548 640 651 672
Ground A-M 50 51 104 105 207 232 330 343 344 383 383 449 450 472 529 530 552 553 622 623 645 645 NaN
Ground N-Z 27 28 111 112 231 328 329 464 551 618 NaN
Ice A-M 124 144 225 362 362 471 473 478 613 614 615 712 713
Ice N-Z NaN 220 221 238 361 363 364 365 378 NaN 582 583 584 NaN
Normal A-M 22 39 52 83 84 85 108 113 115 115 132 133 162 163 174 190 203 206 241 242 264 294 295 298 301 351 352 399 400 424 427 428 428 431 440 441 446 463 493 506 507 531 531 572 573 585 626 628 648 648 659 660 661 676
Normal N-Z 16 17 18 18 19 20 21 40 53 128 137 143 161 164 216 217 233 234 235 263 276 277 287 288 289 293 300 327 333 335 396 397 398 432 474 486 504 505 508 519 520 521 586 627 NaN
Poison A-M 23 24 42 88 89 109 169 316 452 453 569 691
Poison N-Z 29 30 31 32 33 34 41 110 NaN 317 336 434 435 451 454 568 690
Psychic A-M 63 64 65 65 96 97 122 150 150 150 151 196 249 251 281 282 282 326 358 386 386 386 386 433 439 475 475 481 482 488 517 518 574 575 576 578 605 606 677 678 678 720 720
Psychic N-Z NaN 177 178 201 202 280 325 360 480 494 527 528 561 577 579 NaN
Rock A-M 74 75 76 140 141 142 142 246 337 345 346 347 348 408 411 438 525 526 566 567 688 689 698 699 703 719 719
Rock N-Z 95 138 139 185 247 248 248 299 338 377 409 410 476 524 639 696 697
Steel A-M NaN NaN 303 303 304 305 306 306 374 375 376 376 385 436 437 483 599 600 601 638 679 680 681 681 707
Steel N-Z NaN 208 208 227 379 NaN NaN NaN
Water A-M 9 9 55 87 91 98 99 116 118 129 130 130 131 159 160 170 171 183 184 222 226 230 258 259 270 271 272 318 339 341 342 349 350 36... 395 418 419 423 456 457 458 490 502 550 565 580 592 593 594 647 647 656 657 658 692 693
Water N-Z 7 8 54 60 61 62 72 73 79 80 80 86 90 117 119 1... 158 186 194 195 199 211 223 224 245 260 260 278 279 319 319 320 321 340 369 393 394 422 484 489 501 503 515 516 535 536 537 564 581 NaN

Filtering columns based on row values in Pandas

I am trying to create dataframes from this "master" dataframe based on unique entries in the row 2.
DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
1 DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
2 UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
3
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
7 4/1/2020 872 568 505 652 366 982 159 131 218 961 52 85 679 923
8 5/1/2020 93 58 864 682 346 19 293 19 206 500 793 962 630 413
9 6/1/2020 696 262 833 418 876 695 900 781 179 138 143 526 9 866
10 7/1/2020 810 58 579 244 81 858 362 440 186 425 55 920 345 596
11 8/1/2020 834 609 618 214 547 834 301 875 783 216 834 609 550 274
12 9/1/2020 687 935 976 380 885 246 339 904 627 460 659 352 361 793
13 10/1/2020 596 300 810 248 475 718 350 574 825 804 245 209 212 925
14 11/1/2020 584 984 711 879 916 107 277 412 122 683 151 811 129 4
15 12/1/2020 616 515 101 743 650 526 475 991 796 227 880 692 734 799
16 1/1/2021 106 441 305 964 452 249 282 486 374 620 652 793 115 697
17 2/1/2021 969 504 936 678 67 42 985 791 709 689 520 503 102 731
18 3/1/2021 823 169 412 177 783 601 613 251 533 463 13 127 516 15
19 4/1/2021 348 588 140 966 143 576 419 611 128 830 68 209 952 935
20 5/1/2021 96 711 651 121 708 360 159 229 552 951 79 665 709 165
21 6/1/2021 805 657 729 629 249 547 581 583 236 828 636 248 412 535
22 7/1/2021 286 320 908 765 336 286 148 168 821 567 63 908 248 320
23 8/1/2021 707 975 565 699 47 712 700 439 497 106 288 105 872 158
24 9/1/2021 346 523 142 181 904 266 28 740 125 64 287 707 553 437
25 10/1/2021 245 42 773 591 492 512 846 487 983 180 372 306 785 691
26 11/1/2021 785 577 448 489 425 205 672 358 868 637 104 422 873 919
so the output will look something like this
df_unit1
DATE PROP1 PROP2
1 DAYS MEAN MEAN
2 UNIT1 UNIT1
3
4 1/1/2020 677 972
5 2/1/2020 515 430
6 3/1/2020 253 174
7 4/1/2020 872 679
8 5/1/2020 93 630
9 6/1/2020 696 9
10 7/1/2020 810 345
11 8/1/2020 834 550
12 9/1/2020 687 361
13 10/1/2020 596 212
14 11/1/2020 584 129
15 12/1/2020 616 734
16 1/1/2021 106 115
17 2/1/2021 969 102
18 3/1/2021 823 516
19 4/1/2021 348 952
20 5/1/2021 96 709
21 6/1/2021 805 412
22 7/1/2021 286 248
23 8/1/2021 707 872
24 9/1/2021 346 553
25 10/1/2021 245 785
26 11/1/2021 785 873
df_unit2
DATE PROP1 PROP2
1 DAYS MEAN MEAN
2 UNIT2 UNIT2
3
4 1/1/2020 92 733
5 2/1/2020 11 272
6 3/1/2020 295 602
7 4/1/2020 568 923
8 5/1/2020 58 413
9 6/1/2020 262 866
10 7/1/2020 58 596
11 8/1/2020 609 274
12 9/1/2020 935 793
13 10/1/2020 300 925
14 11/1/2020 984 4
15 12/1/2020 515 799
16 1/1/2021 441 697
17 2/1/2021 504 731
18 3/1/2021 169 15
19 4/1/2021 588 935
20 5/1/2021 711 165
21 6/1/2021 657 535
22 7/1/2021 320 320
23 8/1/2021 975 158
24 9/1/2021 523 437
25 10/1/2021 42 691
26 11/1/2021 577 919
I have extracted the unique units from the row
unitName = pd.Series(pd.Series(df[2,:]).unique(), name = "Unit Names")
unitName = unitName.tolist()
Next I was planning to loop through this list of unique units and create dataframes with each units
for unit in unitName:
df_unit = df.iloc[[df.iloc[2:,:].str.match(unit)],:]
print(df_unit)
I am getting an error that 'DataFrame' object has no attribute 'str'. So my plan was to match all the cells in row2 that matches a given unit and then extract the entire column for the matched row cell.
This response has two parts:
Solution 1: Strip columns based on common name in dataframe
With the assumption that your dataframe columns look as follows:
['DATE DAYS', 'PROP1 MEAN UNIT1', 'PROP1 MEAN UNIT2', 'PROP1 MEAN UNIT3', 'PROP1 MEAN UNIT4', 'PROP1 MEAN UNIT5', 'PROP1 MEAN UNIT6', 'PROP2 MEAN UNIT7', 'PROP2 MEAN UNIT8', 'PROP2 MEAN UNIT3', 'PROP2 MEAN UNIT4', 'PROP2 MEAN UNIT11', 'PROP2 MEAN UNIT12', 'PROP2 MEAN UNIT1', 'PROP2 MEAN UNIT2']
and the first few records of your dataframe looks like this...
DATE DAYS PROP1 MEAN UNIT1 ... PROP2 MEAN UNIT1 PROP2 MEAN UNIT2
0 1/1/2020 677 ... 972 733
1 2/1/2020 515 ... 430 272
2 3/1/2020 253 ... 174 602
3 4/1/2020 872 ... 679 923
4 5/1/2020 93 ... 630 413
5 6/1/2020 696 ... 9 866
6 7/1/2020 810 ... 345 596
The following lines of code should give you what you want:
cols = df.columns.tolist()
units = sorted(set(x[x.rfind('UNIT'):] for x in cols[1:]))
s_units = sorted(cols[1:],key = lambda x: x.split()[2])
for i in units:
unit_sublist = ['DATE DAYS'] + [j for j in s_units if j[-6:].strip() == i]
print ('df_' + i.lower())
print (df[unit_sublist])
I got the following:
df_unit1
DATE DAYS PROP1 MEAN UNIT1 PROP2 MEAN UNIT1
0 1/1/2020 677 972
1 2/1/2020 515 430
2 3/1/2020 253 174
3 4/1/2020 872 679
4 5/1/2020 93 630
5 6/1/2020 696 9
6 7/1/2020 810 345
df_unit11
DATE DAYS PROP2 MEAN UNIT11
0 1/1/2020 586
1 2/1/2020 123
2 3/1/2020 823
3 4/1/2020 52
4 5/1/2020 793
5 6/1/2020 143
6 7/1/2020 55
df_unit12
DATE DAYS PROP2 MEAN UNIT12
0 1/1/2020 576
1 2/1/2020 36
2 3/1/2020 822
3 4/1/2020 85
4 5/1/2020 962
5 6/1/2020 526
6 7/1/2020 920
df_unit2
DATE DAYS PROP1 MEAN UNIT2 PROP2 MEAN UNIT2
0 1/1/2020 92 733
1 2/1/2020 11 272
2 3/1/2020 295 602
3 4/1/2020 568 923
4 5/1/2020 58 413
5 6/1/2020 262 866
6 7/1/2020 58 596
df_unit3
DATE DAYS PROP1 MEAN UNIT3 PROP2 MEAN UNIT3
0 1/1/2020 342 69
1 2/1/2020 86 441
2 3/1/2020 644 680
3 4/1/2020 505 218
4 5/1/2020 864 206
5 6/1/2020 833 179
6 7/1/2020 579 186
df_unit4
DATE DAYS PROP1 MEAN UNIT4 PROP2 MEAN UNIT4
0 1/1/2020 432 621
1 2/1/2020 754 11
2 3/1/2020 401 729
3 4/1/2020 652 961
4 5/1/2020 682 500
5 6/1/2020 418 138
6 7/1/2020 244 425
df_unit5
DATE DAYS PROP1 MEAN UNIT5
0 1/1/2020 878
1 2/1/2020 219
2 3/1/2020 574
3 4/1/2020 366
4 5/1/2020 346
5 6/1/2020 876
6 7/1/2020 81
df_unit6
DATE DAYS PROP1 MEAN UNIT6
0 1/1/2020 831
1 2/1/2020 818
2 3/1/2020 184
3 4/1/2020 982
4 5/1/2020 19
5 6/1/2020 695
6 7/1/2020 858
df_unit7
DATE DAYS PROP2 MEAN UNIT7
0 1/1/2020 293
1 2/1/2020 822
2 3/1/2020 354
3 4/1/2020 159
4 5/1/2020 293
5 6/1/2020 900
6 7/1/2020 362
df_unit8
DATE DAYS PROP2 MEAN UNIT8
0 1/1/2020 88
1 2/1/2020 280
2 3/1/2020 12
3 4/1/2020 131
4 5/1/2020 19
5 6/1/2020 781
6 7/1/2020 440
Solution 2: Create column names based on first 3 rows in the source data
Let us assume the first 6 rows of your dataframe looks like this.
DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
Then you can write the below code to create the dataframe.
data = '''DATE PROP1 PROP1 PROP1 PROP1 PROP1 PROP1 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2 PROP2
DAYS MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN MEAN
UNIT1 UNIT2 UNIT3 UNIT4 UNIT5 UNIT6 UNIT7 UNIT8 UNIT3 UNIT4 UNIT11 UNIT12 UNIT1 UNIT2
4 1/1/2020 677 92 342 432 878 831 293 88 69 621 586 576 972 733
5 2/1/2020 515 11 86 754 219 818 822 280 441 11 123 36 430 272
6 3/1/2020 253 295 644 401 574 184 354 12 680 729 823 822 174 602
7 4/1/2020 872 568 505 652 366 982 159 131 218 961 52 85 679 923
8 5/1/2020 93 58 864 682 346 19 293 19 206 500 793 962 630 413
9 6/1/2020 696 262 833 418 876 695 900 781 179 138 143 526 9 866
10 7/1/2020 810 58 579 244 81 858 362 440 186 425 55 920 345 596
11 8/1/2020 834 609 618 214 547 834 301 875 783 216 834 609 550 274
12 9/1/2020 687 935 976 380 885 246 339 904 627 460 659 352 361 793
13 10/1/2020 596 300 810 248 475 718 350 574 825 804 245 209 212 925
14 11/1/2020 584 984 711 879 916 107 277 412 122 683 151 811 129 4
15 12/1/2020 616 515 101 743 650 526 475 991 796 227 880 692 734 799
16 1/1/2021 106 441 305 964 452 249 282 486 374 620 652 793 115 697
17 2/1/2021 969 504 936 678 67 42 985 791 709 689 520 503 102 731
18 3/1/2021 823 169 412 177 783 601 613 251 533 463 13 127 516 15
19 4/1/2021 348 588 140 966 143 576 419 611 128 830 68 209 952 935
20 5/1/2021 96 711 651 121 708 360 159 229 552 951 79 665 709 165
21 6/1/2021 805 657 729 629 249 547 581 583 236 828 636 248 412 535
22 7/1/2021 286 320 908 765 336 286 148 168 821 567 63 908 248 320
23 8/1/2021 707 975 565 699 47 712 700 439 497 106 288 105 872 158
24 9/1/2021 346 523 142 181 904 266 28 740 125 64 287 707 553 437
25 10/1/2021 245 42 773 591 492 512 846 487 983 180 372 306 785 691
26 11/1/2021 785 577 448 489 425 205 672 358 868 637 104 422 873 919'''
data_list = data.split('\n')
data_line1 = data_list[0].split()
data_line2 = data_list[1].split()
data_line3 = [''] + data_list[2].split()
data_header = [' '.join([data_line1[i],data_line2[i],data_line3[i]]) for i in range(len(data_line1))]
data_header[0] = data_header[0][:-1]
new_data= data_list[3:]
import pandas as pd
df = pd.DataFrame(data = None,columns=data_header)
for i in range(len(new_data)-1):
df.loc[i] = new_data[i].split()[1:]
print (df)
Here is what worked for me.
#Assign unique column names to the dataframe
df.columns = range(df.shape[1])
#Get all the unique units in the dataframe
unitName = pd.Series(pd.Series(df.loc[2,:]).unique(), name = "Unit Names")
#Convert them to a list to loop through
unitName = unitName.tolist()
for var in unitName:
#this looks for an exact match for the unit in row index 2 and
#extracts the entire column with the match
df_item = df[df.columns[df.iloc[3].str.fullmatch(var)]]
print (df_item)

sum of occurrences in bins

I'm looking at popularity of food stalls in a pop-up market:
Unnamed: 0 Shop1 Shop2 Shop3 ... shop27 shop28 shop29 shop30 shop31 shop32 shop33 shop34
0 0 484 516 484 ... 348 146 1445 1489 623 453 779 694
1 1 276 564 941 ... 1463 178 700 996 1151 364 111 1243
2 2 74 1093 961 ... 1260 1301 1151 663 1180 723 1477 1198
3 3 502 833 22 ... 349 1105 835 1 938 921 745 14
4 4 829 983 952 ... 568 1435 518 807 874 197 81 573
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
114 114 1 187 706 ... 587 1239 1413 850 1324 788 687 687
115 115 398 733 298 ... 864 981 100 80 1322 381 430 349
116 116 11 312 904 ... 34 508 850 1278 432 395 601 213
117 117 824 261 593 ... 1026 147 488 69 25 286 1229 1028
118 118 461 966 183 ... 850 817 1411 863 950 987 415 130
I then summarize the overall visits and split into bins (pd.cut(df.sum(axis=0),5,labels=['lowest','lower','medium','higher','highest'])):
Unnamed: 0 lowest
Shop1 medium
Shop2 medium
Shop3 lower
Shop4 lower
... ...
shop31 higher
shop32 medium
shop33 higher
shop34 higher
I then want to see popularity of each category over time, manual example:
6891-33086 33087-59151 59152-85216 85217-111281 111282-137346
0 0 1373 3546 13999 1238
How can I do this with python?

when changed sampling rate on librosa.load, how to changed librosa.onset.onset_strength?

I'm trying to extract tempo and beats from an audio file (sample rate:2000) with the below code:
data, sr = librosa.load(path, mono=True, sr=2000)
print ("self.sr :", sr)
onset_env = librosa.onset.onset_strength(data, sr=sr)
tempo, beats = librosa.beat.beat_track(data, sr=sr, onset_envelope=onset_env)
print ("tempo :", tempo)
beats = librosa.frames_to_time(beats, self.sr)
print ("beats :", beats)
i changed only sample rate
but. output is weird
/usr/lib/python3.6/site-packages/librosa/filters.py:284: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
warnings.warn('Empty filters detected in mel frequency basis. '
/usr/lib64/python3.6/site-packages/scipy/fftpack/basic.py:160: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
z[index] = x
tempo : 117.1875
beats : [ 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74
76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110
112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 146
148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182
184 186 188 190 192 194 196 198 200 202 204 206 208 210 212 214 216 218
220 222 224 226 228 230 232 234 236 238 240 242 244 246 248 250 252 254
256 258 260 262 264 266 268 270 272 274 276 278 280 282 284 286 288 290
292 294 296 298 300 302 304 306 308 310 312 314 316 318 320 322 324 326
328 330 332 334 336 338 340 342 344 346 348 350 352 354 356 358 360 362
364 366 368 370 372 374 376 378 380 382 384 386 388 390 392 394 396 398
400 402 404 406 408 410 412 414 416 418 420 422 424 426 428 430 432 434
436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466]
so, i removed sr parameter and run below code :
data, sr = librosa.load(path, mono=True)
print ("self.sr :", sr)
onset_env = librosa.onset.onset_strength(data, sr=sr)
tempo, beats = librosa.beat.beat_track(data, sr=sr, onset_envelope=onset_env)
print ("tempo :", tempo)
beats = librosa.frames_to_time(beats, self.sr)
print ("beats :", beats)
here is removed sr output
self.sr : 22050
/usr/lib64/python3.6/site-packages/scipy/fftpack/basic.py:160: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
z[index] = x
tempo : 161.4990234375
beats : [ 7 23 39 55 71 87 102 118 134 150 166 182 197 213
228 244 260 276 292 307 323 339 355 371 387 404 420 438
454 470 486 501 517 533 549 565 581 596 612 628 644 659
675 691 706 722 738 754 770 786 801 817 833 850 868 884
900 916 932 948 964 980 996 1011 1027 1043 1059 1074 1090 1106
1121 1137 1153 1168 1184 1201 1216 1232 1248 1264 1279 1293 1312 1331
1347 1363 1379 1394 1410 1426 1442 1458 1474 1489 1505 1520 1536 1552
1568 1584 1599 1615 1631 1647 1663 1679 1696 1712 1730 1746 1762 1778
1793 1809 1825 1841 1857 1873 1888 1904 1920 1936 1951 1967 1983 1998
2014 2030 2046 2062 2078 2093 2109 2125 2142 2160 2176 2192 2208 2224
2240 2256 2272 2288 2303 2319 2335 2351 2366 2382 2398 2413 2429 2445
2460 2476 2492 2508 2524 2540 2556 2571 2585 2604 2623 2639 2655 2671
2686 2702 2718 2734 2750 2766 2781 2797 2812 2828 2844 2860 2876 2891
2907 2923 2939 2955 2971 2988 3004 3022 3038 3054 3070 3085 3101 3117
3133 3149 3165 3180 3196 3212 3228 3243 3259 3275 3290 3306 3322 3338
3354 3370 3385 3401 3417 3434 3452 3468 3484 3500 3516 3532 3548 3564
3580 3595 3611 3627 3643 3658 3674 3690 3705 3721 3737 3752 3768 3784
3800 3816 3832 3848 3863 3877 3896 3915 3931 3947 3963 3978 3994 4010
4026 4042 4058 4073 4089 4104 4120 4136 4152 4168 4183 4199 4215 4231
4247 4263 4280 4296 4314 4330 4346 4362 4377 4393 4409 4425 4441 4457
4472 4488 4504 4520 4535 4551 4567 4582 4598 4614 4630 4646 4662 4677
4693 4709 4726 4744 4760 4776 4792 4808 4824 4840 4856 4872 4887 4903
4919 4935 4950 4966 4982 4997 5013 5029 5044 5060 5076 5092 5108 5124]
How can I make the work properly when I change sr?
thank you
When calling
data, sr = librosa.load(path, mono=True, sr=2000)
you are asking librosa to resample whatever your input is to 2000 Hz (see docs: "target sampling rate"). 2000 Hz is a highly unusual sampling frequency for music and it's likely IMHO that a bunch of the algorithms in librosa will not work properly with it. Instead, typical rates are 44.1 kHz (CD quality) or 22050 Hz (the librosa default).
I assume that the beat tracker is trying to split your data into mel bands, and then process those bands individually, perhaps with some novelty curve or onset signal function, but 2 kHz is just not a whole lot to work with, which is probably why you see that empty filter message. But if the result (for the sr=2000) is correct, you could simply ignore the warning.
However, it seems like a safer bet to me to simply not set sr, let librosa resample your audio (whatever it is) to 22050 Hz and then run the beat tracking algorithm on it. 22050 Hz is the kind of sampling rate it was most likely developed on tested on and it is most likely able to succeed on.
Regarding:
/usr/lib64/python3.6/site-packages/scipy/fftpack/basic.py:160: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
This looks like a warning tied to how librosa implemented something. You should be able to ignore it without consequence.

how to reshape in pandas dataframe

Dataframe looks like below
날짜 역번호 역명 구분 a b c d e f ... k l m n o p q r s t
2008-01-01 150 서울역(150) 승차 379 287 371 876 965 1389 ... 2520 3078 3495 3055 2952 2726 3307 2584 1059 264
2008-01-01 150 서울역(150) 하차 145 707 689 1037 1170 1376 ... 1955 2304 2203 2128 1747 1593 1078 744 406 558
2008-01-01 151 시청(151) 승차 131 131 101 152 191 202 ... 892 900 1154 1706 1444 1267 928 531 233 974
2008-01-01 151 시청(151) 하차 35 158 203 393 375 460 ... 1157 1153 1303 1190 830 454 284 141 107 185
2008-01-01 152 종각(152) 승차 1287 867 400 330 345 338 ... 1867 2269 2777 2834 2646 2784 2920 2290 802 1559
I have dataframe like above. which I want to a~t reshape (a~t, 1)
I want to reshape dataframe like below
날짜 역번호 역명 구분 a
2018-01-01 150 서울역 승차 379
2018-01-01 150 서울역 승차 287
2018-01-01 150 서울역 승차 371
2018-01-01 150 서울역 승차 876
2018-01-01 150 서울역 승차 965
....
2008-01-01 152 종각 승차 802
2008-01-01 152 종각 승차 1559
like df = df.reshape(len(data2)*a~t, 1)
how can i do this ??
# A sample dataframe with 5 columns
df = pd.DataFrame(np.random.randn(100,5))
# Firsts 0, 1 will be retained and rest of the columns will be made as row
# with their corresponding value. Finally we drop the variable axis
df = df.melt([0,1],value_name='A').drop('variable', axis=1)
is converted to

Categories