Matplotlib boxplot not showing the expected output - python
Why my boxplot is not showing the expected output? I can see only circles and instead I'll want to see a traditional boxplot. How can I fix it?
import matplotlib as plt
collection_0 = [826, 58, 305, 161, 341, 25, 50, 1303, 1241, 406, 4318, 14330, 62, 45, 17, 809, 2560, 2901, 1988, 1755, 2584, 1924, 218, 13, 140, 156, 591, 109, 17, 563, 242, 23, 156, 179, 85, 59, 78, 55, 57, 27, 33, 62, 499, 685, 1418, 70, 155, 388, 205, 62, 22, 358, 688, 273, 27, 107, 85, 856, 375, 144, 476, 161, 33, 1748, 315, 106, 347, 85, 43, 157, 770, 616, 220, 13, 170, 156, 200, 165, 1211, 138, 163, 61, 78, 140, 318, 1296, 14, 386, 19, 918, 193, 381, 178, 106, 91, 109, 261, 72, 436, 194, 176, 237, 28, 201, 36, 166, 1928, 358, 611, 58, 82, 59, 37, 269, 223, 836, 45, 425, 166, 26, 63, 387, 270, 180, 331, 342, 629, 610, 46, 67, 151, 57, 188, 70, 96, 41, 92, 79, 26, 56, 188, 466, 214, 45, 39, 161, 70, 134, 370, 70, 401, 85, 113, 224, 60, 508, 58, 71, 49, 56, 400, 1308, 22, 124, 74, 63, 56, 84, 144, 26, 29, 33, 20, 241, 25, 17, 25, 45, 37, 100, 93, 175, 27, 308, 134, 28, 203, 195, 161, 168, 364, 102, 66, 53, 57, 195, 30, 55, 108, 110, 75, 42, 531, 25, 17, 156, 24, 29, 303, 77, 36, 184, 67, 15, 92, 124, 206, 51, 87, 83, 23, 134, 64, 50, 99, 451, 144, 265, 228, 96, 357, 39, 14, 91, 46, 110, 75, 18, 30, 93, 61, 31, 203, 226, 92, 162, 415, 30, 48, 86, 51, 79, 130, 181, 17, 64, 57, 168, 153, 72, 57, 34, 234, 18, 30, 72, 98, 44, 114, 58, 23, 54, 24, 126, 37, 28, 73, 8, 38, 86, 214, 46, 34, 63, 79, 72, 111, 37, 499, 382, 76, 589, 72, 139, 108, 301, 63, 158, 17, 12, 103, 337, 65, 17, 56, 32, 27, 14, 224, 33, 40, 55, 60, 76, 18, 24, 56, 99, 135, 23, 50, 102, 74, 114, 29, 24, 50, 84, 33, 316, 52, 38, 112, 61, 10, 22, 17, 71, 22, 99, 51, 84, 34, 32, 18, 91, 240, 29, 141, 121, 67, 40, 303, 78, 86, 48, 149, 102, 57, 42, 88, 137, 133, 89, 88, 70, 31, 24, 73, 7, 53, 46, 156, 17, 133, 85, 103, 70, 26, 145, 26, 112, 81, 37, 27, 98, 14, 84, 26, 31, 43, 42, 19, 38, 32, 35, 92, 168, 53, 175, 25, 30, 48, 84, 98, 57, 62, 32, 38, 75, 11, 33, 29, 38, 48, 52, 244, 303, 135, 10, 52, 12, 43, 78, 34, 50, 51, 49, 68, 68, 53, 18, 50, 64, 17, 27, 17, 21, 12, 46, 29, 35, 31, 93, 93, 25, 20, 18, 18, 43, 61, 29, 16, 40, 28, 26, 15, 30, 41, 67, 75, 53, 64, 105, 15, 35, 41, 22, 54, 20, 38, 31, 21, 105, 23, 37, 12, 29, 38, 16, 16, 21, 57, 66, 83, 44, 43, 14, 28, 48, 51, 17, 21, 16, 7, 34, 50, 23, 14, 18, 23, 32, 91, 29, 31, 23, 9, 14, 17, 15, 43, 16, 17, 20, 11, 16, 7, 13, 11, 49, 42, 13, 23, 18, 28, 38, 23, 10, 32, 9, 34, 16, 18, 9, 23, 16, 12, 65, 31, 37, 16, 9, 34, 8, 12, 22, 55, 17, 30, 13, 25, 27, 14, 7, 78, 19, 11, 41, 54, 22, 27, 8, 18, 22, 6, 29, 16, 35, 27, 8, 10, 7, 51, 9, 23, 12, 9, 6, 15, 16, 8, 7, 14, 12, 10, 14, 17, 10, 13, 18, 8, 7, 9, 10, 10]
collection_1 = [1353, 25, 2430, 1995, 1209, 1291, 564, 68, 1184, 81, 132, 140, 1463, 258, 143, 338, 63, 38, 144, 534, 130, 2742, 392, 157, 301, 193, 620, 2303, 2269, 84, 1464, 148, 593, 191, 102, 1194, 211, 11, 2498, 359, 808, 552, 96, 334, 238, 46, 1771, 536, 160, 195, 318, 193, 684, 280, 249, 19, 235, 15, 144, 2030, 104, 619, 523, 106, 902, 31, 13, 55, 9, 21, 68, 51, 45, 92, 41, 432, 436, 137, 81, 57, 210, 254, 34, 28, 301, 72, 134, 409, 30, 53, 112, 106, 267, 33, 57, 35, 18, 143, 52, 45, 36, 183, 43, 66, 40, 100, 194, 139, 18, 280, 262, 62, 331, 196, 604, 56, 43, 181, 82, 171, 57, 22, 34, 52, 46, 260, 125, 50, 46, 23, 69, 83, 28, 219, 94, 32, 82, 31, 200, 20, 78, 725, 225, 107, 58, 59, 31, 44, 18, 136, 180, 74, 20, 44, 28, 90, 69, 48, 47, 50, 74, 18, 50, 20, 75, 127, 19, 80, 23, 163, 30, 103, 27, 10, 37, 37, 44, 41, 46, 49, 48, 55, 14, 19, 42, 79, 50, 45, 36, 15, 45, 128, 122, 46, 38, 21, 21, 81, 24, 12, 15, 53, 9, 26, 43, 23, 16, 79, 9, 45, 146, 58, 17, 30, 13, 8, 17, 24, 56, 6, 12, 17, 9, 15, 11, 13, 13, 12, 14, 21, 10, 8, 15, 8, 28, 8, 11, 24, 9, 13, 30, 14, 15, 7, 9, 25, 7, 8, 10, 5, 7, 7, 6, 7, 6, 7, 8, 9]
data_to_plot = [collection_0, collection_1]
box = plt.boxplot(data_to_plot,patch_artist=True, labels=["Contracting", "Expanding"])
colors = ['red', 'green']
for patch, color in zip(box['boxes'], colors):
patch.set_facecolor(color)
plt.ylabel("Unique adopters")
plt.show()
Your data varies over 4 orders of magnitude with a majority of data lying close to less than 1000. The mean of your data is around 170 and so the whole box plot appears compressed due to the huge outlier value of above 14000. You can see this via a histogram
plt.hist(collection_0);
You should try using a log scale for your expected visualization
plt.yscale('log')
Related
Permutations on lists within lists sequentially
I have two lists and I would like to calculate the permutations between the two. I have been able to successfully do this using itertools, but am having trouble taking it further. I have two nested lists: list_1 = [0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125][10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125] list_2 = [10, 9, 2, 1, 0][10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125] The first entry of list_1 ([0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125]) needs to be permutated with the first entry of list_2 ([10, 9, 2, 1, 0]). Then I need to get the permutations of the second entry of list_1 with the second entry of list_2, etc. The issue is that there will be no set number of entries in each list, so it is not feasible to simply make variables for list_1[0], list_2[0], etc. What would be the simplest way to do this?
import itertools list_1 = ([0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125], [10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125]) list_2 = ([10, 9, 2, 1, 0], [10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125]) count = 0 for list1_item, list2_item in zip(list_1, list_2): print(f"{list1_item=} {list2_item=}") for permutation in itertools.permutations(itertools.chain(list1_item, list2_item)): if count % 10**8 == 0: # print once in a while print(permutation) count += 1 print(count) print(f"last permutation: {permutation}") gives list1_item=[0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125] list2_item=[10, 9, 2, 1, 0] (0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 16, 114, 210, 22, 87, 28, 125, 10, 9, 2, 1, 0) (0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 210, 125, 10, 9, 114, 22, 0, 87, 2, 1, 16, 28) (0, 226, 68, 100, 70, 71, 42, 43, 44, 14, 28, 16, 210, 22, 125, 9, 1, 2, 87, 10, 114, 0) ... list1_item=[10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125] list2_item=[10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125] (10, 216, 67, 120, 70, 717, 42, 43, 445, 14, 87, 289, 125, 10, 216, 7, 10, 70, 717, 42, 3, 445, 14, 162, 87, 289, 125) ...
Remove noise using logarithmic binning
I'm interested in plotting the probability distribution of a set of points which are distributed as a power law. Further, I would like to use logarithmic binning to be able to smooth out the large fluctuations, especially those observed in the tail. I made the code: plt.figure() plt.grid(True) plt.loglog(x, y, 'bo') plt.savefig('distribution.png', dpi=400) plt.show() plt.close() Where x and y are lists with the data. I know I should use numpy.logspace, but I'm not sure how to do it. I attach the lists and image of the graph: Graphic: 1 x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 44, 45, 46, 48, 50, 53, 54, 55, 56, 57, 58, 59, 63, 64, 66, 71, 72, 73, 76, 79, 81, 84, 85, 86, 90, 95, 97, 99, 100, 101, 103, 105, 114, 117, 118, 120, 122, 129, 141, 159, 166, 168, 172, 199, 201, 206, 218, 226, 243, 260, 262, 263, 265, 273, 274, 278, 281, 292, 300, 390, 404, 420, 443, 491, 849, 939, 1036, 1156, 1191, 1389, 1551, 1742, 2082] y=[0.0, 0.3508771929824561, 0.4259259259259261, 0.4400278940027895, 0.439337474120083, 0.43933333333333335, 0.4165445665445665, 0.4361247947454843, 0.4325877825877826, 0.4820728291316526, 0.42828042328042315, 0.35761299632267374, 0.3491461529923068, 0.4079423222280365, 0.43694194694194693, 0.34069215098626865, 0.3449795896319961, 0.3633688071188071, 0.30852671293847767, 0.4242381075714409, 0.20068791049183207, 0.24466260863319686, 0.12237645395540135, 0.37624875124875123, 0.28918557997841887, 0.25374977395437753, 0.4761346678013344, 0.41219336219336217, 0.19267411510058569, 0.30895915678524377, 0.18104998922645982, 0.2407892107892108, 0.23937740965604742, 0.3727204759813455, 0.23712669683257917, 0.2567023619655199, 0.33474793703626654, 0.3520767731294047, 0.2475947884643537, 0.3738888888888889, 0.5274725274725275, 0.33489003749873314, 0.18518518518518517, 0.15181358496575886, 0.3152953084067635, 0.17919413919413918, 0.20858299108299105, 0.21746880570409982, 0.1915602105707053, 0.2972972972972973, 0.18115942028985507, 0.25, 0.32707722385141735, 0.33894302848575714, 0.21774193548387097, 0.34782608695652173, 0.27608756290137165, 0.17296320127462694, 0.2727272727272727, 0.2879728132387707, 0.06535947712418301, 0.083710407239819, 0.28118393234672306, 0.1951219512195122, 0.09254361251031618, 0.3062211259885678, 0.002663622526636225, 0.27311522048364156, 0.0506558118498417, 0.1044776119402985, 0.06284153005464481, 0.18588399720475193, 0.2129032258064516, 0.14903846153846154, 0.021532091097308487, 0.3089430894308943, 0.301010101010101, 0.3761904761904762, 0.10466269841269842, 0.07138047138047138, 0.21709633649932158, 0.019401589527816735, 0.017575757575757574, 0.15817805383022773, 0.025306629405371837, 0.20850040096230954, 0.0001638001638001638, 0.04357084357084357, 0.09221213569039656, 0.14047410008779632, 0.002560163850486431, 0.0031680440771349864, 0.12334152334152335, 0.6428571428571429, 0.012745098039215686, 0.0058073399287151255, 0.0012413644214162348, 0.013532269257460098, 0.04368752313957793, 0.20265151515151514, 0.0018470281790196543, 0.023099982366425676, 0.03265807243707796, 0.00695970695970696, 0.003737745098039216, 0.009634076615208691, 0.024085079762277136, 0.0062196422224854876, 0.030849549121974372, 0.01636020744931636, 0.003922512815882666, 0.005677708965459911, 0.04833570605382686, 0.014331723027375202]
Combine numpy subarrays of varying dimensions
I have a nested numpy array (dtype=object), it contains 333 arrays that increase consistently from size 52x1 to size 52x333 I would like to effectively extract and concatenate these arrays so that I have a single 52x55611 array I imagine this may be straightforward but my attempts using numpy.reshape have been unsuccesful
If you want to stack them along the second axis, you can use numpy.hstack. list_of_arrays = [ array_1, ..., array_n] #all these arrays have same shape[0] big_array = np.hstack( list_of_arrays)
if I have understood you correctly, you could use numpy.concatenate. >>> import numpy as np >>> a = np.array([range(52)]) >>> b = np.array([range(52,104), range(104, 156)]) >>> np.concatenate((a,b)) array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51], [ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], [104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155]]) >>>
Doing operations with only Nth element of a list
I want to multiply only every second number on a list(0-100) but I just can't get it to work.
[x*10 for x in range(100) if x%2==0] # take vale to be multiply instead of 10, use x if you want to multiply with number it self below code will only multiply even elements and keep other as it is. def iterate_lis(get_list): ls = [] for x in get_list: if x%2==0: ls.append(x*2) else: ls.append(x) print(ls) return ls iterate_count = 5 # list will be iterate 5 times for i in range(iterate_count): if i ==0: get_lis = iterate_lis(range(100)) else: get_lis = iterate_lis(get_lis) result for iterate_count=5 will be as follow: >>> [0, 1, 4, 3, 8, 5, 12, 7, 16, 9, 20, 11, 24, 13, 28, 15, 32, 17, 36, 19, 40, 21, 44, 23, 48, 25, 52, 27, 56, 29, 60, 31, 64, 33, 68, 35, 72, 37, 76, 39, 80, 41, 84, 43, 88, 45, 92, 47, 96, 49, 100, 51, 104, 53, 108, 55, 112, 57, 116, 59, 120, 61, 124, 63, 128, 65, 132, 67, 136, 69, 140, 71, 144, 73, 148, 75, 152, 77, 156, 79, 160, 81, 164, 83, 168, 85, 172, 87, 176, 89, 180, 91, 184, 93, 188, 95, 192, 97, 196, 99] [0, 1, 8, 3, 16, 5, 24, 7, 32, 9, 40, 11, 48, 13, 56, 15, 64, 17, 72, 19, 80, 21, 88, 23, 96, 25, 104, 27, 112, 29, 120, 31, 128, 33, 136, 35, 144, 37, 152, 39, 160, 41, 168, 43, 176, 45, 184, 47, 192, 49, 200, 51, 208, 53, 216, 55, 224, 57, 232, 59, 240, 61, 248, 63, 256, 65, 264, 67, 272, 69, 280, 71, 288, 73, 296, 75, 304, 77, 312, 79, 320, 81, 328, 83, 336, 85, 344, 87, 352, 89, 360, 91, 368, 93, 376, 95, 384, 97, 392, 99] [0, 1, 16, 3, 32, 5, 48, 7, 64, 9, 80, 11, 96, 13, 112, 15, 128, 17, 144, 19, 160, 21, 176, 23, 192, 25, 208, 27, 224, 29, 240, 31, 256, 33, 272, 35, 288, 37, 304, 39, 320, 41, 336, 43, 352, 45, 368, 47, 384, 49, 400, 51, 416, 53, 432, 55, 448, 57, 464, 59, 480, 61, 496, 63, 512, 65, 528, 67, 544, 69, 560, 71, 576, 73, 592, 75, 608, 77, 624, 79, 640, 81, 656, 83, 672, 85, 688, 87, 704, 89, 720, 91, 736, 93, 752, 95, 768, 97, 784, 99] [0, 1, 32, 3, 64, 5, 96, 7, 128, 9, 160, 11, 192, 13, 224, 15, 256, 17, 288, 19, 320, 21, 352, 23, 384, 25, 416, 27, 448, 29, 480, 31, 512, 33, 544, 35, 576, 37, 608, 39, 640, 41, 672, 43, 704, 45, 736, 47, 768, 49, 800, 51, 832, 53, 864, 55, 896, 57, 928, 59, 960, 61, 992, 63, 1024, 65, 1056, 67, 1088, 69, 1120, 71, 1152, 73, 1184, 75, 1216, 77, 1248, 79, 1280, 81, 1312, 83, 1344, 85, 1376, 87, 1408, 89, 1440, 91, 1472, 93, 1504, 95, 1536, 97, 1568, 99] [0, 1, 64, 3, 128, 5, 192, 7, 256, 9, 320, 11, 384, 13, 448, 15, 512, 17, 576, 19, 640, 21, 704, 23, 768, 25, 832, 27, 896, 29, 960, 31, 1024, 33, 1088, 35, 1152, 37, 1216, 39, 1280, 41, 1344, 43, 1408, 45, 1472, 47, 1536, 49, 1600, 51, 1664, 53, 1728, 55, 1792, 57, 1856, 59, 1920, 61, 1984, 63, 2048, 65, 2112, 67, 2176, 69, 2240, 71, 2304, 73, 2368, 75, 2432, 77, 2496, 79, 2560, 81, 2624, 83, 2688, 85, 2752, 87, 2816, 89, 2880, 91, 2944, 93, 3008, 95, 3072, 97, 3136, 99]
You can`t multiply with 0 , because the result will be always 0. result=1 for i in range (1 , 10 ): if i%2==0: result*=i print(result)
import numpy as np l = range(100) np.product(l[0::2]) This will give you every second element of your list and multiply all.
I want to change to list so: 1,2,3,4,5,6,7,8 becomes 1,4,3,8,5,16 etc. Though I don't understand how you multiply 6 to end up with 16, I assume this is what you need: new_list = [] for x in range(1,100): if x % 2 == 0: new_list.append(x*2) else: new_list.append(x) print(new_list) If the number is divisible by 2, you multiply it with 2 and append it to a new list. If not, you just append it without multiplying. Running this program, you get the following output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20, 11, 24, 13, 28, 15, 32, 17, 36, 19, 40, 21, 44, 23, 48, 25, 52, 27, 56, 29, 60, 31, 64, 33, 68, 35, 72, 37, 76, 39, 80, 41, 84, 43, 88, 45, 92, 47, 96, 49, 100, 51, 104, 53, 108, 55, 112, 57, 116, 59, 120, 61, 124, 63, 128, 65, 132, 67, 136, 69, 140, 71, 144, 73, 148, 75, 152, 77, 156, 79, 160, 81, 164, 83, 168, 85, 172, 87, 176, 89, 180, 91, 184, 93, 188, 95, 192, 97, 196, 99]
Use range() to generate the indexes of the entries you want to change... numbers = [1, 2, 3, 4, 5, 6, 7, 8] for i in range(1, len(numbers), 2): numbers[i] *= 2 will result in numbers containing the list [1, 4, 3, 8, 5, 12, 7, 16]
Fitting a distribution to a histogram
I have some data which I've plotted using sns.distplot The x axis is the number of days a certain player is injured for: I want to get the cdf for this distribution. It does not have to be extremely accurate but I want to run some simulations and effectively I want to have the distribution for this data, call it X. Then I want to run X.rvs(1000) which would give me an array of 1000 random numbers which represent the number of days players are injured. e.g. if it returns array(2,35,140,3,4,6,7,23,55,63,...,87) Those should represent the number of days players are injured. Really not sure how to do this as all I have to go on is the data behind this histogram which is simply plotted with sns.distplot(data,kde=True) Hope someone can help data: data = pd.DataFrame([78, 58, 124, 62, 30, 46, 31, 34, 94, 15, 41, 18, 63, 15, 63, 31, 35, 23, 19, 19, 47, 154, 113, 29, 35, 58, 62, 93, 93, 93, 37, 31, 16, 17, 16, 17, 62, 31, 145, 116, 183, 183, 183, 93, 148, 183, 13, 160, 183, 183, 68, 15, 183, 57, 91, 183, 86, 133, 20, 183, 89, 183, 43, 30, 183, 183, 136, 183, 183, 12, 183, 60, 161, 67, 183, 40, 121, 52, 58, 183, 183, 9, 151, 183, 183, 183, 116, 9, 95, 183, 27, 16, 183, 52, 167, 12, 183, 183, 94, 65, 183, 30, 183, 19, 14, 183, 54, 37, 183, 152, 33, 22, 67, 183, 40, 17, 183, 50, 7, 183, 106, 72, 183, 183, 22, 80, 183, 183, 58, 183, 183, 183, 183, 183, 15, 183, 183, 183, 183, 127, 156, 183, 26, 183, 183, 59, 9, 183, 183, 55, 183, 183, 183, 28, 18, 51, 18, 11, 18, 26, 77, 65, 61, 19, 61, 61, 61, 30, 30, 30, 182, 54, 182, 22, 121, 26, 64, 91, 91, 15, 18, 60, 17, 16, 60, 29, 15, 31, 181, 15, 16, 120, 24, 26, 30, 28, 90, 28, 90, 27, 25, 27, 32, 28, 28, 28, 28, 180, 150, 28, 47, 51, 60, 25, 43, 9, 16, 24, 89, 15, 13, 58, 106, 16, 59, 29, 19, 22, 16, 16, 51, 52, 33, 26, 178, 148, 148, 42, 72, 28, 86, 17, 17, 56, 18, 25, 17, 28, 41, 37, 15, 81, 25, 147, 147, 8, 36, 32, 18, 37, 42, 23, 86, 38, 36, 55, 24, 15, 60, 54, 41, 18, 15, 17, 31, 146, 115, 64, 115, 25, 32, 85, 85, 38, 15, 23, 175, 175, 175, 84, 145, 18, 22, 38, 35, 21, 22, 53, 43, 22, 21, 37, 15, 19, 49, 25, 52, 28, 21, 21, 15, 172, 11, 109, 20, 22, 32, 29, 168, 168, 15, 28, 166, 16, 31, 86, 165, 15, 25, 48, 163, 56, 15, 162, 38, 19, 17, 40, 95, 160, 56, 56, 27, 48, 158, 32, 157, 157, 157, 15, 27, 37, 16, 46, 141, 141, 15, 31, 19, 52, 43, 15, 51, 39, 74, 119, 23, 15, 134, 43, 17, 15, 54, 33, 79, 133, 133, 21, 72, 17, 118, 26, 24, 15, 106, 30, 15, 53, 16, 21, 127, 127, 21, 126, 126, 33, 112, 17, 93, 33, 21, 17, 43, 36, 15, 44, 53, 110, 17, 17, 109, 17, 47, 38, 12, 16, 33, 62, 16, 77, 78, 88, 52, 69, 24, 63, 104, 28, 104, 15, 101, 59, 42, 99, 68, 74, 41, 33, 97, 96, 96, 20, 29, 30, 58, 41, 15, 95, 15, 33, 34, 25, 78, 51, 77, 15, 74, 76, 27, 76, 76, 31, 20, 47, 75, 35, 15, 74, 60, 73, 72, 39, 45, 35, 39, 70, 70, 38, 44, 51, 15, 17, 68, 68, 37, 18, 15, 15, 66, 49, 20, 65, 64, 64, 24, 43, 42, 23, 19, 20, 50, 49, 20, 49, 18, 41, 45, 15, 47, 15, 20, 15, 47, 47, 46, 15, 15, 45, 15, 16, 7, 44, 13, 32, 21, 17, 17, 25, 16, 26, 31, 41, 40, 19, 18, 39, 38, 15, 15, 37, 17, 20, 35, 35, 17, 33, 20, 20, 20, 19, 18, 17, 17, 16, 15, 15, 11, 11, 10, 7, 6, 5, 41, 15, 33, 28, 59, 182, 28, 15, 20, 49, 161, 157, 22, 8, 56, 33, 182, 26, 54, 46, 23, 27, 153, 28, 28, 29, 21, 45, 30, 60, 6, 182, 17, 83, 16, 22, 120, 30, 17, 20, 17, 19, 15, 67, 20, 9, 172, 182, 182, 76, 88, 55, 161, 154, 182, 25, 66, 16, 18, 38, 15, 141, 182, 19, 10, 23, 27, 145, 179, 46, 67, 84, 18, 17, 50, 32, 46, 16, 4, 62, 29, 47, 33, 16, 20, 141, 77, 30, 47, 77, 15, 62, 88, 50, 19, 45, 142, 22, 42, 12, 33, 60, 17, 26, 7, 12, 182, 33, 182, 15, 18, 85, 182, 31, 75, 18, 98, 23, 37, 39, 104, 182, 30, 51, 149, 47, 172, 39, 21, 43, 26, 24, 25, 56, 27, 24, 158, 38, 26, 66, 40, 38, 63, 8, 48, 10, 131, 16, 20, 14, 38, 49, 58, 130, 39, 110, 136, 40, 67, 63, 30, 27, 41, 33, 174, 34, 15, 19, 102, 28, 22, 47, 10, 18, 28, 69, 37, 16, 31, 27, 28, 32, 42, 81, 38, 26, 24, 15, 15, 37, 14, 149, 7, 64, 133, 99, 17, 39, 18, 11, 40, 26, 34, 134, 76, 13, 162, 39, 34, 41, 47, 182, 15, 36, 47, 80, 15, 15, 32, 16, 41, 182, 49, 27, 46, 48, 16, 38, 40, 35, 76, 15, 17, 39, 107, 143, 182, 15, 84, 19, 24, 87, 79, 16, 41, 20, 42, 74, 23, 8, 73, 21, 16, 93, 23, 171, 53, 93, 49, 15, 26, 23, 34, 167, 18, 90, 38, 12, 13, 15, 6, 180, 18, 36, 22, 59, 61, 91, 35, 19, 65, 110, 41, 91, 77, 125, 33, 93, 34, 15, 24, 32, 35, 88, 31, 27, 28, 182, 18, 16, 29, 50, 46, 182, 120, 33, 7, 117, 15, 11, 13, 182, 31, 41, 112, 110, 17, 69, 27, 41, 43, 40, 44, 37, 33, 8, 57, 106, 20, 22, 115, 31, 102, 39, 17, 50, 182, 9, 16, 32, 182, 50, 38, 15, 16, 31, 109, 26, 159, 182, 38, 16, 103, 32, 40, 106, 22, 105, 90, 78, 16, 88, 18, 65, 90, 38, 47, 36, 88, 61, 64, 52, 19, 46, 42, 27, 20, 147, 41, 15, 29, 26, 16, 19, 182, 38, 86, 34, 15, 13, 66, 34, 122, 182, 43, 41, 73, 41, 89, 23, 30, 53, 182, 7, 36, 90, 30, 127, 90, 43, 105, 36, 19, 158, 28, 41, 20, 29, 20, 150, 27, 23, 116, 67, 38, 20, 53, 36, 15, 15, 61, 91, 69, 48, 143, 15, 16, 20, 52, 17, 51, 86, 182, 40, 24, 111, 182, 56, 18, 40, 15, 63, 24, 34, 33, 35, 57, 15, 40, 50, 12, 17, 16, 182, 118, 23, 36, 98, 22, 156, 27, 124, 15, 61, 38, 40, 51, 18, 50, 43, 129, 182, 18, 91, 15, 30, 182, 31, 63, 31, 94, 31, 82, 34, 66, 42, 36, 42, 7, 20, 25, 26, 182, 58, 15, 115, 182, 15, 15, 87, 15, 93, 25, 66, 18, 16, 160, 91, 39, 47, 17, 54, 91, 20, 40, 40, 33, 105, 26, 28, 52, 56, 11, 52, 182, 23, 100, 15, 56, 9, 24, 145, 174, 55, 13, 39, 23, 9, 16, 182, 60, 81, 19, 182, 15, 98, 67, 7, 39, 15, 40, 182, 16, 9, 31, 8, 16, 29, 55, 53, 123, 43, 50, 28, 23, 18, 80, 15, 16, 35, 15, 98, 15, 36, 63, 23, 25, 20, 15, 63, 92, 34, 40, 152, 13, 51, 60, 36, 17, 145, 39, 24, 46, 9, 178, 21, 7, 26, 182, 22, 19, 182, 43, 71, 32, 15, 141, 50, 6, 15, 182, 11, 15, 74, 182, 19, 30, 30, 18, 25, 17, 15, 182, 38, 19, 15, 17, 77, 40, 92, 83, 16, 21, 142, 135, 19, 13, 53, 159, 39, 101, 34, 47, 17, 128, 36, 70, 74, 99, 11, 128, 48, 100, 15, 182, 28, 22, 182, 59, 12, 25, 36, 81, 21, 16, 15, 27, 57, 7, 93, 51, 37, 31, 17, 75, 41, 77, 182, 32, 24, 17, 54, 29, 23, 55, 40, 48, 15, 118, 150, 14, 65, 138, 27, 30, 46, 182, 15, 15, 98, 150, 182, 7, 182, 152, 24, 31, 154, 20, 18, 182, 7, 19, 33, 168, 29, 27, 41, 36, 24, 24, 24, 32, 33, 182, 15, 178, 55, 20, 35, 182, 85, 4, 44, 36, 15, 28, 159, 15, 16, 24, 15, 75, 76, 54, 43, 63, 59, 35, 22, 84, 32, 11, 17, 7, 30, 35, 18, 29, 182, 62, 37, 48, 31, 58, 38, 32, 19, 110, 14, 47, 20, 26, 15, 25, 34, 40, 43, 27, 27, 8, 26, 15, 182, 20, 58, 182, 7, 20, 20, 76, 32, 50, 174, 182, 113, 82, 15, 15, 57, 122, 5, 31, 32, 50, 15, 26, 15, 81, 15, 16, 6, 32, 39, 16, 162, 15, 94, 75, 182, 36, 21, 68, 33, 53, 182, 181, 80, 163, 115, 84, 150, 65, 15, 28, 70, 141, 39, 39, 40, 18, 99, 15, 93, 13, 56, 182, 162, 37, 66, 163, 15, 36, 43, 64, 15, 24, 15, 15, 21, 15, 36, 36, 23, 23, 151, 15, 18, 182, 39, 15, 34, 185, 40, 25, 182, 15, 15, 41, 18, 18, 56, 37, 32, 26, 36, 23, 17, 141, 21, 34, 18, 21, 45, 65, 98, 11, 21, 119, 34, 53, 59, 16, 15, 48, 110, 15, 33, 9, 102, 66, 60, 15, 64, 26, 59, 56, 31, 108, 17, 7, 71, 22, 19, 23, 41, 33, 16, 50, 74, 40, 15, 40, 114, 80, 71, 29, 19, 36, 15, 21, 24, 182, 19, 28, 60, 24, 56, 37, 25, 85, 78, 36, 15, 71, 54, 182, 155, 141, 2, 49, 15, 23, 131, 15, 66, 15, 22, 15, 66, 59, 51, 15, 64, 21, 182, 19, 20, 36, 55, 51, 44, 39, 16, 47, 41, 98, 127, 125, 24, 32, 182, 20, 104, 20, 48, 16, 12, 57, 55, 13, 32, 15, 52, 25, 15, 119, 18, 15, 15, 182, 53, 66, 24, 15, 172, 15, 120, 16, 45, 15, 32, 8, 22, 132, 31, 69, 13, 89, 40, 63, 53, 36, 96, 156, 39, 59, 24, 40, 24, 118, 109, 29, 21, 119, 120, 39, 36, 175, 15, 134, 26, 15, 15, 29, 26, 182, 36, 17, 71, 40, 31, 104, 43, 66, 45, 30, 37, 26, 74, 182, 35, 32, 15, 64, 152, 182, 52, 28, 182, 16, 76, 182, 8, 35, 134, 36, 15, 182, 126, 15, 15, 182, 15, 17, 15, 182, 35, 16, 39, 33, 125, 27, 29, 51, 52, 125, 63, 161, 24, 75, 52, 9, 109, 70, 100, 65, 48, 13, 182, 16, 6, 28, 16, 52, 23, 15, 111, 28, 47, 61, 61, 15, 20, 33, 166, 10, 20, 19, 31, 29, 20, 68, 138, 22, 56, 26, 15, 182, 17, 169, 33, 28, 49, 24, 20, 33, 38, 94, 15, 43, 16, 88, 15, 53, 18, 18, 50, 15, 15, 80, 53, 69, 34, 22, 105, 131, 31, 32, 32, 182, 12, 147, 157, 15, 24, 15, 30, 18, 16, 50, 63, 60, 70, 89, 19, 18, 38, 38, 167, 15, 125, 182, 178, 78, 84, 33, 41, 95, 44, 40, 17, 182, 64, 70, 68, 133, 79, 10, 60, 48, 29, 160, 20, 117, 42, 50, 128, 182, 51, 61, 35, 45, 33, 37, 35, 25, 20, 27, 15, 35, 182, 15, 15, 32, 87, 28, 19, 67, 15, 15, 43, 49, 15, 86, 6, 38, 5, 17, 77, 51, 15, 57, 30, 41, 20, 37, 75, 58, 149, 111, 51, 60, 64, 17, 99, 22, 182, 18, 15, 34, 71, 32, 182, 182, 50, 71, 17, 84, 35, 77, 29, 32, 31, 30, 13, 35, 105, 36, 60, 45, 62, 15, 88, 101, 15, 111, 85, 28, 23, 74, 61, 41, 129, 55, 42, 23, 182, 26, 16, 39, 33, 63, 16, 16, 118, 19, 34, 115, 15, 46, 20, 20, 182, 27, 28, 35, 98, 62, 56, 42, 175, 40, 86, 15, 154, 57, 15, 62, 30, 101, 58, 47, 15, 182, 182, 30, 17, 36, 82, 78, 21, 84, 24, 93, 15, 36, 19, 62, 12, 58, 48, 145, 16, 20, 20, 182, 18, 21, 10, 17, 58, 76, 25, 79, 36, 15, 54, 38, 30, 182, 30, 20, 131, 118, 18, 15, 29, 52, 15, 45, 22, 182, 24, 15, 15, 37, 33, 27, 60, 15, 27, 24, 8, 12, 15, 85, 21, 182, 37, 27, 49, 26, 7, 16, 14, 19, 33, 30, 4, 28, 60, 15, 162, 40, 39, 14, 22, 7, 47, 121, 40, 39, 15, 8, 30, 13, 150, 16, 7, 78, 30, 41, 26, 21, 65, 155, 54, 44, 85, 54, 72, 19, 21, 26, 127, 17, 4, 75, 23, 69, 44, 71, 102, 23, 21, 17, 61, 155, 27, 38, 27, 30, 107, 33, 24, 103, 61, 98, 6, 60, 22, 51, ]
Given the raw data it's fairly easy to provide a reasonable approximation. from scipy import stats p_data = [1, 1, 2, 4, 3, 8, 19, 11, 12, 7, 11, 12, 16, 7, 176, 62, 55, 44, 38, 48, 29, 28, 29, 35, 22, 32, 30, 36, 22, 33, 30, 30, 35, 20, 24, 33, 22, 28, 27, 32, 28, 13, 20, 9, 11, 12, 21, 13, 12, 18, 18, 16, 14, 12, 10, 15, 8, 14, 11, 19, 14, 11, 14, 12, 10, 12, 8, 7, 7, 7, 8, 5, 3, 10, 8, 10, 10, 8, 4, 6, 5, 3, 2, 8, 8, 8, 3, 8, 5, 7, 9, 2, 12, 5, 4, 3, 1, 9, 5, 3, 4, 4, 2, 5, 5, 5, 2, 1, 5, 6, 4, 2, 2, 1, 6, 3, 2, 6, 4, 5, 3, 2, 1, 2, 5, 3, 6, 3, 2, 1, 4, 1, 5, 4, 1, 2, 1, 2, 2, 2, 8, 2, 2, 4, 6, 1, 4, 3, 3, 6, 2, 4, 1, 4, 3, 3, 5, 3, 3, 4, 4, 5, 3, 2, 1, 2, 3, 3, 1, 1, 1, 4, 3, 3, 5, 4, 4, 4, 1, 148] thesum=sum(p_data) p_data = [_/thesum for _ in p_data] x_k = range(1, 1+len(p_data)) custom = stats.rv_discrete(name='custm', values=(x_k, p_data)) R = custom.rvs(size=1000) print (R[:20]) First I used Counter from itertools to count how many times each duration occurred. As might be expected, some durations do not occur in the data and were represented as zero counts. Bearing in mind the operative word 'approximate' I felt justified in replacing the few zero values with numbers that are close to the neighbouring values. The results are what you see in p_data. Then I normalise (to make values sum to one as probabilities). There was a dribble of values at the right end, from about 180 days on up. I put all this mass at 180 days. There was no mass at zero days. This explains why x_k runs from one through 180 inclusive. Finally I use the rv_discrete object whose rvs method proves to be reasonably fast.