Related
I have a dataset in xlsx with some discrete values(name:saoi) and i want to see which discrete distribution fitts better to them.
I made some Histograms:
[Full Histogram]
[Hist with values until 5000]
[Hist with values until 10000]
The code is this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_excel('dataset.xlsx', sheet_name=0)
aoi = df ["social AoI"]
saoi = pd.Series(aoi).array
saoi = np.around(saoi)
saoi = saoi.astype(int)
h = plt.hist(saoi)
plt.title('Hist of Social AoI')
plt.xlabel('Values')
plt.ylabel('Freq')
plt.axis([0,20000, 0, 200])
plt.show()
The values are these:
In [21]:saoi
Out[21]:
array([ 0, 13, 101, 106, 10, 22, 73, 30, 1,
54, 44, 2, 4, 52, 106, 70, 1, 11,
3, 50, 2, 9, 2, 28, 32, 15, 2,
42, 53, 16, 13, 70, 12, 91, 11, 43,
18, 53, 91, 9, 52, 9, 19, 27, 18,
53, 19, 242, 19, 22, 24, 53, 90, 82,
100, 62, 111, 20, 22, 8, 41, 134, 51,
72, 10, 1, 23, 3, 32, 1, 30, 18,
164, 10, 32, 35, 65, 79, 19, 21, 37,
20, 55, 32, 75, 489, 61, 111, 54, 46,
68, 53, 12, 7, 95, 43, 48, 11, 241,
7, 295, 284, 55, 69, 223, 4, 66, 278,
33, 22, 26, 197, 117, 242, 252, 29, 325,
289, 76, 28, 84, 21, 204, 74, 189, 11,
162, 85, 35, 510, 4, 135, 299, 211, 406,
149, 99, 2, 10, 1150, 427, 337, 16, 157,
620, 95, 257, 45, 368, 428, 108, 1041, 189,
32, 246, 38, 351, 578, 151, 240, 905, 309,
7, 8, 25, 226, 22, 50, 637, 74, 825,
152, 543, 1484, 893, 524, 866, 5, 236, 1608,
387, 1038, 83, 147, 2871, 6669, 2058, 577, 1634,
2522, 4915, 9, 298, 3074, 856, 29, 7164, 1641,
1270, 143, 508, 476, 2145, 1678, 2135, 86, 1085,
4106, 967, 266, 1302, 11875, 6011, 63, 1470, 2321,
9080, 19216])
I tried to fit some discrete distribution with this code(using likelihoods):
import pandas as pd
from scipy.stats import nbinom, poisson, geom, dlaplace, randint, yulesimon
import math
import numpy as np
x = pd.Series(saoi)
mean = x.mean()
var = x.var()
likelihoods = {}
#nbinom
p = mean / var
r = p * mean / (1-p)
likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()
#poisson
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()
#geometric
p = 1 / mean
likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()
#dlaplace
a = math.sqrt(var/2)
likelihoods['dlaplace'] = x.map(lambda val: dlaplace.pmf(val, a)).prod()
#randint
low = 0
high = 242
likelihoods['randint'] = x.map(lambda val: randint.pmf(val, low, high)).prod()
#yulesimon
p = mean / (mean-1)
likelihoods['yulesimon'] = x.map(lambda val: yulesimon.pmf(val, p)).prod()
best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])
but the results are bad as the likelihoods are 0.
How could i find a better fitting discrete distribution? There are many ways for continuous but what about the discrete ones?
I have some information, which varies according to the day (x-axis), I would like to adjust it in a better way. I have used scipy.optimize.curve_fit and it fits well with the following function but I would like to adjust it in some better way.
A polynomial form would not be useful or imprecise for me, since the Y-axis values are cumulative, so it is unlikely that the curve will drop.
Could someone give me a hand on what I could add to the function to make it fit better?
Here is the data I use, as well as the formula with the parameter values.
>>> X
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66])
>>> y
array([ 1, 1, 1, 1, 1, 1, 2, 2, 4, 15, 17, 38, 46,
53, 61, 75, 100, 111, 133, 142, 152, 159, 170, 174, 179, 182,
188, 190, 192, 196, 198, 199, 202, 205, 207, 214, 215, 216, 218,
224, 229, 231, 233, 236, 236, 236, 237, 237, 237, 237, 237, 237,
237, 238, 238, 238, 238, 238, 238, 238, 238, 238, 238, 238, 239,
243])
>>> def func(x,a,b):
return a*np.exp(b/x)
>>> popt,pcov=curve_fit(func,X,y)
>>> a=popt[0]
>>> b=popt[1]
>>> yvals=func(X,a,b)
>>> plot1=plt.plot(X,y,'*',label='original values')
>>> plot2=plt.plot(X,yvals,'r',label='curve_fit values')
>>> plt.show()
>>> popt
array([357.97373884, -20.60549425])
Curve Fit Plot
Thanks!
Why my boxplot is not showing the expected output? I can see only circles and instead I'll want to see a traditional boxplot. How can I fix it?
import matplotlib as plt
collection_0 = [826, 58, 305, 161, 341, 25, 50, 1303, 1241, 406, 4318, 14330, 62, 45, 17, 809, 2560, 2901, 1988, 1755, 2584, 1924, 218, 13, 140, 156, 591, 109, 17, 563, 242, 23, 156, 179, 85, 59, 78, 55, 57, 27, 33, 62, 499, 685, 1418, 70, 155, 388, 205, 62, 22, 358, 688, 273, 27, 107, 85, 856, 375, 144, 476, 161, 33, 1748, 315, 106, 347, 85, 43, 157, 770, 616, 220, 13, 170, 156, 200, 165, 1211, 138, 163, 61, 78, 140, 318, 1296, 14, 386, 19, 918, 193, 381, 178, 106, 91, 109, 261, 72, 436, 194, 176, 237, 28, 201, 36, 166, 1928, 358, 611, 58, 82, 59, 37, 269, 223, 836, 45, 425, 166, 26, 63, 387, 270, 180, 331, 342, 629, 610, 46, 67, 151, 57, 188, 70, 96, 41, 92, 79, 26, 56, 188, 466, 214, 45, 39, 161, 70, 134, 370, 70, 401, 85, 113, 224, 60, 508, 58, 71, 49, 56, 400, 1308, 22, 124, 74, 63, 56, 84, 144, 26, 29, 33, 20, 241, 25, 17, 25, 45, 37, 100, 93, 175, 27, 308, 134, 28, 203, 195, 161, 168, 364, 102, 66, 53, 57, 195, 30, 55, 108, 110, 75, 42, 531, 25, 17, 156, 24, 29, 303, 77, 36, 184, 67, 15, 92, 124, 206, 51, 87, 83, 23, 134, 64, 50, 99, 451, 144, 265, 228, 96, 357, 39, 14, 91, 46, 110, 75, 18, 30, 93, 61, 31, 203, 226, 92, 162, 415, 30, 48, 86, 51, 79, 130, 181, 17, 64, 57, 168, 153, 72, 57, 34, 234, 18, 30, 72, 98, 44, 114, 58, 23, 54, 24, 126, 37, 28, 73, 8, 38, 86, 214, 46, 34, 63, 79, 72, 111, 37, 499, 382, 76, 589, 72, 139, 108, 301, 63, 158, 17, 12, 103, 337, 65, 17, 56, 32, 27, 14, 224, 33, 40, 55, 60, 76, 18, 24, 56, 99, 135, 23, 50, 102, 74, 114, 29, 24, 50, 84, 33, 316, 52, 38, 112, 61, 10, 22, 17, 71, 22, 99, 51, 84, 34, 32, 18, 91, 240, 29, 141, 121, 67, 40, 303, 78, 86, 48, 149, 102, 57, 42, 88, 137, 133, 89, 88, 70, 31, 24, 73, 7, 53, 46, 156, 17, 133, 85, 103, 70, 26, 145, 26, 112, 81, 37, 27, 98, 14, 84, 26, 31, 43, 42, 19, 38, 32, 35, 92, 168, 53, 175, 25, 30, 48, 84, 98, 57, 62, 32, 38, 75, 11, 33, 29, 38, 48, 52, 244, 303, 135, 10, 52, 12, 43, 78, 34, 50, 51, 49, 68, 68, 53, 18, 50, 64, 17, 27, 17, 21, 12, 46, 29, 35, 31, 93, 93, 25, 20, 18, 18, 43, 61, 29, 16, 40, 28, 26, 15, 30, 41, 67, 75, 53, 64, 105, 15, 35, 41, 22, 54, 20, 38, 31, 21, 105, 23, 37, 12, 29, 38, 16, 16, 21, 57, 66, 83, 44, 43, 14, 28, 48, 51, 17, 21, 16, 7, 34, 50, 23, 14, 18, 23, 32, 91, 29, 31, 23, 9, 14, 17, 15, 43, 16, 17, 20, 11, 16, 7, 13, 11, 49, 42, 13, 23, 18, 28, 38, 23, 10, 32, 9, 34, 16, 18, 9, 23, 16, 12, 65, 31, 37, 16, 9, 34, 8, 12, 22, 55, 17, 30, 13, 25, 27, 14, 7, 78, 19, 11, 41, 54, 22, 27, 8, 18, 22, 6, 29, 16, 35, 27, 8, 10, 7, 51, 9, 23, 12, 9, 6, 15, 16, 8, 7, 14, 12, 10, 14, 17, 10, 13, 18, 8, 7, 9, 10, 10]
collection_1 = [1353, 25, 2430, 1995, 1209, 1291, 564, 68, 1184, 81, 132, 140, 1463, 258, 143, 338, 63, 38, 144, 534, 130, 2742, 392, 157, 301, 193, 620, 2303, 2269, 84, 1464, 148, 593, 191, 102, 1194, 211, 11, 2498, 359, 808, 552, 96, 334, 238, 46, 1771, 536, 160, 195, 318, 193, 684, 280, 249, 19, 235, 15, 144, 2030, 104, 619, 523, 106, 902, 31, 13, 55, 9, 21, 68, 51, 45, 92, 41, 432, 436, 137, 81, 57, 210, 254, 34, 28, 301, 72, 134, 409, 30, 53, 112, 106, 267, 33, 57, 35, 18, 143, 52, 45, 36, 183, 43, 66, 40, 100, 194, 139, 18, 280, 262, 62, 331, 196, 604, 56, 43, 181, 82, 171, 57, 22, 34, 52, 46, 260, 125, 50, 46, 23, 69, 83, 28, 219, 94, 32, 82, 31, 200, 20, 78, 725, 225, 107, 58, 59, 31, 44, 18, 136, 180, 74, 20, 44, 28, 90, 69, 48, 47, 50, 74, 18, 50, 20, 75, 127, 19, 80, 23, 163, 30, 103, 27, 10, 37, 37, 44, 41, 46, 49, 48, 55, 14, 19, 42, 79, 50, 45, 36, 15, 45, 128, 122, 46, 38, 21, 21, 81, 24, 12, 15, 53, 9, 26, 43, 23, 16, 79, 9, 45, 146, 58, 17, 30, 13, 8, 17, 24, 56, 6, 12, 17, 9, 15, 11, 13, 13, 12, 14, 21, 10, 8, 15, 8, 28, 8, 11, 24, 9, 13, 30, 14, 15, 7, 9, 25, 7, 8, 10, 5, 7, 7, 6, 7, 6, 7, 8, 9]
data_to_plot = [collection_0, collection_1]
box = plt.boxplot(data_to_plot,patch_artist=True, labels=["Contracting", "Expanding"])
colors = ['red', 'green']
for patch, color in zip(box['boxes'], colors):
patch.set_facecolor(color)
plt.ylabel("Unique adopters")
plt.show()
Your data varies over 4 orders of magnitude with a majority of data lying close to less than 1000. The mean of your data is around 170 and so the whole box plot appears compressed due to the huge outlier value of above 14000. You can see this via a histogram
plt.hist(collection_0);
You should try using a log scale for your expected visualization
plt.yscale('log')
I have the following strange problem. I am trying to do a 3d plot. That works ok. I wanted to put the projections on the surfaces of the plot. My code looks at the moment like this
fig = plt.figure(figsize = (10,8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_trisurf(xarr, yarr, zarr, cmap=cm.coolwarm, linewidth=50)
ax.set_xlabel('\nMAE', fontsize = 14, linespacing = 1.5)
ax.set_ylabel('\nDIFF', fontsize = 14)
ax.set_zlabel('\nCounts', fontsize = 14, linespacing=1.5)
cset = ax.contour(np.array(xx), np.array(yy),
np.array(zz), zdir='z', offset=-100, cmap=cm.coolwarm)
cset = ax.contour(xx, yy, np.array(zz), zdir='x', offset=-40, cmap=cm.coolwarm)
cset = ax.contour(xx, yy, np.array(zz), zdir='y', offset=40, cmap=cm.coolwarm)
plt.show()
What is not working is the following line
cset = ax.contour(np.array(xx), np.array(yy),
np.array(zz), zdir='z', offset=-100, cmap=cm.coolwarm)
Here are the vectors
np.array(yy)
array([ 21, 6, 30, 3, 27, 61, 56, 52, 38, 14, 33, 12, 93,
129, 36, 11, 59, 9, 113, 18, 26, 8, 17, 10, 29, 2,
4, 16, 85, 55, 58, 45, 7, 15, 19, 5, 69, 57, 20,
158, 86, 118, 31, 107, 34, 92, 32, 28, 66, 54, 87, 25,
13, 99, 23, 60, 81, 24, 72, 123, 49, 63, 64, 71, 67,
40, 46, 48, 47, 95, 43, 159, 22, 37, 35, 105, 104, 42,
128, 53, 76, 75, 103, 65, 136, 144, 68, 77, 278, 98, 111,
114, 41, 84, 154, 62, 214, 124, 210, 1, 155, 79, 74, 80,
83, 318, 70, 120, 78, 44, 88, 73, 50, 110, 178, 51, 134,
106, 189, 91, 411, 135, 138, 143, 127, 122, 160, 94, 109, 226,
140, 117, 100, 133, 191, 141, 89, 288, 126, 97, 653, 121, 172,
161, 39, 96, 90, 130, 169, 142, 82, 132, 156, 137, 119, 102,
112, 188, 610, 115, 146, 234, 108, 150, 182, 170, 116, 223, 139,
197, 194, 241, 131, 181, 183, 152, 147, 250, 203, 165, 199, 218,
334, 167, 151, 384, 163, 162, 125, 148, 233, 354, 184, 168, 186,
180, 166, 369, 192, 101, 201, 157, 164, 419, 239], dtype=int64)
and
np.array(xx)
array([ 500., 1500., 2500., 3500., 4500., 5500., 6500.,
7500., 8500., 9500., 10500., 11500., 12500., 13500.,
14500., 15500., 16500., 17500., 18500., 19500., 20500.,
21500., 22500., 23500., 24500., 25500., 26500., 27500.,
28500.])
the zz has dimensions
np.array(zz).shape
(205,29)
as it should. Anyone can guess what is wrong? The complete error is
ValueError: setting an array element with a sequence.
I cannot unfortunately publish the data, but I hope the error is linked to how the data are structured...
Thanks in advance, Umberto
If you check shapes of X, Y and Z in contour3d example, you shall
find out that they are the same.
So, in order to make your code working, you
should extend your xx and yy to 2d arrays with np.meshgrid before creating a plot.
xx, yy = np.meshgrid(xx, yy)
I want to multiply only every second number on a list(0-100) but I just can't get it to work.
[x*10 for x in range(100) if x%2==0] # take vale to be multiply instead of 10, use x if you want to multiply with number it self
below code will only multiply even elements and keep other as it is.
def iterate_lis(get_list):
ls = []
for x in get_list:
if x%2==0:
ls.append(x*2)
else:
ls.append(x)
print(ls)
return ls
iterate_count = 5 # list will be iterate 5 times
for i in range(iterate_count):
if i ==0:
get_lis = iterate_lis(range(100))
else:
get_lis = iterate_lis(get_lis)
result for iterate_count=5 will be as follow:
>>>
[0, 1, 4, 3, 8, 5, 12, 7, 16, 9, 20, 11, 24, 13, 28, 15, 32, 17, 36, 19, 40, 21, 44, 23, 48, 25, 52, 27, 56, 29, 60, 31, 64, 33, 68, 35, 72, 37, 76, 39, 80, 41, 84, 43, 88, 45, 92, 47, 96, 49, 100, 51, 104, 53, 108, 55, 112, 57, 116, 59, 120, 61, 124, 63, 128, 65, 132, 67, 136, 69, 140, 71, 144, 73, 148, 75, 152, 77, 156, 79, 160, 81, 164, 83, 168, 85, 172, 87, 176, 89, 180, 91, 184, 93, 188, 95, 192, 97, 196, 99]
[0, 1, 8, 3, 16, 5, 24, 7, 32, 9, 40, 11, 48, 13, 56, 15, 64, 17, 72, 19, 80, 21, 88, 23, 96, 25, 104, 27, 112, 29, 120, 31, 128, 33, 136, 35, 144, 37, 152, 39, 160, 41, 168, 43, 176, 45, 184, 47, 192, 49, 200, 51, 208, 53, 216, 55, 224, 57, 232, 59, 240, 61, 248, 63, 256, 65, 264, 67, 272, 69, 280, 71, 288, 73, 296, 75, 304, 77, 312, 79, 320, 81, 328, 83, 336, 85, 344, 87, 352, 89, 360, 91, 368, 93, 376, 95, 384, 97, 392, 99]
[0, 1, 16, 3, 32, 5, 48, 7, 64, 9, 80, 11, 96, 13, 112, 15, 128, 17, 144, 19, 160, 21, 176, 23, 192, 25, 208, 27, 224, 29, 240, 31, 256, 33, 272, 35, 288, 37, 304, 39, 320, 41, 336, 43, 352, 45, 368, 47, 384, 49, 400, 51, 416, 53, 432, 55, 448, 57, 464, 59, 480, 61, 496, 63, 512, 65, 528, 67, 544, 69, 560, 71, 576, 73, 592, 75, 608, 77, 624, 79, 640, 81, 656, 83, 672, 85, 688, 87, 704, 89, 720, 91, 736, 93, 752, 95, 768, 97, 784, 99]
[0, 1, 32, 3, 64, 5, 96, 7, 128, 9, 160, 11, 192, 13, 224, 15, 256, 17, 288, 19, 320, 21, 352, 23, 384, 25, 416, 27, 448, 29, 480, 31, 512, 33, 544, 35, 576, 37, 608, 39, 640, 41, 672, 43, 704, 45, 736, 47, 768, 49, 800, 51, 832, 53, 864, 55, 896, 57, 928, 59, 960, 61, 992, 63, 1024, 65, 1056, 67, 1088, 69, 1120, 71, 1152, 73, 1184, 75, 1216, 77, 1248, 79, 1280, 81, 1312, 83, 1344, 85, 1376, 87, 1408, 89, 1440, 91, 1472, 93, 1504, 95, 1536, 97, 1568, 99]
[0, 1, 64, 3, 128, 5, 192, 7, 256, 9, 320, 11, 384, 13, 448, 15, 512, 17, 576, 19, 640, 21, 704, 23, 768, 25, 832, 27, 896, 29, 960, 31, 1024, 33, 1088, 35, 1152, 37, 1216, 39, 1280, 41, 1344, 43, 1408, 45, 1472, 47, 1536, 49, 1600, 51, 1664, 53, 1728, 55, 1792, 57, 1856, 59, 1920, 61, 1984, 63, 2048, 65, 2112, 67, 2176, 69, 2240, 71, 2304, 73, 2368, 75, 2432, 77, 2496, 79, 2560, 81, 2624, 83, 2688, 85, 2752, 87, 2816, 89, 2880, 91, 2944, 93, 3008, 95, 3072, 97, 3136, 99]
You can`t multiply with 0 , because the result will be always 0.
result=1
for i in range (1 , 10 ):
if i%2==0:
result*=i
print(result)
import numpy as np
l = range(100)
np.product(l[0::2])
This will give you every second element of your list and multiply all.
I want to change to list so: 1,2,3,4,5,6,7,8 becomes 1,4,3,8,5,16 etc.
Though I don't understand how you multiply 6 to end up with 16, I assume this is what you need:
new_list = []
for x in range(1,100):
if x % 2 == 0: new_list.append(x*2)
else: new_list.append(x)
print(new_list)
If the number is divisible by 2, you multiply it with 2 and append it to a new list. If not, you just append it without multiplying.
Running this program, you get the following output:
[1, 4, 3, 8, 5, 12, 7, 16, 9, 20, 11, 24, 13, 28, 15, 32, 17, 36, 19, 40, 21, 44, 23, 48, 25, 52, 27, 56, 29, 60, 31, 64, 33, 68, 35, 72, 37, 76, 39, 80, 41, 84, 43, 88, 45, 92, 47, 96, 49, 100, 51, 104, 53, 108, 55, 112, 57, 116, 59, 120, 61, 124, 63, 128, 65, 132, 67, 136, 69, 140, 71, 144, 73, 148, 75, 152, 77, 156, 79, 160, 81, 164, 83, 168, 85, 172, 87, 176, 89, 180, 91, 184, 93, 188, 95, 192, 97, 196, 99]
Use range() to generate the indexes of the entries you want to change...
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
for i in range(1, len(numbers), 2):
numbers[i] *= 2
will result in numbers containing the list
[1, 4, 3, 8, 5, 12, 7, 16]