Converting discrete values into real values in python - python

I have a numpy array with discrete values. I used numpy.digitize() function to get these discrete values from continuous values. Now I want to convert these discrete values back to the original continuous values. Is there a function in python which can help me doing that? A sample code has been added below:
A = [437.479, 438.536, 440.026,............,471.161]
bins = numpy.linspace(numpy.amin(A),numpy.amax(A),255)
discretized_A = numpy.digitize(A, bins)
discretized_A = [1,8,18,................,237]
As you see here I had a vector of real values, I used digitize function to project that vector in space of min to max of A with 255 equal spacing values. So i got the end result as discretized_A. Now I want to reverse engineer the steps and get my original real values.

Related

Normalise max value of probability function for all frames

I have working code that plots a bivariate gaussian distribution. The distribution is produced by adjusting the COV matrix to account for specific variables. Specifically, every XY coordinate is applied with a radius. The COV matrix is then adjusted by a scaling factor to expand the radius in x-direction and contract in y-direction. The direction of this is measured by theta. The output is expressed as a probability density function (PDF).
I have normalised the PDF values. However, I'm calling a separate PDF for each frame. As such, the maximum value changes and hence the probability will be transformed differently for each frame.
Question: Using #Prasanth's suggestion. Is it possible to create normalized arrays for each frame before plotting, and then plot these arrays?
Below is the function I'm currently using to normalise the PDF for a single frame.
normPDF = (PDFs[0]-PDFs[1])/max(PDFs[0].max(),PDFs[1].max())
Is it possible to create normalized arrays for each frame before plotting, and then plot these arrays?
Indeed is possible. In your case you probably need to rescale your arrays between two values, say -1 and 1, before plotting. So that the minimum becomes -1, the maximum 1 and the intermediate values are scaled accordingly.
You could also choose 0 and 1 or whatever as minimum and maximum, but let's go with -1 and 1 so that a the middle value is 0.
To do this, in your code replace:
normPDF = (PDFs[0]-PDFs[1])/max(PDFs[0].max(),PDFs[1].max())
with:
renormPDF = PDFs[0]-PDFs[1]
renormPDF -= renormPDF.min()
normPDF = (renormPDF * 2 / renormPDF.max()) -1
This three lines ensure that normPDF.min() == -1 and normPDF.max() == 1.
Now when plotting the animation the axis on the right of your image does not change.
Your problem is to find the maximum values of PDFs[0].max() and PDFs[1].max() for all frames.
Why don't you run plotmvs on all your planned frames in order to find the absolute maximum for PDFs[0] and PDFs[1] and then run your animation with these absolute maxima to normalize your plots? This way, the colorbar will be the same for all frames.

2D histogram colour by "label fraction" of data in each bin

Following on from the post found here: 2D histogram coloured by standard deviation in each bin
I would like to colour each bin in a 2D grid by the fraction of points whose label values are below a certain threshold in Python.
Note that, in this dataset, each point has a continuous label value between 0-1.
For example here is a histogram I made whereby the colour denotes the standard deviation of label values of all points in each bin:
The way this was done was by using
scipy.stats.binned_statistic_2d()
(see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html)
..and setting the statistic argument to 'std'
But is there a way to change this kind of plot so that the colouring is representative of the fraction of points in each bin with label value below 0.5 for example?
It could be that the only way to do this is by explicitly defining a grid of some kind and calculating the fractions but I'm not sure of the best way to do that so any help on this matter would be greatly appreciated!
Maybe using scipy.stats.binned_statistic_2d or numpy.histogram2d and being able to return the raw data values in each bin as a multi dimensional array would help in being able to quickly compute the fractions explicitly.
The fraction of elements in an array below a threshold can be calculated as
fraction = lambda a, threshold: len(a[a<threshold])/len(a)
Hence you can call
scipy.stats.binned_statistic_2d(x, y, values, statistic=lambda a: fraction(a, 0.5))

Python: histogram/ binning data from 2 arrays.

I have two arrays of data: one is a radius values and the other is a corresponding intensity reading at that intensity:
e.g. a small section of the data. First column is radius and the second is the intensities.
29.77036614 0.04464427
29.70281027 0.07771409
29.63523525 0.09424901
29.3639355 1.322793
29.29596385 2.321502
29.22783249 2.415751
29.15969437 1.511504
29.09139827 1.01704
29.02302068 0.9442765
28.95463729 0.3109002
28.88609766 0.162065
28.81754446 0.1356054
28.74883612 0.03637681
28.68004928 0.05952569
28.61125036 0.05291172
28.54229804 0.08432806
28.4732599 0.09950128
28.43877462 0.1091304
28.40421016 0.09629156
28.36961249 0.1193614
28.33500089 0.102711
28.30037503 0.07161685
How can I bin the radius data, and find the average intensity corresponding to that binned radius.
The aim of this is to then use the average intensity to assign an intensity value to a radius data with a missing (NaN) data point.
I've never had to use the histogram functions before and have very little idea of how they work/ if its possible to do this with them. The full data set is large with 336622 number of data points, so I don't really want to be using loops or if statements to achieve this.
Many Thanks for any help.
if you only need to do this for a handful of points, you could do something like this.
If intensites and radius are numpy arrays of your data:
bin_width = 0.1 # Depending on how narrow you want your bins
def get_avg(rad):
average_intensity = intensities[(radius>=rad-bin_width/2.) & (radius<rad+bin_width/2.)].mean()
return average_intensities
# This will return the average intensity in the bin: 27.95 <= rad < 28.05
average = get_avg(28.)
It's not really histogramming what your are after. A histogram is more a count of items that fall into a specific bin. What you want to do is more a group by operation, where you'd group your intensities by radius intervals and on the groups of itensities you apply some aggregation method, like average or median etc.
What your are describing, however, sounds a lot more like some sort of interpolation you want to perform. So I would suggest to think about interpolation as an alternative to solve your problem. Anyways, here's a suggestion how you can achieve what you asked for (assuming you can use numpy) - I'm using random inputs to illustrate:
radius = numpy.fromiter((random.random() * 10 for i in xrange(1000)), dtype=numpy.float)
intensities = numpy.fromiter((random.random() * 10 for i in xrange(1000)), dtype=numpy.float)
# group your radius input into 20 equal distant bins
bins = numpy.linspace(radius.min(), radius.max(), 20)
groups = numpy.digitize(radius, bins)
# groups now holds the index of the bin into which radius[i] falls
# loop through all bin indexes and select the corresponding intensities
# perform your aggregation on the selected intensities
# i'm keeping the aggregation for the group in a dict
aggregated = {}
for i in range(len(bins)+1):
selected_intensities = intensities[groups==i]
aggregated[i] = selected_intensities.mean()

Numpy cumulative distribution function (CDF)

I have an array of values and have created a histogram of the data using numpy.histogram, as follows:
histo = numpy.histogram(arr, nbins)
where nbins is the number of bins derived from the range of the data (max-min) divided by a desired bin width.
From the output I create a cumulative distribution function using:
cdf = np.cumsum(histo[0])
normCdf = cdf/np.amax(cdf)
However, I need an array of normCdf values that corresponds with the values in the original array (arr). For example, if a value in the original array arr is near the minimum value of arr then its corresponding normCdf value will be high (i.e 0.95). (In this example, as I am working with radar data my data is in decibels and is negative. Therefore the lowest value is where the CDF reaches its maximum.)
Im struggling, conceptually, how I achieve an array whereby each value in the array has its corresponding value under the CDF (normCdf value). Any help would be appreciated. The histogram with the cdf is below.
This is old, but may still be of help to someone.
Consider the OP's last sentence:
Im struggling, conceptually, how I achieve an array whereby each value in the array has its corresponding value under the CDF (normCdf value).
If I understand correctly, what the OP is asking for, actually boils down to the (normalized) ordinal rank of the array elements.
The ordinal rank of an array element i basically indicates how many elements in the array have a value smaller than that of element i. This is equivalent to the discrete cumulative density.
Ordinal ranking is related to sorting by the following equality (where u is an unsorted list):
u == [sorted(u)[i] for i in ordinal_rank(u)]
Based on the implementation of scipy.stats.rankdata, the ordinal rank can be computed as follows:
def ordinal_rank(data):
rank = numpy.empty(data.size)
rank[numpy.argsort(data)] = numpy.arange(data.size)
return rank
So, to answer the OP's question:
The normalized (empirical) cumulative density corresponding to the values in the OP's arr can then be computed as follows:
normalized_cdf = ordinal_rank(arr) / len(arr)
And the result can be displayed using:
pyplot.plot(arr, normalized_cdf, marker='.', linestyle='')
Note, that, if you only need the plot, there is an easier way:
n = len(arr)
pyplot.plot(numpy.sort(arr), numpy.arange(n) / n)
And, finally, we can verify this by plotting the cumulative normalized histogram as follows (using an arbitrary number of bins):
pyplot.hist(arr, bins=100, cumulative=True, density=True)
Here's an example comparing the three approaches, using 30 bins for the cumulative histogram:

Scipy normalization-- localize values to set discrete points

I am currently displaying two separate 2D images (x,y plane and z,y plane) that are derived from 96x512 arrays of 0-255 values. I would like to be able to filter the data so that anything under a certain value is done away with (the highest values are indicative of targets). What I would like to be able to do is from these images, separate discrete points that may be then mapped three-dimensionally as points, rather than mapping two intersecting planes. I'm not entirely sure how to do this or where to start (I'm very new to python). I am producing the images using scipy and have done some normalization and noise reduction, but I'm not sure how to then separate out anything over the threshold as it's own individual point. Is this possible?
If I understand correctly what you want, filtering points can be done like this:
A=numpy.random.rand(5,5)
B=A>0.5
Now B is a binary mask, and you can use it in a number of ways:
A[B]
will return an array with all values of A that are true in B.
A[B]=0
will assign 0 to all values in A that are true in B.
numpy.nonzero(B)
will give you the x,y coordinates of each point that is true in B.

Categories