Howto bin series of float values into histogram in Python? - python

I have set of value in float (always less than 0). Which I want to bin into histogram,
i,e. each bar in histogram contain range of value [0,0.150)
The data I have looks like this:
0.000
0.005
0.124
0.000
0.004
0.000
0.111
0.112
Whith my code below I expect to get result that looks like
[0, 0.005) 5
[0.005, 0.011) 0
...etc..
I tried to do do such binning with this code of mine.
But it doesn't seem to work. What's the right way to do it?
#! /usr/bin/env python
import fileinput, math
log2 = math.log(2)
def getBin(x):
return int(math.log(x+1)/log2)
diffCounts = [0] * 5
for line in fileinput.input():
words = line.split()
diff = float(words[0]) * 1000;
diffCounts[ str(getBin(diff)) ] += 1
maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)
for i in range(maxBin+1):
lo = 2**i - 1
hi = 2**(i+1) - 1
binStr = '[' + str(lo) + ',' + str(hi) + ')'
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
~

When possible, don't reinvent the wheel. NumPy has everything you need:
#!/usr/bin/env python
import numpy as np
a = np.fromfile(open('file', 'r'), sep='\n')
# [ 0. 0.005 0.124 0. 0.004 0. 0.111 0.112]
# You can set arbitrary bin edges:
bins = [0, 0.150]
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [8]
# bin_edges: [ 0. 0.15]
# Or, if bin is an integer, you can set the number of bins:
bins = 4
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [5 0 0 3]
# bin_edges: [ 0. 0.031 0.062 0.093 0.124]

from pylab import *
data = []
inf = open('pulse_data.txt')
for line in inf:
data.append(float(line))
inf.close()
#binning
B = 50
minv = min(data)
maxv = max(data)
bincounts = []
for i in range(B+1):
bincounts.append(0)
for d in data:
b = int((d - minv) / (maxv - minv) * B)
bincounts[b] += 1
# plot histogram
plot(bincounts,'o')
show()

The first error is:
Traceback (most recent call last):
File "C:\foo\foo.py", line 17, in <module>
diffCounts[ str(getBin(diff)) ] += 1
TypeError: list indices must be integers
Why are you converting an int to a str when a str is needed? Fix that, then we get:
Traceback (most recent call last):
File "C:\foo\foo.py", line 17, in <module>
diffCounts[ getBin(diff) ] += 1
IndexError: list index out of range
because you've only made 5 buckets. I don't understand your bucketing scheme, but let's make it 50 buckets and see what happens:
6
Traceback (most recent call last):
File "C:\foo\foo.py", line 21, in <module>
maxBin = max(maxdiff)
TypeError: 'int' object is not iterable
maxdiff is a single value out of your list of ints, so what is max doing here? Remove it, now we get:
6
Traceback (most recent call last):
File "C:\foo\foo.py", line 28, in <module>
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
TypeError: argument 2 to map() must support iteration
Sure enough, you're using a single value as the second argument to map. Let's simplify the last two lines from this:
binStr = '[' + str(lo) + ',' + str(hi) + ')'
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
to this:
print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])
Now it prints:
6
[0.000000, 1.000000) 3
[1.000000, 3.000000) 0
[3.000000, 7.000000) 2
[7.000000, 15.000000) 0
[15.000000, 31.000000) 0
[31.000000, 63.000000) 0
[63.000000, 127.000000) 3
I'm not sure what else to do here, since I don't really understand the bucketing you are hoping to use. It seems to involve binary powers, but isn't making sense to me...

Related

Distance between point to list of points

I'm trying to calculate the distance from point p to each of the points in list s.
import math
s= [(1,4),(4,2),(6,3)]
p= (3,7)
p0,p1=p
dist=[]
for s0,s1 in s:
dist=math.sqrt((p0[0] - p1[0])**2 + (s0[1] - s1[1])**2)
dist= dist+1
print(dist)
TypeError Traceback (most recent call last)
<ipython-input-7-77e000c3374a> in <module>
3 dist=[]
4 for s0,s1 in s:
----> 5 dist=math.sqrt((p0[0] - p1[0])**2 + (s0[1] - s1[1])**2)
6
7
TypeError: 'int' object is not subscriptable
I see that accessing the location is ceased as p0,p1 are ints. but in this scenario I'm not getting an idea how to address this.
You are accidentally using indexing on your data even though you already separated your points into x, y. In addition, you are overwriting your list and not saving the data. Also the distance formula is incorrect it should be a subtraction between points not addition. Try this:
import math
s= [(1,4),(4,2),(6,3)]
p= (3,7)
p0,p1=p
dist=[]
for s0,s1 in s:
dist_=math.sqrt((p0 - s0)**2 + (p1 - s1)**2) #Edit this line to [0]s and [1]s
dist_= dist_+1 #Also change name and/or delete
# print(dist)
dist.append(dist_) #Save data to list
dist=math.sqrt((p0[0] - p1[0])**2 + (s0[1] - s1[1])**2)
Here, you are indexing integer.
Moreover, you've made mistake in calculation. It should be:
dist=math.sqrt((p0 - s0)**2 + (p1 - s1)**2)
If what is desired is a list of the distances, that can be done in a single line of code with a list comprehension:
import math
import pprint
s = [(1,2),(3,4),(-1,1),(6,-7),(0, 6),(-5,-8),(-1,-1),(6,0),(1,-1)]
p = (3,-4)
dists = [math.sqrt((p[0]-s0)**2 + (p[1]-s1)**2) for s0, s1 in s]
pprint.pprint(dists)
The other thing here is that I've removed the dist = dist + 1 from the OPs code. I don't see that being correct...why add 1 to each computed distance?
Result:
[6.324555320336759,
8.0,
6.4031242374328485,
4.242640687119285,
10.44030650891055,
8.94427190999916,
5.0,
5.0,
3.605551275463989]
Maybe try to change this line:
dist=math.sqrt((p0[0] - p1[0])**2 + (s0[1] - s1[1])**2)
To:
dist=math.sqrt((p0 - p1)**2 + (s0 - s1)**2)
If you would like the Euclidean distance, you could do something like this (even without import math)
s = [(1, 4), (4, 2), (6, 3)]
p = (3, 7)
for point in s:
sum_ = sum((p[i] - point[i]) ** 2 for i in range(len(p)))
distance = sum_ ** (1 / 2) # take the square root, the same thing as math.sqrt()
print(p, point, round(distance, 1))
Results:
(3, 7) (1, 4) 3.6
(3, 7) (4, 2) 5.1
(3, 7) (6, 3) 5.0
The error you get in your code is because you used indexing on an integer. Just like doing this:
>>> a = 3
>>> a[0]
Traceback (most recent call last):
File "<input>", line 1, in <module>
a[0]
TypeError: 'int' object is not subscriptable
In case you are not constrained by the packages you can use. An implementation using NumPy would be more swift.
import numpy as np
s = np.array([(1,4),(4,2),(6,3)])
p = np.array((3,7))
dist = np.linalg.norm(p - s, axis=1)
Result:
array([3.60555128, 5.09901951, 5.])

Convert str to float python

Why does this throw an error : ValueError: could not convert string to float:
frequencies.append(float(l[start+1:stop1].strip()))
losses.append(float(l[stop1+5:stop2].strip()))
Doesn't the float() command parse values into the float type? Where am I wrong here? Both frequencies and losses are lists
This is the code:
def Capture():
impedance = 0
losses = []
frequencies = []
Xtalk = []
start = 0
stop1 = 0
stop2 =0
for filename in glob.glob(os.path.join(user_input, '*.txt')):
with open(filename, 'r') as f:
for l in f.readlines():
if l.startswith(' Impedance'):
v = l[12:-7]
impedance = float(v)
if l.startswith(' Xtalk'):
Xtalk.append(l[7:].strip())
if l.startswith(' Loss per inch'):
start = l.find('#')
stop1 = l.find('GHz', start)
stop2 = l.find('dB', start)
frequencies.append(float(l[start+1:stop1].strip()))
losses.append(float(l[stop1+5:stop2].strip()))
print(impedance, frequencies, losses, Xtalk)
It basically takes values from a text file and prints them onto the console
And the text files look like this:
Impedance = 71.28 ohms
Begin Post processing
Frequency multiplier = 1Hz
number of ports = 12
Start Frequency = 0
End Frequency = 40000000000
Number of Frequency points = 4001
Touchstone Output file = C:\Users\Aravind_Sampathkumar\Desktop\IMLC\BO\Output_TW_3.5-TS_3-core_h_2.xml_5000mil.s12p
Output format = Real - Imaginary
Loss per inch # 2.500000e+00 GHz = -0.569 dB
Loss per inch # 5 GHz = -0.997 dB
Xtalk #1 (Conductor 1 2):
Step response Next= -0.56 mV
Step response Fext peak # 5 inches= 0.11 mV
Xtalk #2 (Conductor 5 6):
Step response Next= -0.56 mV
Step response Fext peak # 5 inches= 0.11 mV
Finished post processing
First make sure which is the format of variable.
string with comma can't be converted to float with "float()" parser
a = "1,2345"
float(a)
Traceback (most recent call last):
File "<input>", line 1, in <module>
ValueError: could not convert string to float: '1,2345'
a = "1.2345"
float(a)
1.2345

Indexerror: list index out of range/numpy

I am really
new to python. I am getiing an error stating Indexerror list index out of range. Kindly help me out. Thanks in advance .
Edit 1
x = np.array([10,0])
Phi = np.array([[ 1. , 0.01],
[ 0. , 1. ]])
Gamma = np.array([[ 0.0001048 ],
[ 0.02096094]])
Z = np.array([[ 0.0001048 ],
[ 0.02096094]])
wd = 0
u_new = 0
x1d = 0
x2d = 0
xd = [[0 for col in range(len(x))] for row in range(1000)]
xd[0][0] = 10
xd[1][0] = 0
k = 10
DistPeriodNo1 = 500
FirstPeriod = 1
k=k+1 #Update PeriodNo(so PeriodNo is now equal to No. of current period)
if (k == 100): #If maximum value of PeriodNo is reached,
k = 11 #set it to 1
DistPeriodNo1 = random.randint(11,99)
if (FirstPeriod == 0):
if (k == DistPeriodNo1):
wd = random.randint(-1,1)
else:
wd = 0
xd[0][k] = Phi*xd[0][k-1] - Gamma*u_new + Z*wd
# >>indexerror list index out of range
xd[1][k] = Phi*xd[1][k-1] - Gamma*u_new + Z*wd
x1d = xd[0][k]
x2d = xd[1][k]
To answer your question in the comments about tracebacks (stack traces): running the following
a = [1,2,3]
b = [True, False]
print(a[2])
print(b[2])
produces one answer and one traceback.
>>>
3
Traceback (most recent call last):
File "C:\Programs\python34\tem.py", line 4, in <module>
print(b[2])
IndexError: list index out of range
The traceback shows what line and what code raised the error. People were asking you to copy the last 4 line and paste them into your question (by editing it).

Trying to find the std of an array result in fault

I have a bunch of files of the following order (tab separated):
h local average
1 4654 4654
2 5564 5564
3 6846 6846
... ... ...
I read the file in a loop (attached below) and store them in a two dimensional list. I then convert the list to array and apply std to it. This results with:
Traceback (most recent call last):
File "plot2.py", line 56, in <module>
e0028 = np.std(ar, axis=0)
File "/usr/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2467, in std
return std(axis, dtype, out, ddof)
TypeError: unsupported operand type(s) for /: 'list' and 'float'
Which baffles me. I tried to find an element in the array which is not float and nothing popped.
import numpy as np
import matplotlib.pyplot as plt
from math import fabs, sqrt, pow, pi
h0028 = []
p0028 = []
headLines = 2
fig=plt.figure()
ax1 = fig.add_subplot(1,1,1)
for i in range (0,24):
n = 0
j = i + 560
p = []
f = open('0028/'+str(j)+'.0028/ionsDist.dat')
for line in f:
if n < headLines:
n += 1
continue
words = line.split()
p.append (float(words[1]))
if i == 0:
h0028.append (fabs(int(words[0])))
n += 1
print (n)
p0028.append(p)
f.close()
ar = np.array(p0028)
for a in ar:
for b in a:
if not isinstance(b,float):
print type(a)
e0028 = np.std(ar, axis=0)
p0028 = np.mean(ar, axis=0)
h0028 = np.array(h0028)/10 -2.6
p0028 /= max(p0028)
e0028 /= (sum(p0028)*sqrt(23))
ax1.errorbar(h0028 , p0028, yerr=e0028, color = 'red')
ax1.set_xlim(-0.1,10)
plt.show()
plt.savefig('plot2.png', format='png')
I can't figure out, why your code does not work, but maybe this will help you.
You can read the file like this:
>>>a = np.loadtxt("p0028.csv",dtype="float",skiprows = 1)
>>> a
array([[ 1.00000000e+00, 4.65400000e+03, 4.65400000e+03],
[ 2.00000000e+00, 5.56400000e+03, 5.56400000e+03],
[ 3.00000000e+00, 6.84600000e+03, 6.84600000e+03]])
Now you can get the std of e.g. the column local like this:
>>>a_std = np.std(a[:1])
2193.4452352406706
When you loop over several files, you can use the vstack method to collect the data together, that way you do not depend on the number of rows in the file:
>>>a = np.loadtxt("p0028.csv",dtype="float",skiprows = 1)
>>> a
array([[ 1.00000000e+00, 4.65400000e+03, 4.65400000e+03],
[ 2.00000000e+00, 5.56400000e+03, 5.56400000e+03],
[ 3.00000000e+00, 6.84600000e+03, 6.84600000e+03]])
>>>b = np.loadtxt("p0028.csv",dtype="float",skiprows = 1)
>>> np.vstack((a,b))
array([[ 1, 4654, 4654],
[ 2, 5564, 5564],
[ 3, 6846, 6846],
[ 1, 4654, 4654],
[ 2, 5564, 5564],
[ 3, 6846, 6846]])
I have found the error, my file were not all of the same length. This caused a situation where I accessed empty element. I have added a loop that add zeros at the end of each list till I get the same length. Schuh, noted that adding zero at the end might result in getting wrong std. This is not the case in my data but this should be noted.

Problems with Smoothing graphs in Python

I have been trying to smooth a plot which is noisy due to the sampling rate I'm using, and what it's counting. I've been using the help on here - mainly Plot smooth line with PyPlot (although I couldn't find the "spline" function and so am using UnivarinteSpline instead)
However, whatever I do I keep getting errors with either the pyplot error that "x and y are not of the same length" or, that the scipi.UnivariateSpline has a value for w that is incorrect. I am not sure quite how to fix this (not really a Python person!) I've attached the code although it's just the plotting bit at the end that is causing problems. Thanks
import os.path
import matplotlib.pyplot as plt
import scipy.interpolate as sci
import numpy as np
def main():
jcc = "0050"
dj = "005"
l = "060"
D = 20
hT = 4 * D
wT1 = 2 * D
wT2 = 5 * D
for jcm in ["025","030","035","040","045","050","055","060"]:
characteristic = "LeadersOnly/Jcm" + jcm + "/Jcc" + jcc + "/dJ" + dj + "/lambda" + l + "/Seed000"
fingertime1 = []
fingertime2 = []
stamp =[]
finger=[]
for x in range(0,2500,50):
if x<10000:
z=("00"+str(x))
if x<1000:
z=("000"+str(x))
if x<100:
z=("0000"+str(x))
if x<10:
z=("00000"+str(x))
stamp.append(x)
path = "LeadersOnly/Jcm" + jcm + "/Jcc" + jcc + "/dJ" + dj + "/lambda" + l + "/Seed000/profile_" + str(z) + ".txt"
if os.path.exists(path):
f = open(path, 'r')
pr1,pr2=np.genfromtxt(path, delimiter='\t', unpack=True)
p1=[]
p2=[]
h1=[]
h2=[]
a1=[]
a2=[]
finger1 = 0
finger2 = 0
for b in range(len(pr1)):
p1.append(pr1[b])
p2.append(pr2[b])
for elem in range(len(pr1)-80):
h1.append((p1[elem + (2*D)]-0.5*(p1[elem]+p1[elem + (4*D)])))
h2.append((p2[elem + (2*D)]-0.5*(p2[elem]+p2[elem + (4*D)])))
if h1[elem] >= hT:
a1.append(1)
else:
a1.append(0)
if h2[elem]>=hT:
a2.append(1)
else:
a2.append(0)
for elem in range(len(a1)-1):
if (a1[elem] - a1[elem + 1]) != 0:
finger1 = finger1 + 1
finger1 = finger1 / 2
for elem in range(len(a2)-1):
if (a2[elem] - a2[elem + 1]) != 0:
finger2 = finger2 + 1
finger2 = finger2 / 2
fingertime1.append(finger1)
fingertime2.append(finger2)
finger.append((finger1+finger2)/2)
namegraph = jcm
stampnew = np.linspace(stamp[0],stamp[-1],300)
fingernew = sci.UnivariateSpline(stamp, finger, stampnew)
plt.plot(stampnew,fingernew,label=namegraph)
plt.show()
main()
For information, the data input files are simply a list of integers (two lists seperated by tabs, as the code suggests).
Here is one of the error codes that I get:
0-th dimension must be fixed to 50 but got 300
error Traceback (most recent call last)
/group/data/Cara/JCMMOTFingers/fingercount_jcm_smooth.py in <module>()
116
117 if __name__ == '__main__':
--> 118 main()
119
120
/group/data/Cara/JCMMOTFingers/fingercount_jcm_smooth.py in main()
93 #print(len(stamp))
94 stampnew = np.linspace(stamp[0],stamp[-1],300)
---> 95 fingernew = sci.UnivariateSpline(stamp, finger, stampnew)
96 #print(len(stampnew))
97 #print(len(fingernew))
/usr/lib/python2.6/dist-packages/scipy/interpolate/fitpack2.pyc in __init__(self, x, y, w, bbox, k, s)
86 #_data == x,y,w,xb,xe,k,s,n,t,c,fp,fpint,nrdata,ier
87 data = dfitpack.fpcurf0(x,y,k,w=w,
---> 88 xb=bbox[0],xe=bbox[1],s=s)
89 if data[-1]==1:
90 # nest too small, setting to maximum bound
error: failed in converting 1st keyword `w' of dfitpack.fpcurf0 to C/Fortran array
Let's analyze your code a bit, starting from the for x in range(0, 2500, 50):
You define z as a string of 6 digits padded with 0s. You should really use somestring formatting like z = "{0:06d}".format(x) or z = "%06d" % x instead of these multiple tests of yours.
At the end of your loop, stamp will have (2500//50)=50 elements.
You check for the existence of your file path, then open it and read it, but you never close it. A more Pythonic way is to do:
try:
with open(path,"r") as f:
do...
except IOError:
do something else
With the with syntax, your file is automatically closed.
pr1 and pr2 are likely to be 1D arrays, right? You can really simplify the construction of your p1 and p2 lists as:
p1 = pr1.tolist()
p2 = pr2.tolist()
Your lists a1, a2 have the same size: you could combine your for elem in range(len(a..)-1) loops in a single one. You could also use the np.diff function.
at the end of the for x in range(...) loops, finger will have 50 elements minus the number of missing files. As you're not telling what to do in case of a missing file, your stamp and finger lists may not have the same number of elements, which will crash your scipy.UnivariateSpline. An easy fix would be to update your stamp list only if the path file is defined (that way, it always has the same number of elements as finger).
Your stampnew array has 300 elements, when your stamp and finger can only have at most 50. That's a second problem, the size of the weight array (stampnew) must be the same as the size of the inputs.
You're eventually trying to plot fingernew vs stamp. The problem is that fingernew is not an array, it's an instance of UnivariateSpline. You still need to calculate some actual points, for example with fingernew(stamp), then use that in your plot function.

Categories