Trying to find the std of an array result in fault

Trying to find the std of an array result in fault - python

I have a bunch of files of the following order (tab separated):
h local average
1 4654 4654
2 5564 5564
3 6846 6846
... ... ...
I read the file in a loop (attached below) and store them in a two dimensional list. I then convert the list to array and apply std to it. This results with:
Traceback (most recent call last):
File "plot2.py", line 56, in <module>
e0028 = np.std(ar, axis=0)
File "/usr/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2467, in std
return std(axis, dtype, out, ddof)
TypeError: unsupported operand type(s) for /: 'list' and 'float'
Which baffles me. I tried to find an element in the array which is not float and nothing popped.
import numpy as np
import matplotlib.pyplot as plt
from math import fabs, sqrt, pow, pi
h0028 = []
p0028 = []
headLines = 2
fig=plt.figure()
ax1 = fig.add_subplot(1,1,1)
for i in range (0,24):
n = 0
j = i + 560
p = []
f = open('0028/'+str(j)+'.0028/ionsDist.dat')
for line in f:
if n < headLines:
n += 1
continue
words = line.split()
p.append (float(words[1]))
if i == 0:
h0028.append (fabs(int(words[0])))
n += 1
print (n)
p0028.append(p)
f.close()
ar = np.array(p0028)
for a in ar:
for b in a:
if not isinstance(b,float):
print type(a)
e0028 = np.std(ar, axis=0)
p0028 = np.mean(ar, axis=0)
h0028 = np.array(h0028)/10 -2.6
p0028 /= max(p0028)
e0028 /= (sum(p0028)*sqrt(23))
ax1.errorbar(h0028 , p0028, yerr=e0028, color = 'red')
ax1.set_xlim(-0.1,10)
plt.show()
plt.savefig('plot2.png', format='png')

I can't figure out, why your code does not work, but maybe this will help you.
You can read the file like this:
>>>a = np.loadtxt("p0028.csv",dtype="float",skiprows = 1)
>>> a
array([[ 1.00000000e+00, 4.65400000e+03, 4.65400000e+03],
[ 2.00000000e+00, 5.56400000e+03, 5.56400000e+03],
[ 3.00000000e+00, 6.84600000e+03, 6.84600000e+03]])
Now you can get the std of e.g. the column local like this:
>>>a_std = np.std(a[:1])
2193.4452352406706
When you loop over several files, you can use the vstack method to collect the data together, that way you do not depend on the number of rows in the file:
>>>a = np.loadtxt("p0028.csv",dtype="float",skiprows = 1)
>>> a
array([[ 1.00000000e+00, 4.65400000e+03, 4.65400000e+03],
[ 2.00000000e+00, 5.56400000e+03, 5.56400000e+03],
[ 3.00000000e+00, 6.84600000e+03, 6.84600000e+03]])
>>>b = np.loadtxt("p0028.csv",dtype="float",skiprows = 1)
>>> np.vstack((a,b))
array([[ 1, 4654, 4654],
[ 2, 5564, 5564],
[ 3, 6846, 6846],
[ 1, 4654, 4654],
[ 2, 5564, 5564],
[ 3, 6846, 6846]])

I have found the error, my file were not all of the same length. This caused a situation where I accessed empty element. I have added a loop that add zeros at the end of each list till I get the same length. Schuh, noted that adding zero at the end might result in getting wrong std. This is not the case in my data but this should be noted.

Related

Simple 1D Array Iteration Error (TypeError: 'int' object is not iterable)

I'm currently trying to iterate arrays for Random Walks, and I am able to use a for loop when there are multiple numbers per element of the array. I seem to be having trouble applying a math.dist function to a 1-dimensional array with one number per element.
Here is the problematic code:
origin = 0
all_walks1 = []
W = 100
N = 10
list_points = []
for i in range(W):
x = 2*np.random.randint(0,2,size=N)-1
xs = cumsum(x)
all_walks1.append(xs)
list_points.append(xs[-1])
list_dist = []
for i in list_points:
d = math.dist(origin, i)
list_dist.append(d)
If I try to append the distances to a new array, I am getting a TypeError: 'int' object is not iterable error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1808/1512057642.py in <module>
16
17 for i in list_points:
---> 18 d = math.dist(origin, i)
19 list_dist.append(d)
20
TypeError: 'int' object is not iterable
However, if the array I am parsing through in the for loop is has multiple numbers per element, as it is in the following code, everything works fine:
origin = (0, 0, 0)
all_walks_x = []
all_walks_y = []
all_walks_z = []
W = 100
N = 10
list_points = []
for i in range(W):
x = 2*np.random.randint(0,2,size=N)-1
y = 2*np.random.randint(0,2,size=N)-1
z = 2*np.random.randint(0,2,size=N)-1
xs = cumsum(x)
ys = cumsum(y)
zs = cumsum(z)
all_walks_x.append(xs)
all_walks_y.append(ys)
all_walks_z.append(zs)
list_points.append((xs[-1], ys[-1], zs[-1]))
list_dist = []
for i in list_points:
d = math.dist(origin, i)
list_dist.append(d)
I have tried using for i in range(len(list_points): and for key, value in enumerate(list_points): to no success. The only difference between the first and and the second list_points arrays would appear to be the elements enclosed in parentheses when there are multiple numbers per element. It would seem to be a simple solution that in whatever way is eluding me. Thanks for reading, and any help would be appreciated.
EDIT: I may be using the terms 1D and 3D arrays incorrectly, The first array is a list of numbers such as [6, 4, -1, 5 ... ] and the second array is a list of multiple numbers per element such as [(-10, -2, 14), (12, 2, 8), (-4, 8, 24), (10, 10, 0), (2, 8, 10) ... ]

It seems you are passing integers to math.dist. math.dist finds the Euclidean distance between one and two dimensional points. Therefore you have to provide a list, even if its just a single integer.
Example:
# not valid
math.dist(1, 2)
# valid
math.dist([1], [2])

Indexerror: list index out of range/numpy

I am really
new to python. I am getiing an error stating Indexerror list index out of range. Kindly help me out. Thanks in advance .
Edit 1
x = np.array([10,0])
Phi = np.array([[ 1. , 0.01],
[ 0. , 1. ]])
Gamma = np.array([[ 0.0001048 ],
[ 0.02096094]])
Z = np.array([[ 0.0001048 ],
[ 0.02096094]])
wd = 0
u_new = 0
x1d = 0
x2d = 0
xd = [[0 for col in range(len(x))] for row in range(1000)]
xd[0][0] = 10
xd[1][0] = 0
k = 10
DistPeriodNo1 = 500
FirstPeriod = 1
k=k+1 #Update PeriodNo(so PeriodNo is now equal to No. of current period)
if (k == 100): #If maximum value of PeriodNo is reached,
k = 11 #set it to 1
DistPeriodNo1 = random.randint(11,99)
if (FirstPeriod == 0):
if (k == DistPeriodNo1):
wd = random.randint(-1,1)
else:
wd = 0
xd[0][k] = Phi*xd[0][k-1] - Gamma*u_new + Z*wd
# >>indexerror list index out of range
xd[1][k] = Phi*xd[1][k-1] - Gamma*u_new + Z*wd
x1d = xd[0][k]
x2d = xd[1][k]

To answer your question in the comments about tracebacks (stack traces): running the following
a = [1,2,3]
b = [True, False]
print(a[2])
print(b[2])
produces one answer and one traceback.
>>>
3
Traceback (most recent call last):
File "C:\Programs\python34\tem.py", line 4, in <module>
print(b[2])
IndexError: list index out of range
The traceback shows what line and what code raised the error. People were asking you to copy the last 4 line and paste them into your question (by editing it).

Generate 4 columns of data such that each row sum to 100

How do I write a python program that can randomly generate 4 columns of data such that the sum of the numbers of each row is 100?

>>> import numpy as np
>>> A = np.random.rand(10,4)
>>> A /= A.sum(axis=1)[:,np.newaxis]
>>> A *= 100
>>> A
array([[ 52.65020485, 8.39068184, 4.89730114, 34.06181217],
[ 58.32667159, 8.99338257, 13.7326809 , 18.94726494],
[ 8.23847677, 36.27990343, 14.73440883, 40.74721097],
[ 37.10408209, 5.31467062, 39.47977538, 18.10147191],
[ 21.5697797 , 14.80630725, 12.69891923, 50.92499382],
[ 15.46006657, 24.62499701, 37.37736874, 22.53756768],
[ 6.66777748, 25.62326117, 11.80042839, 55.90853296],
[ 38.81602256, 26.74457165, 3.4365655 , 31.00284028],
[ 5.67431732, 7.57571558, 44.01330459, 42.73666251],
[ 33.09837171, 26.66421892, 10.90188895, 29.33552043]])
This generates positive real numbers as you asked. They will be random in the uniform distribution. If you want a different distribution, you can find several other choices in np.random.

import random
def Column(n):
integers = []
for i in range(n):
A = random.randrange(0,100)
B = random.randrange(0,100-A)
C = random.randrange(0,100-(A+B))
D = (100 - (A+B+C))
integers.append((A,B,C,D))
return integers
Returns = Column(4)
for i in Returns:
print(i)
print(i[0]+i[1]+i[2]+i[3])
Sorry if it's messy, got to go.

Python: How does one typically get around "Maximum allowed dimension exceeded" error?

I'm trying to make a 2^n x 2^n numpy array of all possible dot product permutations of a very large set of vectors. My test array, "data", is a (129L, 222L) numpy array. My function seems (in my novice opinion) to be pretty straightforward. It's just the fact that I have too much data to process. How do programmers typically get around this issue? Any suggestions?
My data:
>>> data
array([[ 1.36339199e-07, 6.71355407e-09, 2.13336419e-07, ...,
8.44471296e-10, 6.02566662e-10, 3.38577178e-10],
[ 7.19224620e-08, 5.64739121e-08, 1.49689547e-07, ...,
3.85361972e-10, 3.17756751e-10, 1.68563023e-10],
[ 1.93443482e-10, 1.11626853e-08, 2.66691759e-09, ...,
2.20938084e-11, 2.56114420e-11, 1.31865060e-11],
...,
[ 7.12584509e-13, 7.70844451e-13, 1.09718565e-12, ...,
2.08390730e-13, 3.05264153e-13, 1.62286818e-13],
[ 2.57153616e-13, 6.08747557e-13, 2.00768488e-12, ...,
6.29901984e-13, 1.19631816e-14, 1.05109078e-13],
[ 1.74618064e-13, 5.03695393e-13, 1.29632351e-14, ...,
7.60145676e-13, 3.19648911e-14, 8.72102078e-15]])`
My function:
import numpy as np
from itertools import product, count
def myFunction(data):
S = np.array([])
num = 2**len(data)
y = product(data, repeat = 2)
for x in count():
while x <= num:
z = y.next()
i, j = z
s = np.dot(i, j)
S = np.insert(S, x, s)
break #for the 'StopIteration' issue
return np.reshape(S, (num,num))
My error:
>>> theMatrix = myFunction(data)
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\IPython\core\interactiveshell.py", line 2721, in run_code
exec code_obj in self.user_global_ns, self.user_ns
File "", line 1, in <module>
matrix = myFunction(data)
File "E:\Folder1\Folder2\src\myFunction.py", line 16, in myFunction
return np.reshape(S, (num,num))
File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape
return reshape(newshape, order=order)
ValueError: Maximum allowed dimension exceeded

Why are you passing num,num to reshape, but not the actual thing you're reshaping?
Perhaps you want something like return np.reshape(S, (num, num)) instead?
As for the actual error, 2^129 is a pretty darn large number - even your regular 64-bit integer can only index up to 2^64. The memory of your machine probably can't contain a 2^129 x 2^129 matrix.
Are you sure you really want to be processing quite that much? Even with a GHz processor, that's still ~2^100 seconds worth of processing if you can operate on an element in a single cpu cycle (which you probably can't).

The cartesian product is O(n^2) not O(2^n), (lucky for you). Probably that's also the cause of your "StopIteration" issue
S = np.array([])
num = len(data) ** 2 # This is not the same as 2 ** len(data) !!
y = product(data, repeat=2)
for x in count():
while x <= num:
z = y.next()
i, j = z
s = np.dot(i, j)
S = np.insert(S, x, s)
break #for the 'StopIteration' issue
return np.reshape(S, (num, num))

Howto bin series of float values into histogram in Python?

I have set of value in float (always less than 0). Which I want to bin into histogram,
i,e. each bar in histogram contain range of value [0,0.150)
The data I have looks like this:
0.000
0.005
0.124
0.000
0.004
0.000
0.111
0.112
Whith my code below I expect to get result that looks like
[0, 0.005) 5
[0.005, 0.011) 0
...etc..
I tried to do do such binning with this code of mine.
But it doesn't seem to work. What's the right way to do it?
#! /usr/bin/env python
import fileinput, math
log2 = math.log(2)
def getBin(x):
return int(math.log(x+1)/log2)
diffCounts = [0] * 5
for line in fileinput.input():
words = line.split()
diff = float(words[0]) * 1000;
diffCounts[ str(getBin(diff)) ] += 1
maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)
for i in range(maxBin+1):
lo = 2**i - 1
hi = 2**(i+1) - 1
binStr = '[' + str(lo) + ',' + str(hi) + ')'
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
~

When possible, don't reinvent the wheel. NumPy has everything you need:
#!/usr/bin/env python
import numpy as np
a = np.fromfile(open('file', 'r'), sep='\n')
# [ 0. 0.005 0.124 0. 0.004 0. 0.111 0.112]
# You can set arbitrary bin edges:
bins = [0, 0.150]
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [8]
# bin_edges: [ 0. 0.15]
# Or, if bin is an integer, you can set the number of bins:
bins = 4
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [5 0 0 3]
# bin_edges: [ 0. 0.031 0.062 0.093 0.124]

from pylab import *
data = []
inf = open('pulse_data.txt')
for line in inf:
data.append(float(line))
inf.close()
#binning
B = 50
minv = min(data)
maxv = max(data)
bincounts = []
for i in range(B+1):
bincounts.append(0)
for d in data:
b = int((d - minv) / (maxv - minv) * B)
bincounts[b] += 1
# plot histogram
plot(bincounts,'o')
show()

The first error is:
Traceback (most recent call last):
File "C:\foo\foo.py", line 17, in <module>
diffCounts[ str(getBin(diff)) ] += 1
TypeError: list indices must be integers
Why are you converting an int to a str when a str is needed? Fix that, then we get:
Traceback (most recent call last):
File "C:\foo\foo.py", line 17, in <module>
diffCounts[ getBin(diff) ] += 1
IndexError: list index out of range
because you've only made 5 buckets. I don't understand your bucketing scheme, but let's make it 50 buckets and see what happens:
6
Traceback (most recent call last):
File "C:\foo\foo.py", line 21, in <module>
maxBin = max(maxdiff)
TypeError: 'int' object is not iterable
maxdiff is a single value out of your list of ints, so what is max doing here? Remove it, now we get:
6
Traceback (most recent call last):
File "C:\foo\foo.py", line 28, in <module>
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
TypeError: argument 2 to map() must support iteration
Sure enough, you're using a single value as the second argument to map. Let's simplify the last two lines from this:
binStr = '[' + str(lo) + ',' + str(hi) + ')'
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
to this:
print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])
Now it prints:
6
[0.000000, 1.000000) 3
[1.000000, 3.000000) 0
[3.000000, 7.000000) 2
[7.000000, 15.000000) 0
[15.000000, 31.000000) 0
[31.000000, 63.000000) 0
[63.000000, 127.000000) 3
I'm not sure what else to do here, since I don't really understand the bucketing you are hoping to use. It seems to involve binary powers, but isn't making sense to me...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to find the std of an array result in fault - python

Related

Simple 1D Array Iteration Error (TypeError: 'int' object is not iterable)

Indexerror: list index out of range/numpy

Generate 4 columns of data such that each row sum to 100

Python: How does one typically get around "Maximum allowed dimension exceeded" error?

Howto bin series of float values into histogram in Python?

Categories

Resources