I have a numpy array like this:
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
I want to replace all the zeros with the median value of the whole array (where the zero values are not to be included in the calculation of the median)
So far I have this going on:
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
foo = np.array(foo_array)
foo = np.sort(foo)
print "foo sorted:",foo
#foo sorted: [ 0 0 0 0 0 3 5 8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
nonzero_values = foo[0::] > 0
nz_values = foo[nonzero_values]
print "nonzero_values?:",nz_values
#nonzero_values?: [ 3 5 8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
size = np.size(nz_values)
middle = size / 2
print "median is:",nz_values[middle]
#median is: 26
Is there a clever way to achieve this with numpy syntax?
Thank you
This solution takes advantage of numpy.median:
import numpy as np
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
foo = np.array(foo_array)
# Compute the median of the non-zero elements
m = np.median(foo[foo > 0])
# Assign the median to the zero elements
foo[foo == 0] = m
Just a note of caution, the median for your array (with no zeroes) is 23.5 but as written this sticks in 23.
foo2 = foo[:]
foo2[foo2 == 0] = nz_values[middle]
Instead of foo2, you could just update foo if you want. Numpy's smart array syntax can combine a few lines of the code you made. For example, instead of,
nonzero_values = foo[0::] > 0
nz_values = foo[nonzero_values]
You can just do
nz_values = foo[foo > 0]
You can find out more about "fancy indexing" in the documentation.
Related
I am writing a code the represent the Ulam Spiral Diagonal Numbers and this is the code I typed myself
t = 1
i = 2
H = [1]
while i < 25691 :
for n in range(4):
t += i
H.append(t)
i += 2
print(H)
The number "25691" in the code is the side lenght of the spiral.If it was 7 then the spiral would contain 49 numbers etc.
Here H will give you the all numbers in diagonal. But I wonder is there a much faster way to do this.
For example if I increase the side lenght large amount it really takes forever to calculate the next H.
Code Example:
t = 1
i = 2
H = [1]
for j in range(25000,26000):
while i < j :
for n in range(4):
t += i
H.append(t)
i += 2
For example my computer cannot calculate it so, is there a faster way to do this ?
You dont need to calculate the intermediate values:
Diagonal, horizontal, and vertical lines in the number spiral correspond to polynomials of the form
where b and c are integer constants.
wikipedia
You can find b and c by solving a linear system of equations for two numbers.
17 16 15 14 13
18 5 4 3 12 ..
19 6 1 2 11 28
20 7 8 9 10 27
21 22 23 24 25 26
Eg for the line 1,2,11,28 etc:
f(0) = 4*0*0+0*b+c = 1 => c = 1
f(1) = 4*1*1+1*b+1 = 2 => 5+b = 2 => b = -3
f(2) = 4*2*2+2*(-3)+1 = 11
f(3) = 4*3*3+3*(-3)+1 = 28
I have a numpy array with the following integer numbers:
[10 30 16 18 24 18 30 30 21 7 15 14 24 27 14 16 30 12 18]
I want to normalize them to a range between 1 and 10.
I know that the general formula to normalize arrays is:
But how am I supposed to scale them between 1 and 10?
Question: What is the simplest/fastest way to normalize this array to values between 1 and 10?
Your range is actually 9 long: from 1 to 10. If you multiply the normalized array by 9 you get values from 0 to 9, which you need to shift back by 1:
start = 1
end = 10
width = end - start
res = (arr - arr.min())/(arr.max() - arr.min()) * width + start
Note that the denominator here has a numpy built-in named arr.ptp():
res = (arr - arr.min())/arr.ptp() * width + start
I'm very new to python and just coding in general, I'm trying to complete this assignment and there's just one more thing I need help with to finish it.
The task is to generate a square matrix with user input dimensions, and from that matrix create a new one, by removing rows and columns on intersections of which there is an element which is an absolute maximum of the elements of the matrix.
Here's what I got so far:
import numpy as np
print ("input number of rows in rmatr:")
n = int(input())
print ("input number of columns rmatr:")
m = int(input())
def form_rmatr():
rmatr = np.ndarray(shape=(n,m),dtype=int)
for i in range(n):
for j in range(m):
rmatr[i,j] = np.random.randint(-50,50)
return rmatr
a = form_rmatr()
print (a)
b=np.abs(np.max(a))
print ("absolute max value of rmatr = ", b)
max = (0,0)
for i in range(n):
for j in range(m):
if np.abs(a[i,j]) == np.abs(b):
max = i, j
new_a = np.delete(np.delete(a,[i],0),[j],1)
print(new_a)
Now, it does work, but it removes only one intersection, the first one it finds an absolute max value. I need it to remove all intersections. I tried making a while statement instead of if, but obviously, the loop just goes forever since it's searching for absolute max values in the original a matrix. The solution I need is probably to input conditions inside the np.delete function. Something along the lines np.delete(np.where...) , but I have no idea how to actually write it down.
Edit: an example of what it does would be
input number of rows in rmatr rmatr:
8
input number of columns rmatr:
8
[[ 29 -24 -42 14 12 18 -23 44]
[-50 9 -41 -3 -14 30 11 -33]
[ 14 -22 -43 -12 35 42 3 48]
[-26 34 23 -9 47 -5 -33 6]
[-33 29 0 -32 -26 24 -31 1]
[ 15 -31 -40 1 47 30 33 -41]
[ 48 -41 9 44 -4 0 17 -3]
[-32 -23 31 5 -35 3 8 -31]]
absolute max value of rmatr = 48
[[-24 -42 14 12 18 -23 44]
[ 9 -41 -3 -14 30 11 -33]
[-22 -43 -12 35 42 3 48]
[ 34 23 -9 47 -5 -33 6]
[ 29 0 -32 -26 24 -31 1]
[-31 -40 1 47 30 33 -41]
[-23 31 5 -35 3 8 -31]]
It deletes a row and column at intersections of which the number 48 is.
What I need is for it to delete all intersections of rows and columns where a number 48 or -48 are. So seeing as there is one more intersection like that, I need it to look like:
[[-24 -42 14 12 18 -23 ]
[ 9 -41 -3 -14 30 11 ]
[ 34 23 -9 47 -5 -33 ]
[ 29 0 -32 -26 24 -31 ]
[-31 -40 1 47 30 33 ]
[-23 31 5 -35 3 8 ]]
NumPy is designed to allow you to vectorize your computations. This generally means you should rarely if ever need native Python for-loops. Here is a short definition of vectorization.
You can do this in 5 lines of code:
a = np.random.randint(-50, 50, size=(n, m), dtype=int)
ne = np.abs(a) != np.abs(a).max()
cols = np.nonzero(ne.all(axis=0))[0]
rows = np.nonzero(ne.all(axis=1))[0]
new_a = a[rows[:, None], cols]
print(a)
[[ -2 20 10 10 -25]
[-15 -24 22 -43 -37]
[-48 29 23 -16 23]
[-26 -25 1 -48 -32]
[ 22 15 -24 -24 -40]]
print(new_a)
[[ 20 10 -25]
[-24 22 -37]
[ 15 -24 -40]]
Here's a walkthrough of the above:
Instead of creating a with a nested for-loop, we can specify its size (really, its shape) directly as a tuple. It is an (n x m) array.
ne is an array that is the same shape as a. It is False only in places that meet your condition that a given cell's maximum value is equal to the maximum absolute value of the entire array. (If I'm not interpreting that right, you should be able to revise easily.)
[[ True True True True True]
[ True True True True True]
[False True True True True]
[ True True True False True]
[ True True True True True]]
Now we need the indexes of rows and columns, respectively, that contain all Trues.
print(rows)
[0 1 4]
print(cols)
[1 2 4]
Finally, you can use a bit of advanced indexing to slice a on both of these 1-dimensional arrays at once.
Hope this works-
import numpy as np
print ("input number of rowsm/columns in rmatr:")
n = int(input())
m = n
def form_rmatr():
rmatr = np.ndarray(shape=(n,m),dtype=int)
for i in range(n):
for j in range(m):
rmatr[i,j] = np.random.randint(-50,50)
return rmatr
a = form_rmatr()
print (a)
b=np.abs(a).max()
print(b)
print ("absolute max value of rmatr = ", b)
max_rows=[]
max_cols=[]
for i in range(0,n):
for j in range(0,m):
if abs(a[i][j])==b:
max_rows.append(i)
max_cols.append(j)
a=np.delete(a,max_rows, axis=0)
a=np.delete(a,max_cols, axis=1)
print(a)
I'm very new to python, but have been using it to calculate and filter through data. I'm trying to output my array so I can pass it to other programs, but the output is one solid piece of text, with brackets and commas separating it.
I understand there are ways of manipulating this, but I want to understand why my code has output it in this format, and how to make it output it in nice columns instead.
The array was generated with:
! /usr/bin/env python
import numpy as np
import networkx
import gridData
from scipy.spatial.distance import euclidean
INPUT1=open("test_area.xvg",'r')
INPUT2=open("test_atom.xvg",'r')
OUTPUT1= open("negdist.txt",'w')
area = []
pointneg = []
posneg = []
negdistance =[ ]
negresarea = []
while True:
line = INPUT1.readline()
if not line:
break
col = line.split()
if col:
area.append(((col[0]),float(col[1])))
pointneg.append((-65.097000,5.079000,-9.843000))
while True:
line = INPUT2.readline()
if not line:
break
col = line.split()
if col:
pointneg.append((float(col[5]),float(col[6]),float(col[7])))
posneg.append((col[4]))
for col in posneg:
negresarea.append(area[int(col)-1][1])
a=len(pointneg)
for x in xrange(a-1):
negdistance.append((-1,(negresarea[x]),euclidean((pointneg[0]),(pointneg[x]))))
print >> OUTPUT1, negdistance
example output:
[(-1, 1.22333, 0.0), (-1, 1.24223, 153.4651968428021), (-1, 1.48462, 148.59335545709976), (-1, 1.39778, 86.143305392816202), (-1, 0.932278, 47.914688322058403), (-1, 1.04997, 28.622555546282022),
desired output:
[-1, 1.22333, 0.0
-1, 1.24223, 153.4651968428021
-1, 1.48462, 148.59335545709976
-1, 1.39778, 86.143305392816202
-1, 0.932278, 47.914688322058403
-1, 1.04997, 28.622555546282022...
Example inputs:
example input1
1 2.12371 0
2 1.05275 0
3 0.865794 0
4 0.933986 0
5 1.09092 0
6 1.22333 0
7 1.54639 0
8 1.24223 0
9 1.10928 0
10 1.16232 0
11 0.60942 0
12 1.40117 0
13 1.58521 0
14 1.00011 0
15 1.18881 0
16 1.68442 0
17 0.866275 0
18 1.79196 0
19 1.4375 0
20 1.198 0
21 1.01645 0
22 1.82221 0
23 1.99409 0
24 1.0728 0
25 0.679654 0
26 1.15578 0
27 1.28326 0
28 1.00451 0
29 1.48462 0
30 1.33399 0
31 1.13697 0
32 1.27483 0
33 1.18738 0
34 1.08141 0
35 1.15163 0
36 0.93699 0
37 0.940171 0
38 1.92887 0
39 1.35721 0
40 1.85447 0
41 1.39778 0
42 1.97309 0
Example Input2
ATOM 35 CA GLU 6 56.838 -5.202 -102.459 1.00273.53 C
ATOM 55 CA GLU 8 54.729 -6.650 -96.930 1.00262.73 C
ATOM 225 CA GLU 29 5.407 -2.199 -58.801 1.00238.62 C
ATOM 321 CA GLU 41 -24.633 -0.327 -34.928 1.00321.69 C
The problem is the multiple parenthesis when you append. You are appending tuples.
what you want is to be adding lists - i.e. the ones with square brackets.
import numpy as np
area = []
with open('example2.txt') as filehandle:
for line in filehandle:
if line.strip() == '':continue
line = line.strip().split(',')
area.append([int(line[0]),float(line[1]),float(line[2])])
area = np.array(area)
print(area)
'example2.txt' is the data you provided made into a csv
I didn't really get an answer that enabled me to understand the problem, the one suggested above just prevented to whole code working properly. I did find a work around by including the print command in the loop defining my final output.
for x in xrange(a-1):
negdistance.append((-1,(negresarea[x]),euclidean((pointneg[0]),(pointneg[x]))))
print negdistance
negdistance =[]
I have data of the form in a text file.
Text file entry
#x y z
1 1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64 512
9 81 729
10 100 1000
11 121
12 144 1728
13 169
14 196
15 225
16 256 4096
17 289
18 324
19 361 6859
20 400
21 441 9261
22 484
23 529 12167
24 576
25 625
Some of the entries in the third column are empty. I am trying to create an array of x (column 1) and z (column 3) ignoring nan. Let the array be B. The contents of B should be:
1 1
8 512
9 729
10 1000
12 1728
16 4096
19 6859
21 9261
23 12167
I tried doing this using the code:
import numpy as np
A = np.genfromtxt('data.dat', comments='#', delimiter='\t')
B = []
for i in range(len(A)):
if ~ np.isnan(A[i, 2]):
B = np.append(B, np.column_stack((A[i, 0], A[i, 2])))
print B.shape
This does not work. It creates a column vector. How can this be done in Python?
Using pandas would make your life quite easier (note the regular expression to define delimiter):
from pandas import read_csv
data = read_csv('data.dat', delimiter='\s+').values
print(data[~np.isnan(data[:, 2])][:, [0, 2]])
Which results in:
array([[ 8.00000000e+00, 5.12000000e+02],
[ 9.00000000e+00, 7.29000000e+02],
[ 1.00000000e+01, 1.00000000e+03],
[ 1.20000000e+01, 1.72800000e+03],
[ 1.60000000e+01, 4.09600000e+03],
[ 1.90000000e+01, 6.85900000e+03],
[ 2.10000000e+01, 9.26100000e+03],
[ 2.30000000e+01, 1.21670000e+04]])
If you read your data.dat file and assign the content to a variable, say data:
You can iterate over the lines and split them and process only the ones that have 3 elements:
B=[]
for line in data.split('\n'):
if len(line.split()) == 3:
x,y,z = line.split()
B.append((x,z)) # or B.append(str(x)+'\t'+str(z)+'\n')
# or any othr format you need
Not always the functions provided by the libraries are easy to use, as you found out. The following program does it manually, and creates an array with the values from the datafile.
import numpy as np
def main():
B = np.empty([0, 2], dtype = int)
with open("data.dat") as inf:
for line in inf:
if line[0] == "#": continue
l = line.split()
if len(l) == 3:
l = [int(d) for d in l[1:]]
B = np.vstack((B, l))
print B.shape
print B
return 0
if __name__ == '__main__':
main()
Note that:
1) The append() function works on lists, not on arrays - at least not in the syntax you used. The easiest way to extend arrays is 'piling' rows, using vstack (or hstack for columns)
2) Specifying a delimiter in genfromtxt() can come to bite you. By default the delimiter is any white space, which is normally what you want.
From your input dataframe:
In [33]: df.head()
Out[33]:
x y z
0 1 1 1
1 2 4 NaN
2 3 9 NaN
3 4 16 NaN
4 5 25 NaN
.. you can get to the output dataframe B by doing this :
In [34]: df.dropna().head().drop('y', axis=1)
Out[34]:
x z
0 1 1
7 8 512
8 9 729
9 10 1000
11 12 1728