Trying to add a column to a data file

Trying to add a column to a data file - python

I have a data file with 2 columns, x ranging from -5 to 4 and f(x). I need to add a third column with |f(x)| the absolute value of f(x). Then I need to export the 3 columns as a new data file.
Currently my code looks like this:
from numpy import *
data = genfromtxt("task1.dat")
c = []
ab = abs(data[:,1])
ablist = ab.tolist()
datalist = data.tolist()
c.append(ablist)
c.append (datalist)
A = asarray (c)
savetxt("task1b.dat", A)
It gives me the following error message for line "A = asarray(c)":
ValueError : setting an array element with a sequence.
Does someone know a quick and efficient way to add this column and export the data file?

You are getting a list within a list in c.
Anyway, I think this is much clearer:
import numpy as np
data = np.genfromtxt("task1.dat")
data_new = np.hstack((data, np.abs(data[:,-1]).reshape((-1,1))))
np.savetxt("task_out.dat", data_new)

c is a list and when you execute
c.append(ablist)
c.append (datalist)
it appends 2 lists of different shapes to the list c. It will probably end up looking like this
c == [ [ [....],[....]], [....]]
which is not possible to be parsed by numpy.asarray due to that shape difference
(I am saying probably because I am assuming there is a 2d matrix in genfromtxt("task1.dat"))
what you can do to concatenate the columns is
from numpy import *
data = genfromtxt("task1.dat")
ab = abs(data[:,1])
c = concatenate((data,ab.reshape(-1,1),axis=1)
savetxt("task1b.dat", c)

data is a 2d array like:
In [54]: data=np.arange(-5,5).reshape(5,2)
In [55]: data
Out[55]:
array([[-5, -4],
[-3, -2],
[-1, 0],
[ 1, 2],
[ 3, 4]])
In [56]: ab=abs(data[:,1])
There are various ways to concatenate 2 arrays. In this case, data is 2d, and ab is 1d, so you have to take some steps to ensure they are both 2d. np.column_stack does that for us.
In [58]: np.column_stack((data,ab))
Out[58]:
array([[-5, -4, 4],
[-3, -2, 2],
[-1, 0, 0],
[ 1, 2, 2],
[ 3, 4, 4]])
With a little change in indexing we could make ab a column array from that start, and simply concatenate on the 2nd axis:
ab=abs(data[:,[1]])
np.concatenate((data,ab),axis=1)
==================
The same numbers with your tolist produce a c like
In [72]: [ab.tolist()]+[data.tolist()]
Out[72]: [[4, 2, 0, 2, 4], [[-5, -4], [-3, -2], [-1, 0], [1, 2], [3, 4]]]
That is not good input for array.
To go the list route you need to do an iteration over a zip:
In [86]: list(zip(data,ab))
Out[86]:
[(array([-5, -4]), 4),
(array([-3, -2]), 2),
(array([-1, 0]), 0),
(array([1, 2]), 2),
(array([3, 4]), 4)]
In [87]: c=[]
In [88]: for i,j in zip(data,ab):
c.append(i.tolist()+[j])
....:
In [89]: c
Out[89]: [[-5, -4, 4], [-3, -2, 2], [-1, 0, 0], [1, 2, 2], [3, 4, 4]]
In [90]: np.array(c)
Out[90]:
array([[-5, -4, 4],
[-3, -2, 2],
[-1, 0, 0],
[ 1, 2, 2],
[ 3, 4, 4]])
Obviously this will be slower than the array concatenate, but studying this might help you understand both arrays and lists.

Related

Numpy.where used with list of values

I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:
import numpy as np
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
results = []
for elem in B:
results.append(np.where(A==elem)[0])
This works and results in the following array:
[array([0, 5], dtype=int64),
array([0, 3], dtype=int64),
array([2, 4], dtype=int64),
array([0, 2], dtype=int64)]
But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:
out1 = np.where(np.in1d(A, B))
num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0
out2 = A[A == num_arr[idx]]
But these give me incorrect values:
In [36]: out1
Out[36]: (array([ 0, 1, 2, 6, 8, 9, 13, 17], dtype=int64),)
In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])
Thanks for your help

If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:
input:
np.isin(A,B).sum(axis=1)>0
output:
array([ True, False, True, True, True, True])

Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.
In [50]: d = np.where(B[:, None] == A.ravel())[1]
In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))
^
# expected result
* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance.
Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.

Is something like this what you are looking for?
import numpy as np
from itertools import combinations
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
for i in combinations(A, 2):
if np.all(np.isin(B, np.hstack(i))):
print(i[0], ' ', i[1])
which prints the following:
[0 3 1] [2 7 3]
[0 3 1] [6 2 7]
note: this solution does NOT require the rows be consecutive. Please let me know if that is required.

Put numpy arrays split with np.split() back together

I have split a numpy array like so:
x = np.random.randn(10,3)
x_split = np.split(x,5)
which splits x equally into five numpy arrays each with shape (2,3) and puts them in a list. What is the best way to combine a subset of these back together (e.g. x_split[:k] and x_split[k+1:]) so that the resulting shape is similar to the original x i.e. (something,3)?
I found that for k > 0 this is possible with you do:
np.vstack((np.vstack(x_split[:k]),np.vstack(x_split[k+1:])))
but this does not work when k = 0 as x_split[:0] = [] so there must be a better and cleaner way. The error message I get when k = 0 is:
ValueError: need at least one array to concatenate

The comment by Paul Panzer is right on target, but since NumPy now gently discourages vstack, here is the concatenate version:
x = np.random.randn(10, 3)
x_split = np.split(x, 5, axis=0)
k = 0
np.concatenate(x_split[:k] + x_split[k+1:], axis=0)
Note the explicit axis argument passed both times (it has to be the same); this makes it easy to adapt the code to work for other axes if needed. E.g.,
x_split = np.split(x, 3, axis=1)
k = 0
np.concatenate(x_split[:k] + x_split[k+1:], axis=1)

np.r_ can turn several slices into a list of indices.
In [20]: np.r_[0:3, 4:5]
Out[20]: array([0, 1, 2, 4])
In [21]: np.vstack([xsp[i] for i in _])
Out[21]:
array([[9, 7, 5],
[6, 4, 3],
[9, 8, 0],
[1, 2, 2],
[3, 3, 0],
[8, 1, 4],
[2, 2, 5],
[4, 4, 5]])
In [22]: np.r_[0:0, 1:5]
Out[22]: array([1, 2, 3, 4])
In [23]: np.vstack([xsp[i] for i in _])
Out[23]:
array([[9, 8, 0],
[1, 2, 2],
[3, 3, 0],
[8, 1, 4],
[3, 2, 0],
[0, 3, 8],
[2, 2, 5],
[4, 4, 5]])
Internally np.r_ has a lot of ifs and loops to handle the slices and their boundaries, but it hides it all from us.
If the xsp (your x_split) was an array, we could do xsp[np.r_[...]], but since it is a list we have to iterate. Well we could also hide that iteration with an operator.itemgetter object.
In [26]: operator.itemgetter(*Out[22])
Out[26]: operator.itemgetter(1, 2, 3, 4)
In [27]: np.vstack(operator.itemgetter(*Out[22])(xsp))

Numpy: Subtract 2 numpy arrays row wise

I have 2 numpy arrays a and b as below:
a = np.random.randint(0,10,(3,2))
Out[124]:
array([[0, 2],
[6, 8],
[0, 4]])
b = np.random.randint(0,10,(2,2))
Out[125]:
array([[5, 9],
[2, 4]])
I want to subtract each row in b from each row in a and the desired output is of shape(3,2,2):
array([[[-5, -7], [-2, -2]],
[[ 1, -1], [ 4, 4]],
[[-5, -5], [-2, 0]]])
I can do this using:
print(np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2))
But I need a fully vectorized solution or a built in numpy function to do this.

Just use np.newaxis (which is just an alias for None) to add a singleton dimension to a, and let broadcasting do the rest:
In [45]: a[:, np.newaxis] - b
Out[45]:
array([[[-5, -7],
[-2, -2]],
[[ 1, -1],
[ 4, 4]],
[[-5, -5],
[-2, 0]]])

I'm not sure what means a fully factorized solution, but may be this will help:
np.append(a, a, axis=1).reshape(3, 2, 2) - b

You can shave a little time off using np.subtract(), and a good bit more using np.concatenate()
import numpy as np
import time
start = time.time()
for i in range(100000):
a = np.random.randint(0,10,(3,2))
b = np.random.randint(0,10,(2,2))
c = np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2)
print time.time() - start
start = time.time()
for i in range(100000):
a = np.random.randint(0,10,(3,2))
b = np.random.randint(0,10,(2,2))
#c = np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2)
c = np.c_[np.subtract(a,b[0]),np.subtract(a,b[1])].reshape(3,2,2)
print time.time() - start
start = time.time()
for i in range(100000):
a = np.random.randint(0,10,(3,2))
b = np.random.randint(0,10,(2,2))
#c = np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2)
c = np.concatenate([np.subtract(a,b[0]),np.subtract(a,b[1])],axis=1).reshape(3,2,2)
print time.time() - start
>>>
3.14023900032
3.00368094444
1.16146492958
reference:
confused about numpy.c_ document and sample code
np.c_ is another way of doing array concatenate

Reading from the doc on broadcasting, it says:
When operating on two arrays, NumPy compares their shapes
element-wise. It starts with the trailing dimensions, and works its
way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
Back to your case, you want result to be of shape (3, 2, 2), following these rules, you have to play around with your dimensions.
Here's now the code to do it:
In [1]: a_ = np.expand_dims(a, axis=0)
In [2]: b_ = np.expand_dims(b, axis=1)
In [3]: c = a_ - b_
In [4]: c
Out[4]:
array([[[-5, -7],
[ 1, -1],
[-5, -5]],
[[-2, -2],
[ 4, 4],
[-2, 0]]])
In [5]: result = c.swapaxes(1, 0)
In [6]: result
Out[6]:
array([[[-5, -7],
[-2, -2]],
[[ 1, -1],
[ 4, 4]],
[[-5, -5],
[-2, 0]]])
In [7]: result.shape
Out[7]: (3, 2, 2)

I have a numpy array, and an array of indexs, how can I access to these positions at the same time

for example, I have the numpy arrays like this
a =
array([[1, 2, 3],
[4, 3, 2]])
and index like this to select the max values
max_idx =
array([[0, 2],
[1, 0]])
how can I access there positions at the same time, to modify them.
like "a[max_idx] = 0" getting the following
array([[1, 2, 0],
[0, 3, 2]])

Simply use subscripted-indexing -
a[max_idx[:,0],max_idx[:,1]] = 0
If you are working with higher dimensional arrays and don't want to type out slices of max_idx for each axis, you can use linear-indexing to assign zeros, like so -
a.ravel()[np.ravel_multi_index(max_idx.T,a.shape)] = 0
Sample run -
In [28]: a
Out[28]:
array([[1, 2, 3],
[4, 3, 2]])
In [29]: max_idx
Out[29]:
array([[0, 2],
[1, 0]])
In [30]: a[max_idx[:,0],max_idx[:,1]] = 0
In [31]: a
Out[31]:
array([[1, 2, 0],
[0, 3, 2]])

Numpy support advanced slicing like this:
a[b[:, 0], b[:, 1]] = 0
Code above would fit your requirement.
If b is more than 2-D. A better way should be like this:
a[np.split(b, 2, axis=1)]
The np.split will split ndarray into columns.

How to get a value from every column in a Numpy matrix

I'd like to get the index of a value for every column in a matrix M. For example:
M = matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In pseudocode, I'd like to do something like this:
for col in M:
idx = numpy.where(M[col]==0) # Only for columns!
and have idx be 0, 4, 0 for each column.
I have tried to use where, but I don't understand the return value, which is a tuple of matrices.

The tuple of matrices is a collection of items suited for indexing. The output will have the shape of the indexing matrices (or arrays), and each item in the output will be selected from the original array using the first array as the index of the first dimension, the second as the index of the second dimension, and so on. In other words, this:
>>> numpy.where(M == 0)
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
>>> row, col = numpy.where(M == 0)
>>> M[row, col]
matrix([[0, 0, 0]])
>>> M[numpy.where(M == 0)] = 1000
>>> M
matrix([[1000, 1, 1000],
[ 4, 2, 4],
[ 3, 4, 1],
[ 1, 3, 2],
[ 2, 1000, 3]])
The sequence may be what's confusing you. It proceeds in flattened order -- so M[0,2] appears second, not third. If you need to reorder them, you could do this:
>>> row[0,col.argsort()]
matrix([[0, 4, 0]])
You also might be better off using arrays instead of matrices. That way you can manipulate the shape of the arrays, which is often useful! Also note ajcr's transpose-based trick, which is probably preferable to using argsort.
Finally, there is also a nonzero method that does the same thing as where in this case. Using the transpose trick now:
>>> (M == 0).T.nonzero()
(matrix([[0, 1, 2]]), matrix([[0, 4, 0]]))

As an alternative to np.where, you could perhaps use np.argwhere to return an array of indexes where the array meets the condition:
>>> np.argwhere(M == 0)
array([[[0, 0]],
[[0, 2]],
[[4, 1]]])
This tells you each the indexes in the format [row, column] where the condition was met.
If you'd prefer the format of this output array to be grouped by column rather than row, (that is, [column, row]), just use the method on the transpose of the array:
>>> np.argwhere(M.T == 0).squeeze()
array([[0, 0],
[1, 4],
[2, 0]])
I also used np.squeeze here to get rid of axis 1, so that we are left with a 2D array. The sequence you want is the second column, i.e. np.argwhere(M.T == 0).squeeze()[:, 1].

The result of where(M == 0) would look something like this
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]])) First matrix tells you the rows where 0s are and second matrix tells you the columns where 0s are.
Out[4]:
matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In [5]: np.where(M == 0)
Out[5]: (matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
In [6]: M[0,0]
Out[6]: 0
In [7]: M[0,2] #0th row 2nd column
Out[7]: 0
In [8]: M[4,1] #4th row 1st column
Out[8]: 0

This isn't anything new on what's been already suggested, but a one-line solution is:
>>> np.where(np.array(M.T)==0)[-1]
array([0, 4, 0])
(I agree that NumPy matrix objects are more trouble than they're worth).

>>> M = np.array([[0, 1, 0],
... [4, 2, 4],
... [3, 4, 1],
... [1, 3, 2],
... [2, 0, 3]])
>>> [np.where(M[:,i]==0)[0][0] for i in range(M.shape[1])]
[0, 4, 0]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to add a column to a data file - python

You are getting a list within a list in c. Anyway, I think this is much clearer: import numpy as np data = np.genfromtxt("task1.dat") data_new = np.hstack((data, np.abs(data[:,-1]).reshape((-1,1)))) np.savetxt("task_out.dat", data_new)

Related

Numpy.where used with list of values

Put numpy arrays split with np.split() back together

Numpy: Subtract 2 numpy arrays row wise

I have a numpy array, and an array of indexs, how can I access to these positions at the same time

How to get a value from every column in a Numpy matrix

Categories

Resources