Write many columns (more than 20) - python

I want to know an efficient and simple way to write many columns to a single file with Python.
For example, I have a and b arrays with a size of 20 for N rows.
each row has a different a and b.
I would like to write a file with a format like this:
Names of each column
0 a[0] b[0] a[1] b[1] ... a[19] b[19]
1 a[0] b[0] a[1] b[1] ....a[19] b[19]
I can only think this way:
data = open(output_filename,'w')
for i in range(0, N):
data.write('{} {} {} ...\n'.format(i, a[0], b[0], ....))

import numpy
#Assuming a, b are numpy arrays, else convert them accordingly:
# a= np.array(a)
# b= np.array(b)
c = np.zeros((100,40))
for i in range(20):
c[:, 2*i] = a[:,i]
c[:, 2*i+1] = b[:,i]
np.savetxt("test.txt",c)
This is the simplest way I could think of it.

If I understand correctly, you want to interleave two lists. You can do that with zip and some post-processing of it:
>>> a = ['a', 'b', 'c']
>>> b = ['d', 'e', 'f']
>>> print(list(zip(a, b)))
[('a', 'd'), ('b', 'e'), ('c', 'f')]
>>> from itertools import chain
>>> print(list(chain.from_iterable(zip(a, b))))
['a', 'd', 'b', 'e', 'c', 'f']
>>> print(' '.join(chain.from_iterable(zip(a, b))))
a d b e c f
You probably want to apply that something like this:
data.write('{} {}\n'.format(i, ' '.join(chain.from_iterable(zip(a, b)))))

Related

python: sort array when sorting other array

I have two arrays:
a = np.array([1,3,4,2,6])
b = np.array(['c', 'd', 'e', 'f', 'g'])
These two array are linked (in the sense that there is a 1-1 correspondence between the elements of the two arrays), so when i sort a by decreasing order I would like to sort b in the same order.
For instance, when I do:
a = np.sort(a)[::-1]
I get:
a = [6, 4, 3, 2, 1]
and I would like to be able to get also:
b = ['g', 'e', 'd', 'f', 'c']
i would do smth like this:
import numpy as np
a = np.array([1,3,4,2,6])
b = np.array(['c', 'd', 'e', 'f', 'g'])
idx_order = np.argsort(a)[::-1]
a = a[idx_order]
b = b[idx_order]
output:
a = [6 4 3 2 1]
b = ['g' 'e' 'd' 'f' 'c']
I don't know how or even if you can do this in numpy arrays. However there is a way using standard lists albeit slightly convoluted. Consider this:-
a = [1, 3, 4, 2, 6]
b = ['c', 'd', 'e', 'f', 'g']
assert len(a) == len(b)
c = []
for i in range(len(a)):
c.append((a[i], b[i]))
r = sorted(c)
for i in range(len(r)):
a[i], b[i] = r[i]
print(a)
print(b)
In your problem statement, there is no relationship between the two tables. What happens here is that we make a relationship by grouping relevant data from each table into a temporary list of tuples. In this scenario, sorted() will carry out an ascending sort on the first element of each tuple. We then just rebuild our original arrays

functools reduce In-Place modifies original dataframe

I currently facing the issue that "functools.reduce(operator.iadd,...)" alters the original input. E.g.
I have a simple dataframe
df = pd.DataFrame([[['A', 'B']], [['C', 'D']]])
0
0 [A, B]
1 [C, D]
Applying the iadd operator leads to following result:
functools.reduce(operator.iadd, df[0])
['A', 'B', 'C', 'D']
Now, the original df changed to
0
0 [A, B, C, D]
1 [C, D]
Also copying the df using df.copy(deep=True) beforehand does not help.
Has anyone an idea to overcome this issue?
THX, Lazloo
Use operator.add instead of operator.iadd:
In [8]: functools.reduce(operator.add, df[0])
Out[8]: ['A', 'B', 'C', 'D']
In [9]: df
Out[9]:
0
0 [A, B]
1 [C, D]
After all, operator.iadd(a, b) is the same as a += b. So it modifies df[0]. In contrast, operator.add(a, b) returns a + b, so there is no modification of df[0].
Or, you could compute the same quantity using df[0].sum():
In [39]: df[0].sum()
Out[39]: ['A', 'B', 'C', 'D']
The docs for df.copy warns:
When deep=True, data is copied but actual Python objects
will not be copied recursively, only the reference to the object.
Since df[0] contains Python lists, the lists are not copied even with df.copy(deep=True). This is why modifying the copy still affects df.
In addition to #unutbu's good answer, you can also use the int.__add__ method:
df = pd.DataFrame([[['A', 'B']], [['C', 'D']]])
functools.reduce(lambda x,y: (x).__add__(y), df[0])
print(df)
And You can see that it is:
0
0 [A, B]
1 [C, D]
For output!!!

Python: finding index of all elements of array in another including repeating arrays

I have an array A of size 100 which might have repeating elements in it. I have another array B of size 10 which have unique elements in it. All elements of B are present in A and vice versa. I have another array C corresponding to B where each element of C is corresponding to the element in B.
I want to create an array A2 composed of elements of C, such that I can achieve the following:
import numpy as np
A = np.array([1,1,4,5,5,6])
B = np.array([4,6,5,1)])
C = np.array(['A','B','C','D')])
I want to create A2 such that:
A2 = np.array(['D','D','A','C','C','B'])
A2 has elements from C based on matching index of elements of B in A.
No need for numpy. Just zip the B and C arrays to a dict and map the values of A:
>>> btoc = dict(zip(B, C))
>>> A2 = np.array(map(btoc.get, A))
>>> A2
array(['D', 'D', 'A', 'C', 'C', 'B'], dtype='|S1')
Here's a NumPythonic approach using np.searchsorted -
sidx = B.argsort()
out = C[sidx[np.searchsorted(B,A,sorter = sidx)]]
Sample run -
In [17]: A = np.array([1,1,4,5,5,6])
...: B = np.array([4,6,5,1])
...: C = np.array(['A','B','C','D'])
...:
In [18]: sidx = B.argsort()
In [19]: C[sidx[np.searchsorted(B,A,sorter = sidx)]]
Out[19]:
array(['D', 'D', 'A', 'C', 'C', 'B'],
dtype='|S1')
The numpy_indexed package (disclaimer: I am its author) contains functionality to do this in a single call; npi.indices, which is a vectorized equivalent of list.index.
import numpy as np
A = np.array([1,1,4,5,5,6])
B = np.array([4,6,5,1])
C = np.array(['A','B','C','D'])
import numpy_indexed as npi
i = npi.indices(B, A)
print(C[i])
Performance should be similar to the solution of Divakar, since it operates along the same lines; but all wrapped up in a convenient package with tests and all.

Loop print of 2 columns in to 1

Suppose I have an array of 2 columns. It looks like this
column1 = [1,2,3,...,830]
column2 = [a,b,c,...]
I want to have output in a print form of single columns, that includes value of both columns one by one. output form: column = [1,a,2,b ....]
I tried to do by this code,
dat0 = np.genfromtxt("\", delimiter = ',')
mu = dat0[:,0]
A = dat0[:,1]
print(mu,A)
R = np.arange(0,829,1)
l = len(mu)
K = np.zeros((l, 1))
txtfile = open("output_all.txt",'w')
for x in mu:
i = 0
K[i,0] = x
dat0[i,1] = M
txtfile.write(str(x))
txtfile.write('\n')
txtfile.write(str(M))
txtfile.write('\n')
print K
I do not understand your code completely, is the reference to numpy really relevant for your question? What is M?
If you have two lists of the same lengths you can get pairs of elements using the zip builtin.
A = [1, 2, 3]
B = ['a', 'b', 'c']
for a, b in zip(A, B):
print(a)
print(b)
This will print
1
a
2
b
3
c
I'm sure there is a better way to do this, but one method is
>>> a = numpy.array([[1,2,3], ['a','b','c'],['d','e','f']])
>>> new_a = []
>>> for column in range(0,a.shape[1]): # a.shape[1] is the number of columns in a
... for row in range(0,a.shape[1]): # a.shape[0] is the number of rows in a
... new_a.append(a[row][column])
...
>>> numpy.array(new_a)
array(['1', 'a', 'd', '2', 'b', 'e', '3', 'c', 'f'],
dtype='|S1')

How to iterate over two lists?

I am trying to do something in pyGTk where I build a list of HBoxes:
self.keyvalueboxes = []
for keyval in range(1,self.keyvaluelen):
self.keyvalueboxes.append(gtk.HBox(False, 5))
But I then want to run over the list and assign A text entry & a label into each one both of which are stored in a list.
If your list are of equal length use zip
>>> x = ['a', 'b', 'c', 'd']
>>> y = [1, 2, 3, 4]
>>> z = zip(x,y)
>>> z
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> for l in z: print l[0], l[1]
...
a 1
b 2
c 3
d 4
>>>
Check out http://docs.python.org/library/functions.html#zip. It lets you iterate over two lists at the same time.

Categories