I have no: of arrays, and i like to take it to text file in specific format, for eg.,
'present form'
a= [1 2 3 4 5 ]
b= [ 1 2 3 4 5 6 7 8 ]
c= [ 8 9 10 12 23 43 45 56 76 78]
d= [ 1 2 3 4 5 6 7 8 45 56 76 78 12 23 43 ]
The 'required format' in a txt file,
a '\t' b '\t' d '\t' c
1 '\t' 1
2 '\t' 2
3 '\t' 3
4 '\t' 4
5 '\t' 5
6
7
8
'\t'- 1 tab space
problem is,
I have the array in linear form[a],[b],[c],and d, i have to transpose('required format') and sort [a],[b],[d],and [c] and write it as a txt file
from __future__ import with_statement
import csv
import itertools
a= [1, 2, 3, 4, 5]
b= [1, 2, 3, 4, 5, 6, 7, 8]
c= [8, 9, 10, 12, 23, 43, 45, 56, 76, 78]
d= [1, 2, 3, 4, 5, 6, 7, 8, 45, 56, 76, 78, 12, 23, 43]
with open('destination.txt', 'w') as f:
cf = csv.writer(f, delimiter='\t')
cf.writerow(['a', 'b', 'd', 'c']) # header
cf.writerows(itertools.izip_longest(a, b, d, c))
Results on destination.txt (<tab>s are in fact real tabs on the file):
a<tab>b<tab>d<tab>c
1<tab>1<tab>1<tab>8
2<tab>2<tab>2<tab>9
3<tab>3<tab>3<tab>10
4<tab>4<tab>4<tab>12
5<tab>5<tab>5<tab>23
<tab>6<tab>6<tab>43
<tab>7<tab>7<tab>45
<tab>8<tab>8<tab>56
<tab><tab>45<tab>76
<tab><tab>56<tab>78
<tab><tab>76<tab>
<tab><tab>78<tab>
<tab><tab>12<tab>
<tab><tab>23<tab>
<tab><tab>43<tab>
Here's the izip_longest function, if you have python < 2.6:
def izip_longest(*iterables, fillvalue=None):
def sentinel(counter=([fillvalue]*(len(iterables)-1)).pop):
yield counter()
fillers = itertools.repeat(fillvalue)
iters = [itertools.chain(it, sentinel(), fillers)
for it in iterables]
try:
for tup in itertools.izip(*iters):
yield tup
except IndexError:
pass
Have a look at matplotlib.mlab.rec2csv and csv2rec:
>>> from matplotlib.mlab import rec2csv,csv2rec
# note: these are also imported automatically when you do ipython -pylab
>>> rec = csv2rec('csv file.csv')
>>> rec2csv(rec, 'copy csv file', delimiter='\t')
Just for fun with no imports:
a= [1, 2, 3, 4, 5]
b= [1, 2, 3, 4, 5, 6, 7, 8]
c= [8, 9, 10, 12, 23, 43, 45, 56, 76, 78]
d= [1, 2, 3, 4, 5, 6, 7, 8, 45, 56, 76, 78, 12, 23, 43]
fh = open("out.txt","w")
# header line
fh.write("a\tb\td\tc\n")
# rest of file
for i in map(lambda *row: [elem or "" for elem in row], *[a,b,d,c]):
fh.write("\t".join(map(str,i))+"\n")
fh.close()
Related
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(seq[i],end ="\t")
How do I get my output table to look like this?
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
one of many ways is this, you make iterate over the seq list by a step of 6 and print the element between those margins
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(0, len(seq), 6):
print(*seq[i:i+6], sep=' ')
output
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
You probably want to make use of string formatting. Below, f"{seq[i]:<4d}" means "A string of length 4, left-aligned, containing the string representation of seq[i]". If you want to right-align, just remove <.
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(f"{seq[i]:<4d}", end = "")
if not (i+1) % 6:
print("")
print("")
Output:
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
The simplest relevant technique is padding
for i in range(0, len(seq), 6):
print(" ".join[str(k).ljust(2, " ") for k in seq[i: i + 6]]
but string formatting as in Printing Lists as Tabular Data will make is a more sophisticated solution
I'm interested that this 2 lines can be solved in 1 line:
data = [ line.strip().split() for line in f ] # f = file
data = [ [ int(num) for num in nums ] for nums in data ]
Example lines of file:
9 3 14 3 10 17
9 8 19 12 5 9
Example result:
[[9, 3, 14, 3, 10, 17], [9, 8, 19, 12, 5, 9]]
Try:
f = open("file.txt", "r")
data = [[int(num) for num in line.split()] for line in f.readlines()]
print(data)
[[9, 3, 14, 3, 10, 17], [9, 8, 19, 12, 5, 9]]
or using numpy can be slightly neater:
import numpy as np
data = np.loadtxt("file.txt", dtype=int).tolist()
I have a matrix (3x5) where a number is randomly selected in this matrix. I want to swap the selected number with the one down-right. I'm able to locate the index of the randomly selected number but not sure how to replace it with the one that is down then right. For example, given the matrix:
[[169 107 229 317 236]
[202 124 114 280 106]
[306 135 396 218 373]]
and the selected number is 280 (which is in position [1,3]), needs to be swapped with 373 on [2,4]. I'm having issues on how to move around with the index. I can hard-code it but it becomes a little more complex when the number to swap is randomly selected.
If the selected number is on [0,0], then hard-coded would look like:
selected_task = tard_generator1[0,0]
right_swap = tard_generator1[1,1]
tard_generator1[1,1] = selected_task
tard_generator1[0,0] = right_swap
Any suggestions are welcome!
How about something like
chosen = (1, 2)
right_down = chosen[0] + 1, chosen[1] + 1
matrix[chosen], matrix[right_down] = matrix[right_down], matrix[chosen]
will output:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> index = (1, 2)
>>> right_down = index[0] + 1, index[1] + 1
>>> a[index], a[right_down] = a[right_down], a[index]
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 13, 8, 9],
[10, 11, 12, 7, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
There should be a boundary check but its omitted
Try this:
import numpy as np
def swap_rdi(mat, index):
row, col = index
rows, cols = mat.shape
assert(row + 1 != rows and col + 1 != cols)
mat[row, col], mat[row+1, col+1] = mat[row+1, col+1], mat[row, col]
return
Example:
mat = np.matrix([[1,2,3], [4,5,6]])
print('Before:\n{}'.format(mat))
print('After:\n{}'.format(swap_rdi(mat, (0,1))))
Outputs:
Before:
[[1 2 3]
[4 5 6]]
After:
[[1 6 3]
[4 5 2]]
I have a 3 row x 96 column dataframe. I'm trying to computer the average of the two rows beneath the index (row1:96) for every 12 data points. here is my dataframe:
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 \
0 1461274.92 1458079.44 1456807.1 1459216.08 1458643.24 1457145.19
1 478167.44 479528.72 480316.08 475569.52 472989.01 476054.89
2 ------ ------ ------ ------ ------ ------
Run 7 Run 8 Run 9 Run 10 ... Run 87 \
0 1458117.08 1455184.82 1455768.69 1454738.07 ... 1441822.45
1 473630.89 476282.93 475530.87 474200.22 ... 468525.2
2 ------ ------ ------ ------ ... ------
Run 88 Run 89 Run 90 Run 91 Run 92 Run 93 \
0 1445339.53 1461050.97 1446849.43 1438870.43 1431275.76 1430781.28
1 460076.8 473263.06 455885.07 475245.64 483875.35 487065.25
2 ------ ------ ------ ------ ------ ------
Run 94 Run 95 Run 96
0 1436007.32 1435238.23 1444300.51
1 474328.87 475789.12 458681.11
2 ------ ------ ------
[3 rows x 96 columns]
Currently I am trying to use df.irow(0) to select all the data in row index 0.
something along the lines of:
selection = np.arange(0,13)
for i in selection:
new_df = pd.DataFrame()
data = df.irow(0)
........
then i get lost
I just don't know how to link this range with the dataframe in order to computer the mean for every 12 data points in each column.
To summarize, I want the average for every 12 runs in each column. So, i should end up with a separate dataframe with 2 * 8 average values (96/12).
any ideas?
thanks.
You can do a groupby on axis=1 (using some dummy data I made up):
>>> h = df.iloc[:2].astype(float)
>>> h.groupby(np.arange(len(h.columns))//12, axis=1).mean()
0 1 2 3 4 5 6 7
0 0.609643 0.452047 0.536786 0.377845 0.544321 0.214615 0.541185 0.544462
1 0.382945 0.596034 0.659157 0.437576 0.490161 0.435382 0.476376 0.423039
First we extract the data and force recognition of a float (the presence of the ------ row means that you've probably got an object dtype, which will make the mean unhappy.)
Then we make an array saying what groups we want to put the different columns in:
>>> np.arange(len(df.columns))//12
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7], dtype=int32)
which we feed as an argument to groupby. .mean() handles the rest.
It's always best to try to use pandas methods when you can, rather than iterating over the rows. The DataFrame's iloc method is useful for extracting any number of rows.
The following example shows you how to do what you want in a two-column DataFrame. The same technique will work independent of the number of columns:
In [14]: df = pd.DataFrame({"x": [1, 2, "-"], "y": [3, 4, "-"]})
In [15]: df
Out[15]:
x y
0 1 3
1 2 4
2 - -
In [16]: df.iloc[2] = df.iloc[0:2].sum()
In [17]: df
Out[17]:
x y
0 1 3
1 2 4
2 3 7
However, in your case you want to sum each group of eight cells in df.iloc[2]`, so you might be better simply taking the result of the summing expression with the statement
ds = df.iloc[0:2].sum()
which with your data will have the form
col1 0
col2 1
col3 2
col4 3
...
col93 92
col94 93
col95 94
col96 95
(These numbers are representative, you will obviously see your column sums). You can then turn this into a 12x8 matrix with
ds.values.reshape(12, 8)
whose value is
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63],
[64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87],
[88, 89, 90, 91, 92, 93, 94, 95]])
but summing this array will give you the sum of all elements, so instead create another DataFrame with
rs = pd.DataFrame(ds.values.reshape(12, 8))
and then sum that:
rs.sum()
giving
0 528
1 540
2 552
3 564
4 576
5 588
6 600
7 612
dtype: int64
You may find in practice that it is easier to simply create two 12x8 matrices in the first place, which you can add together before creating a dataframe which you can then sum. Much depends on how you are reading your data.
I have a list A of the form:
A = ['P', 'Q', 'R', 'S', 'T', 'U']
and an array B of the form:
B = [[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]]
now I would like to create a structured array C of the form:
C = [[ P Q R S T U]
[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]]
so that I can extract columns with column names P, Q, R, etc. I tried the following code but it does not create a structured array and gives the following error.
Code
import numpy as np
A = (['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]])
C = np.vstack((A, B))
print (C)
D = C['P']
Error
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
How to create structured array in Python in this case?
Update
Both are variables, their shape changes during runtime but both list and array will have the same number of columns.
If you want to do it in pure numpy you can do
A = np.array(['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24]])
# define the structured array with the names from A
C = np.zeros(B.shape[0],dtype={'names':A,'formats':['f8','f8','f8','f8','f8','f8']})
# copy the data from B into C
for i,n in enumerate(A):
C[n] = B[:,i]
C['Q']
array([ 2., 8., 14., 20.])
Edit: you can automatize the format list by using instead
C = np.zeros(B.shape[0],dtype={'names':A,'formats':['f8' for x in range(A.shape[0])]})
Furthermore, the names do not appear in C as data but in dtype. In order to get the names from C you can use
C.dtype.names
This is what the pandas library is for:
>>> A = ['P', 'Q', 'R', 'S', 'T', 'U']
>>> B = np.arange(1, 25).reshape(4, 6)
>>> B
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24]])
>>> import pandas as pd
>>> pd.DataFrame(B, columns=A)
P Q R S T U
0 1 2 3 4 5 6
1 7 8 9 10 11 12
2 13 14 15 16 17 18
3 19 20 21 22 23 24
>>> df = pd.DataFrame(B, columns=A)
>>> df['P']
0 1
1 7
2 13
3 19
Name: P, dtype: int64
>>> df['T']
0 5
1 11
2 17
3 23
Name: T, dtype: int64
>>>
http://pandas.pydata.org/pandas-docs/dev/tutorials.html
Your error occurs on:
D = C['P']
Here is a simple approach, using regular Python lists on the title row.
import numpy as np
A = (['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]])
C = np.vstack((A, B))
print (C)
D = C[0:len(C), list(C[0]).index('P')]
print (D)