Reading non-uniform data from file into array with NumPy - python

Suppose I have a text file that looks like this:
33 3
46 12
23 10 23 11 23 12 23 13 23 14 23 15 23 16 24 10 24 11 24 12 24 13 24 14 24 15 24 16 25 14 25 15 25 16 26 16 27 16 28 16 29 16
33 17 33 18 33 19 34 17 34 18 34 19 35 17 35 18 35 19 36 19
41 32 41 33 42 32 42 33
I would like to read each line into a separate array of integers, as in (pseudo code):
for line in textfile:
currentArray = firstLine
do stuff with currentArray
where in the first iteration, currentArray would be
array([33, 3])
and in the second iteration, currentArray would be
array([46, 12])
until the last iteration, when currentArray would be
array([41, 32, 41, 33, 42, 32, 42, 33])
Basically, I would like to have the functionality of the numpy function loadtxt:
currentArray = loadtxt('scienceVertices.txt', usecols=() )
Except instead of usecols, being able to specify the row, e.g.,
currentArray = loadtxt('scienceVertices.txt', userows=(line) )

Here's a one-liner:
arrays = [np.array(map(int, line.split())) for line in open('scienceVertices.txt')]
arrays is a list of numpy arrays.

for line in textfile:
a = np.array([int(v) for v in line.strip().split(" ")])
# Work on your array

You can also use numpy.fromstring()
for line in f:
a = numpy.fromstring(line.strip(), dtype=int, sep=" ")
or -- if you want full flexibility -- even numpy.loadtxt():
for line in f:
a = numpy.loadtxt(StringIO.StringIO(line), dtype=int)
For long lines, these solution will perform better than the Python code in the other answers.

f = open("file", "r")
array = []
line = f.readline()
index = 0
while line:
line = line.strip("\n")
line = line.split()
array.append([])
for item in line:
array[index].append(int(item))
line = f.readline()
index += 1
f.close()
print array

Related

how to sequentially assign two numbers in an array?

I try to assign two numbers diagonally to each other in the matrix according to certain procedures.
At first the first 1st number in the penultimate line of the line with the 2nd number in the last line, then the first number in the line up with the 2nd number in the penultimate line, etc..This sequence is shown in the example below. The matrix does not always have to be the same size.
Example
a=np.array([[11,12,13],
[21,22,23],
[31,32,33]])
required output:
21 32
11 22
11 33
22 33
12 23
or
a=np.array([[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44]])
required output:
31 42
21 32
21 43
32 43
11 22
11 33
11 44
22 33
22 44
12 23
12 34
23 34
13 24
It is possible?
Here's an iterative solution, assuming a square matrix. Modifying this for non-square matrices shouldn't be hard.
import numpy as np
a=np.array([[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44]])
w,h = a.shape
for y0 in range(1,h):
y = h-y0-1
for x in range(h-y-1):
print( a[y+x,x], a[y+x+1,x+1] )
for x in range(1,w-1):
for y in range(w-x-1):
print( a[y,x+y], a[y+1,x+y+1] )

Reading txt file with number and suming them python

I have txt file witht the following txt in it:
2
4 8 15 16 23 42
1 3 5
6
66 77
77
888
888 77
34
23 234 234
1
32
3
23 23 23
365
22 12
I need a way to read the file and sum all the numbers.
i have this code for now but not sure what to do next. Thx in advance
`lstComplete = []
fichNbr = open("nombres.txt", "r")
lstComplete = fichNbr
somme = 0
for i in lstComplete:
i = i.split()`
Turn them into a list and sum them:
with open('nombres.txt', 'r') as f:
num_list = f.read().split()
print sum([int(n) for n in num_list])
Returns 3227
Open the file and use read() method to get the content and then convert string to int, use sum() to get the result:
>>> sum(map(int,open('nombres.txt').read().split()))
3227

Read in file as arrays then rearrange columns

I would like to read in a file with multiple columns and write out a new file with columns in a different order than the original file. One of the columns has some extra text that I want eliminated in the new file as well.
For instance, if I read in file: data.txt
1 6 omi=11 16 21 26
2 7 omi=12 17 22 27
3 8 omi=13 18 23 28
4 9 omi=14 19 24 29
5 10 omi=15 20 25 30
I would like the written file to be: dataNEW.txt
26 1 11 16
27 2 12 17
28 3 13 18
29 4 14 19
30 5 15 20
With the help of inspectorG4dget, I came up with this:
import csv as csv
import sys as sys
infile = open('Rearrange Column Test.txt')
sys.stdout = open('Rearrange Column TestNEW.txt' , 'w')
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [5, 0, 2, 3]]
newline[2] = newline[2].split('=')[1]
print newline[0], newline[1], newline[2], newline[3]
sys.stdout.close()
Is there a more concise way to get an output without any commas than listing each line index from 0 to the total number of lines?
import csv
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
writer = csv.writer(outfile)
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [-1, 0, 2 3]]
newline[2] = newline[2].split('=')[1]
writer.writerow(newline)

Python: How to write values to a csv file from another csv file

For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
import collections
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
out = np.zeros((len(data2),len(data1)))
for row in data2:
for ch_row in range(len(data1)):
if (row[3] == ch_row + 1):
out = row.tolist() + data1[ch_row].tolist()
print(out)
writer = csv.writer(open('dn.csv','w'), delimiter=',',quoting=csv.QUOTE_ALL)
writer.writerow(out)
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
If I do "print(out)", it comes out a correct answer. However, when I input "out" in the shell, there are only one row appears like [1.0, 1.0, 1.0, 1.0, 20.0, 30.0, 50.0]
What I need is to store all the values in the "out" variables and write them to the dn.csv file.
This ought to do the trick for you:
Code:
from csv import reader, writer
data = list(reader(open("filename.csv", "r"), delimiter=" "))
out = writer(open("output.csv", "w"), delimiter=" ")
for row in reader(open("index.csv", "r"), delimiter=" "):
out.writerow(row + data[int(row[3])])
index.csv:
0 0 0 1
0 0 0 2
0 0 0 3
filename.csv:
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
This produces the output:
0 0 0 1 70 60 45
0 0 0 2 35 26 77
0 0 0 3 93 37 68
Note: There's no need to use numpy here. The stadard library csv module will do most of the work for you.
I also had to modify your sample datasets a bit as what you showed had indexes out of bounds of the sample data in filename.csv.
Please also note that Python (like most languages) uses 0th indexes. So you may have to fiddle with the above code to exactly fit your needs.
with open('dn.csv','w') as f:
writer = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
idx = row[3]
out = [idx] + [x for x in data1[idx-1]]
writer.writerow(out)

Printing a rather specific matrix

I have a list consisting of 148 entries. Each entry is a four digit number. I would like to print out the result as this:
1 14 27 40
2 15 28 41
3 16 29 42
4 17 30 43
5 18 31 44
6 19 32 45
7 20 33 46
8 21 34 47
9 22 35 48
10 23 36 49
11 24 37 50
12 25 38 51
13 26 39 52
53
54
55... and so on
I have some code that work for the first 13 rows and 4 columns:
kort_identifier = [my_list_with_the_entries]
print_val = 0
print_num_1 = 0
print_num_2 = 13
print_num_3 = 26
print_num_4 = 39
while (print_val <= 36):
print kort_identifier[print_num_1], '%10s' % kort_identifier[print_num_2], '%10s' % kort_identifier[print_num_3], '%10s' % kort_identifier[print_num_4]
print_val += 1
print_num_1 += 1
print_num_2 += 1
print_num_3 += 1
print_num_4 += 1
I feel this is an awful solution and there has to be a better and simpler way of doing this. I have searched through here (searched for printing tables and matrices) and tried those solution but none seems to work with this odd table/matrix behaviour that I need.
Please point me in the right direction.
A bit tricky, but here you go. I opted to manipulate the list until it had the right shape, instead of messing around with indexes.
lst = range(1, 149)
lst = [lst[i:i+13] for i in xrange(0, len(lst), 13)]
lst = zip(*[lst[i] + lst[i+4] + lst[i+8] for i in xrange(4)])
for row in lst:
for col in row:
print col,
print
It might be overkill, but you could just make a numpy array.
import numpy as np
x = np.array(kort_identifier).reshape(2, 13, 4)
for subarray in x:
for row in subarray:
print row

Categories