Read in file as arrays then rearrange columns - python

I would like to read in a file with multiple columns and write out a new file with columns in a different order than the original file. One of the columns has some extra text that I want eliminated in the new file as well.
For instance, if I read in file: data.txt
1 6 omi=11 16 21 26
2 7 omi=12 17 22 27
3 8 omi=13 18 23 28
4 9 omi=14 19 24 29
5 10 omi=15 20 25 30
I would like the written file to be: dataNEW.txt
26 1 11 16
27 2 12 17
28 3 13 18
29 4 14 19
30 5 15 20
With the help of inspectorG4dget, I came up with this:
import csv as csv
import sys as sys
infile = open('Rearrange Column Test.txt')
sys.stdout = open('Rearrange Column TestNEW.txt' , 'w')
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [5, 0, 2, 3]]
newline[2] = newline[2].split('=')[1]
print newline[0], newline[1], newline[2], newline[3]
sys.stdout.close()
Is there a more concise way to get an output without any commas than listing each line index from 0 to the total number of lines?

import csv
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
writer = csv.writer(outfile)
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [-1, 0, 2 3]]
newline[2] = newline[2].split('=')[1]
writer.writerow(newline)

Related

how to convert text file to Excel file , Without deleting the spaces between data

I want to convert text file to excel file, without deleting spaces for each line.
Note that the number of columns will be equal to all lines of the file.
the text file follows the following format:
First row
05100079 0000001502 5 01 2 070 1924 02 06 1994 C508 2 8500 3 8500 3 3 1 1 012 10 0 98 00 4 8 8 9 0 40 01 2 15 26000 1748 C508 116 102 3 09 98 013 1 1 0 1 10 10 0 09003 50060 50060 0 0 369 99 9 1 4 4 5 8 0 0181 1 80 00 01 0 9 9 8 1 0 00 00 020 0
second row
05100095 0000001502 2 01 2 059 1917 02 03 1977 C504 2 8500 3 8500 3 9 1 1 54-11-0999-00 2 9 0 90 01 2 12 26000 1744 C504 116 102 3 09 98 013 1 1 0 2 0 09011 50060 50060 0 36 9 9 1 9 9 5 8 0 3161 9 9 8 020 0 `
How to edit the code to convert text file to excel file without deleting the spaces between data?
This code below deletes the space in each line.
I mean to convert the file to Excel Sheet without any modification to the original file.
The spaces stay spaces and all other data stays the same format.
import xlwt
import xlrd
book = xlwt.Workbook()
ws = book.add_sheet('First Sheet') # Add a sheet
f = open('testval.txt', 'r+')
data = f.readlines() # read all lines at once
for i in range(len(data)):
row = data[i].split() # This will return a line of string data, you may need to convert to other formats depending on your use case`
for j in range(len(row)):
ws.write(i, j, row[j]) # Write to cell i, j
book.save('testval' + '.xls')
f.close()
Expected output:
Excel file in the same format as the original file"text"
If you have fixed-length fields, you need to split each line using index intervals.
For instance, you can do:
book = xlwt.Workbook()
ws = book.add_sheet('First Sheet') # Add a sheet
with io.open("testval.txt", mode="r", encoding="utf-8") as f:
for row_idx, row in enumerate(f):
row = row.rstrip()
ws.write(row_idx, 0, row[0:8])
ws.write(row_idx, 1, row[9:19])
ws.write(row_idx, 2, row[20:21])
ws.write(row_idx, 3, row[22:24])
# and so on...
book.save("sample.xlsx")
You get something like that:

Reading txt file with number and suming them python

I have txt file witht the following txt in it:
2
4 8 15 16 23 42
1 3 5
6
66 77
77
888
888 77
34
23 234 234
1
32
3
23 23 23
365
22 12
I need a way to read the file and sum all the numbers.
i have this code for now but not sure what to do next. Thx in advance
`lstComplete = []
fichNbr = open("nombres.txt", "r")
lstComplete = fichNbr
somme = 0
for i in lstComplete:
i = i.split()`
Turn them into a list and sum them:
with open('nombres.txt', 'r') as f:
num_list = f.read().split()
print sum([int(n) for n in num_list])
Returns 3227
Open the file and use read() method to get the content and then convert string to int, use sum() to get the result:
>>> sum(map(int,open('nombres.txt').read().split()))
3227

Need to read from a file and add the elements and get avg in python 3.4

I have an input file as following:
75647485 10 20 13 12 14 17 13 16
63338495 15 20 11 17 18 20 17 20
00453621 3 10 4 10 20 18 15 10
90812341 18 18 16 20 8 20 7 15
I need to find the mean of each row starting from the second element till the end [1:8] and give the output as:
ID Mean Lowest number Highest number
75647485 14.37 10 20
90812341 ... ... ...
I am new to python, so can someone please help. I don't need to write the output to the file, but just displaying it on the console would work.
thank you
array = [ [int(s) for s in line.split()] for line in open('file') ]
for line in array:
print('%08i %3.1f %3i %3i' % (line[0], sum(line[1:])/len(line[1:]), min(line[1:]), max(line[1:])))
This produces the output:
75647485 14.4 10 20
63338495 17.2 11 20
00453621 11.2 3 20
90812341 15.2 7 20
Alternate Version
To assure that the file handle is properly closed, this version uses with. Also, string formatting is done with the more modern format function:
with open('file') as f:
array = [ [int(s) for s in line.split()] for line in f ]
for line in array:
print('{:08.0f} {:3.1f} {:3.0f} {:3.0f}'.format(line[0], sum(line[1:])/len(line[1:]), min(line[1:]), max(line[1:])))
You can do this by using numpy:
import numpy
numpy.mean(mylist[1:8])
fileRecord = namedtuple('RecordID', 'num1, num2, num3, num4, num5, num6, num7, num8)
import csv
for line in csv.reader(open("file.txt", header=None, delimiter=r"\s+")):
numList = fileRecord._make(line)
numListDict = numList._asdict()
lowest = numListDict[0]
highest = numListDict[7]
for (key, value) in numListDict:
total += value;
mean = total/8
print (lowest, highest, mean)
I would recommend using pandas. Much more scalable and many more features. It is also based on numpy.
import pandas as pd
x='''75647485 10 20 13 12 14 17 13 16
63338495 15 20 11 17 18 20 17 20
00453621 3 10 4 10 20 18 15 10
90812341 18 18 16 20 8 20 7 15'''
from cStringIO import StringIO # py27
df = pd.read_csv(StringIO(x), delim_whitespace=True, header=None, index_col=0)
print df.T.max()
#75647485 20
#63338495 20
#453621 20
#90812341 20
print df.T.min()
#75647485 10
#63338495 11
#453621 3
#90812341 7
print df.T.mean()
#75647485 14.375
#63338495 17.250
#453621 11.250
#90812341 15.250

Python: How to write values to a csv file from another csv file

For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
import collections
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
out = np.zeros((len(data2),len(data1)))
for row in data2:
for ch_row in range(len(data1)):
if (row[3] == ch_row + 1):
out = row.tolist() + data1[ch_row].tolist()
print(out)
writer = csv.writer(open('dn.csv','w'), delimiter=',',quoting=csv.QUOTE_ALL)
writer.writerow(out)
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
If I do "print(out)", it comes out a correct answer. However, when I input "out" in the shell, there are only one row appears like [1.0, 1.0, 1.0, 1.0, 20.0, 30.0, 50.0]
What I need is to store all the values in the "out" variables and write them to the dn.csv file.
This ought to do the trick for you:
Code:
from csv import reader, writer
data = list(reader(open("filename.csv", "r"), delimiter=" "))
out = writer(open("output.csv", "w"), delimiter=" ")
for row in reader(open("index.csv", "r"), delimiter=" "):
out.writerow(row + data[int(row[3])])
index.csv:
0 0 0 1
0 0 0 2
0 0 0 3
filename.csv:
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
This produces the output:
0 0 0 1 70 60 45
0 0 0 2 35 26 77
0 0 0 3 93 37 68
Note: There's no need to use numpy here. The stadard library csv module will do most of the work for you.
I also had to modify your sample datasets a bit as what you showed had indexes out of bounds of the sample data in filename.csv.
Please also note that Python (like most languages) uses 0th indexes. So you may have to fiddle with the above code to exactly fit your needs.
with open('dn.csv','w') as f:
writer = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
idx = row[3]
out = [idx] + [x for x in data1[idx-1]]
writer.writerow(out)

Reading non-uniform data from file into array with NumPy

Suppose I have a text file that looks like this:
33 3
46 12
23 10 23 11 23 12 23 13 23 14 23 15 23 16 24 10 24 11 24 12 24 13 24 14 24 15 24 16 25 14 25 15 25 16 26 16 27 16 28 16 29 16
33 17 33 18 33 19 34 17 34 18 34 19 35 17 35 18 35 19 36 19
41 32 41 33 42 32 42 33
I would like to read each line into a separate array of integers, as in (pseudo code):
for line in textfile:
currentArray = firstLine
do stuff with currentArray
where in the first iteration, currentArray would be
array([33, 3])
and in the second iteration, currentArray would be
array([46, 12])
until the last iteration, when currentArray would be
array([41, 32, 41, 33, 42, 32, 42, 33])
Basically, I would like to have the functionality of the numpy function loadtxt:
currentArray = loadtxt('scienceVertices.txt', usecols=() )
Except instead of usecols, being able to specify the row, e.g.,
currentArray = loadtxt('scienceVertices.txt', userows=(line) )
Here's a one-liner:
arrays = [np.array(map(int, line.split())) for line in open('scienceVertices.txt')]
arrays is a list of numpy arrays.
for line in textfile:
a = np.array([int(v) for v in line.strip().split(" ")])
# Work on your array
You can also use numpy.fromstring()
for line in f:
a = numpy.fromstring(line.strip(), dtype=int, sep=" ")
or -- if you want full flexibility -- even numpy.loadtxt():
for line in f:
a = numpy.loadtxt(StringIO.StringIO(line), dtype=int)
For long lines, these solution will perform better than the Python code in the other answers.
f = open("file", "r")
array = []
line = f.readline()
index = 0
while line:
line = line.strip("\n")
line = line.split()
array.append([])
for item in line:
array[index].append(int(item))
line = f.readline()
index += 1
f.close()
print array

Categories