loop for two files only prints first line python - python

The first file, f1 has two columns, the first being an ID number and the second being a value associated with it.
The second file, f2 is a bigger version of the first file with more values and six columns but includes the two from the first file.
The second file has a column I want to associate with the values in the first file and I want the output to be a new text file that contains the ID,
the value associated and another column with the ones I want to associate from the bigger second file.
So far I've made a code which is doing what I want however it's only printing out the first line.
I'm not fantastic at python which is probably noticeable in my code, and I was hoping someone will have the answer to my problem
import csv
with open('output1.txt','w') as out1,open('list1.csv') as f1,open('list2.csv') as f2:
csvf1=csv.reader(f1)
csvf2=csv.reader(f2)
for txt1 in csvf1:
id1=txt1[0]
z1=txt1[1]
for txt2 in csvf2:
id2=txt2[0]
z2=txt2[3]
ra=txt2[1]
if id1==id2:
out1.write("{} {} {}\n".format(id2,z1,ra))
out1.close()
f1.close()
f2.close()
I would also like to point out that using .split(',') does not work on my files for some reason just in case someone tries to use it in an answer.

Keep the line csvf2=csv.reader(f2) inside the first loop. The inner loop will get executed only for the first line. For the second line, inner loop will not get executed as the filereader marker is already at the end of the file.
import csv
with open('output1.txt','w') as out1,open('list1.csv') as f1,open('list2.csv') as f2:
csvf1=csv.reader(f1)
for txt1 in csvf1:
id1=txt1[0]
z1=txt1[1]
csvf2=csv.reader(f2)
for txt2 in csvf2:
id2=txt2[0]
z2=txt2[3]
ra=txt2[1]
if id1==id2:
out1.write("{} {} {}\n".format(id2,z1,ra))
out1.close()
f1.close()
f2.close()

The cvs.reader() is a function which you can only iterate the result of it once, (is it a yield function? some one should correct me, I just dig into the source code and got stuck at buildin-module of object reader(). any way, it just behave like a yield function)
So you may need to save every row in a temp array for farther usage:
list1 = []
with open('list1.txt') as fp:
for row in csv.reader(fp):
list1.append(row)
By the way, you wont need to close a fp explicitly when open it with a with expression, the mechanism just do that for you when you get out of the with scope.

I managed to find my answer from another programmer and this was the code that ended up working.
Thank you so much for your answers for they were close to what worked.
import csv
with open('output1.txt','w') as out1, open('file1.csv') as f1:
csvf1=csv.reader(f1)
for txt1 in csvf1:
id1=txt1[0]
z1=txt1[1]
with open('file2.csv') as f2:
csvf2=csv.reader(f2)
for txt2 in csvf2:
id2=txt2[0]
z2=txt2[3]
ra=txt2[1]
if id1 == id2:
out1.write("{} {} {}\n".format(id2,z1,ra))

Related

IndexError when printing with readlines()

I keep encountering an index error when trying to print a line from a text file. I'm new to python and I'm still trying to learn so I'd appreciate if you can try to be patient with me; if there is something else needed from me, please let me know!
The traceback reads as
...
print(f2.readlines()[1]):
IndexError: list index out of range
When trying to print line 2 (...[1]), I am getting this out of range error.
Here's the current script.
with open("f2.txt", "r") as f2:
print(f2.readlines()[1])
There are 3 lines with text in the file.
contents of f2.txt
peaqwenasd
lasnebsat
kikaswmors
It seems that f2.seek(0) was necessary here to solve the issue.
with open("f2.txt", "r") as f2:
f2.seek(0)
print(f2.readlines()[1])
You haven't given all the code needed to solve your problem, but your given symptoms point to multiple calls to readlines.
Read the documentation: readlines() reads the entire file and returns a list of the contents. As a consequence, the file pointer is at the end of the file. If you call readlines() again at this point, it returns an empty file.
You apparently have a readlines() call before the code you gave us. seek(0) resets the file pointer to the start of the file, and you're reading the entire file a second time.
There are many tutorials that show you canonical ways to iterate through the contents of a file. I strongly recommend that you use one of those. For instance:
with open("f2.txt", "r") as f2:
for line in f2.readlines():
# Here you can work with the lines in sequence
If you need to deal with the lines in non-sequential order, then
with open("f2.txt", "r") as f2:
content = list(f2.readlines())
# Now you can access content[2], content[1], etc.

How to import and write text in a for cycle

I have the following code:
dat11=np.genfromtxt('errors11.txt')
dat12=np.genfromtxt('errors12.txt')
dat13=np.genfromtxt('errors13.txt')
dat22=np.genfromtxt('errors22.txt')
dat23=np.genfromtxt('errors23.txt')
dat33=np.genfromtxt('errors33.txt')
zip(dat11,dat12,dat13,dat22,dat23,dat33)
import csv
with open('Allerrors.txt', "w+") as output:
writer = csv.writer(output, delimiter='\t')
writer.writerows(zip(dat11,dat12,dat13,dat22,dat23,dat33))
quit
Where each of the 'errorsxy.txt' files consists in a column of numbers. With this program I created the 'Allerrors.txt' file, were all those columns are one next to the others. I need to do this same thing with a for cycle (or any other kind of loop) because I'll actually have much more files and I can't do it by hand. But I don't know how to write these various datxy with a cycle. I tried (for the first part of the code) with:
for x in range(1,Nbin+1):
for y in range(1,Nbin+1):
'dat'+str(x)+str(y)=np.genfromtxt('errors'+str(x)+str(y)+'.txt')
But of course I get the following error:
SyntaxError: can't assign to operator
I understand why I get this error, but I couldn't find any other way to write it. Also, I have no idea how to write the second part of the code.
I'm using Python 2.7
Anyone can help me?
Instead of making separate variables for each data file, you could append each read-in file to a list, then zip and print the list after the for loop has run.
errorfiles = []
for x in range(1,Nbin+1):
for y in range(1,Nbin+1):
dat=np.genfromtxt('errors'+str(x)+str(y)+'.txt’)
errorfiles.append(dat)

Using Python v3.5 to load a tab-delimited file, omit some rows, and output max and min floating numbers in a specific column to a new file

I've tried for several hours to research this, but every possible solution hasn't suited my particular needs.
I have written the following in Python (v3.5) to download a tab-delimited .txt file.
#!/usr/bin/env /Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5
import urllib.request
import time
timestr = time.strftime("%Y-%m-%d %H-%M-%S")
filename="/data examples/"+ "ace-magnetometer-" + timestr + '.txt'
urllib.request.urlretrieve('http://services.swpc.noaa.gov/text/ace-magnetometer.txt', filename=filename)
This downloads the file from here and renames it based on the current time. It works perfectly.
I am hoping that I can then use the "filename" variable to then load the file and do some things to it (rather than having to write out the full file path and file name, because my ultimate goal is to do the following to several hundred different files, so using a variable will be easier in the long run).
This using-the-variable idea seems to work, because adding the following to the above prints the contents of the file to STDOUT... (so it's able to find the file without any issues):
import csv
with open(filename, 'r') as f:
reader = csv.reader(f, dialect='excel', delimiter='\t')
for row in reader:
print(row)
As you can see from the file, the first 18 lines are informational.
Line 19 provides the actual column names. Then there is a line of dashes.
The actual data I'm interested in starts on line 21.
I want to find the minimum and maximum numbers in the "Bt" column (third column from the right). One of the possible solutions I found would only work with integers, and this dataset has floating numbers.
Another possible solution involved importing the pyexcel module, but I can't seem to install that correctly...
import pyexcel as pe
data = pe.load(filename, name_columns_by_row=19)
min(data.column["Bt"])
I'd like to be able to print the minimum Bt and maximum Bt values into two separate files called minBt.txt and maxBt.txt.
I would appreciate any pointers anyone may have, please.
This is meant to be a comment on your latest question to Apoc, but I'm new, so I'm not allowed to comment. One thing that might create problems is that bz_values (and bt_values, for that matter) might be a list of strings (at least it was when I tried to run Apoc's script on the example file you linked to). You could solve this by substituting this:
min_bz = min([float(x) for x in bz_values])
max_bz = max([float(x) for x in bz_values])
for this:
min_bz = min(bz_values)
max_bz = max(bz_values)
The following will work as long as all the files are formatted in the same way, i.e. data 21 lines in, same number of columns and so on. Also, the file that you linked did not appear to be tab delimited, and thus I've simply used the string split method on each row instead of the csv reader. The column is read from the file into a list, and that list is used to calculate the maximum and minimum values:
from itertools import islice
# Line that data starts from, zero-indexed.
START_LINE = 20
# The column containing the data in question, zero-indexed.
DATA_COL = 10
# The value present when a measurement failed.
FAILED_MEASUREMENT = '-999.9'
with open('data.txt', 'r') as f:
bt_values = []
for val in (row.split()[DATA_COL] for row in islice(f, START_LINE, None)):
if val != FAILED_MEASUREMENT:
bt_values.append(float(val))
min_bt = min(bt_values)
max_bt = max(bt_values)
with open('minBt.txt', 'a') as minFile:
print(min_bt, file=minFile)
with open('maxBt.txt', 'a') as maxFile:
print(max_bt, file=maxFile)
I have assumed that since you are doing this to multiple files you are looking to accumulate multiple max and min values in the maxBt.txt and minBt.txt files, and hence I've opened them in 'append' mode. If this is not the case, please swap out the 'a' argument for 'w', which will overwrite the file contents each time.
Edit: Updated to include workaround for failed measurements, as discussed in comments.
Edit 2: Updated to fix problem with negative numbers, also noted by Derek in separate answer.

While Loop Not Performing Main Function

I'm trying to write a Python script that uses a particular external application belonging to the company I work for. I can generally figure things out for myself when it comes to programming and scripting, but this time I am truely lost!
I can't seem to figure out why the while loop wont function as it is meant to. It doesn't give any errors which doesn't help me. It just seems to skip past the important part of the code in the centre of the loop and then goes on to increment the "count" like it should afterwards!
f = open('C:/tmp/tmp1.txt', 'w') #Create a tempory textfile
f.write("TEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\n") #Put some simple text in there
f.close() #Close the file
count = 0 #Insert the line number from the text file you want to begin with (first line starts with 0)
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt')) #Get the number of lines from the textfile
f = open('C:/tmp/tmp2.txt', 'w') #Create a new textfile
f.close() #Close it
while (count < num_lines): #Keep the loop within the starting line and total number of lines from the first text file
with open('C:/tmp/tmp1.txt', 'r') as f: #Open the first textfile
line2 = f.readlines() #Read these lines for later input
for line2[count] in f: #For each line from chosen starting line until last line from first text file,...
with open('C:/tmp/tmp2.txt', 'a') as g: #...with the second textfile open for appending strings,...
g.write("hello\n") #...write 'hello\n' each time while "count" < "num_lines"
count = count + 1 #Increment the "count"
I think everything works up until: "for line2[count] in f:"
The real code I'm working on is somewhat more complicated, and the application I'm using isn't exactly for sharing, so I have simplified the code to give silly outputs instead just to fix the problem.
I'm not looking for alternative code, I'm just looking for a reason why the loop isn't working so I can try to fix it myself.
All answers will be appreciated, and thanking everyone in advance!
Cormac
Some comments:
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt'))
Why? What's wrong with len(open(filename, 'rb').readlines())?
while (count < num_lines):
...
count = count + 1
This is bad style, you could use:
for i in range(num_lines):
...
Note that I named your index i, which is universally recognized, and that I used range and a for loop.
Now, your problem, like I said in the comment, is that f is a file (that is, a stream of bytes with a location pointer) and you've read all the lines from it. So when you do for line2[count] in f:, it will try reading a line into line2[count] (this is a bit weird, actually, you almost never use a for loop with a list member as an index but apparently you can do that), see that there's no line to read, and never executes what's inside the loop.
Anyway, you want to read a file, line by line, starting from a given line number? Here's a better way to do that:
from itertools import islice
start_line = 0 # change this
filename = "foobar" # also this
with open(filename, 'rb') as f:
for line in islice(f, start_line, None):
print(line)
I realize you don't want alternative code, but your code really is needlessly complicated.
If you want to iterate over the lines in the file f, I suggest replacing your "for" line with
for line in line2:
# do something with "line"...
You put the lines in an array called line2, so use that array! Using line2[count] as a loop variable doesn't make sense to me.
You seem to get it wrong how the 'for line in f' loop works. It iterates over a file and calls readline, until there are no lines to read. But at the moment you start the loop all the lines are already read(via f.readlines()) and file's current position is at end. You can achieve what you want by calling f.seek(0), but that doesn't seem to be a good decision anyway, since you're going to read file again and that's slow IO.
Instead you want to do smth like:
for line in line2[count:]: # iterate over lines read, starting with `count` line
do_smth_with(line)

Why can't I repeat the 'for' loop for csv.Reader?

I am a beginner of Python. I am trying now figuring out why the second 'for' loop doesn't work in the following script. I mean that I could only get the result of the first 'for' loop, but nothing from the second one. I copied and pasted my script and the data csv in the below.
It will be helpful if you tell me why it goes in this way and how to make the second 'for' loop work as well.
My SCRIPT:
import csv
file = "data.csv"
fh = open(file, 'rb')
read = csv.DictReader(fh)
for e in read:
print(e['a'])
for e in read:
print(e['b'])
"data.csv":
a,b,c
tree,bough,trunk
animal,leg,trunk
fish,fin,body
The csv reader is an iterator over the file. Once you go through it once, you read to the end of the file, so there is no more to read. If you need to go through it again, you can seek to the beginning of the file:
fh.seek(0)
This will reset the file to the beginning so you can read it again. Depending on the code, it may also be necessary to skip the field name header:
next(fh)
This is necessary for your code, since the DictReader consumed that line the first time around to determine the field names, and it's not going to do that again. It may not be necessary for other uses of csv.
If the file isn't too big and you need to do several things with the data, you could also just read the whole thing into a list:
data = list(read)
Then you can do what you want with data.
I have created small piece of function which doe take path of csv file read and return list of dict at once then you loop through list very easily,
def read_csv_data(path):
"""
Reads CSV from given path and Return list of dict with Mapping
"""
data = csv.reader(open(path))
# Read the column names from the first line of the file
fields = data.next()
data_lines = []
for row in data:
items = dict(zip(fields, row))
data_lines.append(items)
return data_lines
Regards

Categories