Reading comma separated values in python until new blank line character - python

I want to read input from STDIN like this:
1,0
0,0
1,0
1,0
and so on until the new line is empty(\n). This signifies the end of the input.
I did this
while (raw_input()!='\n'):
actual,predicted=raw_input().split(',')
Gave me this error when I entered "enter" in last input
0,0
0,1
1,0
1,1
1,1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-3ec5186ad531> in <module>()
5
6 while (raw_input()!='\n'):
----> 7 actual,predicted=raw_input().split(',')
8 if (actual==1 and predicted==1):
9 t_p+=50
ValueError: need more than 1 value to unpack
What's wrong?

OK, there are two problems here: raw_input strips the trailing newline, so a blank input becomes an empty string, not a newline.
The larger issue is that raw_input consumes the input, so your code won't work properly - it will only process every second line. The while loop calls raw_input (which uses up some input and discards it), then the body of the loop calls it again and assigns it to actual and predicted.
The python idiom for this task is a while True: loop containing a break:
while True:
line = raw_input()
if not line:
break
actual, predicted = line.split(",")
print("a, p", actual, predicted)
print("done!")

raw_input() strips the trailing newline, so you want to compare to ''.
But you're also reading two lines at a time in your loop, where I think you want something more like:
data = raw_input()
while (data != ''):
actual,predicted=data.split(',')
data = raw_input()

raw_input is really used when you're writing a user-interactive program. It sounds like your program is a more typical UNIX program which processes file input.
sys.stdin is an open file. Because of this, you can use my favorite feature of Python which is iterating over each line in a file. Ditch the raw_input altogether and just treat your data as if it were a file:
for line in sys.stdin:
line = line.strip()
parts = line.split(',')

Related

Error with .readlines()[n]

I'm a beginner with Python.
I tried to solve the problem: "If we have a file containing <1000 lines, how to print only the odd-numbered lines? ". That's my code:
with open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')as f:
n=1
num_lines=sum(1 for line in f)
while n<num_lines:
if n/2!=0:
a=f.readlines()[n]
print(a)
break
n=n+2
where n is a counter and num_lines calculates how many lines the file contains.
But when I try to execute the code, it says:
"a=f.readlines()[n]
IndexError: list index out of range"
Why it doesn't recognize n as a counter?
You have the call to readlines into a loop, but this is not its intended use,
because readlines ingests the whole of the file at once, returning you a LIST
of newline terminated strings.
You may want to save such a list and operate on it
list_of_lines = open(filename).readlines() # no need for closing, python will do it for you
odd = 1
for line in list_of_lines:
if odd : print(line, end='')
odd = 1-odd
Two remarks:
odd is alternating between 1 (hence true when argument of an if) or 0 (hence false when argument of an if),
the optional argument end='' to the print function is required because each line in list_of_lines is terminated by a new line character, if you omit the optional argument the print function will output a SECOND new line character at the end of each line.
Coming back to your code, you can fix its behavior using a
f.seek(0)
before the loop to rewind the file to its beginning position and using the
f.readline() (look, it's NOT readline**S**) method inside the loop,
but rest assured that proceding like this is. let's say, a bit unconventional...
Eventually, it is possible to do everything you want with a one-liner
print(''.join(open(filename).readlines()[::2]))
that uses the slice notation for lists and the string method .join()
Well, I'd personally do it like this:
def print_odd_lines(some_file):
with open(some_file) as my_file:
for index, each_line in enumerate(my_file): # keep track of the index of each line
if index % 2 == 1: # check if index is odd
print(each_line) # if it does, print it
if __name__ == '__main__':
print_odd_lines('C:\Users\Savina\Desktop\rosalind_ini5.txt')
Be aware that this will leave a blank line instead of the even number. I'm sure you figure how to get rid of it.
This code will do exactly as you asked:
with open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')as f:
for i, line in enumerate(f.readlines()): # Iterate over each line and add an index (i) to it.
if i % 2 == 0: # i starts at 0 in python, so if i is even, the line is odd
print(line)
To explain what happens in your code:
A file can only be read through once. After that is has to be closed and reopened again.
You first iterate over the entire file in num_lines=sum(1 for line in f). Now the object f is empty.
If n is odd however, you call f.readlines(). This will go through all the lines again, but none are left in f. So every time n is odd, you go through the entire file. It is faster to go through it once (as in the solutions offered to your question).
As a fix, you need to type
f.close()
f = open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')
everytime after you read through the file, in order to get back to the start.
As a side note, you should look up modolus % for finding odd numbers.

What qualifies collection of strings to become a line?

Following code is taking every character and running the loop as many times. But when I save the same line in a text file and perform same operation, the loop is only run once for 1 line. It is bit confusing. Possible reason I can think off is that first method is running the loop by considering "a" as a list. Kindly correct me if I am wrong. Also let me know how to create a line in code itself rather first saving it in a file and then using it.
>>> a="In this world\n"
>>> i=0
>>> for lines in a:
... i=i+1
... print i
...
1
2
3
4
5
6
7
8
9
10
11
12
13
You're trying to loop over a, which is a string. Regardless of how many newlines you have in a string, when you loop over it, you're going to go character by character.
If you want to loop through a bunch of lines, you have to use a list:
lines = ["this is line 1", "this is another line", "etc"]
for line in lines:
print line
If you have a string containing a bunch of newlines and want to convert it to a list of lines, use the split method:
text = "This is line 1\nThis is another line\netc"
lines = text.split("\n")
for line in lines:
print line
The reason why you go line by line when reading from a file is because the people who implemented Python decided that it would be more useful if iterating over a file yielded a collection of lines instead of a collection of characters.
However, a file and a string are different things, and you should not necessarily expect that they work in the same way.
Just change the name of the variable when looping on the line:
i = 0
worldLine ="In this world\n"
for character in worldLine:
i=i+1
print i
count = 0
readFile = open('myFile','r')
for line in readFile:
count += 1
now it should be clear what's going on.
Keeping meaningful names will save you a lot of debugging time.
Considering doing the following:
i = 0
worldLine =["In this world\n"]
for character in worldLine:
i=i+1
print i
if you want to loop on a list of lines consisting of worldLine only.

Converting strings from txt to int results in an error

Trying to grab these numbers for example from a text file :
00000
11111
22222
33333
44444
Trying to substract the total string that I got from the text file to have a function doing stuff with each row as an integer this way:
import linecache
with file('textfiletest.txt', 'r') as original: testfile = str(original.read())
lines = len(testfile.splitlines())
for i in range(lines):
SID = int(testfile[5*i: -(len(testfile)-5*(i+1))])
print SID
This code results in printing all the lines and getting an error on the last one, saying its not a convertable char.
ValueError: invalid literal for int() with base 10: ''
Note: every line is 5 characters long.
Line reading in Python is so much simpler than that, you are really overcomplicating things here:
with file('textfiletest.txt', 'r') as original:
for line in original:
if not line.strip():
continue
SID = int(line)
print SID
This loops over the lines in the file directly, one by one. int() can handle extra whitespace, including the newline character, so all we need to take care of is making sure we skip lines that are empty apart from whitespace.
Your solution doesn't take into account that the newline characters take up space too; lines are 6 characters long with the newline. But why make it so hard for yourself when you clearly already found the str.splitlines() function; you could just have looped over that:
for line in testfile.splitlines():
# ...
would have given you a loop over the lines in the file contents without trailing newlines and certainly no need for complicated slicing computations.

Python Index out of range on Cash flow

Having trouble with a code that should read comma separated values out of .txt file, sort into arrays based on negativity, and then plot data.
Here is the code, followed by 2 .txt files, the first one works, but the second one doesn't
#check python is working
print "hello world"
#import ability to plot and use matrices
import matplotlib.pylab as plt
import numpy as np
#declare variables
posdata=[]
negdata=[]
postime=[]
negtime=[]
interestrate=.025
#open file
f= open('/Users/zacharygastony/Desktop/CashFlow_2.txt','r')
data = f.readlines()
#split data into arrays
for y in data:
w= y.split(",")
if float(w[1])>0:
postime.append(int(w[0]))
posdata.append(float(w[1]))
else:
negtime.append(int(w[0]))
negdata.append(float(w[1]))
print "Inflow Total: ", posdata
print "Inflow Time: ", postime
print "Outflow Total: ", negdata
print "Outflow Time: ", negtime
#plot the data
N=len(postime)
M=len(negtime)
ind = np.arange(N+M) # the x locations for the groups
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(ind, posdata+negdata, width, color='r')
# add some
ax.set_ylabel('Cash Amount')
ax.set_title('Cash Flow Diagram')
ax.set_xlabel('Time')
plt.plot(xrange(0,M+N))
plt.show()'
.txt 1______
0,3761.97
1,-1000
2,-1000
3,-1000
4,-1000
.txt 2______
0,1000
1,-1000
2,1000
3,-1000
My error is as follows:
>>> runfile('/Users/zacharygastony/cashflow.py', wdir=r'/Users/zacharygastony')
hello world
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/zacharygastony/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "/Users/zacharygastony/cashflow.py", line 24, in <module>
if float(w[1])>0:
IndexError: list index out of range
One error that I can spot is with " if float(w[1])>0:" -- it shoudl take into account that the w[1] would be a set of two values separated by a space. Here is how w would look like for the second file: "['0', '1000 1', '-1000 2', '1000 3', '-1000\n']". So, w[1] would be "1000 1" and taking a float for this value would be an error. So, if you really want to access the second element, then one way is to split it using the default space delimiter and pick the first one (or the second one). Something like: "if float((w[1].split())[0])>0:".
Without having your actual files (or, better, an SSCCE that demonstrates the same problem), there's no way to be exactly sure what's going wrong. When I run your code (just changing the hardcoded pathname) with your exact data, everything works fine.
But, if if float(w[1])>0: is raising an IndexError, clearly w has only 0 or 1 elements.
Since w came from w= y.split(","), that means that y didn't have any commas in it.
Since y is each line from your file, one of the lines doesn't have any commas in it.
Which line? Well, none of them in the example you gave.
Most likely, your real file has something like a blank line at the end, so w ends as the single-element list [''].
Or… maybe that 2______ is actually a header line at the top of your file, in which case w will end up as ['2______'].
Or the actual file you're running against is a longer, hand-edited file, where you've made a typo somewhere, like 4.1000 instead of 4,1000.
Or…
To actually figure out the problem instead of just guessing, you will need to debug things, using a debugger or an interactive visualizer, or just adding print statements to log all the intermediate values:
print(y)
w= y.split(",")
print(w)
w1 = w[1]
print(w1)
f = float(w1)
print(f)
if f>0:
# ...
So, your actual problem is blank lines at the end of the file. How can you deal with that?
You can skip over blank lines, or skip over lines without enough commas, or just handle the exception and continue on.
For example, let's skip over blank lines. Note that readlines leaves the newline characters on the end, so they won't actually be blank, they'll be '\n' or maybe, depending on your platform and Python version, something else like '\r\n'. But really, you probably want to skip over a line with nothing but spaces too, right? So, let's just do call strip on it, and if the result is empty, skip the line:
for y in data:
if not y.strip():
continue
w = y.split(",")
If you'd prefer to preprocess things, you can:
data = f.readlines()
data = [line for line in data if line.strip()]
The problem with this is that, on top of reading in the whole file and searching for newlines to split on and building up a big list (all of which you were already doing just by calling readlines), you're also now going over the whole list again and building up another list. And all that before you even get started. And there is no reason to do that.
You can just iterate over a file, without ever calling readlines on it, which will grab the lines as you need them.
And you can use a generator expression instead of a list comprehension to "preprocess" without actually doing the work up-front. So:
data = (line for line in f if line.strip())

PYTHON how to search a text file for a number

There's a text file that I'm reading line by line. It looks something like this:
3
3
67
46
67
3
46
Each time the program encounters a new number, it writes it to a text file. The way I'm thinking of doing this is writing the first number to the file, then looking at the second number and checking if it's already in the output file. If it isn't, it writes THAT number to the file. If it is, it skips that line to avoid repetitions and goes on to the next line. How do I do this?
Rather than searching your output file, keep a set of the numbers you've written, and only write numbers that are not in the set.
Instead of checking output file for the number if it was already written it is better to keep this information in a variable (a set or list). It will save you on disk reads.
To search a file for numbers you need to loop through each line of that file, you can do that with for line in open('input'): loop, where input is the name of your file. On each iteration line would contain one line of input file ended with end of line character '\n'.
In each iteration you should try to convert the value on that line to a number, int() function may be used. You may want to protect yourself against empty lines or non-number values with try statement.
In each iteration having the number you should check if the value you found wasn't already written to the output file by checking a set of already written numbers. If value is not in the set yet, add it and write to the output file.
#!/usr/bin/env python
numbers = set() # create a set for storing numbers that were already written
out = open('output', 'w') # open 'output' file for writing
for line in open('input'): # loop through each line of 'input' file
try:
i = int(line) # try to convert line to integer
except ValueError: # if conversion to integer fails display a warning
print "Warning: cannot convert to number string '%s'" % line.strip()
continue # skip to next line on error
if i not in numbers: # check if the number wasn't already added to the set
out.write('%d\n' % i) # write the number to the 'output' file followed by EOL
numbers.add(i) # add number to the set to mark it as already added
This example assumes that your input file contains single numbers on each line. In case of empty on incorrect line a warning will be displayed to stdout.
You could also use list in the above example, but it may be less efficient.
Instead of numbers = set() use numbers = [] and instead of numbers.add(i): numbers.append(i). The if condition stays the same.
Don't do that. Use a set() to keep track of all the numbers you have seen. It will only have one of each.
numbers = set()
for line in open("numberfile"):
numbers.add(int(line.strip()))
open("outputfile", "w").write("\n".join(str(n) for n in numbers))
Note this reads them all, then writes them all out at once. This will put them in a different order than in the original file (assuming they're integers, they will come out in ascending numeric order). If you don't want that, you can also write them as you read them, but only if they are not already in the set:
numbers = set()
with open("outfile", "w") as outfile:
for line in open("numberfile"):
number = int(line.strip())
if number not in numbers:
outfile.write(str(number) + "\n")
numbers.add(number)
Are you working with exceptionally large files? You probably don't want to try to "search" the file you're writing to for a value you just wrote. You (probably) want something more like this:
encountered = set([])
with open('file1') as fhi, open('file2', 'w') as fho:
for line in fhi:
if line not in encountered:
encountered.add(line)
fho.write(line)
If you want to scan through a file to see if it contains a number on any line, you could do something like this:
def file_contains(f, n):
with f:
for line in f:
if int(line.strip()) == n:
return True
return False
However as Ned points out in his answer, this isn't a very efficient solution; if you have to search through the file again for each line, the running time of your program will increase proportional to the square of the number of numbers.
It the number of values is not incredibly large, it would be more efficient to use a set (documentation). Sets are designed to very efficiently keep track of unordered values. For example:
with open("input_file.txt", "rt") as in_file:
with open("output_file.txt", "wt") as out_file:
encountered_numbers = set()
for line in in_file:
n = int(line.strip())
if n not in encountered_numbers:
encountered_numbers.add(n)
out_file.write(line)

Categories