Python when splitting line in file it leaves newline in the end

Python when splitting line in file it leaves newline in the end - python

I'm trying to make a simple question/answer program, where the questions are written in a normal text file like
this
Problem is when i split the code (at the #) it also leaves the newline, meaning anyone using the program would have to add the newline to the answer. Any way to remove that newline so only the answer is required?
Code:
file1 = open("file1.txt", "r")
p = 0
for line in file:
list = line.split("#")
answer = input(list[0])
if answer == list[1]:
p = p + 1
print("points:",p)

Strip all the whitespace from the right side of your input:
list[1] = list[1].rstrip()
Or just the newlines:
list[1] = list[1].rstrip('\n')
Using list as a variable name is a bad idea because it shadows the built-in class name. You can use argument unpacking to avoid the issue entirely and make your code more legible:
prompt, expected = line.split('#')
expected = expected.rstrip()

You can use rstrip(), which, without arguments, will default to stripping any whitespace character. So just modify the conditional slightly, like this.
if answer == list[1].rstrip():

Related

What is the purpose of readline().strip()?

What is the purpose of readline().strip() (especially in the below code)?
Context:
I was taking a look at the following code:
op = open('encyin.txt', 'r')
n, q = op.readline().split()
n = int(n)
q = int(q)
dic = {}
for i in range(1, n + 1):
dic[str(i)]=(op.readline().strip())
And trying to interpret it.
My Interpretation:
The start is simple enough - it opens a file encyin.txt in read mode. It takes input - n & p - from the line, the .split() separating the two inputs. They are then classified as integers, and an empty list dict is created?
From there, a for loop is utilised.
But what does the last line mean? I am not familiar with (a) readline().strip() and (b) how this affects list dict and the values of the input:
For Example
If ency.txt was the following:
6 5
1151
723
1321
815
780
931
What happens to the other numbers from the 2nd line downwards? Does the readline().split assign them a line number? Does it add it to the list dict, a bit like .append?
What does the last line mean of the top code do? I am not familiar with (a) readline().strip() and (b) how this affects list dict and the values of the input:

In your text file, you have these things called whitespace characters. Often, these are spaces or enters ('\n') that you want to get rid of. The strip() helps you remove these whitespace characters.
If you were to print the numbers after reading them and without stripping, you would get:
number1
number2
number3
...
Because you haven't removed the hidden 'enter' character.

When reading a python script and you come across some function that you don't know, your goal should first to be understand the function out of context, and then you can figure out what they are doing in context.
The first port of call for understanding builtin/standard library functions (as opposed to functions from some extra library) should be the python docs. When the docs fail you, move on to other sources (there are plenty).
In this case, you want to know what op.readline() does. Well, what is op? I would go to open, and see that it creates a file object, which tells you that the actual implementation used is in io. Here we can search the page for readline.
What do the docs have to say about readline?
Read and return one line from the stream.
Here, I would assume, since it's a text file, "a line from the stream" is a string object (but you could always open a python interpreter to check), and look up string.strip(), which says:
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace.
Now put them together. They call (op.readline().strip()).
We know op is a "file object" using io
io's readline reads a single line from the stream
some_string.strip() called without parameters removes all whitespace from the start and end of some_string
Although python uses duck-typing, objects still have types/behaviours and understanding code often involves knowing what kind of object you are dealing with at any point so you can look into how it should work.
For example, if you know something is a dictionary, but you don't know what a dictionary is, you should search the docs for some info and try to understand what it does out of context first.

op = open('encyin.txt', 'r')
n, q = op.readline().split()
n = int(n)
q = int(q)
dic = {}
for i in range(1, n + 1):
# Here you're creating a key-value pair using the str value of the loop variable
# i as the dictionary key i.e. key dic[str(i)] creates the key, and the value is
# op.readline().strip(). strip() is a str method that removes trailing characters.
# the default is to remove whitespace at the beginning and ends of the string.
# These spaces get trimmed off if the method is called
dic[str(i)]=(op.readline().strip())
https://docs.python.org/3/library/stdtypes.html?highlight=str#str.strip

readline() returning a single line as string from your file.
ex: for the given txt file info:
Danni Loss
Shani Amari
Michele favarotti
readline() will return the first line:
Danni Loss\n
then there is a use of strip() removes all empty chars from the start and end of the string, so you will get:
Danni Loss

.readline() reads a line from a file. The result includes a trailing '\n'.
.strip() removes all leading & trailing whitespace (e.g. the above-mentioned '\n') from a string.
Thus, the last line of code dic[str(i)]=(op.readline().strip()) does the following:
Reads line from the open file
Strips whitespace from the line
Stores the stripped line in the dictionary using the index (converted to string) as a key

Unexpected output from textfile - cleaning read in lines correctly

I am trying to use a very basic text file as a settings file. Three lines repeat in this order/format that govern some settings/input for my program. Text file is as follows:
Facebook
1#3#5#2
Header1#Header2#Header3#Header4
...
This is read in using the following Python code:
f = open('settings.txt', 'r')
for row in f:
platform = f.readline()
rows_to_keep = int(f.readline().split('#'))
row_headers = f.readline().split('#')
clean_output(rows_to_keep, row_headers, platform)
I would expect single string to be read in platform, an array of ints in the second and an array of strings in the third. These are then passed to the function and this is repeated numerous times.
However, the following three things are happening:
Int doesn't convert and I get a TypeError
First line in text file is ignored and I get rows to keep in platform
\n at the end of each line
I suspect these are related and so am only posting one question.

You cannot call int on a list, you need do do some kind of list comprehension like
rows_to_keep = [int(a) for a in f.readline().split('#')]
You're reading a line, then reading another line from the file. You should either do some kind of slicing (see Python how to read N number of lines at a time) or call a function with the three lines after every third iteration.
use .strip() to remove end of lines and other whitespace.

Try this:
with open('settings.txt', 'r') as f:
platform, rows_to_keep, row_headers = f.read().splitlines()
rows_to_keep = [int(x) for x in rows_to_keep.split('#')]
row_headers = row_headers.split('#')
clean_output(rows_to_keep, row_headers, platform)

There are several things going on here. First, when you do the split on the second line, you're trying to cast a list to type int. That won't work. You can, instead, use map.
rows_to_keep = map(int,f.readline().strip().split("#"))
Additionally, you see the strip() method above. That removes trailing whitespace chars from your line, ie: \n.
Try that change and also using strip() on each readline() call.

With as few changes as possible, I've attempted to solve your issues and show you where you went wrong. #Daniel's answer is how I would personally solve the issues.
f = open('settings.txt', 'r')
#See 1. We remove the unnecessary for loop
platform = f.readline()
#See 4. We make sure there are no unwanted leading or trailing characters by stripping them out
rows_to_keep = f.readline().strip().split('#')
#See 3. The enumerate function creates a list of pairs [index, value]
for row in enumerate(rows_to_keep):
rows_to_keep[row[0]] = int(row[1])
row_headers = f.readline().strip().split('#')
#See 2. We close the file when we're done reading
f.close()
clean_output(rows_to_keep, row_headers, platform)
You don't need (and don't want) a for loop on f, as well as calls to readline. You should pick one or the other.
You need to close f with f.close().
You cannot convert a list to an int, you want to convert the elements in the list to int. This can be accomplished with a for loop.
You probably want to call .strip to get rid of trailing newlines.

Trying to replace \t with \s with regex in Python, but as a result "Unhashable type:list" error

I am new to programming and have already checked other people's questions to make sure that I am using a good method to replace tabs with spaces, know my regex is correct, and also understand what exactly my error is ("Unhashable type 'list'). But even still, I'm at a loss of what to do. Any help would be great!
I have a large file that I have broken up into lines. Ultimately I will need to access the first 3 elements of each line. Currently when I print a line, without the additional re.sub line of code, I get something like this: ['blah\tblah\tblah'], when I want ['blah blah blah'].
My code to do this is
f = open(text.txt)
raw = f.read()
raw = raw.lower()
lines = raw.splitlines()
lines = re.sub(r'\t', lines, '\s')
print lines[0:2] #just to see the first few examples
f.close()
When I print the first few lines without the regex sub bit, it works fine. And then when I add that line in attempt to change the lines, I get the error. I understand that lists are changeable and thus can't be a hashed... but I'm not trying to work with a hash. I'm just trying to replace \t with \s in a large text file to make the program easier to work with. I don't think there is a problem with how I am changing \t's to \s's, because according to this error, any way I change it will break my code. What do I do?! Any help is super appreciated. :')

You need to change the order of params present inside the re.sub function. And also note that you can't use regex \s as a second param in re.sub function. Syntax of re.sub must be re.sub(regex,replacement,string) .
lines = raw.splitlines()
lines = [re.sub(r'\t', ' ', line) for line in lines]
raw.splitlines() returns a list which was then assigned to a variable called lines. So you need to apply the re.sub function to each item present in the list, since re.sub won't directly be applied on a list.

Reading and correctly understanding/interpreting control characters from a file (python)

I'm a python beginner and just ran into a simple problem: I have a list of names (designators) and then a very simple code that reads lines in a csv file and prints the csv lines that has a name in the first column (row[0]) in common with my "designator list". So:
import csv
DesignatorList = ["AAX-435", "AAX-961", "HHX-9387", "HHX-58", "K-58", "K-14", "K-78524"]
with open('DesignatorFile.csv','rb') as FileReader:
for row in csv.reader(FileReader, delimiter=';'):
if row[0] in DesignatorList:
print row
My csv files is only a list of names, like this:
AAX-435
AAX-961
HHX-58
HHX-9387
I would like to be able to use wildcards like * and ., example: let's say that I put this on my csv file:
AAX*
H.X-9387
*58
I need my code to be able to interpret those wild cards/control characters, printing the following:
every line that starts with "AAX";
every line that starts with "H", then any following character, then finally ends with "X-9387";
every line that ends with "58".
Thank you!
EDIT: For future reference (in case somebody runs into the same problem), this is how I solved my problem following Roman advice:
import csv
import re
DesignatorList = ["AAX-435", "AAX-961", "HHX-9387", "HHX-58", "K-58", "K-14", "K-78524"]
with open('DesignatorFile.txt','rb') as FileReader:
for row in csv.reader(FileReader, delimiter=';'):
designator_col0 = row[0]
designator_col0_re = re.compile("^" + ".*".join(re.escape(i) for i in designator_col0.split("*")) + "$")
for d in DesignatorList:
if designator_col0_re.match(d):
print d

Try the re module.
You may need to prepare regular expression (regex) for use by replacing '*' with '.*' and adding ^ (beginning of a string) and $ (end of string) to the beginning and the end of the regular expression. In addition, you may need to escape everything else by re.escape function (that is, function escape from module re).
In case you do not have any other "control characters" (as you call them), splitting the string by "*" and joining by ".*" after applying escape.
For example,
import re
def make_rule(rule): # where rule for example "H*X-9387"
return re.compile("^" + ".*".join(re.escape(i) for i in rule.split("*")) + "$")
Then you can match (I guess, your rule is row):
...
rule_re = make_rule(row)
for d in DesignatorList:
if rule_re.match(d):
print row # or maybe print d
(I have understood, that rules are coming from CSV file while designators are from a list. It's easy to do it the other way around).
The examples above are examples. You still need to adapt them into your program.

Python's string object does have a startswith and an endswith method, which you could use here if you only had a small number of rules. The most general way to go with this, since you seem to have fairly simple patterns, is regular expressions. That way you can encode those rules as patterns.
import re
rules = ['^AAX.*$', # starts with AAX
'^H.*X-9387$', # starts with H, ends with X-9387
'^.*58$'] # ends with 58
for line in reader:
if any(re.match(rule, line) for rule in rules):
print line

Python: How to ignore #comment lines when reading in a file

In Python, I have just read a line form a text file and I'd like to know how to code to ignore comments with a hash # at the beginning of the line.
I think it should be something like this:
for
if line !contain #
then ...process line
else end for loop
But I'm new to Python and I don't know the syntax

you can use startswith()
eg
for line in open("file"):
li=line.strip()
if not li.startswith("#"):
print line.rstrip()

I recommend you don't ignore the whole line when you see a # character; just ignore the rest of the line. You can do that easily with a string method function called partition:
with open("filename") as f:
for line in f:
line = line.partition('#')[0]
line = line.rstrip()
# ... do something with line ...
partition returns a tuple: everything before the partition string, the partition string, and everything after the partition string. So, by indexing with [0] we take just the part before the partition string.
EDIT:
If you are using a version of Python that doesn't have partition(), here is code you could use:
with open("filename") as f:
for line in f:
line = line.split('#', 1)[0]
line = line.rstrip()
# ... do something with line ...
This splits the string on a '#' character, then keeps everything before the split. The 1 argument makes the .split() method stop after a one split; since we are just grabbing the 0th substring (by indexing with [0]) you would get the same answer without the 1 argument, but this might be a little bit faster. (Simplified from my original code thanks to a comment from #gnr. My original code was messier for no good reason; thanks, #gnr.)
You could also just write your own version of partition(). Here is one called part():
def part(s, s_part):
i0 = s.find(s_part)
i1 = i0 + len(s_part)
return (s[:i0], s[i0:i1], s[i1:])
#dalle noted that '#' can appear inside a string. It's not that easy to handle this case correctly, so I just ignored it, but I should have said something.
If your input file has simple enough rules for quoted strings, this isn't hard. It would be hard if you accepted any legal Python quoted string, because there are single-quoted, double-quoted, multiline quotes with a backslash escaping the end-of-line, triple quoted strings (using either single or double quotes), and even raw strings! The only possible way to correctly handle all that would be a complicated state machine.
But if we limit ourselves to just a simple quoted string, we can handle it with a simple state machine. We can even allow a backslash-quoted double quote inside the string.
c_backslash = '\\'
c_dquote = '"'
c_comment = '#'
def chop_comment(line):
# a little state machine with two state varaibles:
in_quote = False # whether we are in a quoted string right now
backslash_escape = False # true if we just saw a backslash
for i, ch in enumerate(line):
if not in_quote and ch == c_comment:
# not in a quote, saw a '#', it's a comment. Chop it and return!
return line[:i]
elif backslash_escape:
# we must have just seen a backslash; reset that flag and continue
backslash_escape = False
elif in_quote and ch == c_backslash:
# we are in a quote and we see a backslash; escape next char
backslash_escape = True
elif ch == c_dquote:
in_quote = not in_quote
return line
I didn't really want to get this complicated in a question tagged "beginner" but this state machine is reasonably simple, and I hope it will be interesting.

I'm coming at this late, but the problem of handling shell style (or python style) # comments is a very common one.
I've been using some code almost everytime I read a text file.
Problem is that it doesn't handle quoted or escaped comments properly. But it works for simple cases and is easy.
for line in whatever:
line = line.split('#',1)[0].strip()
if not line:
continue
# process line
A more robust solution is to use shlex:
import shlex
for line in instream:
lex = shlex.shlex(line)
lex.whitespace = '' # if you want to strip newlines, use '\n'
line = ''.join(list(lex))
if not line:
continue
# process decommented line
This shlex approach not only handles quotes and escapes properly, it adds a lot of cool functionality (like the ability to have files source other files if you want). I haven't tested it for speed on large files, but it is zippy enough of small stuff.
The common case when you're also splitting each input line into fields (on whitespace) is even simpler:
import shlex
for line in instream:
fields = shlex.split(line, comments=True)
if not fields:
continue
# process list of fields

This is the shortest possible form:
for line in open(filename):
if line.startswith('#'):
continue
# PROCESS LINE HERE
The startswith() method on a string returns True if the string you call it on starts with the string you passed in.
While this is okay in some circumstances like shell scripts, it has two problems. First, it doesn't specify how to open the file. The default mode for opening a file is 'r', which means 'read the file in binary mode'. Since you're expecting a text file it is better to open it with 'rt'. Although this distinction is irrelevant on UNIX-like operating systems, it's important on Windows (and on pre-OS X Macs).
The second problem is the open file handle. The open() function returns a file object, and it's considered good practice to close files when you're done with them. To do that, call the close() method on the object. Now, Python will probably do this for you, eventually; in Python objects are reference-counted, and when an object's reference count goes to zero it gets freed, and at some point after an object is freed Python will call its destructor (a special method called __del__). Note that I said probably: Python has a bad habit of not actually calling the destructor on objects whose reference count drops to zero shortly before the program finishes. I guess it's in a hurry!
For short-lived programs like shell scripts, and particularly for file objects, this doesn't matter. Your operating system will automatically clean up any file handles left open when the program finishes. But if you opened the file, read the contents, then started a long computation without explicitly closing the file handle first, Python is likely to leave the file handle open during your computation. And that's bad practice.
This version will work in any 2.x version of Python, and fixes both the problems I discussed above:
f = open(file, 'rt')
for line in f:
if line.startswith('#'):
continue
# PROCESS LINE HERE
f.close()
This is the best general form for older versions of Python.
As suggested by steveha, using the "with" statement is now considered best practice. If you're using 2.6 or above you should write it this way:
with open(filename, 'rt') as f:
for line in f:
if line.startswith('#'):
continue
# PROCESS LINE HERE
The "with" statement will clean up the file handle for you.
In your question you said "lines that start with #", so that's what I've shown you here. If you want to filter out lines that start with optional whitespace and then a '#', you should strip the whitespace before looking for the '#'. In that case, you should change this:
if line.startswith('#'):
to this:
if line.lstrip().startswith('#'):
In Python, strings are immutable, so this doesn't change the value of line. The lstrip() method returns a copy of the string with all its leading whitespace removed.

I've found recently that a generator function does a great job of this. I've used similar functions to skip comment lines, blank lines, etc.
I define my function as
def skip_comments(file):
for line in file:
if not line.strip().startswith('#'):
yield line
That way, I can just do
f = open('testfile')
for line in skip_comments(f):
print line
This is reusable across all my code, and I can add any additional handling/logging/etc. that I need.

I know that this is an old thread, but this is a generator function that I
use for my own purposes. It strips comments no matter where they
appear in the line, as well as stripping leading/trailing whitespace and
blank lines. The following source text:
# Comment line 1
# Comment line 2
# host01 # This host commented out.
host02 # This host not commented out.
host03
host04 # Oops! Included leading whitespace in error!
will yield:
host02
host03
host04
Here is documented code, which includes a demo:
def strip_comments(item, *, token='#'):
"""Generator. Strips comments and whitespace from input lines.
This generator strips comments, leading/trailing whitespace, and
blank lines from its input.
Arguments:
item (obj): Object to strip comments from.
token (str, optional): Comment delimiter. Defaults to ``#``.
Yields:
str: Next uncommented non-blank line from ``item`` with
comments and leading/trailing whitespace stripped.
"""
for line in item:
s = line.split(token, 1)[0].strip()
if s:
yield s
if __name__ == '__main__':
HOSTS = """# Comment line 1
# Comment line 2
# host01 # This host commented out.
host02 # This host not commented out.
host03
host04 # Oops! Included leading whitespace in error!""".split('\n')
hosts = strip_comments(HOSTS)
print('\n'.join(h for h in hosts))
The normal use case will be to strip the comments from a file (i.e., a hosts file, as in my example above). If this is the case, then the tail end of the above code would be modified to:
if __name__ == '__main__':
with open('aa.txt', 'r') as f:
hosts = strip_comments(f)
for host in hosts:
print('\'%s\'' % host)

A more compact version of a filtering expression can also look like this:
for line in (l for l in open(filename) if not l.startswith('#')):
# do something with line
(l for ... ) is called "generator expression" which acts here as a wrapping iterator that will filter out all unneeded lines from file while iterating over it. Don't confuse it with the same thing in square brakets [l for ... ] which is a "list comprehension" that will first read all the lines from the file into memory and only then will start iterating over it.
Sometimes you might want to have it less one-liney and more readable:
lines = open(filename)
lines = (l for l in lines if ... )
# more filters and mappings you might want
for line in lines:
# do something with line
All the filters will be executed on the fly in one iteration.

Use regex re.compile("^(?:\s+)*#|(?:\s+)") to skip the new lines and comments.

I tend to use
for line in lines:
if '#' not in line:
#do something
This will ignore the whole line, though the answer which includes rpartition has my upvote as it can include any information from before the #

a good thing to get rid of coments that works for both inline and on a line
def clear_coments(f):
new_text = ''
for line in f.readlines():
if "#" in line: line = line.split("#")[0]
new_text += line
return new_text

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.