Simple question here, but I can't find a definitive answer. I'm writing some python code and I'm confused as to what this line does exactly:
a = []
for x in sys.stdin:
c = x.split()
a.extend(c)
When I run it, it defaults to making a list of words, but why is that? Why does python default to words instead of lines, or even characters from the stdin? I know about the readlines and readline methods, but I'm confused as to what exactly this code is doing with the stdin.
File objects, including sys.stdin, are iterable. Looping over a file object yields lines, so the text from that file up to the next line separator.
It never produces words. Your code produces words because you explicitly split each line:
c = x.split()
str.split() without arguments splits on arbitrary-length whitespace sequences (removing any whitespace from the start and end), effectively giving you a list of words, which are then added to the a list with list.extend().
Related
What is the purpose of readline().strip() (especially in the below code)?
Context:
I was taking a look at the following code:
op = open('encyin.txt', 'r')
n, q = op.readline().split()
n = int(n)
q = int(q)
dic = {}
for i in range(1, n + 1):
dic[str(i)]=(op.readline().strip())
And trying to interpret it.
My Interpretation:
The start is simple enough - it opens a file encyin.txt in read mode. It takes input - n & p - from the line, the .split() separating the two inputs. They are then classified as integers, and an empty list dict is created?
From there, a for loop is utilised.
But what does the last line mean? I am not familiar with (a) readline().strip() and (b) how this affects list dict and the values of the input:
For Example
If ency.txt was the following:
6 5
1151
723
1321
815
780
931
What happens to the other numbers from the 2nd line downwards? Does the readline().split assign them a line number? Does it add it to the list dict, a bit like .append?
What does the last line mean of the top code do? I am not familiar with (a) readline().strip() and (b) how this affects list dict and the values of the input:
In your text file, you have these things called whitespace characters. Often, these are spaces or enters ('\n') that you want to get rid of. The strip() helps you remove these whitespace characters.
If you were to print the numbers after reading them and without stripping, you would get:
number1
number2
number3
...
Because you haven't removed the hidden 'enter' character.
When reading a python script and you come across some function that you don't know, your goal should first to be understand the function out of context, and then you can figure out what they are doing in context.
The first port of call for understanding builtin/standard library functions (as opposed to functions from some extra library) should be the python docs. When the docs fail you, move on to other sources (there are plenty).
In this case, you want to know what op.readline() does. Well, what is op? I would go to open, and see that it creates a file object, which tells you that the actual implementation used is in io. Here we can search the page for readline.
What do the docs have to say about readline?
Read and return one line from the stream.
Here, I would assume, since it's a text file, "a line from the stream" is a string object (but you could always open a python interpreter to check), and look up string.strip(), which says:
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace.
Now put them together. They call (op.readline().strip()).
We know op is a "file object" using io
io's readline reads a single line from the stream
some_string.strip() called without parameters removes all whitespace from the start and end of some_string
Although python uses duck-typing, objects still have types/behaviours and understanding code often involves knowing what kind of object you are dealing with at any point so you can look into how it should work.
For example, if you know something is a dictionary, but you don't know what a dictionary is, you should search the docs for some info and try to understand what it does out of context first.
op = open('encyin.txt', 'r')
n, q = op.readline().split()
n = int(n)
q = int(q)
dic = {}
for i in range(1, n + 1):
# Here you're creating a key-value pair using the str value of the loop variable
# i as the dictionary key i.e. key dic[str(i)] creates the key, and the value is
# op.readline().strip(). strip() is a str method that removes trailing characters.
# the default is to remove whitespace at the beginning and ends of the string.
# These spaces get trimmed off if the method is called
dic[str(i)]=(op.readline().strip())
https://docs.python.org/3/library/stdtypes.html?highlight=str#str.strip
readline() returning a single line as string from your file.
ex: for the given txt file info:
Danni Loss
Shani Amari
Michele favarotti
readline() will return the first line:
Danni Loss\n
then there is a use of strip() removes all empty chars from the start and end of the string, so you will get:
Danni Loss
.readline() reads a line from a file. The result includes a trailing '\n'.
.strip() removes all leading & trailing whitespace (e.g. the above-mentioned '\n') from a string.
Thus, the last line of code dic[str(i)]=(op.readline().strip()) does the following:
Reads line from the open file
Strips whitespace from the line
Stores the stripped line in the dictionary using the index (converted to string) as a key
I currently have a list hard coded into my python code. As it keeps expanding, I wanted to make it more dynamic by reading the list from a file. I have read through many articles about how to do this, but in practice I can't get this working. So firstly, here is an example of the existing hardcoded list:
serverlist = []
serverlist.append(("abc.com", "abc"))
serverlist.append(("def.com", "def"))
serverlist.append(("hji.com", "hji"))
When I enter the command 'print serverlist' the output is shown below and my list works perfectly when I access it:
[('abc.com', 'abc'), ('def.com', 'def'), ('hji.com', 'hji')]
Now I've replaced the above code with the following:
serverlist = []
with open('/server.list', 'r') as f:
serverlist = [line.rstrip('\n') for line in f]
With the contents of server.list being:
'abc.com', 'abc'
'def.com', 'def'
'hji.com', 'hji'
When I now enter the command print serverlist, the output is shown below:
["'abc.com', 'abc'", "'def.com', 'def'", "'hji.com', 'hji'"]
And the list is not working correctly. So what exactly am I doing wrong? Am I reading the file incorrectly or am I formatting the file incorrectly? Or something else?
The contents of the file are not interpreted as Python code. When you read a line in f, it is a string; and the quotation marks, commas etc. in your file are just those characters as parts of a string.
If you want to create some other data structure from the string, you need to parse it. The program has no way to know that you want to turn the string "'abc.com', 'abc'" into the tuple ('abc.com', 'abc'), unless you instruct it to.
This is the point where the question becomes "too broad".
If you are in control of the file contents, then you can simplify the data format to make this more straightforward. For example, if you just have abc.com abc on the line of the file, so that your string ends up as 'abc.com abc', you can then just .split() that; this assumes that you don't need to represent whitespace inside either of the two items. You could instead split on another character (like the comma, in your case) if necessary (.split(',')). If you need a general-purpose hammer, you might want to look into JSON. There is also ast.literal_eval which can be used to treat text as simple Python literal expressions - in this case, you would need the lines of the file to include the enclosing parentheses as well.
If you are willing to let go of the quotes in your file and rewrite it as
abc.com, abc
def.com, def
hji.com, hji
the code to load can be reduced to a one liner using the fact that files are iterables
with open('servers.list') as f:
servers = [tuple(line.split(', ')) for line in f]
Remember that using a file as an iterator already strips off the newlines.
You can allow arbitrary whitespace by doing something like
servers = [tuple(word.strip() for word in line.split(',')) for line in f]
It might be easier to use something like regex to parse the original format. You could use an expression that captures the parts of the line you care about and matches but discards the rest:
import re
pattern = re.compile('\'(.+)\',\\s*\'(.+)\'')
You could then extract the names from the matched groups
with open('servers.list') as f:
servers = [pattern.fullmatch(line).groups() for line in f]
This is just a trivialized example. You can make it as complicated as you wish for your real file format.
Try this:
serverlist = []
with open('/server.list', 'r') as f:
for line in f:
serverlist.append(tuple(line.rstrip('\n').split(',')))
Explanation
You want an explicit for loop so you cycle through each line as expected.
You need list.append for each line to append to your list.
You need to use split(',') in order to split by commas.
Convert to tuple as this is your desired output.
List comprehension method
The for loop can be condensed as below:
with open('/server.list', 'r') as f:
serverlist = [tuple(line.rstrip('\n').split(',')) for line in f]
I am trying to use a very basic text file as a settings file. Three lines repeat in this order/format that govern some settings/input for my program. Text file is as follows:
Facebook
1#3#5#2
Header1#Header2#Header3#Header4
...
This is read in using the following Python code:
f = open('settings.txt', 'r')
for row in f:
platform = f.readline()
rows_to_keep = int(f.readline().split('#'))
row_headers = f.readline().split('#')
clean_output(rows_to_keep, row_headers, platform)
I would expect single string to be read in platform, an array of ints in the second and an array of strings in the third. These are then passed to the function and this is repeated numerous times.
However, the following three things are happening:
Int doesn't convert and I get a TypeError
First line in text file is ignored and I get rows to keep in platform
\n at the end of each line
I suspect these are related and so am only posting one question.
You cannot call int on a list, you need do do some kind of list comprehension like
rows_to_keep = [int(a) for a in f.readline().split('#')]
You're reading a line, then reading another line from the file. You should either do some kind of slicing (see Python how to read N number of lines at a time) or call a function with the three lines after every third iteration.
use .strip() to remove end of lines and other whitespace.
Try this:
with open('settings.txt', 'r') as f:
platform, rows_to_keep, row_headers = f.read().splitlines()
rows_to_keep = [int(x) for x in rows_to_keep.split('#')]
row_headers = row_headers.split('#')
clean_output(rows_to_keep, row_headers, platform)
There are several things going on here. First, when you do the split on the second line, you're trying to cast a list to type int. That won't work. You can, instead, use map.
rows_to_keep = map(int,f.readline().strip().split("#"))
Additionally, you see the strip() method above. That removes trailing whitespace chars from your line, ie: \n.
Try that change and also using strip() on each readline() call.
With as few changes as possible, I've attempted to solve your issues and show you where you went wrong. #Daniel's answer is how I would personally solve the issues.
f = open('settings.txt', 'r')
#See 1. We remove the unnecessary for loop
platform = f.readline()
#See 4. We make sure there are no unwanted leading or trailing characters by stripping them out
rows_to_keep = f.readline().strip().split('#')
#See 3. The enumerate function creates a list of pairs [index, value]
for row in enumerate(rows_to_keep):
rows_to_keep[row[0]] = int(row[1])
row_headers = f.readline().strip().split('#')
#See 2. We close the file when we're done reading
f.close()
clean_output(rows_to_keep, row_headers, platform)
You don't need (and don't want) a for loop on f, as well as calls to readline. You should pick one or the other.
You need to close f with f.close().
You cannot convert a list to an int, you want to convert the elements in the list to int. This can be accomplished with a for loop.
You probably want to call .strip to get rid of trailing newlines.
I want to find strings listed in list.txt (one string per line) in another text file in case I found it print 'string,one_sentence' in case didn't find 'string,another_sentence'. I'm using following code, but it is finding only last string in the strings list from file list.txt. Cannot understand what could be the reason?
data = open('c:/tmp/textfile.TXT').read()
for x in open('c:/tmp/list.txt').readlines():
if x in data:
print(x,',one_sentence')
else:
print(x,',another_sentence')
When you read a file with readlines(), the resulting list elements do have a trailing newline characters. Likely, these are the reason why you have less matches than you expected.
Instead of writing
for x in list:
write
for x in (s.strip() for s in list):
This removes leading and trailing whitespace from the strings in list. Hence, it removes trailing newline characters from the strings.
In order to consolidate your program, you could do something like this:
with open('c:/tmp/textfile.TXT') as f:
haystack = f.read()
if not haystack:
sys.exit("Could not read haystack data :-(")
with open('c:/tmp/list.txt') as f:
for needle in (line.strip() for line in f):
if needle in haystack:
print(needle, ',one_sentence')
else:
print(needle, ',another_sentence')
I did not want to make too drastic changes. The most important difference is that I am using the context manager here via the with statement. It ensures proper file handling (mainly closing) for you. Also, the 'needle' lines are stripped on the fly using a generator expression. The above approach reads and processes the needle file line by line instead of loading the whole file into memory at once. Of course, this only makes a difference for large files.
readlines() keeps a newline character at the end of each string read from your list file. Call strip() on those strings to remove those (and every other whitespace) characters.
I have a function that can only accept strings. (it creates the image with the string, but the string has little formatting and no word wrapping, so a long string will just bleed right through the edge of the image and keep going into the abyss, when in reality I would have liked it to create a paragraph, instead of a one line infinity).
I need it print with line breaks. Currently the file is being readin using
inputFiles.readlines()
so that this reads the entire file. Storing file.readLines() creates a list. So this list cannot be passed to my function looking for a string.
I used
inputFileContent = ' \n'.join(inputFiles.readLines())
in an attempt to force hard line breaks into the string between each list item. This does not work (edit: elaboration here) which means that the inputFileContent string does not have line breaks even though I put '\n' between the list elements. From my understanding, the readLines() function puts the individual lines into individual elements of a list.
any suggestions? Thank you
Use inputFiles.read() which creates a string. Does that help?
The 'join' should have worked. Your problem may be that the writing of the string ignores newline characters. You could maybe try '\r\n'.join(...)