rstrip not working as expected (Python 2.7) - python

I have the following code:
file = open("file", "r")
array = file.readlines()
stats = [1, 1, 1, 1, 1] # creating an array to fill
print array
sh1 = array[1] # breaking the array extracted from the text file up for editing
sh2 = array[2]
sh3 = array[3]
sh4 = array[4]
stats[0] = string.rstrip(sh1[1])
stats[1] = string.rstrip(sh2[1])
stats[2] = string.rstrip(sh3[1])
stats[3] = string.rstrip(sh4[1])
print stats
I was expecting it to strip the newlines from the array extracted from the text file and place the new data into a separate array. What is instead happening is I'm having a seemingly random amount of characters stripped from either end of my variables. Please could someone explain what I've done wrong?

sh1,sh2,sh3,sh4 are strings, so sh1[1] is the second character from the string.
rstrip will remove trailing whitespace, so you will put either 1 or 0 character strings into your result array.
I suspect you want something like:
stats = []
for line in open("file").readlines():
line = line.rstrip()
stats.append(line)
print stats
or all on one line:
print [ l.rstrip() for l in open("file").readlines() ]

Use list-comprehension.
array = file.readlines()
print [i.rstrip() for i in array]

You should open the file using with, you don't need to call readlines first. You can simply iterate over the file object in a list comprehension calling rstrip on each line:
with open("file") as f: # with closes your file automatically
stats = [line.rstrip() for line in f]
Why your code removes random characters is because you are passing random characters to remove, you are passing the second character from the second, third,fourth and fifth lines respectively to rstrip and stripping from lines 1,2,3 and 4 so depending on what the strings end with and what you passed different chars will be removed. You can pass no substring to remove any whitespace or specify certain characters:
In [3]: "foobar".rstrip("bar")
Out[3]: 'foo'
In [4]: "foobar \n".rstrip()
Out[4]: 'foobar'
There is also no way you are removing data from the front of the string unless you are completely stripping the string. Lastly if you actually want to skip the first line and start at line 2 you would simply have to call next(f) on the file object before you iterate in the comprehension.

Related

How to convert txt file into 2d array of each char

I am trying to read a text file I created, which looks like this:
small.txt
%%%%%%%%%%%%%%%%%%%%%%%
%eeeeeee%eeeee%eeeee%G%
%%%e%e%%%%%e%e%%%e%e%e%
%e%e%eeeeeee%eee%e%eee%
%e%e%e%e%%%e%%%e%e%%%e%
%eeeee%eee%eeeeeeeee%e%
%e%%%e%e%e%e%e%e%%%%%e%
%e%e%eee%e%e%eeeeeee%e%
%e%e%e%%%e%%%%%e%e%%%e%
%Pee%eeeeeeeee%e%eeeee%
%%%%%%%%%%%%%%%%%%%%%%%
I want to create a a 2D array board[21][11] in the specific situation.
I want each char to be in a cell, because I want to implement BFS and other algorithms to reach a specific path, it's a kind of Pacman game.
Here is my code:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = rec.split()
print chars
inner_list = []
for each in chars:
inner_list.append(each)
output_list.append(inner_list)
print output_list
As you see the output i get now is [[%%%%%%%%%%%%%%%%%%%%%%%]]
You can just do:
with open('small.txt') as f:
board = f.readlines()
The file.readlines() method will return a list of strings, which you can then use as a 2D array:
board[1][5]
>>> 'e'
Note, that with this approach, the newline characters ('\n') will be put into each row at the last index. To get rid of them, you can use str.rstrip:
board = [row.rstrip('\n') for row in board]
As another answer noted, the line strings are already indexable by integer, but if you really want a list of lists:
array = [list(line.strip()) for line in f]
That removes the line endings and converts each string to a list.
There are a few problems with your code:
you try to split lines into lists of chars using split, but that will only split at spaces
assuming your indentation is correct, you are only ever treating the last value of chars in your second loop
that second loop just wraps each of the (not splitted) lines in chars (which due to the previous issue is only the last one) into a list
Instead, you can just convert str to list...
>>> list("abcde")
['a', 'b', 'c', 'd', 'e']
... and put those into output_list directly. Also, don't forget to strip the \n:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = list(rec.strip())
output_list.append(chars)
Or using with for autoclosing and a list-comprehension:
with open("small.txt") as f:
output_list = [list(line.strip()) for line in f]
Note, however, that is you do not want to change the values in that grid, you do not have to convert to a list of lists of chars at all; a list of strings will work just as well.
output_list = list(map(str.strip, f))

extract the dimensions from the head lines of text file

Please see following attached image showing the format of the text file. I need to extract the dimensions of data matrix indicated by the first line in the file, here 49 * 70 * 1 for the case shown by the image. Note that the length of name "gd_fac" can be varying. How can I extract these numbers as integers? I am using Python 3.6.
Specification is not very clear. I am assuming that the information you want will always be in the first line, and always be in parenthesis. After that:
with open(filename) as infile:
line = infile.readline()
string = line[line.find('(')+1:line.find(')')]
lst = string.split('x')
This will create the list lst = [49, 70, 1].
What is happening here:
First I open the file (you will need to replace filename with the name of your file, as a string. The with ... as ... structure ensures that the file is closed after use. Then I read the first line. After that. I select only the parts of that line that fall after the open paren (, and before the close paren ). Finally, I break the string into parts, with the character x as the separator. This creates a list that contains the values in the first line of the file, which fall between parenthesis, and are separated by x.
Since you have mentioned that length of 'gd_fac' van be variable, best solution will be using Regular Expression.
import re
with open("a.txt") as fh:
for line in fh:
if '(' in line and ')' in line:
dimension = re.findall(r'.*\((.*)\)',line)[0]
break
print dimension
Output:
'49x70x1'
What this does is it looks for "gd_fac"
then if it's there is removes all the unneeded stuff and replaces it with just what you want.
with open('test.txt', 'r') as infile:
for line in infile:
if("gd_fac" in line):
line = line.replace("gd_fac", "")
line = line.replace("x", "*")
line = line.replace("(","")
line = line.replace(")","")
print (line)
break
OUTPUT: "49x70x1"

Removes white spaces while reading in a file

with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
Can anyone break down the line where whitespaces get removed?
I understand line.strip().split() first removes leading and trailing spaces from line then the resulting string gets split on whitespaces and stores all words in a list.
But what does the remaining code do?
The line ' '.join(line.strip().split()) creates a string consisting of all the list elements separated by exactly one whitespace character. Applying split() method on this string again returns a list containing all the words in the string which were separated by a whitespace character.
Here's a breakdown:
# Opens the file
with open(filename, "r") as f:
# Iterates through each line
for line in f:
# Rewriting this line, below:
# line = (' '.join(line.strip().split())).split()
# Assuming line was " foo bar quux "
stripped_line = line.strip() # "foo bar quux"
parts = stripped_line.split() # ["foo", "bar", "quux"]
joined = ' '.join(parts) # "foo bar quux"
parts_again = joined.split() # ["foo", "bar", "quux"]
Is this what you were looking for?
That code is pointlessly complicated is what it is.
There is no need to strip if you're no-arg spliting next (no-arg split drops leading and trailing whitespace by side-effect), so line.strip().split() can simplify to line.split().
The join and re-split doesn't change a thing, join sticks the first split back together with spaces, then split resplits on those very same spaces. So you could save the time spent joining only to split and just keep the original results from the first split, changing it to:
line = line.split()
and it would be functionally identical to the original:
line = (' '.join(line.strip().split())).split()
and faster to boot. I'm guessing the code you were handed was written by someone who didn't understand spliting and joining either, and just threw stuff at their problem without understanding what it did.
Here is explanation to code:-
with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
First line.strip() removes leading and trailing white spaces from line and .split() break to list on basis of white spaces.
Again .join convert previous list to a line of white space separated. Finally .split again convert it to list.
This code is superfluous line = (' '.join(line.strip().split())).split(). And it should be:-
line = line.split()
If you again want to strip use:-
line = map(str.strip, line.split())
I think they are doing this to maintain a constant amount of whitespace. The strip is removing all whitespace (could be 5 spaces and a tab), and then they are adding back in the single space in its place.

read line from file but store as list (python)

i want to read a specific line in a textfile and store the elements in a list.
My textfile looks like this
'item1' 'item2' 'item3'
I always end up with a list with every letter as an element
what I tried
line = file.readline()
for u in line:
#do something
line = file.readline()
for u in line.split():
# do stuff
This assumes the items are split by whitespace.
split the line by spaces and then add them to the list:
# line = ('item1' 'item2' 'item3') example of line
listed = []
line = file.readline()
for u in line.split(' '):
listed.append(u)
for e in listed:
print(e)
What you have there will read one whole line in, and then loop through each character that was in that line. What you probably want to do is split that line into your 3 items. Provided they are separated by a space, you could do this:
line = file.readline() # Read the line in as before
singles = line.split(' ') # Split the line wherever there are spaces found. You can choose any character though
for item in singles: # Loop through all items, in your example there will be 3
#Do something
You can reduce the number of lines (and variables) here by stringing the various functions used together, but I left them separate for ease of understanding.
You can try:
for u in line.split():
Which assumes there are whitespaces between each item. Otherwise you'll simply iterate over a str and thus iterate character by character.
You might also want to do:
u = u.strip('\'')
to get rid of the '
I'd use with, re and basically take anything between apostrophes... (this'll work for strings that have spaces inside them (eg: item 1 item 2, but obviously nested or string escape sequences won't be caught).
import re
with open('somefile') as fin:
print re.findall("'(.*?)'", next(fin))
# ['item1', 'item2', 'item3']
If you want all the characters of the line in a list you could try this.
This use double list comprehension.
with open('stackoverflow.txt', 'r') as file:
charlist = [c for word in file.readline().split(' ') for c in word ]
print(charlist)
If you want to get rid off some char, you can apply some filter for example; I don't want the char = ' in my list.
with open('stackoverflow.txt', 'r') as file:
charlist = [c for word in file.readline().split(' ') for c in word if(c != "'")]
print(charlist)
If this double list comprehension looks strange is the same of this.
with open('stackoverflow.txt', 'r') as file:
charlist = []
line = file.readline()
for word in line.split(' '):
for c in word:
if(c != "'"):
charlist.append(c)
print(charlist)

python string assignment

I have a StringIO object the is filled correctly. I than have the following code:
val = log_fp.getvalue()
lines = val.split('\n')
newval = ''
for line in lines:
if (not line.startswith('[output]')):
newval = line
print 'test1'+newval
print 'test2' +newval
in the loop, I have the correct value for newval printed, but in the last print, I have an empty string. Any ideas what I am doing wrong? What I need is to extract one of the lines in the stringIO object that is marked [output], but newval seems to be empty in 'test2'.
Splitting on '\n' for a string such as 'foo\n' will produce ['foo', ''].
What I need is to extract one of the
lines in the stringIO object that is
marked [output],
Untested:
content = log_fp.getvalue().split()
output_lines = [x for x in content if x.startswith('[output'])]
Then get the first element of output_lines, if that is what you need.
Is log_fp a text file?
If so, the last value in lines will be everything after the last newline character. Your file probably terminates in a newline, or a newline and some whitespace.
For the former case, the last value of line will be an empty string.
For the latter case, the last value of line will be the whitespace.
To avoid this, you could add a new clause to the if statement to check the trimmed string is not empty, eg.
val = log_fp.getvalue()
lines = val.split('\n')
newval = ''
for line in lines:
if ( len(line.strip()) > 0):
if (not line.startswith('[output]')):
newval = line
print 'test1'+newval
print 'test2' +newval
(I haven't tried running this, but it should give you the idea)

Categories