List to str adding backslash \ to string using str() - python

I am trying to extract file paths from a txt file. My file says C:\logs.
I use
with open(pathfile, "r") as f:
pathlist = f.readlines()
to produce a list with the path in, and then
path1 = str(pathlist)
to produce the line as a string. The list sees the line as it is in th efile, but te second command puts in an extra backslash: C:\logs.
I then do
os.chdir(path1)
to look at the path and I get the error
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: "['C:\\logs']"
WHY is this? how can I prevent it?
I am looking to have many paths in the file and have the script search each path individually. Is this the best way to do it?
Thank you very much.

The extra backslash you see is an "escape" character, which is how the representation of the string disambiguates the existing backslash. It's not actually two backslashes
The problem is actually that pathlist is a list, and you're forcing it to be a str. Instead, take the first element of pathlist:
path1 = pathlist[0]
You may also have a line break at the end (another use of escape: \n or \r). To solve that, use .strip()
path1 = pathlist[0].strip()

str(pathlist) casts the list to a string, which results in ['C:\\logs'], which is definately not a valid path.
with open(pathfile, "r") as f:
for line in f:
# print the path (=line)
# strip() removes whitespace as well as the line break '\n' at the end
print(strip(line))
Or you could do:
for line in f:
print(line.replace('\\n', ''))
Or:
for line in f:
if line:
print(line.splitlines()[0])

Let's say the contents of pathfile are as follows:
C:\Path1
C:\Path2
C:\Path3
readlines returns a list of all lines in pathfile.
[ 'C:\Path1', 'C:\Path2', 'C:\Path3' ]
Using str on a python list creates a string which is a literal intrepretation of the list parseable by python. Its not what you want.
"[ \"C:\\Path1\", \"C:\\Path2\", \"C:\\Path3\" ]"
What you want is something like
import os
with open(pathfile, "r") as f:
for line in f.readlines():
path = line.strip() # strip newline characters from end of line
os.chdir(path)

Related

Checking if string is in text file is not working

I am writing in python 3.6 and am having trouble making my code match strings in a short text document. this is a simple example of the exact logic that is breaking my bigger program:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
print(file.read().splitlines())
if 'bah' not in file.read().splitlines():
print("fail")
with the text document formatted like so:
bah
gah
fah
dah
mah
and it is indeed printing out fail each time I run this. Am I using the incorrect method of reading the data from the text document?
the issue is that you're printing print(file.read().splitlines())
so it exhausts the file, and the next call to file.read().splitlines() returns an empty list...
A better way to "grep" your pattern would be to iterate on the file lines instead of reading it fully. So if you find the string early in the file, you save time:
with open(PATH, 'r') as f:
for line in f:
if line.rstrip()=="bah":
break
else:
# else is reached when no break is called from the for loop: fail
print("fail")
The small catch here is not to forget to call line.rstrip() because file generator issues the line with the line terminator. Also, if there's a trailing space in your file, this code will still match the word (make it strip() if you want to match even with leading blanks)
If you want to match a lot of words, consider creating a set of lines:
lines = {line.rstrip() for line in f}
so your in lines call will be a lot faster.
Try it:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = file.read().splitlines()
print(words)
if 'bah' not in words:
print("fail")
You can't read the file two times.
When you do print(file.read().splitlines()), the file is read and the next call to this function will return nothing because you are already at the end of file.
PATH = "your_file"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
if 'bah' not in (file.read().splitlines()) :
print("fail")
as you can see output is not 'fail' you must use one 'file.read().splitlines()' in code or save it in another variable otherwise you have an 'fail' message

Remove backslash instances from a file in Python

Okay, this may sound like a stupid question, but I can't solve this problem...
I need remove all instances of backslash from downloaded file... But,
output.replace("\","")
doesn't work. Python considers "\"," a string, rather than "\" one string and "" the other one.
How can I remove backslashes?
EDIT:
New problem...
Originally, downloaded file had to be processed, which I did using:
fn = "result_cache.txt"
f = open(fn)
output = []
for line in f:
if content in line:
output.append(line)
f.close()
f = open(fn, "w")
f.writelines(output)
f.close()
output=str(output)
#irrelevant stuff
with open("result_cache.txt", "wt") as out:
out.write(output.replace("\\n","\n"))
Which worked okay, reducing file's content to only one line...
And finally ended with having this contents only:
Line of text\
Another line of text\
There\\\'s more text here\
Last line of text
I can't use the same thing again, because it would transform every line to a value in a list, leaving brackets and commas... So, I need to have:
out.write(output.replace("\\n","\n"))
out.write(output.replace("\\",""))
in the same line... How? Or is there another way?
Just escape the backslash with a backslash:
output.replace("\\","")

how to properly read and modify a file using python

I'm trying to remove all (non-space) whitespace characters from a file and replace all spaces with commas. Here is my current code:
def file_get_contents(filename):
with open(filename) as f:
return f.read()
content = file_get_contents('file.txt')
content = content.split
content = str(content).replace(' ',',')
with open ('file.txt', 'w') as f:
f.write(content)
when this is run, it replaces the contents of the file with:
<built-in,method,split,of,str,object,at,0x100894200>
The main issue you have is that you're assigning the method content.split to content, rather than calling it and assigning its return value. If you print out content after that assignment, it will be: <built-in method split of str object at 0x100894200> which is not what you want. Fix it by adding parentheses, to make it a call of the method, rather than just a reference to it:
content = content.split()
I think you might still have an issue after fixing that through. str.split returns a list, which you're then tuning back into a string using str (before trying to substitute commas for spaces). That's going to give you square brackets and quotation marks, which you probably don't want, and you'll get a bunch of extra commas. Instead, I suggest using the str.join method like this:
content = ",".join(content) # joins all members of the list with commas
I'm not exactly sure if this is what you want though. Using split is going to replace all the newlines in the file, so you're going to end up with a single line with many, many words separated by commas.
When you split the content, you forgot to call the function. Also once you split, its an array so you should loop to replace things.
def file_get_contents(filename):
with open(filename) as f:
return f.read()
content = file_get_contents('file.txt')
content = content.split() <- HERE
content = [c.replace(' ',',') for c in content]
content = "".join(content)
with open ('file.txt', 'w') as f:
f.write(content)
if you are looking to replace characters i think you would be better off using python's re module for regular expressions. sample code would be as follows:
import re
def file_get_contents(filename):
with open(filename) as f:
return f.read()
if __name__=='__main__':
content = file_get_contents('file.txt')
# First replace any spaces with commas, then remove any other whitespace
new_content = re.sub('\s', '', re.sub(' ', ',', content))
with open ('new_file.txt', 'w') as f:
f.write(new_content)
its more succinct then trying to split all the time and gives you a little bit more flexibility. just also be careful with how large of a file you are opening and reading with your code - you may want to consider using a line iterator or something instead of reading all the file contents at once

Remove whitespaces in the beginning of every string in a file in python?

How to remove whitespaces in the beginning of every string in a file with python?
I have a file myfile.txt with the strings as shown below in it:
_ _ Amazon.inc
Arab emirates
_ Zynga
Anglo-Indian
Those underscores are spaces.
The code must be in a way that it must go through each and every line of a file and remove all those whitespaces, in the beginning of a line.
I've tried using lstrip but that's not working for multiple lines and readlines() too.
Using a for loop can make it better?
All you need to do is read the lines of the file one by one and remove the leading whitespace for each line. After that, you can join again the lines and you'll get back the original text without the whitespace:
with open('myfile.txt') as f:
line_lst = [line.lstrip() for line in f.readlines()]
lines = ''.join(line_lst)
print lines
Assuming that your input data is in infile.txt, and you want to write this file to output.txt, it is easiest to use a list comprehension:
inf = open("infile.txt")
stripped_lines = [l.lstrip() for l in inf.readlines()]
inf.close()
# write the new, stripped lines to a file
outf = open("output.txt", "w")
outf.write("".join(stripped_lines))
outf.close()
To read the lines from myfile.txt and write them to output.txt, use
with open("myfile.txt") as input:
with open("output.txt", "w") as output:
for line in input:
output.write(line.lstrip())
That will make sure that you close the files after you're done with them, and it'll make sure that you only keep a single line in memory at a time.
The above code works in Python 2.5 and later because of the with keyword. For Python 2.4 you can use
input = open("myfile.txt")
output = open("output.txt", "w")
for line in input:
output.write(line.lstrip())
if this is just a small script where the files will be closed automatically at the end. If this is part of a larger program, then you'll want to explicitly close the files like this:
input = open("myfile.txt")
try:
output = open("output.txt", "w")
try:
for line in input:
output.write(line.lstrip())
finally:
output.close()
finally:
input.close()
You say you already tried with lstrip and that it didn't work for multiple lines. The "trick" is to run lstrip on each individual line line I do above. You can try the code out online if you want.

Delete newline / return carriage in file output

I have a wordlist that contains returns to separate each new letter. Is there a way to programatically delete each of these returns using file I/O in Python?
Edit: I know how to manipulate strings to delete returns. I want to physically edit the file so that those returns are deleted.
I'm looking for something like this:
wfile = open("wordlist.txt", "r+")
for line in wfile:
if len(line) == 0:
# note, the following is not real... this is what I'm aiming to achieve.
wfile.delete(line)
>>> string = "testing\n"
>>> string
'testing\n'
>>> string = string[:-1]
>>> string
'testing'
This basically says "chop off the last thing in the string" The : is the "slice" operator. It would be a good idea to read up on how it works as it is very useful.
EDIT
I just read your updated question. I think I understand now. You have a file, like this:
aqua:test$ cat wordlist.txt
Testing
This
Wordlist
With
Returns
Between
Lines
and you want to get rid of the empty lines. Instead of modifying the file while you're reading from it, create a new file that you can write the non-empty lines from the old file into, like so:
# script
rf = open("wordlist.txt")
wf = open("newwordlist.txt","w")
for line in rf:
newline = line.rstrip('\r\n')
wf.write(newline)
wf.write('\n') # remove to leave out line breaks
rf.close()
wf.close()
You should get:
aqua:test$ cat newwordlist.txt
Testing
This
Wordlist
With
Returns
Between
Lines
If you want something like
TestingThisWordlistWithReturnsBetweenLines
just comment out
wf.write('\n')
You can use a string's rstrip method to remove the newline characters from a string.
>>> 'something\n'.rstrip('\r\n')
>>> 'something'
The most efficient is to not specify a strip value
'\nsomething\n'.split() will strip all special characters and whitespace from the string
simply use, it solves the issue.
string.strip("\r\n")
Remove empty lines in the file:
#!/usr/bin/env python
import fileinput
for line in fileinput.input("wordlist.txt", inplace=True):
if line != '\n':
print line,
The file is moved to a backup file and standard output is directed to the input file.
'whatever\r\r\r\r\r\r\r\r\n\n\n\n\n'.translate(None, '\r\n')
returns
'whatever'
This is also a possible solution
file1 = open('myfile.txt','r')
conv_file = open("numfile.txt","w")
temp = file1.read().splitlines()
for element in temp:
conv_file.write(element)
file1.close()
conv_file.close()

Categories