I have a txt file with list of html/doc files, I want to download them using python and save them as 1.html, 2.doc, 3.doc, ...
http://example.com/kran.doc
http://example.com/loj.doc
http://example.com/sks.html
I've managed to create fully functional script except python will allways add question mark to the end of newly created file (if you look from linux) and if you look from windows file name would be something like 5CFB43~X
import urllib2
st = 1;
for line in open('links.txt', 'r'):
u = urllib2.urlopen(line)
ext = line.split(".")
imagefile = str(st)+"."+ext[-1]
#file created should be something.doc but its something.doc? -> notice question mark
fajl = open(imagefile, "w+")
fajl.write(u.read())
fajl.close()
print imagefile
st += 1
The line terminator is two characters, not one.
for line in open('links.txt', 'rU'):
But not anymore.
Work on line.strip() instead of line
That's because lines read this way will end up with '\n' at the end, hence the ?
Just add the following at the beginning of your loop:
if line.endswith('\n'):
line = line[:-1]
Or as AKX pointed out in the comments, just:
line = line.rstrip('\r\n')
And so you cover any kind of line ending.
Related
I want to edit a a big file in specific lines.
So it isnt a good Idea to read the whole file before editing, thats why I dont
want to use:
myfile.readlines()
I have to read each line check if there a special content in it and then i have to edit this line.
So far Im reading every line:
file = open("file.txt","r+")
i = 0
for line in file:
if line ......:
//edit this line
//this is where i need help
file.close()
So the Question is:
How can I edit the current line in the If Statement for example:
if the current line is "test" I want to replace it with "test2" and then write "test2" back into the file into the line where "test" was before
This will help
import fileinput
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
print(line.replace(text_to_search, replacement_text), end='')
ok so as #EzzatA mentioned in the comments below the question it seems to be the best way to read the original file and create a new one with the edited data.
So something like this:
original_file = open("example.txt","r")
new_file = open("example_converted.xml","w")
string_tobe_replace = "test"
replacement_string = "test2"
for line in original_file:
if string_tobe_replace in line:
new_line = line.replace(string_tobe_replace,replacement_string)
new_file.write(new_line)
else:
new_file.write(line)
original_file.close()
new_file.close()
I have a text file that looks like this:
1,004,59
1,004,65
1,004,69
1,005,55
1,005,57
1,006,53
1,006,59
1,007,65
1,007,69
1,007,55
1,007,57
1,008,53
Want to create new text file that will be inserted by 'input', something like this
1,004,59,input
1,004,65,input
1,004,69,input
1,005,55,input
1,005,57,input
1,006,53,input
1,006,59,input
1,007,65,input
1,007,69,input
1,007,55,input
1,007,57,input
1,008,53,input
I have attempted something like this:
with open('data.txt', 'a') as f:
lines = f.readlines()
for i, line in enumerate(lines):
line[i] = line[i].strip() + 'input'
for line in lines:
f.writelines(line)
Not able to get the right approach though.
What you want is to be able to read and write to the file in place (at the same time). Python comes with the fileinput module which is good for this purpose:
import fileinput
for line in fileinput.input('data.txt', inplace=True):
line = line.rstrip()
print line + ",input"
Discusssion
The fileinput.input() function returns a generator that reads your file line by line. Each line ends up with a new line (either \n or \r\n, depends on the operating system).
The code then strip off each line of this new line, add the ",input" part, then print out. Note that because of fileinput magic, the print statement's output will go back into the file instead of the console.
There are a newline '\n' in every line in your file, so you should handle it.
edit: oh I forgot about the rstrip() function!
tmp = []
with open("input.txt", 'r') as file:
appendtext = ",input\n"
for line in file:
tmp.append(line.rstrip() + appendtext)
with open("input.txt", 'w') as file:
file.writelines(tmp)
Added:
Answer by Hai_Vu is great if you use fileinput since you don't have to open the file twice as I did.
To do only the thing you're asking I would go for something like
newLines = list()
with open('data.txt', 'r') as f:
lines = f.readlines()
for line in lines:
newLines.append(line.strip() + ',input\n')
with open('data2.txt', 'w') as f2:
f2.writelines(newLines)
But there are definitely more elegant solutions
I'm making a file type to store information from my program. The file type can include lines starting with #, like:
# This is a comment.
As shown, the # in front of a line denotes a comment.
I've written a program in Python that can read these files:
fileData = []
file = open("Tutorial.rdsf", "r")
line = file.readline()
while line != "":
fileData.append(line)
line = file.readline()
for item in list(fileData):
item.strip()
fileData = list(map(lambda s: s.strip(), fileData))
print(fileData)
As you can see, it takes the file, adds every line as an item in a list, and strips the items of \n. So far, so good.
But often these files contain comments I've made, and such the program adds them to the list.
Is there a way to delete all items in the list starting with #?
Edit: To make things a bit clearer: Comments won't be like this:
Some code:
{Some Code} #Foo
They'll be like this:
#Foo
Some code:
{Some Code}
You can process lines directly in a for loop:
with open("Tutorial.rdsf", "r") as file:
for line in file:
if line.startswith('#'):
continue # skip comments
line = line.strip()
# do more things with this line
Only put them into a list if you need random access (e.g. you need to access lines at specific indices).
I used a with statement to manage the open file, when Python reaches the end of the with block the file is automatically closed for you.
It's easy to check for leading # signs.
Change this:
while line != "":
fileData.append(line)
line = file.readline()
to this:
while line != "":
if not line.startswith("#"):
fileData.append(line)
line = file.readline()
But your program is a bit complicated for what it does. Look in the documentation where it explains about for line in file:.
there are multiple files in directory with extension .txt, .dox, .qcr etc.
i need to list out txt files, search & replace the text from each txt files only.
need to search the $$\d ...where \d stands for the digit 1,2,3.....100.
need to replace with xxx.
please let me know the python script for this .
thanks in advance .
-Shrinivas
#created following script, it works for single txt files, but it is not working for txt files more than one lies in directory.
-----
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll("t.read","$$\d","xxx")
t.close()
fileinput.input(...) is supposed to process a bunch of files, and must be ended with a corresponding fileinput.close(). So you can either process all in one single call:
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=True):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
dummy = sys.stdout.write(line) # to avoid a possible output of the size
fileinput.close() # to orderly close everythin
replaceAll(glob.glob('*.txt'), "$$\d","xxx")
or consistently close fileinput after processing each file, but it rather ignores the main fileinput feature.
Try out this.
import re
def replaceAll(file,searchExp,replaceExp):
for line in file.readlines():
try:
line = line.replace(re.findall(searchExp,line)[0],replaceExp)
except:
pass
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll(t,"\d+","xxx")
t.close()
Here we are sending file handler to the replaceAll function rather than a string.
You can try this:
import os
import re
the_files = [i for i in os.listdir("foldername") if i.endswith("txt")]
for file in the_files:
new_data = re.sub("\d+", "xxx", open(file).read())
final_file = open(file, 'w')
final_file.write(new_data)
final_file.close()
I'm trying to open .txt file and am getting confused with which part goes where. I also want that when I open the text file in python, the spaces removed.And when answering could you make the file name 'clues'.
My first try is:
def clues():
file = open("clues.txt", "r+")
for line in file:
string = ("clues.txt")
print (string)
my second try is:
def clues():
f = open('clues.txt')
lines = [line.strip('\n') for line in open ('clues.txt')]
The thrid try is:
def clues():
f = open("clues.txt", "r")
print f.read()
f.close()
Building upon #JonKiparsky It would be safer for you to use the python with statement:
with open("clues.txt") as f:
f.read().replace(" ", "")
If you want to read the whole file with the spaces removed, f.read() is on the right track—unlike your other attempts, that gives you the whole file as a single string, not one line at a time. But you still need to replace the spaces. Which you need to do explicitly. For example:
f.read().replace(' ', '')
Or, if you want to replace all whitespace, not just spaces:
''.join(f.read().split())
This line:
f = open("clues.txt")
will open the file - that is, it returns a filehandle that you can read from
This line:
open("clues.txt").read().replace(" ", "")
will open the file and return its contents, with all spaces removed.