I'm creating an application that looks into a website's text and then checks if the input string is in the url of the website's url. The way I'm doing is:
Replace the spaces (' ') in the given string (because url's can't have spaces, duh)
use requests to get the text of the website url
Create a new file and write every string you find in the website in the file.
Read the file line by line and if one line has the string in it, open it in a webbrowser.
I hope I explained it well. Here is my code:
def getGame():
game = gameEntry.get()
gameClean = game.replace(' ', '_')
print(gameClean)
gameCheck1 = requests.get('INSERT LINK HERE')
game2 = gameCheck1.text
with open('Links.txt', 'w+') as f:
f.write(game2)
readLinks = f.readlines()
for link in readLinks:
if game in link:
print(f'Found working link: {link}')
Thanks in advance.
When you write to the file, the file pointer ends up at the end of the file; a subsequent read begins at the end of the file and finds nothing. To fix, call f.seek(0) after the write call to move the file pointer back to the beginning of the file.
Also, just as a side-note, there's no reason to call .readlines(); just delete the readlines line entirely and change the loop to:
for link in f:
and you'll read the lines on demand (instead of creating a whole list of them up front when you only need a line at a time).
Related
I want to write a file that says hello guys how are you but each word must be an item of list. Here is my code. It shows nothing when I run it, when I run second time it shows item by item as I want. But when I click text file, it is written two times.
with open('stavanger.txt','r+') as f: # file closes itself with with open as filename command
words = ['hello\n','guys\n','how\n', 'are\n','you\n']
f.writelines(words)
for i in f:
x=i.rstrip().split(',')#turn text file into list and we seperate list items by comma .
print(x)
The problem is that writing to a file uses a buffer. So after the line f.writelines(words) nothing really happened. Only the buffer changed.
In effect, the file still haven't changed and the file pointer is still at the beginning of the file. So the second time you run your code you see the content printed, which leaves the file pointer at the end of the file and only then the buffer is passed to the file and you have the duplicated content.
Simply use mode='w' if you just want to write to a file...
You start reading the file from where the writing stopped. It is better to open the file first for writing, then for reading
Something like this
with open('stavanger.txt', 'w') as f: # file closes itself with with open as filename command
words = ['hello\n', 'guys\n', 'how\n', 'are\n', 'you\n']
f.writelines(words)
with open('stavanger.txt', 'r') as f:
for i in f:
x = i.rstrip().split(',') # turn text file into list and we seperate list items by comma .
print(x)
Is it possible to use both read() and readline() on one text file in python?
When I did that, it will only do the first reading function.
file = open(name, "r")
inside = file.readline()
inside2 = file.read()
print(name)
print(inside)
print(inside2)
The result shows only the inside variable, not inside2.
Reading a file is like reading a book. When you say .read(), it reads through the book until the end. If you say .read() again, well you forgot one step. You can't read it again unless you flip back the pages until you're at the beginning. If you say .readline(), we can call that a page. It tells you the contents of the page and then turns the page. Now, saying .read() starts there and reads to the end. That first page isn't included. If you want to start at the beginning, you need to turn back the page. The way to do that is with the .seek() method. It is given a single argument: a character position to seek to:
with open(name, 'r') as file:
inside = file.readline()
file.seek(0)
inside2 = file.read()
There is also another way to read information from the file. It is used under the hood when you use a for loop:
with open(name) as file:
for line in file:
...
That way is next(file), which gives you the next line. This way is a little special, though. If file.readline() or file.read() comes after next(file), you will get an error that mixing iteration and read methods would lose data. (Credits to Sven Marnach for pointing this out.)
Yes you can.
file.readline() reads a line from the file (the first line in this case), and then file.read() reads the rest of the file starting from the seek position, in this case, where file.readline() left off.
You are receiving an empty string with f.read() probably because you reached EOF - End of File immediately after reading the first line with file.readline() implying your file only contains one line.
You can however return to the start of the file by moving the seek position to the start with f.seek(0).
I am trying to write a script to automate browsing to my most commonly visited websites. I have put the websites into a list and am trying to open it using the webbrowser() module in Python. My code looks like the following at the moment:
import webbrowser
f = open("URLs", "r")
list = f.readline()
for line in list:
webbrowser.open_new_tab(list)
This only reads the first line from my file "URLs" and opens it in the browser. Could any one please help me understand how I can achieve reading through the entire file and also opening the URLs in different tabs?
Also other options that can help me achieve the same.
You have two main problems.
The first problem you have is that you are using readline and not readlines. readline will give you the first line in the file, while readlines gives you a list of your file contents.
Take this file as an example:
# urls.txt
http://www.google.com
http://www.imdb.com
Also, get in to the habit of using a context manager, as this will close the file for you once you have finished reading from it. Right now, even though for what you are doing, there is no real danger, you are leaving your file open.
Here is the information from the documentation on files. There is a mention about best practices with handling files and using with.
The second problem in your code is that, when you are iterating over list (which you should not use as a variable name, since it shadows the builtin list), you are passing list in to your webrowser call. This is definitely not what you are trying to do. You want to pass your iterator.
So, taking all this in to mind, your final solution will be:
import webbrowser
with open("urls.txt") as f:
for url in f:
webbrowser.open_new_tab(url.strip())
Note the strip that is called in order to ensure that newline characters are removed.
You're not reading the file properly. You're only reading the first line. Also, assuming you were reading the file properly, you're still trying to open list, which is incorrect. You should be trying to open line.
This should work for you:
import webbrowser
with open('file name goes here') as f:
all_urls = f.read().split('\n')
for each_url in all_urls:
webbrowser.open_new_tab(each_url)
My answer is assuming that you have the URLs 1 per line in the text file. If they are separated by spaces, simply change the line to all_urls = f.read().split(' '). If they're separated in another way just change the line to split accordingly.
I have a problem whereby I am trying to first check a text file for the existence of a known string, and based on this, loop over the file and insert a different line.
For some reason, after calling file.read() to check for the test string, the for loop appears not to work. I have tried calling file.seek(0) to get back to the start of the file, but this has not helped.
My current code is as follows:
try:
f_old = open(text_file)
f_new = open(text_file + '.new','w')
except:
print 'Unable to open text file!'
logger.info('Unable to open text file, exiting')
sys.exit()
wroteOut = False
# first check if file contains an test string
if '<dir>' in f_old.read():
#f_old.seek(0) # <-- do we need to do this??
for line in f_old: # loop thru file
print line
if '<test string>' in line:
line = ' <found the test string!>'
if '<test string2>' in line:
line = ' <found test string2!>'
f_new.write(line) # write out the line
wroteOut = True # set flag so we know it worked
f_new.close()
f_old.close()
You already know the answer:
#f_old.seek(0) # <-- do we need to do this??
Yes, you need to seek back to the start of the file before you can read the contents again.
All file operations work with the current file position. Using file.read() reads all of the file, leaving the current position set to the end of the file. If you wanted to re-read data from the start of the file, a file.seek(0) call is required. The alternatives are to:
Not read the file again, you just read all of the data, so use that information instead. File operations are slow, using the same data from memory is much, much faster:
contents = f_old.read()
if '<dir>' in contents:
for line in contents.splitlines():
# ....
Re-open the file. Opening a file in read mode puts the current file position back at the start.
I am currently trying to read a txt file from a website.
My script so far is:
webFile = urllib.urlopen(currURL)
This way, I can work with the file. However, when I try to store the file (in webFile), I only get a link to the socket. Another solution I tried was to use read()
webFile = urllib.urlopen(currURL).read()
However this seems to remove the formating (\n, \t etc) are removed.
If I open the file like this:
webFile = urllib.urlopen(currURL)
I can read it line by line:
for line in webFile:
print line
This will should result in:
"this"
"is"
"a"
"textfile"
But I get:
't'
'h'
'i'
...
I wish to get the file on my computer, but maintain the format at the same time.
You should use readlines() to read entire line:
response = urllib.urlopen(currURL)
lines = response.readlines()
for line in lines:
.
.
But, i strongly recommend you to use requests library.
Link here http://docs.python-requests.org/en/latest/
This is because you iterate over a string. And that will result in character for character printing.
Why not save the whole file at once?
import urllib
webf = urllib.urlopen('http://stackoverflow.com/questions/32971752/python-read-file-from-web-site-url')
txt = webf.read()
f = open('destination.txt', 'w+')
f.write(txt)
f.close()
If you really want to loop over the file line for line use txt = webf.readlines() and iterate over that.
If you're just trying to save a remote file to your local server as part of a python script, you could use the PycURL library to download and save it without parsing it. More info here - http://pycurl.sourceforge.net
Alternatively, if you want to read and then write the output, I think you've just got the methods out of sequence. Try the following:
# Assign the open file to a variable
webFile = urllib.urlopen(currURL)
# Read the file contents to a variable
file_contents = webFile.read()
print(file_contents)
> This will be the file contents
# Then write to a new local file
f = open('local file.txt', 'w')
f.write(file_contents)
If neither applies, please update the question to clarify.
You can directly download the file and save it using a name that you prefer. After that, you can read the file and later you can delete it if you don't need the file anymore.
!pip install wget
import wget
url = "https://raw.githubusercontent.com/apache/commons-validator/master/src/example/org/apache/commons/validator/example/ValidateExample.java"
wget.download(url, 'myFile.java')