python read file from a web URL - python

I am currently trying to read a txt file from a website.
My script so far is:
webFile = urllib.urlopen(currURL)
This way, I can work with the file. However, when I try to store the file (in webFile), I only get a link to the socket. Another solution I tried was to use read()
webFile = urllib.urlopen(currURL).read()
However this seems to remove the formating (\n, \t etc) are removed.
If I open the file like this:
webFile = urllib.urlopen(currURL)
I can read it line by line:
for line in webFile:
print line
This will should result in:
"this"
"is"
"a"
"textfile"
But I get:
't'
'h'
'i'
...
I wish to get the file on my computer, but maintain the format at the same time.

You should use readlines() to read entire line:
response = urllib.urlopen(currURL)
lines = response.readlines()
for line in lines:
.
.
But, i strongly recommend you to use requests library.
Link here http://docs.python-requests.org/en/latest/

This is because you iterate over a string. And that will result in character for character printing.
Why not save the whole file at once?
import urllib
webf = urllib.urlopen('http://stackoverflow.com/questions/32971752/python-read-file-from-web-site-url')
txt = webf.read()
f = open('destination.txt', 'w+')
f.write(txt)
f.close()
If you really want to loop over the file line for line use txt = webf.readlines() and iterate over that.

If you're just trying to save a remote file to your local server as part of a python script, you could use the PycURL library to download and save it without parsing it. More info here - http://pycurl.sourceforge.net
Alternatively, if you want to read and then write the output, I think you've just got the methods out of sequence. Try the following:
# Assign the open file to a variable
webFile = urllib.urlopen(currURL)
# Read the file contents to a variable
file_contents = webFile.read()
print(file_contents)
> This will be the file contents
# Then write to a new local file
f = open('local file.txt', 'w')
f.write(file_contents)
If neither applies, please update the question to clarify.

You can directly download the file and save it using a name that you prefer. After that, you can read the file and later you can delete it if you don't need the file anymore.
!pip install wget
import wget
url = "https://raw.githubusercontent.com/apache/commons-validator/master/src/example/org/apache/commons/validator/example/ValidateExample.java"
wget.download(url, 'myFile.java')

Related

Reading and Typing in a new File IO

Why does the text not show up when I click on the file_io_reverse.ipynb file??
##I am trying to read 'file_io.ipynb' and put the reverse of it into 'file_io_reverse.ipynb', this code doesn't work at all
f = open('file_io_reverse.ipynb', "a")
with open('file_io.ipynb', "r") as f2:
for i in f2:
x = i[::-1]
print(x)
f.write(x)
f.close()
As #olvin pointed out, your mixture of ways of opening and closing files is inconsistent but not functionally incorrect and should work.
What are you trying to open the file_io_reverse.ipynb file in?
IPYNB notebooks are plain text files formatted using JSON, making them human-readable and easy to share with others. So if you are trying to reverse contents of each line in the file and trying to save it in another file, then that would make the new ipynb file invalid.
Try opening the file in a text editor, and it should have the reversed lines for each line in the file_io.ipynb.

Python open() doesn't create a file when in w+ mode

I'm creating an application that looks into a website's text and then checks if the input string is in the url of the website's url. The way I'm doing is:
Replace the spaces (' ') in the given string (because url's can't have spaces, duh)
use requests to get the text of the website url
Create a new file and write every string you find in the website in the file.
Read the file line by line and if one line has the string in it, open it in a webbrowser.
I hope I explained it well. Here is my code:
def getGame():
game = gameEntry.get()
gameClean = game.replace(' ', '_')
print(gameClean)
gameCheck1 = requests.get('INSERT LINK HERE')
game2 = gameCheck1.text
with open('Links.txt', 'w+') as f:
f.write(game2)
readLinks = f.readlines()
for link in readLinks:
if game in link:
print(f'Found working link: {link}')
Thanks in advance.
When you write to the file, the file pointer ends up at the end of the file; a subsequent read begins at the end of the file and finds nothing. To fix, call f.seek(0) after the write call to move the file pointer back to the beginning of the file.
Also, just as a side-note, there's no reason to call .readlines(); just delete the readlines line entirely and change the loop to:
for link in f:
and you'll read the lines on demand (instead of creating a whole list of them up front when you only need a line at a time).

Can anyone help me figure out how to import this .txt file into my code?

I'm working in VS on a repository in Github. I'm importing this stats.csv file into my code but the .readlines() call isn't printing anything. Does anyone know why? Thank you
Tried many different import methods
#this is our main code
import os
cmd = 'curl https://raw.githubusercontent.com/ksu-is/NFLQuarterbackstatIdentifier/master/stats.csv -o stats.txt'
os.system(cmd)
stats = open('stats.txt', 'a+')
statheadings = stats.readlines()
print(statheadings)
print("123123")
Should print the stats.csv file lines
I tried your code, and it worked well without 'a+' option when open the text file.
Your code shows nothing because you opened file as a 'wrting' mode.
You should give the option as 'r' or 'r+' or just leave it as default.
'r' : open for reading (default)
'a' : open for writing, appending to the end of the file if it exists.
'+' : open a disk file for updating (reading and writing)
Try:
stats = open('stats.txt') # select
#stats = open('stats.txt','r') # one of
#stats = open('stats.txt','r+') # these
statheadings = stats.readlines()
print(statheadings)
It will work as well, and the result: ['404: Not Found\n']
If you want to check only a value, you can add index also.
Print only the last line:
print(satheadings[-1])
Result:
404: Not Found
Rather than attempting to save the file to the disk first, you can just open it directly:
import requests
response = requests.get('https://raw.githubusercontent.com/ksu-is/NFLQuarterbackstatIdentifier/master/stats.csv')
print(response.text)
However, the URL that you're trying to access is giving me a 404. Is this because it's in a private repository? If so, you'll want to store it somewhere where it's publicly accessible so your program can reach it (or otherwise set up a more complicated authentication scheme).

How does one read a .dif file with Python

I am working on a project that requires me to read a file with a .dif extension. Dif stands for data information exchange. The file opens nicely in Open Office Calc. Then you can easily save as a csv file, however when I open in Python all I get are random characters that don't make sense. Here is the last code that I tried just to see if I could read.
txt = open('C:\myfile.dif', 'rb').read()
print txt
I would even be open to programatically converting the file to csv first. before opening if someone knows how to do that. As always, any help is much appreciated. Below is a partial screenshot of what I get when I run the code.
Hadn't heard of this file format. Went and got a sample here.
I tested your method and it works fine:
>>> content = open(r"E:\sample.dif", 'rb').read()
>>> print (content)
b'TABLE\r\n0,1\r\n"EXCEL"\r\nVECTORS\r\n0,8\r\n""\r\nTUPLES\r\n0,3\r\n""\r\nDATA\r\n0,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"Welcome to File Extension FYI Center!"\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n""\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"ID"\r\n1,0\r\n"Type"\r\n1,0\r\n"Description"\r\n-1,0\r\nBOT\r\n0,1\r\nV\r\n1,0\r\n"ASP"\r\n1,0\r\n"Active Server Pages"\r\n-1,0\r\nBOT\r\n0,2\r\nV\r\n1,0\r\n"JSP"\r\n1,0\r\n"JavaServer Pages"\r\n-1,0\r\nBOT\r\n0,3\r\nV\r\n1,0\r\n"PNG"\r\n1,0\r\n"Portable Network Graphics"\r\n-1,0\r\nBOT\r\n0,4\r\nV\r\n1,0\r\n"GIF"\r\n1,0\r\n"Graphics Interchange Format"\r\n-1,0\r\nBOT\r\n0,5\r\nV\r\n1,0\r\n"WMV"\r\n1,0\r\n"Windows Media Video"\r\n-1,0\r\nEOD\r\n'
>>>
The question is what is in the file and how do you want to handle it. Personally I liked:
with open(r"E:\sample.dif", 'rb') as f:
for line in f:
print (line)
In the first code block, that long line that has a b'' (for bytes!) in front of it can be iterated on \r\n:
b'TABLE\r\n'
b'0,1\r\n'
b'"EXCEL"\r\n'
b'VECTORS\r\n'
b'0,8\r\n'
b'""\r\n'
b'TUPLES\r\n'
b'0,3\r\n'
b'""\r\n'
b'DATA\r\n'
b'0,0\r\n'
.
.
.
b'"Windows Media Video"\r\n'
b'-1,0\r\n'
b'EOD\r\n'

Find&Replace using Python - Binary file

I'm attempting to do a "find and replace" in a file on a Mac OS X computer. Although it appears to work correctly. It seems that the file is somehow altered. The text editor that I use (Text Wrangler) is unable to even open the file once this is completed.
Here is the code as I have it:
import fileinput
for line in fileinput.FileInput("testfile.txt",inplace=1):
line = line.replace("newhost",host)
print line,
When I view the file from the terminal, it does say "testfile" may be a binary file. See it anyway? Is there a chance that this replace is corrupting the file? Do I have another option for this to work? I really appreciate the help.
Thank you,
Aaron
UPDATE: the actual file is NOT a .txt file it is a .plist file which is preference file in Mac OS X if that makes any difference
LINK to plist file:
http://www.queencitytech.com/plist.zip
Your code worked for me fine. However, I would suggest a different approach: don't try overwriting the file directly. I never like changing the file directly because if you have a bug or something like that the file is lost. Generate a new file then copy it over manually (or within python, if you really want to).
PATH = 'testfile.txt'
FILE = open(PATH)
OUT_FILE = open('out_' + PATH, 'w')
for line in FILE.readlines():
print >> OUT_FILE, line.replace('newhost', host),
Try using sys.stdout.write instead of print. readlines() retains the new line characters at the end of the read line. The print statement adds an additional new line character, so it's likely double spacing the file.

Categories