I have set of URL links in my file and i need to open every link and fetch the output and i need to store that in a file. But if i tried to print output empty lines are coming.
Please find the code below and help me on this
import urllib2
import webbrowser
with open('C:\\Users\\home\\Desktop\\11.txt','r') as fp:
for line in fp:
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
top_level_url = "https://facebook.com"
password_mgr.add_password(None, top_level_url, "appsdev", "--omitted--")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
r=opener.open(top_level_url)
r.read()
print r.read()
If the code you posted is correct and the 2nd r.read() isn't a typo, then it's because you have two reads.
On file-like objects (like the return value from opener.open()), calling read() will return the entire contents and set the current position to the end of the file. Subsequent calls to read() will return empty strings, since the cursor is already at the end of the file.
In your code
r.read() # This returns the entire contents
print r.read() # Empty string
Just get rid of the first r.read().
Before Writing into some other file , assign that content into any variable,
like ,
out_data = r.read()
new_file = open('file.txt','w')
new_file.write(out_data)
new_file.close()
thats it your scraped data will be be wrote into file.txt
Related
I have written some code to read the contents from a specific url:
import requests
import os
def read_doc(doc_ID):
filename = doc_ID + ".txt"
if not os.path.exists(filename):
my_url = encode_url(doc_ID) #this is a call to another function that would encode the url
my_response = requests.get(my_url)
if my_response.status_code == requests.codes.ok:
return my_response.text
return None
This checks if there's a file named doc_ID.txt (where doc_ID could be any name provided). And if there's no such file, it would read the contents from a specific url and would return them. What I would like to do is to store those returned contents in a file called doc_ID.txt. That is, I would like to finish my function by creating a new file in case it didn't exist at the beginning.
How can I do that? I tried this:
my_text = my_response.text
output = os.rename(my_text, filename)
return output
but then, the actual contents of the file would become the name of the file and I would get an error saying the filename is too long.
So the issue I think I'm seeing is that you want to put the contents of your request's response into the file, rather than naming the file with the contents. The code below should create a file with the filename you want, and insert the text from your response!
import requests
import os
def read_doc(doc_ID):
filename = doc_ID + ".txt"
if not os.path.exists(filename):
my_url = encode_url(doc_ID) #this is a call to another function that would encode the url
my_response = requests.get(my_url)
if my_response.status_code == requests.codes.ok:
with open(filename, "w") as file:
file.write(my_response.text)
return file
return None
To write the response text to the file, you can simply use python file object, https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
with open(filename, "w") as file:
file.write(my_text)
Please correct me if I am wrong as I am a beginner in python.
I have a web services URL which contains an XML file:
http://abc.tch.xyz.edu:000/patientlabtests/id/1345
I have a list of values and I want to append each value in that list to the URL and download file for each value and the name of the downloaded file should be the same to the value appended from the list.
It is possible to download one file at a time but I have 1000's of values in the list and I was trying to write a function with a for loop and I am stuck.
x = [ 1345, 7890, 4729]
for i in x :
url = http://abc.tch.xyz.edu:000/patientlabresults/id/{}.format(i)
response = requests.get(url2)
****** Missing part of the code ********
with open('.xml', 'wb') as file:
file.write(response.content)
file.close()
The files downloaded from URL should be like
"1345patientlabresults.xml"
"7890patientlabresults.xml"
"4729patientlabresults.xml"
I know there is a part of the code which is missing and I am unable to fill in that missing part. I would really appreciate if anyone can help me with this.
Accessing your web service url seem not to be working. Check this.
import requests
x = [ 1345, 7890, 4729]
for i in x :
url2 = "http://abc.tch.xyz.edu:000/patientlabresults/id/"
response = requests.get(url2+str(i)) # i must be converted to a string
Note: When you use 'with' to open a file, you do not have close the file since it will closed automatically.
with open(filename, mode) as file:
file.write(data)
Since the Url you provide is not working, I am going to use a different url. And I hope you get the idea and how to write to a file using the custom name
import requests
categories = ['fruit', 'car', 'dog']
for category in categories :
url = "https://icanhazdadjoke.com/search?term="
response = requests.get(url + category)
file_name = category + "_JOKES_2018" #Files will be saved as fruit_JOKES_2018
r = requests.get(url + category)
data = r.status_code #Storing the status code in 'data' variable
with open(file_name+".txt", 'w+') as f:
f.write(str(data)) # Writing the status code of each url in the file
After running this code, the status codes will be written in each of the files. And the file will also be named as follows:
car_JOKES_2018.txt
dog_JOKES_2018.txt
fruit_JOKES_2018.txt
I hope this gives you an understanding of how to name the files and write into the files.
I think you just want to create a path using str.format as you (almost) are for the URL. maybe something like the following
import os.path
x = [ 1345, 7890, 4729]
for i in x:
path = '1345patientlabresults.xml'.format(i)
# ignore this file if we've already got it
if os.path.exists(path):
continue
# try and get the file, throwing an exception on failure
url = 'http://abc.tch.xyz.edu:000/patientlabresults/id/{}'.format(i)
res = requests.get(url)
res.raise_for_status()
# write the successful file out
with open(path, 'w') as fd:
fd.write(res.content)
I've added some error handling and better behaviour on retry
User may give a bunch of urls as command line args. All URLs given in the past are serialized with pickle. The script checks all given URLs, if they are unique then they are serialized and appended to a file. At least that's what should be happening. Nothing is being appended. However when I open the file in write mode,the new, unique URL is written. So what gives? Code is:
def get_new_urls():
if(len(urls.URLs) != 0): # check if empty
with open(urlFile, 'rb') as f:
try:
cereal = pickle.load(f)
print(cereal)
toDump = []
for arg in urls.URLs:
if (arg in cereal):
print("Duplicate URL {0} given, ignoring it.".format(arg))
else:
toDump.append(arg)
except Exception as e:
print("Holy bleep something went wrong: {0}".format(e))
return(toDump)
urlsToDump = get_new_urls()
print(urlsToDump)
# TODO: append new URLs
if(urlsToDump):
with open(urlFile, 'ab') as f:
pickle.dump(urlsToDump, f)
# TODO check HTML of each page against the serialized copy
with open(urlFile, 'rb') as f:
try:
cereal = pickle.load(f)
print(cereal)
except EOFError: # your URL file is empty, bruh
pass
Pickle writes out the data you give it in a special format, e.g. it will write some header/metadata/etc, to the file you give it.
It is not intended to work this way; concatenating two pickle files doesn't really make sense. To achieve a concatenation of your data, you'd need to first read whatever is in the file into your urlsToDump, then update your urlsToDump with any new data, and then finally dump it out again (overwriting the whole file, not appending).
After
with open(urlFile, 'rb') as f:
you need a while loop, to repeatedly unpickle (repeatedly read) from the file until hitting EOF.
I am trying to send data from a text file to a server looking for a match to the sent data in order to get that matched data returned back to me that I store in an existing text file. If I send a list of names to the server within the script, I am fine. I however want to repeat the request and insert a text file as the names to be matched and returned. Here is my text so far:
import json
import urllib2
values = 'E:\names.txt'
url = 'https://myurl.com/get?name=values&key=##########'
response = json.load(urllib2.urlopen(url))
with open('E:\data.txt', 'w') as outfile:
json.dump(response, outfile, sort_keys = True, indent = 4,ensure_ascii=False);
This code just send back a one line file showing nothing has matched. I am assuming that it is just looking at the values as the name instead of the data in the values text file.
Update Trial 1: I updated my code as per suggested below to include the urllib.urlencode suggestion. Here is my updated code:
import json
import urllib
import urllib2
file = 'E:\names.txt'
url = 'https://myurl.com/get'
values = {'name' : file,
'key' : '##########'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = json.load(urllib2.urlopen(req))
with open('E:\data.txt', 'w') as outfile:
json.dump(response, outfile, sort_keys = True, indent = 4,ensure_ascii=False);
fixed traceback errors by editing url. However it is just passing "e:\names.txt" as name in the JSON request. So it seems my issue now is just trying to send the data in the names.txt file to the tuple 'names' properly. Any thoughts?
Make sure when sending parameters to server, they're encoded -- see urllib.urlencode()
Ok, so I'm trying to do a sentimental analysis of twitter tweets and all my code works perfect to get a response of live tweets. However the shell deletes all the tweets after a certain amount was reached. I have been messing around with my code to try and write all the tweets to a text file but for the last 5 hours of my struggles I can not figure it out. Where the comment symbol # is code I added to try and write the information to my text file. I'm fairly new to python so if someone can help me out I would very much appreciate it.
I would use Git because I know how to write all the data to a text file in that program but I can't figure out how to get it to run my python files.
def twitterreq(url, method, parameters):
req = oauth.Request.from_consumer_and_token(oauth_consumer,
token=oauth_token,
http_method=http_method,
http_url=url,
parameters=parameters)
req.sign_request(signature_method_hmac_sha1, oauth_consumer, oauth_token)
headers = req.to_header()
if http_method == "POST":
encoded_post_data = req.to_postdata()
else:
encoded_post_data = None
url = req.to_url()
opener = urllib.OpenerDirector()
opener.add_handler(http_handler)
opener.add_handler(https_handler)
response = opener.open(url, encoded_post_data)
return response
def fetchsamples():
url = "https://stream.twitter.com/1/statuses/sample.json"
parameters = []
response = twitterreq(url, "GET", parameters)
f=open("C:\\Users\\name\\Desktop\\datasci_course_materials\\assignment1", "w") # my attempt
for line in response:
f.write(str(line) + "\n") # 100% sure im not using this command properly
print line.strip()
if __name__ == '__main__':
fetchsamples()
I have left out the top of my code because we shouldn't need my access and consumer keys to answer this question. This code is in Python 2.7
Could try something along the lines of.
try:
with open("filename.txt", "a") as f:
for n in response:
f.write(n + "\n")
f.close()
except IOError as e:
print e
except TypeError as t:
print t
This will attempt to open filename.txt and append each item in "response" to a new line. It will capture IO errors and Type errors.
The line f=open("<filename>", "w") # my attempt means that if it stops, your file will just be lost completely and erased. Every time that your program runs this line it erases the file and then opens it.
Try changing the mode "a", which means that each subsequent call will just add data to the end.
f = open("<filename>", "a") # Appending instead of overwriting.
Extra information: https://docs.python.org/2/library/functions.html#open