Reading the content of robots.txt in Python and printing it [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to check if a given website contains robot.txt, read all the content of that file and print it. Maybe also add the content to a dictionary would be very good.
I've tried playing with the robotparser module but can't figure out how to do it.
I would like to use only modules that come with the standard Python 2.7 package.
I did as #Stefano Sanfilippo suggested:
from urllib.request import urlopen
returned
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
from urllib.request import urlopen
ImportError: No module named request
So I tried:
import urllib2
from urllib2 import Request
from urllib2 import urlopen
with urlopen("https://www.google.com/robots.txt") as stream:
print(stream.read().decode("utf-8"))
but got:
Traceback (most recent call last):
File "", line 1, in
with urlopen("https://www.google.com/robots.txt") as stream:
AttributeError: addinfourl instance has no attribute 'exit'
From bugs.python.org it seems that's something not supported in 2.7 version.
As a matter of fact the code works fine with Python 3
Any idea how to work this around?

Yes, robots.txt is just a file, download and print it!
Python 3:
from urllib.request import urlopen
with urlopen("https://www.google.com/robots.txt") as stream:
print(stream.read().decode("utf-8"))
Python 2:
from urllib import urlopen
from contextlib import closing
with closing(urlopen("https://www.google.com/robots.txt")) as stream:
print stream.read()
Note that the path is always /robots.txt.
If you need to put content in a dictionary, .split(":") and .strip() are your friends:

Related

What is causing this error and how can it be fixed? [duplicate]

This question already has answers here:
How to avoid circular imports in Python? [duplicate]
(3 answers)
Closed 2 years ago.
I am having the error below and I am not sure how to fix it. I know it has something to do with my imports but I am not sure what needs to be done in order to fix this issue.
Traceback (most recent call last):
File "WebOutput.py", line 1, in <module>
import DatabaseInteractor
File "/Users/yaminhimani/Desktop/tweetybird/DatabaseInteractor.py", line 3, in <module>
import WebOutput
File "/Users/yaminhimani/Desktop/tweetybird/WebOutput.py", line 4, in <module>
db = DatabaseInteractor.DatabaseInteractor()
AttributeError: partially initialized module 'DatabaseInteractor' has no attribute 'DatabaseInteractor' (most likely due to a circular import)
WebOutput.py file
import DatabaseInteractor
import nltk
db = DatabaseInteractor.DatabaseInteractor()
class WebOutput:
def __init__(self,text):
self.text= text
#self.hashtag = input("Enter Hashtag")
DataInteractor.py file
import mysql.connector
import Tweet
import WebOutput
import re
class DatabaseInteractor:
def __init__(self):
# connects to the mysql server
# config settings should be changed based on where you are trying to connect (they are currently set for my local sql server)
config = {
}
You are importing DatabaseInteractor in WebOutput, then you are importing WebOutput in DatabaseInteractor!! I suggest you to move something to another file, just import it in both for what you need or rearrange the code in some way you have to figure out not to need one each other!!

AttributeError: module 'requests' has no attribute 'post'. is it deprecated and new function has been introduced? [duplicate]

This question already has answers here:
Importing installed package from script with the same name raises "AttributeError: module has no attribute" or an ImportError or NameError
(2 answers)
Closed 3 years ago.
I've been trying to send requests to a local server built using flask.
requests are sent using requests module of python.
I don't know if that requests.post function has been deprecated and new one's introduced or is there anything really wrong with my code. I've done everything exactly as said in this article.
Here's my code:
import requests
host_url = "http://127.0.0.1:5000"
# get the data
data_for_prediction = [int(input()) for _ in range(10)]
r = requests.post(url=host_url,json={data_for_prediction})
print(r.json())
The error I'm getting for above code is:
Traceback (most recent call last):
File "C:/Users/--/requests.py", line 1, in <module>
import requests
File "C:\Users\--\requests.py", line 8, in <module>
r = requests.post(url=host_url,json={data_for_prediction})
AttributeError: module 'requests' has no attribute 'post'
Process finished with exit code 1
my server code is:
flask_server_app = Flask(__name__)
# let's make the server now
#flask_server_app.route("/api/predict", methods=["GET", "POST"])
# my prediction function goes here
def predict():
# Get the data from the POST request & reads the received json
json_data = request.get_json(force=True)
# making prediction
# Make prediction using model loaded from disk as per the data.
prediction = ml_model.predict([[json_data]])
# return json version of the prediction
return jsonify(prediction[0])
# run the app now
if __name__ == '__main__':
flask_server_app.run(port=5000, debug=True)
I've tried checking documentation, checked many articles online and also re-wrote the whole code. But, none helped.
So, is that requests.post function deprecated and a new one's been introduced or is there anything wrong with my code.
It seems like you are writing your code in a file called requests.py so when you try to import the requests module, it does import your own file as a module. Try renaming your file...

Pull JSON data from internet and print in Python? [duplicate]

This question already has answers here:
Python 3.2 Unable to import urllib2 (ImportError: No module named urllib2) [duplicate]
(3 answers)
Closed 6 years ago.
I'm developing a Twitch chat bot in Python. However, I'm having some trouble with a feature that has been requested a lot. I need to pull the "gameserverid" and "gameextrainfo" data from a JSON file. example file
import urllib2
import json
req = urllib2.Request("http://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key=605C90955CFDE6B1CB7D2EFF5FE824A0&steamids=76561198022404556")
opener = urllib2.build_opener()
f = opener.open(req)
json = json.loads(f.read())
currentlyPlaying = json[gameextrainfo]
gameServer = json[gameserverid]
This is the code I've got at the moment. I want to get it so that other commands can print the variables "currentlyPlaying" and "gameServer" to the IRC chat. However, when I do this, I get this in the console :
Traceback (most recent call last):
File "N:/_DEVELOPMENT/Atlassian Cloud/TwitchChatBot/Testing/grabplayerinfofromsteam.py", line 1, in <module>
import urllib2
ImportError: No module named 'urllib2'
Any ideas? I'm in a Windows environment, running on the latest version of Python (Python 3.5.1)
try:
import urllib.request as urllib2
except ImportError:
import urllib2
but dont use urllib2, use requests!
pip install requests
http://docs.python-requests.org/en/master/

Python 3.4.3 save image from url to file using urllib

I tried to make a python program that would allow me to download a jpg file from a website.
Why I'm doing this is really for no reason at all, I just wanted to try it for fun.
Anyways, here is the code:
import urllib
a = 1
while a == 1:
urllib.urlretrieve("http://lemerg.com/data/wallpapers/38/957049.jpg","D:\\Users\\Elias\\Desktop\\FolderName-957049.jpg")
(You may have to properly tab it in, it wouldn't let me here)
So basically what I want it to do is to repeatedly download the same file until I close the program. Just don't ask why.
The error code I get is:
Traceback (most recent call last):
urllib.urlretrieve("http://lemerg.com/data/wallpapers/38/957049.jpg","D:\Users\Elias\Desktop\FolderName-957049.jpg")
AttributeError: 'module' object has no attribute 'urlretrieve'
urlretrieve() in Python3 is in the urllib.request module. Do this:
from urllib import request
a = 1
while a == 1:
request.urlretrieve("http://lemerg.com/data/wallpapers/38/957049.jpg","D:\\Users\\Elias\\Desktop\\FolderName-957049.jpg")

Get sourcecode for urls [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have following codes:
import urllib2
from itertools import product
with open('urllist.txt') as urllist:
urls=[line.strip() for line in urllist]
for url in product(urls):
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
sourcecode=open('./sourcecode', 'w+')
sourcecode.write(data)
When I ran it, it gave:
Traceback (most recent call last):
File "12.py", line 8, in <module>
usock = urllib2.urlopen(url)
File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 383, in open
req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'
Any idea how to fix it? Many thanks!
itertools.product returns a tuple not the item itself.:
>>> from itertools import product
>>> lis = ['a','b','c']
>>> for p in product(lis):
... print p
...
('a',)
('b',)
('c',)
Use a simple loop over urls:
for url in urls:
usock = urllib2.urlopen(url)

Categories