Python Requests Get XML

Python Requests Get XML - python

If I go to http://boxinsider.cratejoy.com/feed/ I can see the XML just fine. But when I try to access it using python requests, I get a 403 error.
blog_url = 'http://boxinsider.cratejoy.com/feed/'
headers = {'Accepts': 'text/html,application/xml'}
blog_request = requests.get(blog_url, timeout=10, headers=headers)
Any ideas on why?

Because it's hosted by WPEngine and they filter user agents.
Try this:
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36"
requests.get('http://boxinsider.cratejoy.com/feed/', headers={'User-agent': USER_AGENT})

Related

Python request returning 403

I am trying to use requests to make an api call that this page is making https://www.betonline.ag/sportsbook/martial-arts/mma.
requests.post(
url='https://api.betonline.ag/offering/api/offering/sports/offering-by-league',
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Safari/605.1.15'}
json={"Sport":"martial-arts","League":"mma","ScheduleText":None,"Period":0}
)
I have also tried including all the headers I see in the image but am still unable to get a 200 and get the response.
What am I missing?

Unable to read website intermittently with requests

I tried to read a website using Python requests.
However, it sometimes succeeds but sometimes fails.
Here is my code:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = 'https://hn.house.ifeng.com/homedetail/25762.shtml'
res = requests.get(url, headers = headers, timeout = 10, verify=False)
What are the reasons and how to solve it?
Thank you very much.

403 forbidden error. can't access to this site

I want to scrape https://health.usnews.com/doctors/specialists-index while sending a request to this site through scrapy spider it shows status code as 403. In my request, I added user_agent but also it's not working.
I referred these two answer Python Doesn't Have Permission To Access On This Server / Return City/State from ZIP and 403:You don't have permission to access /index.php on this server but it's not working for me.
my user_agent is Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36. Some one help me to scrape the above mentioned site.

Try to add 'authority' in the headers as well. The below works for me in scrapy shell:
from scrapy import Request
headers = {
'authority': 'health.usnews.com',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
}
url = "https://health.usnews.com/doctors/specialists-index"
req = Request(url, headers=headers)
fetch(req)

Requests posts with Python

This is the first time I am trying requests.post() because I have always used requests.get(). So I'm trying to navigate to a website and search. I am using yellowpages.com, and before I get negative feedback about using the site to scrape or about an API, I just want to try it out. The problem I am running into is that it spits out some html that isn't remotely what I am looking for. I'll post my code below to show you what I am talking about.
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
url = "https://www.yellowpages.com"
search_terms = "Car Dealership"
location = "Jackson, MS"
q = {'search_terms': search_terms, 'geo_locations_terms': location}
page = requests.post(url, headers=headers, params=q)
print(page.text)

Your request boils down to
$ curl -X POST \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36' \
'https://www.yellowpages.com/?search_terms=Car Dealership&geo_locations_terms=Jackson, MS'
For this the server returns a 502 Bad Gateway status code.
The reason is that you use POST together wihy query parameters params. The two don't go well together. Use data instead:
requests.post(url, headers=headers, data=q)

How to change user agent urllib2

I'm trying to access a page using the following
page = urllib2.urlopen(full_url)
soup = BeautifulSoup(page, 'html.parser')
li_post_id = "post-" + str(post_id)
li_soup = soup.find('li', attrs={'id':li_post_id})
This works fine on my ubuntu machine, but when running it on my Windows server I get 403 Forbidden error, so I assume the issue is with the user agent.
How do I change this, say, to Firefox? I have only seen tutorials to change the user agent using requests, but I don't want to change all of my code to this.

You could try this.
import random
import requests, bs4
agents= [
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)']
headers = {"User-Agent":random.choice(agents)}
response = requests.get(full_url,headers=headers)
soup = BeautifulSoup(response.text, 'lxml')

Changing the header doesn't have anything to do with BeautifulSoup. It is meant for HTML parsing only. You need to change it in your urllib request like so:
Python3
import urllib.request
req = urllib.request.build_opener()
req.addheaders = [('User-Agent', 'Some user agent')]
response = req.open('http://www.stackoverflow.com')
Python2.7
import urllib2
req = urllib2.build_opener()
req.addheaders = [('User-Agent', 'Some user agent')]
response = req.open('http://www.stackoverflow.com')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Requests Get XML - python

Because it's hosted by WPEngine and they filter user agents. Try this: USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36" requests.get('http://boxinsider.cratejoy.com/feed/', headers={'User-agent': USER_AGENT})

Related

Python request returning 403

Unable to read website intermittently with requests

403 forbidden error. can't access to this site

Requests posts with Python

How to change user agent urllib2

Categories

Resources