I want to download JSON from URL of dextools URL is
https://www.dextools.io/chain-bsc/api/Pancakeswap/pools?timestampToShow=1669287222&range=1
my code:
from urllib.request import urlopen, Request
import json
download_url = "https://www.dextools.io/chain-bsc/api/Pancakeswap/pools?timestampToShow=1657123280&range=1"
header = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
req = Request(download_url, headers=header)
webpage = urlopen(req).read()
data = json.loads(webpage)
Related
i wont to download json from url of dextools
url is >> :https://www.dextools.io/chain-bsc/api/Pancakeswap/pools?timestampToShow=1669287222&range=1
my code:
from urllib.request import urlopen, Request
import json
download_url = "https://www.dextools.io/chain-bsc/api/Pancakeswap/pools?timestampToShow=1657123280&range=1"
header = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
req = Request(download_url, headers=header)
webpage = urlopen(req).read()
data = json.loads(webpage)
I am trying to extract the redirected link of this link. When I click on this link I am redirected to this page and I want to store this page link. So, for this I have tried with urllib module but it didn't give any response.
from urllib import request
headers = headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)'}
url = 'https://www.forexfactory.com/news/403059-manufacturing-in-us-expands-after-reaching-three-year-low/hit'
response = requests.get(url, headers=headers)
print(response) # Output: <Response [503]>
So, how can I extract this link?
You can use cloudscraper to process the cloudflare redirect:
import cloudscraper
scraper = cloudscraper.create_scraper()
url = 'https://www.forexfactory.com/news/403059-manufacturing-in-us-expands-after-reaching-three-year-low/hit'
r = scraper.get(url)
print(r.url)
you can use the requests library
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko)'}
url = 'https://www.forexfactory.com/news/403059-manufacturing-in-us-expands-after-reaching-three-year-low/hit'
response = requests.get(url, headers=headers)
print(response.url)
I'm trying to scrape carrefour website data through python. I've used scrappy, beautiful soup, selenium but nothing seems to work. I'm getting the error that you don't have the permission to access. Is there any way to scrape this website? The code is attached below, NEED HELP!
from requests_html import HTMLSession
session = HTMLSession()
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
resp = session.get("https://www.carrefour.pk/",headers=headers)
resp.html.render()
a=resp.html.html
print(a)
think you are using the wrong headers. These headers work fine for me.
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}
Or full:
import requests
from bs4 import BeautifulSoup as bs
# Block cookies
from http import cookiejar # Python 2: import cookielib as cookiejar
class BlockAll(cookiejar.CookiePolicy):
return_ok = set_ok = domain_return_ok = path_return_ok = lambda self, *args, **kwargs: False
netscape = True
rfc2965 = hide_cookie2 = False
s = requests.Session()
s.cookies.set_policy(BlockAll())
#Get URL
url = "https://www.carrefour.pk"
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}
r = s.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
print(soup)
I'm using python requests library and I'm trying to login to https://www.udemy.com/join/login-popup/, the problem is when I use the following header:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
it returns CSRF verification failed. Request aborted.
When I change it to:
headers = {'Referer': url}
it returns Please verify that you are a human.
any suggestions?
My code:
import requests
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/74.0.3729.131 Safari/537.36'}
request = s.get(url, headers=headers)
cookies = dict(cookies=request.cookies)
csrf = request.cookies['csrftoken']
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail',
'password': 'maypassword'}
request = s.post(url, data=data_login, headers={'Referer': url}, cookies=cookies['cookies'])
print(request.content)
There are a couple of issues with your current code:
The header you are using is missing a few things
The value that you are passing for csrfmiddlewaretoken isn't correct
As you're using requests.session() you shouldn't include cookies manually (in this case)
Try this code:
import requests
from bs4 import BeautifulSoup
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Referer": "https://www.udemy.com/join/login-popup/", "Upgrade-Insecure-Requests": "1"}
request = s.get(url, headers=headers)
soup = BeautifulSoup(request.text, "lxml")
csrf = soup.find("input",{"name":"csrfmiddlewaretoken"})["value"]
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail#test.com','password': 'maypassword'}
request = s.post(url, data=data_login, headers=headers)
print(request.content)
(PS: I'm using the Beautifulsoup library in order to find the value of csrfmiddlewaretoken)
Hope this helps
I am trying to SCRAPE DATA FROM TWITTER. previously working code does not work any more the twitter is sending weard html commends. i am using different urllib.request.Request commands stil not getting the html structure as i seen it in browers. i could not find a solution to my problem
alternative 1
import urllib.parse
import urllib.request
url = 'https://twitter.com/search?l=&q=%22gsm%22%20since%3A2017-01-01%20until%3A2017-05-02&src=typd'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
values = {'name': 'Michael Foord',
'location': 'Northampton',
'language': 'Python' }
headers = {'User-Agent': user_agent}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data, headers)
with urllib.request.urlopen(req) as response:
the_page = response.read()
alternative 2:
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
is there any way that i could solve this issue with urllib or any other way