Using Python to access web reources. JSON problem?

Using Python to access web reources. JSON problem? - python

I have tried a lot to try to get this to work.
There are CSV resources on a website to that I want to download and save I can do this using firefox but would like to automate the process.
So something like:
headers={ 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
payload = { 'email': username, 'password': password}
with requests.Session() as s:
p=s.post('https://isaacphysics.org/api/v2.10.4/api/auth/segue/authenticate', json=payload,headers=headers)
print(p.status_code)
webpage=s.get('https://isaacphysics.org/api/v2.10.4/api/assignments/assign/group/24684/progress/download' ,headers=headers)
The post call returns status 200, returns some user data which verifies it has worked and creates the necessary cookies but the 'get' fails to authenticate.
in firefox the request looks like this:
Method: GET
URL: https://isaacphysics.org/api/v2.10.4/api/assignments/assign/group/25090/progress/download
Request Headers:
Host: isaacphysics.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Cookie: JSESSIONID=9g94t7JUNKJUNK3blif6bkx; isaacCookiesAccepted=1; SEGUE_AUTH_COOKIE="{\"currentUserId\":\"47878\",\"token\":\"0\",\"DATE_EXPIRES\":\"Tue May 12 13:41:48 +0000 2020\",\"HMAC\":\"Tr+eAeSQWU9JUNKJUNKJUNKJUNKJUNKJUNK2MQVrBQ=\"}"
Upgrade-Insecure-Requests: 1
and in python
print(webpage.request.headers)
gives:
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*',
'Connection': 'keep-alive',
'Cookie': 'JSESSIONID=1jr9h3wJUNKJUNKJUNKrpfzx; SEGUE_AUTH_COOKIE="{currentUserId:47878,token:0,DATE_EXPIRES:Tue May 12 18:08:00 +0000 2020,HMAC:3LlaQTN/JUNKJUNKJUNKJUNKJUNKJUNKQW4mo=}"; isaacCookiesAccepted=1'}
The obvious difference seems to be in SEGUE_AUTH_COOKIE because some of the escape characters have been dropped. I have tried several things and if this is the issue I cannot figure out how to fix it. Any help greatly appreciated.

Related

Issue about getting a 404 error page when scraping a website

I wanted to get Fear and greed index from 'https://alternative.me/crypto/ and I found a url where the XHR data is by using Chrome network tool but when trying to scrape the website, I got 404 or 500 error page.
url = 'https://alternative.me/api/crypto/fear-and-greed-index/history'
headers = {
'user-agent': ('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'),
'Referer': ('https://alternative.me/crypto/fear-and-greed-index/'),
'Accept-Language': ('ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7'),
'accept-encoding': ('gzip, deflate, br'),
'accept': ('application/json, text/plain, */*')
}
res = requests.post(url, headers=headers)

Requests python and dio flutter does not return same result

I have python code like
import requests
headers = {'Host': 'www.google.com','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0','Accept': '*/*','Accept-Language': 'en-US,en;q=0.5','Accept-Encoding': 'gzip, deflate, br','Referer': 'https://www.google.com/','Origin': 'https://www.google.com','Connection': 'keep-alive','Content-Length': '0','TE': 'Trailers'}
response = requests.get("https://www.google.co.in/search?tbs=sbi:AMhZZivWlHh9fYSFQ1SYSgdWdYroq7vlNqRWbgzAeOHgb1_1aVO6EfHf9oo4N6kMf9pR-MjgXMeP5EG4VTTeZ5UujHI12znActXxMoyDqKsqI0cgI9YJ_11xd5R0DiKpo2drjWKnK2lNgGSGJYKdDFJ0ZNKhfBTUn3WKSmG72gLR07uPXdby9jCXC1KJFqBSpaGNrJ6Zc6LSQymwNqqJZrO8iwNYRPzJsoHlWUZNSoZ1X18Ii8X7x0TrlgSz0HySJ_1QO3E8LLbaE0rZluLVBsk6t0GDW2MR4IXs3dCuCcTMPDgqZS-CMks6Tgc6xkDLyLLBC051S6gNxRhXpZ3FVg75Vlt_1nAptI_1Vpw",headers=headers)
print(len(response.text))
and a flutter code like
import 'package:dio/dio.dart';
Dio dio = Dio();
var headers = {'Host': 'www.google.com','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0','Accept': '*/*','Accept-Language': 'en-US,en;q=0.5','Accept-Encoding': 'gzip, deflate, br','Referer': 'https://www.google.com/','Origin': 'https://www.google.com','Connection': 'keep-alive','Content-Length': '0','TE': 'Trailers'};
var response = await dio.get("https://www.google.co.in/search?tbs=sbi:AMhZZivWlHh9fYSFQ1SYSgdWdYroq7vlNqRWbgzAeOHgb1_1aVO6EfHf9oo4N6kMf9pR-MjgXMeP5EG4VTTeZ5UujHI12znActXxMoyDqKsqI0cgI9YJ_11xd5R0DiKpo2drjWKnK2lNgGSGJYKdDFJ0ZNKhfBTUn3WKSmG72gLR07uPXdby9jCXC1KJFqBSpaGNrJ6Zc6LSQymwNqqJZrO8iwNYRPzJsoHlWUZNSoZ1X18Ii8X7x0TrlgSz0HySJ_1QO3E8LLbaE0rZluLVBsk6t0GDW2MR4IXs3dCuCcTMPDgqZS-CMks6Tgc6xkDLyLLBC051S6gNxRhXpZ3FVg75Vlt_1nAptI_1Vpw",options: Options(headers: headers,followRedirects: true));
var a=response.data;
print(a.length);
The problem is that the result I am getting from both of the packages is not same. I want the output of python but have to implement python code in flutter. Flutter solution with any other package is also nice...

Download NSE 2021 data using Python

I'm facing issue to access such URL via Python code
https://www1.nseindia.com/content/historical/EQUITIES/2021/JAN/cm01JAN2021bhav.csv.zip
This was working for last 3 years until 31-Dec-2020. Seems that the site has implemented some restrictions.
There's solution for similar one here in
VB NSE ACCESS DENIED
This addition is made : "User-Agent" : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11" "Referer" : "https://www1.nseindia.com/products/content/equities/equities/archieve_eq.htm"
Original code is here :
https://github.com/naveen7v/Bhavcopy/blob/master/Bhavcopy.py
It's not working even after adding following in requests section
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11'}
##print (Path)
a=requests.get(Path,headers)
Can someone help?

def download_bhavcopy(formated_date):
url = "https://www1.nseindia.com/content/historical/DERIVATIVES/{0}/{1}/fo{2}{1}{0}bhav.csv.zip".format(
formated_date.split('-')[2],
month_dict[formated_date.split('-')[1]],
formated_date.split('-')[0])
print(url)
res=None
hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*,q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-IN,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,hi;q=0.6',
'Connection': 'keep-alive','Host':'www1.nseindia.com',
'Cache-Control':'max-age=0',
'Host':'www1.nseindia.com',
'Referer':'https://www1.nseindia.com/products/content/derivatives/equities/fo.htm',
}
cookie_dict={'bm_sv':'E2109FAE3F0EA09C38163BBF24DD9A7E~t53LAJFVQDcB/+q14T3amyom/sJ5dm1gV7z2R0E3DKg6WiKBpLgF0t1Mv32gad4CqvL3DIswsfAKTAHD16vNlona86iCn3267hHmZU/O7DrKPY73XE6C4p5geps7yRwXxoUOlsqqPtbPsWsxE7cyDxr6R+RFqYMoDc9XuhS7e18='}
session = requests.session()
for cookie in cookie_dict:
session.cookies.set(cookie,cookie_dict[cookie])
response = session.get(url,headers = hdr)
if response.status_code == 200:
print('Success!')
elif response.status_code == 404:
print('Not Found.')
else :
print('response.status_code ', response.status_code)
file_name="none";
try:
zipT=zipfile.ZipFile(io.BytesIO(response.content) )
zipT.extractall()
file_name = zipT.filelist[0].filename
print('file name '+ file_name)
except zipfile.BadZipFile: # if the zip file has any errors then it prints the error message which you wrote under the 'except' block
print('Error: Zip file is corrupted')
except zipfile.LargeZipFile: # it raises an 'LargeZipFile' error because you didn't enable the 'Zip64'
print('Error: File size if too large')
print(file_name)
return file_name

Inspect the link in your web browser and find GET for the required download link.
Go to Headers and check User-Agent
e.g. User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0
Now modify your code as:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0'
}
result = requests.get(URL, headers = headers)

Getting "WinError 10054" When using urllib, trying to get info from a site. How to fix?

I'm trying to submit info to this site > https://cxkes.me/xbox/xuid
The info: e = {'gamertag' : "Xi Fall iX"}
Every time I try, I get WinError 10054. I can't seem to find a fix for this.
My Code:
import urllib.parse
import urllib.request
import json
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
url = "https://cxkes.me/xbox/xuid"
e = {'gamertag' : "Xi Fall iX"}
f = {'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
'accept-encoding': "gzip, deflate, br",
'accept-language': "en-GB,en-US;q=0.9,en;q=0.8",
'cache-control': "max-age=0",
'content-length': "76",
'content-type': "application/x-www-form-urlencoded",
'cookie': "__cfduid=d2f371d250727dc4858ad1417bdbcfba71593253872; XSRF-TOKEN=eyJpdiI6IjVcL2dHMGlYSGYwd3ZPVEpTRGlsMnFBPT0iLCJ2YWx1ZSI6InA4bDJ6cEtNdzVOT3UxOXN4c2lcLzlKRTlYaVNvZjdpMkhqcmllSWN3eFdYTUxDVHd4Y2NiS0VqN3lDSll4UDhVMHM1TXY4cm9lNzlYVGE0dkRpVWVEZz09IiwibWFjIjoiYjdlNjU3ZDg3M2Y0MDBlZDY3OWE5YTdkMWUwNGRiZTVkMTc5OWE1MmY1MWQ5OTQ2ODEzNzlhNGFmZGNkZTA1YyJ9; laravel_session=eyJpdiI6IjJTdlFhK0dacFZ4cFI5RFFxMHgySEE9PSIsInZhbHVlIjoia2F6UTJXVmNSTEt1M3lqekRuNVFqVE5ZQkpDang4WWhraEVuNm0zRmlVSjVTellNTDRUb1wvd1BaKzNmV2lISGNUQ0l6Z21jeFU3VlpiZzY0TzFCOHZ3PT0iLCJtYWMiOiIwODU3YzMxYzg2N2UzMjdkYjcxY2QyM2Y4OTVmMTY1YTcxZTAxZWI0YTExZDE0ZjFhYWI2NzRlODcyOTg3MjIzIn0%3D",
'origin': "https://cxkes.me",
'referer': "https://cxkes.me/xbox/xuid",
'sec-fetch-dest': "document",
'sec-fetch-mode': "navigate",
'sec-fetch-site': "same-origin",
'sec-fetch-user': "?1",
'upgrade-insecure-requests': "1",
'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"}
data = urllib.parse.urlencode(e)
data = json.dumps(data)
data = str(data)
data = data.encode('ascii')
req = urllib.request.Request(url, data, f)
with urllib.request.urlopen(req) as response:
the_page = response.read()
print(the_page)

Having run the code, I get the following error
[WinError 10054] An existing connection was forcibly closed by the remote host
that could be caused by any of the followings:
The network link between server and client may be temporarily going down.
running out of system resources.
sending malformed data.
I am not sure what you trying to achieve here entirely. But if your aim is to simply read the XUID of a gamer tag then use a web-automator like Selenium to retrieve that value.

Python requests post login failed with page expired

I am trying to retrieve some html text behind a login page using python requests Post. But my code fails to do so with return html containing .... The page has expired due to inactivity..
Below is my code.
import requests
url_login = u"https://savethewater-game.com/login"
headers = {
'referer': 'https://savethewater-game.com/login',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}
payload = {
'_token': u'mUaaXNup3vCEtiln5QeQNJNwoO8LrCH9opoVE4GH',
'email': u'someone#gmail.com', # fake email
'password': u'12345678' # fake pass
}
with requests.Session() as session:
p = session.post(url_login, headers=headers, data=payload)
print(p.text)
The intercepted login value in Chrome dev tool is shown below:
:authority: savethewater-game.com
:method: POST
:path: /login
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
accept-encoding: gzip, deflate, br
accept-language: en,it;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6,en-US;q=0.5
cache-control: max-age=0
content-length: 93
content-type: application/x-www-form-urlencoded
cookie: XSRF-TOKEN=eyJpdiI6ImtvRW5UT1BMNjkxNVFBc1d2OVJKZ3c9PSIsInZhbHVlIjoicVVCYlhFRG50QmVKd3V1Yzh4NnNldUhvRXpZOWVSRDFiUGNsT1E4aG9oOUFpYlZ0M1BaRFwvR3VkK1Q4MkhLOFlBZDlxUWp4R0s4YjU4aTZGc0I0RVZ3PT0iLCJtYWMiOiIzYzMxMmI0ZjlhOTM0YzVjZjA5NDk2MDkxMDJlY2VlMjVmNjhiYTJiM2E2OTlkYmYzOTIyYzJiYTM0NTJhMWMyIn0%3D; savethewater_session=eyJpdiI6IjltY2M3alp2endPdWY4VmVpNGhKMXc9PSIsInZhbHVlIjoiVjR2T2lHempPVGM1YW04YldtbGkxcWU3TlwvU1N2RTRcL0VoMzFPY2RLb245bXo0bVJreDl0UnBMYlFjaDNOZlZlMEQ2YVpKVXU3QVYxWWRGNW13bE9wdz09IiwibWFjIjoiNjk0YTdmNTFmYzJiMzg2MDA3NmRiOGU5OTUwMWVkMDE3ZmRkZDY1NzUzMjVjMTYxNzljNjNlZTc4NzE5ODYyNiJ9
origin: https://savethewater-game.com
referer: https://savethewater-game.com/login
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36
Some posts mentioned that the reason could be the block of website from automatic scraping. I would like to know if mine codes went wrong or some other issues. Very much thanks!

It's hard to give a concrete solution without trial and error using that credentials. However, try the following. It should work.
import requests
from bs4 import BeautifulSoup
url_login = "https://savethewater-game.com/login"
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}
payload = {
'_token': '',
'email': 'someone#gmail.com',
'password': '12345678'
}
with requests.Session() as session:
res = session.get(url_login)
cookie_val = res.headers['Set-Cookie'].split(";")[0]
headers['cookie'] = cookie_val
soup = BeautifulSoup(res.text,"lxml")
token = soup.select_one('input[name="_token"]')['value']
payload['_token'] = token
p = session.post(url_login,data=payload,headers=headers)
print(p.content)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Python to access web reources. JSON problem? - python

Related

Issue about getting a 404 error page when scraping a website

Requests python and dio flutter does not return same result

Download NSE 2021 data using Python

Getting "WinError 10054" When using urllib, trying to get info from a site. How to fix?

Python requests post login failed with page expired

Categories

Resources