I am trying to SCRAPE DATA FROM TWITTER. previously working code does not work any more the twitter is sending weard html commends. i am using different urllib.request.Request commands stil not getting the html structure as i seen it in browers. i could not find a solution to my problem
alternative 1
import urllib.parse
import urllib.request
url = 'https://twitter.com/search?l=&q=%22gsm%22%20since%3A2017-01-01%20until%3A2017-05-02&src=typd'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
values = {'name': 'Michael Foord',
'location': 'Northampton',
'language': 'Python' }
headers = {'User-Agent': user_agent}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data, headers)
with urllib.request.urlopen(req) as response:
the_page = response.read()
alternative 2:
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
is there any way that i could solve this issue with urllib or any other way
Related
I want to download JSON from URL of dextools URL is
https://www.dextools.io/chain-bsc/api/Pancakeswap/pools?timestampToShow=1669287222&range=1
my code:
from urllib.request import urlopen, Request
import json
download_url = "https://www.dextools.io/chain-bsc/api/Pancakeswap/pools?timestampToShow=1657123280&range=1"
header = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
req = Request(download_url, headers=header)
webpage = urlopen(req).read()
data = json.loads(webpage)
I am trying to Access a comfort panel with windows CE from Windows 10 and get a audittrail.cvs with python. After logging in using a username and password, I try to download the csv file with the audit trail info but the csv containing the HTML info gets downloaded instead. How do I download the actual file?
import requests
from bs4 import BeautifulSoup
loginurl = ("http://10.70.148.11/FormLogin")
secure_url = ("http://10.70.148.11/StorageCardSD?UP=TRUE&FORCEBROWSE")
downloadurl = ("http://10.70.148.11/StorageCardSD/AuditTrail0.csv?UP=TRUE&FORCEBROWSE")
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
payload = {
'Login': 'Admin',
'Password': 'Pass'
}
with requests.session() as s:
s.post(loginurl, data=payload)
r = s.get(secure_url)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
req = requests.get(downloadurl, headers=headers, allow_redirects=True)
url_content = req.content
csv_file = open('audittrail.csv', 'wb')
csv_file.write(url_content)
csv_file.close()
When you try to get the file, you are no longer in the requests session and therefore do not have the necessary cookies.
Try making the requests with your session logged in (the requests session).
It should work.
I'm trying to push some value in the search box of amazon.com.
I'm using requests rather then selenium (push keys option).
I've identified the xpath of the search box, and now want to push value in it, IE: char "a" or "apple" or any other string and then collect the results.
However when pushing the data with the post method for request I get an error.
Here below it's my code:
import requests
from lxml import html
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)''AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
page = requests.get('https://www.amazon.com/', headers=headers)
page = requests.get('https://www.amazon.com/', headers=headers)
response_code = page.status_code
if response_code == 200:
htmlText = page.text
tree = html.fromstring(page.content)
search_box = tree.xpath('//input[#id="twotabsearchtextbox"]')
pushing_keys = requests.post(search_box,'a')
print(search_box)
However I get this error code:
requests.exceptions.MissingSchema: Invalid URL "[<InputElement 20b94374a98 name='field-keywords' type='text'>]": No schema supplied. Perhaps you meant http://[<InputElement 20b94374a98 name='field-keywords' type='text'>]?
How do I correctly push any char in the search box with requests?
Thanks
Try using this approach:
import requests
base_url = 'https://www.amazon.com'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)''AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
page = requests.get(base_url, headers=headers)
response_code = page.status_code
if response_code == 200:
key_word_to_search = 'bean bags'
pushing_keys = requests.get(f'{base_url}/s/ref=nb_sb_noss', headers=headers, params={'k': key_word_to_search})
print(pushing_keys.content)
The search box is using a get request.
See here
I'm using python requests library and I'm trying to login to https://www.udemy.com/join/login-popup/, the problem is when I use the following header:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
it returns CSRF verification failed. Request aborted.
When I change it to:
headers = {'Referer': url}
it returns Please verify that you are a human.
any suggestions?
My code:
import requests
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/74.0.3729.131 Safari/537.36'}
request = s.get(url, headers=headers)
cookies = dict(cookies=request.cookies)
csrf = request.cookies['csrftoken']
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail',
'password': 'maypassword'}
request = s.post(url, data=data_login, headers={'Referer': url}, cookies=cookies['cookies'])
print(request.content)
There are a couple of issues with your current code:
The header you are using is missing a few things
The value that you are passing for csrfmiddlewaretoken isn't correct
As you're using requests.session() you shouldn't include cookies manually (in this case)
Try this code:
import requests
from bs4 import BeautifulSoup
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Referer": "https://www.udemy.com/join/login-popup/", "Upgrade-Insecure-Requests": "1"}
request = s.get(url, headers=headers)
soup = BeautifulSoup(request.text, "lxml")
csrf = soup.find("input",{"name":"csrfmiddlewaretoken"})["value"]
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail#test.com','password': 'maypassword'}
request = s.post(url, data=data_login, headers=headers)
print(request.content)
(PS: I'm using the Beautifulsoup library in order to find the value of csrfmiddlewaretoken)
Hope this helps
I am trying to make a login to http://site24.way2sms.com/content/index.html
This is the script I've written.
import urllib
import urllib2
url = 'http://site21.way2sms.com/content/index.html'
values = {'username' : 'myusername',
'password' : 'mypassword'}
headers = {'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, sdch',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'If-Modified-Since':'Fri, 13 Nov 2015 17:47:23 GMT',
'Referer':'https://packetforger.wordpress.com/2013/09/13/changing-user-agent-in-python-requests-and-requesocks-and-using-it-in-an-exploit/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers=headers)
response = urllib2.urlopen(req)
the_page = response.read()
print the_page
I am getting the response from the website. But it's kind of encrypted or something like:
��:�����G��ʯ#��C���G�X�*�6�?���ך��5�\���:�tF�D1�٫W��<�bnV+w\���q�����$�Q��͇���Aq`��m�*��Օ���)���)�
in my ubuntu terminal. How can I fix this ?
Am I being logged in correctly ?
Please help.
The form on that page doesn't post back to the same URL, it posts to http://site21.way2sms.com/content/Login.action.