I am trying to fill html form and get the intended result as i get when i fill manually. But I fail.
I am trying to fill the site https://www.desco.org.bd/ebill/login.php with value 32000001. So far my try is as below-
import requests
#LOGIN_URL = 'https://www.desco.org.bd/ebill/login.php'
#LOGIN_URL = 'https://www.desco.org.bd/ebill/authentication.php'
LOGIN_URL = 'https://www.desco.org.bd/ebill/billinformation.php'
payload = {
'username': '32000001',
'login':'Login',
'login':'true'
}
with requests.Session() as s:
p = s.post(LOGIN_URL, data=payload)#, verify=False)
# print the html returned or something more intelligent to see if it's a successful login page.
print (p.text)
I have found that login.php redirects to authentication.php and it further redirects to billinformation.php which delivers the true data i needed.
Thanks in advance.
N.B. I am not planning to use selenium since it is too slow for my case i.e. collect huge data from this site.
i am working for similar case, may be you would try using websockets:
import websockets
def process_function(message):
# process the message
def server(ws:str, path:int):
while True:
message_received = await ws.recv() # receive from ui
print(f'Msg [{message_received}]')
message_to_send = process_function(message)
await ws.send(message_to_send) # send feedback to ui
server = websockets.serve(server, '127.0.0.1', 5678) # set the html to run in the same ip:port
another try:
import json, requests
def do_auth(url):
headers = {"Content-Type": "application/json", "Accept":'*/*'}
body = json.dumps({'username': 'user', 'password': 'pass'})
r = requests.post(url=url, headers=headers, data=body, verify=False)
print(r.status_code);
d = json.loads(r.text);
print(d['access_token']);
print(d['refresh_token'])
return d['access_token'], d['refresh_token']
do_auth(url_auth) # authorization
requests.get(url_content, headers=headers, verify=False) # get data
Related
I am trying to connect to Splunk via API using python. I can connect, and get a 200 status code but when I read the content, it doesn't read the content of the page. View below:
Here is my code:
import json
import requests
import re
baseurl = 'https://my_splunk_url:8888'
username = 'my_username'
password = 'my_password'
headers={"Content-Type": "application/json"}
s = requests.Session()
s.proxies = {"http": "my_proxy"}
r = s.get(baseurl, auth=(username, password), verify=False, headers=None, data=None)
print(r.status_code)
print(r.text)
I am new to Splunk and python so any ideas or suggestions as to why this is happening would help.
You need to authenticate first to get a token, then you'll be able to hit the rest of REST endpoints. The auth endpoint it at /servicesNS/admin/search/auth/login, which will give you the session_key, which you then provide to subsequent requests.
Here is some code that uses requests to authenticate to a Splunk instance, then start a search. It then checks to see if the search is complete, if not, wait a second and then check again. Keep checking and sleeping until the search is done, then print out the results.
import time # need for sleep
from xml.dom import minidom
import json, pprint
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
base_url = 'https://localhost:8089'
username = 'admin'
password = 'changeme'
search_query = "search=search index=*"
r = requests.get(base_url+"/servicesNS/admin/search/auth/login",
data={'username':username,'password':password}, verify=False)
session_key = minidom.parseString(r.text).getElementsByTagName('sessionKey')[0].firstChild.nodeValue
print ("Session Key:", session_key)
r = requests.post(base_url + '/services/search/jobs/', data=search_query,
headers = { 'Authorization': ('Splunk %s' %session_key)},
verify = False)
sid = minidom.parseString(r.text).getElementsByTagName('sid')[0].firstChild.nodeValue
print ("Search ID", sid)
done = False
while not done:
r = requests.get(base_url + '/services/search/jobs/' + sid,
headers = { 'Authorization': ('Splunk %s' %session_key)},
verify = False)
response = minidom.parseString(r.text)
for node in response.getElementsByTagName("s:key"):
if node.hasAttribute("name") and node.getAttribute("name") == "dispatchState":
dispatchState = node.firstChild.nodeValue
print ("Search Status: ", dispatchState)
if dispatchState == "DONE":
done = True
else:
time.sleep(1)
r = requests.get(base_url + '/services/search/jobs/' + sid + '/results/',
headers = { 'Authorization': ('Splunk %s' %session_key)},
data={'output_mode': 'json'},
verify = False)
pprint.pprint(json.loads(r.text))
Many of the request calls thare used include the flag, verify = False to avoid issues with the default self-signed SSL certs, but you can drop that if you have legit certificates.
Published a while ago at https://gist.github.com/sduff/aca550a8df636fdc07326225de380a91
Nice piece of coding. One of the wonderful aspects of Python is the ability to use other people's well written packages. In this case, why not use Splunk's Python packages to do all of that work, with a lot less coding around it.
pip install splunklib.
Then add the following to your import block
import splunklib.client as client
import splunklib.results as results
pypi.org has documentation on some of the usage, Splunk has an excellent set of how-to documents. Remember, be lazy, use someone else's work to make your work look better.
How do you use Bitbucket's 2.0 API to decline a pull request via Python?
According to their documentaion, it should be something like:
import requests
kwargs = {
'username': MY_BITBUCKET_ACCOUNT,
'repo_slug': MY_BITBUCKET_REPO,
'pull_request_id': pull_request_id
}
url = 'https://api.bitbucket.org/2.0/repositories/{username}/{repo_slug}/pullrequests/{pull_request_id}/decline'.format(**kwargs)
headers = {'Content-Type': 'application/json'}
response = requests.post(url, auth=(USERNAME, PASSWORD), headers=headers)
However, this fails with response.text simply saying "Bad Request".
This similar code works for me with their other API endpoints, so I'm not sure why the decline method is failing.
What am I doing wrong?
You have to authenticate with Oath. I wrote a wrapper for making these requests. Here is a simple example that works. The only thing I couldn't figure out was how to add a reason it was declined. I ended up making a request before I declined the PR that added a comment on why it was declined.
import os
from oauthlib.oauth2 import BackendApplicationClient
from requests_oauthlib import OAuth2Session
class Bitbucket(object):
def __init__(self, client_id, client_secret, workplace, repo_slug):
self.workplace = workplace # username or company username
self.repo_slug = repo_slug
self.token_url = 'https://bitbucket.org/site/oauth2/access_token'
self.api_url = 'https://api.bitbucket.org/2.0/'
self.max_pages = 10
self.client = BackendApplicationClient(client_id=client_id)
self.oauth = OAuth2Session(client=self.client)
self.oauth.fetch_token(
token_url=self.token_url,
client_id=client_id,
client_secret=client_secret
)
def get_api_url(self, endpoint):
return f'{self.api_url}repositories/{self.workplace}/{self.repo_slug}/{endpoint}'
bitbucket = Bitbucket(os.environ['BITBUCKET_KEY'], os.environ['BITBUCKET_SECRET'], workplace='foo', repo_slug='bar')
pr_id = 1234
resp = bitbucket.oauth.post(f"{bitbucket.get_api_url('pullrequests')}/{pr_id}/decline")
if resp.status_code == 200:
print('Declined')
else:
print('Someting went wrong.')
Currently trying to manage a website with a Python script to control IoT object.
From what I discovered, control is doing in 2 times :
POST method to get an ID, needed to control the device.
POST method using the ID
The first one is working with this Python script and ID is displayed in the response.
import requests
url = 'http://local_IP/login.cgi'
payload = {'lgname': 'theLogin', 'lgpin': 'thePin'}
r = requests.post(url, data=payload)
For the second POST (to control the device when the user is logged in), I captured the command with Wireshark and here is the information:
POST /user/keyfunction.cgi HTTP/1.1\r\n
Content-Type: text/plain;charset=UTF-8\r\n
Referer: http://LOCAL_IP/login.cgi\r\n
and then I have:
Line-based text data: text/plain
sess=IDReceivedWithTheFirstPOST&comm=80&Data0=2&data2=18&data1=1
So basicaly, I need a way to do a POST in Python with this "Line-based text data: text/plain" but I have no idea how to deal with it.
Hope you'll be able to help me,
Thank you,
Baptiste
EDIT: If it can help anyone someday, here is my working code:
import requests
from collections import OrderedDict
session = requests.Session()
url = 'http://LOCAL_IP/login.cgi'
payload = {'lgname': 'User', 'lgpin': 'Password'}
r_login = session.post(url, data=payload)
with open('data.txt', 'w') as output:
output.write(r_login.text)
text = 'function getSession(){return'
with open('./data.txt', 'rb') as f:
for line in f:
if line.find(text) == 1:
id = line.split()[2][1:17]
print(id)
data = OrderedDict()
data['sess']=id
data['comm']=80
data['data0']=2
data['data2']=1
data['data1']=16
url = 'http://LOCAL_IP/user/keyfunction.cgi'
r_keyfunction = session.post(url, data=data)
with open('data2.txt', 'w') as output:
output.write(r_keyfunction.text)
Updated based on OPs results to use requests.Session()
using requests.Session() will capture all cookies and forward them on subsequent requests. It also pools connections and does lots of other cool things.
import requests
session = requests.Session()
payload = {'lgname': 'theLogin', 'lgpin': 'thePin'}
r_login = session.post('http://local_IP/login.cgi', data=payload)
# Figure out the ID here somehow
id_thing = 'IDReceivedWithTheFirstPOST'
payload = {
'sess': id_thing,
'comm': 80,
'Data0': 2,
'data2': 18,
'data1': 1
}
r_keyfunction = session.post('http://local_IP/user/keyfunction.cgi', params=payload)
# do something here
Lately I've come across a number of questions and articles very briefly covering using urllib, requests, mwapi, poster, and various other tools to either perform an HTTP POST, or working with the API to upload one or more files to a MediaWiki instance. Thus far, nothing has worked.
So, could someone kindly provide a simple code block that will reliably upload a file to such a Wiki? My preference is in Requests and/or Python 3, but at this point I'm pretty desperate and am open to almost anything.
Edit:
Per the request in the comments, below is the last bit of code I attempted. It completes with no errors, but of course no file is uploaded or any change to the Wiki logs.
from urllib.parse import quote
import requests
user = 'username'
passw = quote('password')
baseurl = 'http://127.0.0.1:8020/mediawiki/'
apiurl = baseurl + 'api.php'
login_params = '?action=login&lgname=%s&lgpassword=%s&format=json'% (user, passw)
# Login request
r1 = requests.post(apiurl+login_params)
login_token = r1.json()['login']['token']
# Login confirm
login_params2 = login_params+'&lgtoken=%s'% login_token
r2 = requests.post(apiurl+login_params2, cookies=r1.cookies)
# Get edit token
params3 = '?format=json&action=query&meta=tokens&continue='
r3 = requests.get(apiurl+params3, cookies=r2.cookies)
edit_token = r3.json()['query']['tokens']['csrftoken']
edit_cookie = r2.cookies.copy()
edit_cookie.update(r3.cookies)
# Upload file
with open('91.png', 'rb') as f:
headers = {'content-type': 'multipart/form-data'}
payload = {'action': 'upload', 'filename': 'Image', 'file': '91.png', 'token': edit_token}
files = {'files': f}
r4 = requests.post(apiurl, headers=headers, data=payload, files=files, cookies=edit_cookie)
I'm glad you got mwclient to work, but I think I can answer your preference of using just Python 3 and requests.
I was having major headaches doing the same thing, and finally got the following to work. I also posted it at https://www.mediawiki.org/wiki/API_talk:Upload#Python_with_requests but since this was a question I found while trying to solve my problem, I'll reproduce below...
You probably strictly speaking do not need a BotPassword, but it's a good idea.
import requests
api_url = 'https://project/w/api.php'
USER,PASS=u'BotUsername#Instancename',u'[[Special:BotPasswords]] password'
#Ensure bot instance is permissioned for createeditmovepage, uploadfile, uploadeditmovefile
FILENAME='/path/to/file'
REMOTENAME='remote_filename.ext'
USER_AGENT='Descriptive User Agent per [[:meta:User-Agent_policy]]'
# get login token and log in
payload = {'action': 'query', 'format': 'json', 'utf8': '',
'meta': 'tokens', 'type': 'login'}
r1 = requests.post(api_url, data=payload)
login_token=r1.json()['query']['tokens']['logintoken']
login_payload = {'action': 'login', 'format': 'json', 'utf8': '',
'lgname': USER, 'lgpassword': PASS, 'lgtoken': login_token}
r2 = requests.post(api_url, data=login_payload, cookies=r1.cookies)
cookies=r2.cookies.copy()
# We have now logged in and can request edit tokens thusly:
def get_edit_token(cookies):
edit_token_response=requests.post(api_url, data={'action': 'query',
'format': 'json',
'meta': 'tokens'}, cookies=cookies)
return edit_token_response.json()['query']['tokens']['csrftoken']
# Now actually perform the upload:
upload_payload={'action': 'upload',
'format':'json',
'filename':REMOTENAME,
'comment':'<MY UPLOAD COMMENT>',
'text':'Text on the File: page... description, license, etc.',
'token':get_edit_token(cookies)}
files={'file': (REMOTENAME, open(FILENAME,'rb'))}
headers={'User-Agent': USER_AGENT}
upload_response=requests.post(api_url, data=upload_payload,files=files,cookies=cookies,headers=headers)
To me, it seems weird that your last code didn't error. Assuming that you set up MediaWiki with the standard configuration, api.php is in baseurl/w/api.php, not baseurl/api.php. This means that you requested the wrong page all the time. Try again, but this time replace apiurl = baseurl + 'api.php' with apiurl = baseurl + 'w/api.php'.
It turns out that the one I hadn't tried worked. Using the documented examples for mwclient, uploading was successful.
I have a simple website crawler, it works fine, but sometime it stuck because of large content such as ISO images, .exe files and other large stuff. Guessing content-type using file extension is probably not the best idea.
Is it possible to get content-type and content length/size without fetching the whole content/page?
Here is my code:
requests.adapters.DEFAULT_RETRIES = 2
url = url.decode('utf8', 'ignore')
urlData = urlparse.urlparse(url)
urlDomain = urlData.netloc
session = requests.Session()
customHeaders = {}
if maxRedirects == None:
session.max_redirects = self.maxRedirects
else:
session.max_redirects = maxRedirects
self.currentUserAgent = self.userAgents[random.randrange(len(self.userAgents))]
customHeaders['User-agent'] = self.currentUserAgent
try:
response = session.get(url, timeout=self.pageOpenTimeout, headers=customHeaders)
currentUrl = response.url
currentUrlData = urlparse.urlparse(currentUrl)
currentUrlDomain = currentUrlData.netloc
domainWWW = 'www.' + str(urlDomain)
headers = response.headers
contentType = str(headers['content-type'])
except:
logging.basicConfig(level=logging.DEBUG, filename=self.exceptionsFile)
logging.exception("Get page exception:")
response = None
Yes.
You can use the Session.head method to create HEAD requests:
response = session.head(url, timeout=self.pageOpenTimeout, headers=customHeaders)
contentType = response.headers['content-type']
A HEAD request similar to GET request, except that the message body would not be sent.
Here is a quote from Wikipedia:
HEAD
Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.
Use requests.head() for this. It will not return the message body. You should use head method if you are interested only in the headers. Check this link for detail.
h = requests.head(some_link)
header = h.headers
content_type = header.get('content-type')
Sorry, my mistake, I should read documentation better. Here is the answer:
http://docs.python-requests.org/en/latest/user/advanced/#advanced (Body Content Workflow)
tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
r = requests.get(tarball_url, stream=True)
if int(r.headers['content-length']) > TOO_LONG:
r.connection.close()
# log request too long
Because requests.head() does NOT auto redirect, so a URL is redirected, requests.head() will get 0 for Content-Length. So make sure allow_redirects=True is added.
r = requests.head(url, allow_redirects=True)
length = r.headers['Content-Length']
Refer to Requests Redirection And History