send post request to Wikipedia

send post request to Wikipedia - python

I'm trying to send data to Wikipedia, (basically trying to input the word 'python' into Wikipedia search bar and print the content)
Here's what I tried:
import requests
payload = {'family': 'wikipedia',
'language': 'en',
'search': 'python',
'language': 'en',
'go': 'Go'}
with requests.Session() as s:
url = 'https://www.wikipedia.org/'
r = s.get(url)
r = s.post(url, data=payload)
print(r.content)
But it doesn't seem to work
This is the website I'm trying to send data to: https://www.wikipedia.org/

If all you want to do is get get the content from submitting python into the wikipedia search bar, you don't need to create a post request. A simple get request will work fine:
search_term = "python"
response = requests.get(f'https://en.wikipedia.org/wiki/{search_term}')
print(response.content)
So to answer your remaining questions:
I'll be using post request for logins ect so I want to learn via post request
GET, POST, PUT, DELETE UPDATE HTTP requests are server side implementations. They don't magically exist for everything. So if Wikipedia decides to not have a POST request for searching in the search bar, then that's too bad. You can't use POST to make a search. You will have to use some other way to search, whatever it is that they support (which from my tests appears to be via a GET request)
So even though they might implement POST for logins (as they should), not everything necessarily has an associated POST request.
can’t I use post to automate it like logging in and pressing buttons, like what selenium does
Sort of. You can use HTTP requests to make the same HTTP calls that a button would when clicked. It's not exactly the same as clicking on a button though, since clicking on a button can still do many other things behind the scene in your web browser. And not every button HTTP call is necessarily a POST request.
But that aside, even if you search in Wikipedia using Selenium, it would still end up being a GET request because Wikipedia changed the way that searches work (at least based on what you have posted). They made searches require a GET request so you have to make a GET request.
TLDR: It may have been possible with POST in the past, but it isn't anymore because that was a decision that Wikipedia made.

Related

Can we use API Calls in Selenium Pytest Framework?

I am trying to automate a test case wherein I first login to a website and then modify a drop down. I have automated the part until where I reach the correct page. The issue is, this drop down option is part of a table. The table has many similar elements. Also a few table tags are used as the first column is locked and others are horizontally scrollable. I am finding it very difficult to locate the correct drop down option to click.
When I check in the developer options, the drown modification is actually a POST request.
I know we can use API testing with pytest, but is it possible to integrate this within existing selenium framework?
Can I create a framework where in test_navigate will navigate me to the necessary page (pure selenium). Then test_modify_dropdown will use api call to send the POST request and modify the option. And then i can continue with further test_three?
All this in pytest by the way.

You should simply be able to use the python requests module. A post request could look like this:
import requests
url = "url/for/your/api"
myobj = {'somekey': 'somevalue'} #json in request payload
x = requests.post(url, json = myobj)
print(x.text) #and/or get whatever data you want from the response.

Loading Ajax with Python Requests

For a personal project, I'm trying to get a full friends list of a user (myself for now) from Facebook using Requests and BeautifulSoup.
The main friends page however displays only 20, and the rest are loaded with Ajax when you scroll down.
The request url looks something like this (method is GET):
https://www.facebook.com/ajax/pagelet/generic.php/AllFriendsAppCollectionPagelet?dpr=1&data={"collection_token":"1244314824:2256358349:2","cursor":"MDpub3Rfc3RydWN0dXJlZDoxMzU2MDIxMTkw","tab_key":"friends","profile_id":1244214828,"overview":false,"ftid":null,"order":null,"sk":"friends","importer_state":null}&__user=1364274824&__a=1&__dyn=aihaFayfyGmagngDxfIJ3G85oWq2WiWF298yeqrWo8popyUW3F6wAxu13y78awHx24UJi28cWGzEgDKuEjKeCxicxabwTz9UcTCxaFEW58nVV8-cxnxm1typ9Voybx24oqyUf9UgC_UrQ4bBv-2jAxEhw&__af=o&__req=5&__be=-1&__pc=EXP1:DEFAULT&__rev=2677430&__srp_t=1474288976
My question is, is it possible to recreate the dynamically generated tokens such as the __dyn, cursor, collection_token etc. to send manually in my request? Is there some way to figure out how they are generated or is it a lost cause?
I know that the current Facebook API does not support viewing a full friends list. I also know that I can do this with Selenium, or some other browser simulator, but that feels way too slow, ideally I want to scrape thousands of friends lists (of users whose friends lists are public) in a reasonable time.
My current code is this:
import requests
from bs4 import BeautifulSoup
with requests.Session() as S:
requests.utils.add_dict_to_cookiejar(S.cookies, {'locale': 'en_US'})
form = {}
form['email'] = 'myusername'
form['pass'] = 'mypassword'
response = S.post('https://www.facebook.com/login.php?login_attempt=1&lwv=110', data=form)
# Im logged in
page = S.get('https://www.facebook.com/yoshidakai/friends?source_ref=pb_friends_tl')
Any help will be appreciated, including other methods to achieve this :)

As of this writing, you can extract this information by parsing the page and then get the next cursor for latter pages by parsing the preceding ajax response. However, as Facebook regularly makes updates to its backend, I have had more stable results using selenium to drive a Chrome headless browser to scroll through the page, and then parsing the resulting HTML.

Post method with Requests

I'm trying to make a simple post method with the requests module, like this :
s=requests.Session()
s.post(link,data=payload)
In order to do it properly, the payload is an id from the page itself, and it's generated in every access to the page.
So I need to get the data from the page and then proceed the request.
The problem when you accessed the page is that a new id will be generated.
So if we do this:
s=requests.Session()
payload=get_payload(s.get(link).text)
s.post(link,data=payload)
It will not work because when you acceded the page with s.get the right id is generated, but when you go for the post request, a new id will be generated so you'll be using an old one.
Is there any way to get the data from the page right before the post request?
Something like:
s.post(link,data=get_data(s.get(link))

When you do a post (or get) request, the page will generate another id and send it back to you. There is no way of sending data to the page while it is being generated because you need to receive a response first to process the data on the page and once you have received the response, the server will create a new id for you the next time you view the page.
See https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/images/HTTP.png for a simple example image of a HTTP Request

In general, there is no way to do this. The server's response is potentially affected by the data you send, so it can't be available before you have sent the data. To persist this kind of information across requests, the server would usually set a cookie for you to send with each subsequent request - but using a requests.Session will handle that for you automatically. It is possible that you need to set the cookie yourself based on the first response, but cookies are a key/value pair, and you only appear to have the value. To find the key, and more generally to find out if this is what the server expects you to do, requires specific knowledge of the site you are working with - if this is a documented API, the documentation would be a good place to start. Otherwise you might need to look at what the website itself does - most browsers allow you to look at the cookies that are set for that site, and some (possibly via extensions) will let you look through the HTTP headers that are sent and received.

Login, Navigate and Retrieve data behind a proxy with Python

I want, with a python script, to be able to login a website and retrieve some data. This behind my company's proxy.
I know that this question seems a duplicate of others that you can find searching, but it isn't.
I already tried using the proposed solutions in the responses to those answers but they didn't work... I don't only need a piece of code to login and get a specific webpage but also some "concepts" behind how all this mechanism works.
Here is a description of what I want to be able to do:
Log into a website > Get to page X > Insert data in some form of page X and push "Calculate" button > Capture the results of my query
Once I have the results I'll see how to sort how the data.
How can I achieve this behind a proxy? Every time I try to use "request" library to login it doesn't work saying I am unable to get page X since I did not authenticate... or worst, I am even unable to get to that side because I didn't set up the proxy before.

Clarification of Requirements
First, make sure you understand context for getting results of your calculation
(F12 shall show DevTools in Chrome or Firebug in Firefox where you can learn most details discussed below)
do you manage accessing from the target page your web browser?
is it really necessary to use proxy? If yes, then test it in the browser and note exactly what proxy to use
what sort of authentication you have to use to access target web app. Options being "basic", "digest", or some custom, requiring filling in some form and having something in cookies etc.
when you access the calculation form in your browser, does pressing "Calculate" button result in visible HTTP request? Is it POST? What is content of the request?
Simple: HTTP based scenario
It is very likely, that your situation will allow use of simple HTTP communication. I will assume following situation:
proxy is used and you know the url and possibly user name and password to use the proxy
All pages on target web application require either basic authentication or digest one.
Calculation button is using classical HTML form and results in HTTP POST request with all data see in form parameters.
Complex: Browser emulation scenario
There are some chances, that part of interaction needed to get your result is dependent on JavaScript code performing something on the page. Often it can be converted into HTTP scenario by investigating, what are final HTTP requests, but here I will assume this is not feasible or possible and we will emulate using real browser.
For this scenario I will assume:
you are able to perform the task yourself in web browser and have all required information available
proxy url
proxy user name and password, if required
url to log in
user name and password to fill into some login form to get in
knowing "where to follow" after login to reach your calculation form
you are able to find enough information about each page element to use (form to fill, button to press etc.) like name of it, id, or something else, which will allow to target it at the moment of simulation.
Resolving HTTP based scenario
Python provides excellent requests package, which shall serve our needs:
Proxy
Aassuming proxy at http://10.10.1.10:3128, username being user and password pass
import requests
proxies = {
"http": "http://user:pass#10.10.1.10:3128/",
}
#ready for `req = requests.get(url, proxies=proxies)`
Basic Authentication
Assuming, the web app allows access for user being appuser and password apppass
url = "http://example.com/form"
auth=("appuser", "apppass")
req = requests.get(url, auth=auth)
or using explicitly BasicAuthentication
from requests.auth import HTTPBasicAuth
url = "http://example.com/path"
auth = HTTPBasicAuth("appuser", "apppass")
req = requests.get(url, auth=auth)
Digest authentication differs only in classname being HTTPDigestAuth
Other authentication methods are documented at requests pages.
HTTP POST for a HTML Form
import requests
a = 4
b = 5
data = {"a": a, "b": b}
url = "http://example.com/formaction/url"
req = requests.post(url, data=data)
Note, that this url is not url of the form, but of the "action" taken, when you press the submit button.
All together
Users often go to the final HTML form in two steps, first log in, then navigate to the form.
However, web applications typically allow (with knowledge of the form url) direct access. This will perform authentication at the same step and this is the way described below.
Note: If this would not work, you would have to use sessions with requests, which is possible, but I will not elaborate on that here.
import request
from requests.auth import HTTPBasicAuth
proxies = {
"http": "http://user:pass#10.10.1.10:3128/",
}
auth = HTTPBasicAuth("appuser", "apppass")
a = 4
b = 5
data = {"a": a, "b": b}
url = "http://example.com/formaction/url"
req = requests.post(url, data=data, proxies=proxies, auth=auth)
By now, you shall have your result available via req and you are done.
Resolving Browser emulation scenario
Proxy
Selenimum doc for configuring proxy recommends configuring your proxy in your web browser. The same link provides details, how to set up proxy from your script, but here I will assume, you used Firefox and have already (during manual testing) succeeded with configuring proxy.
Basic or Digest Authentication
Following modified snippet originates from SO answer by Mimi, using Basic Authentication:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference('network.http.phishy-userpass-length', 255)
driver = webdriver.Firefox(firefox_profile=profile)
driver.get("https://appuser:apppass#somewebsite.com/")
Note, that Selenium does not seem providing complete solution for Basic/Digest authentication, the sample above is likely to work, but if not, you may check this Selenium Developer Activity Google Group thread and see, you are not alone. Some solutions might work for you.
Situation with Digest Authentication seems even worse then with Basic one, some people reporting success with AutoIT or blindly sending keys, discussion referenced above shows some attempts.
Authentication via Login Form
If the web site allows logging in by entering credentials into some form, you might be lucky one, as this is rather easy task to do with Selenium. For more see next chapter about Filling in forms.
Fill in a Form and Submit
In contrast to Authentication, filling data into forms, clicking buttons and similar activities are where Selenium works very well.
from selenium import webdriver
a = 4
b = 5
url = "http://example.com/form"
# formactionurl = "http://example.com/formaction/url" # this is not relevant in Selenium
# Start up Firefox
browser = webdriver.Firefox()
# Assume, you get somehow authenticated now
# You might succeed with Basic Authentication by using url = "http://appuser:apppass#example.com/form
# Navigate to your url
browser.get(url)
# find the element that's id is param_a and fill it in
inputElement = browser.find_element_by_id("param_a")
inputElement.send_keys(str(a))
# repeat for "b"
inputElement = browser.find_element_by_id("param_b")
inputElement.send_keys(str(b))
# submit the form (if having problems, try to set inputElement to the Submit button)
inputElement.submit()
time.sleep(10) # wait 10 seconds (better methods can be used)
page_text = browser.page_source
# now you have what you asked for
browser.quit()
Conclusions
Information provided in question describes what is to be done in rather general manner, but is lacking specific details, which would allow providing tailored solution. That is why this answer focuses on proposing general approach.
There are two scenarios, one bing HTTP based, second one uses emulated browser.
HTTP Solution is preferable, despite of a fact, it requires a bit more preparation in searching, what HTTP requests are to be used. Big advantage is, it is then in production much faster, requiring much less memory and shall be more robust.
In rare cases, when there is some essential JavaScript activity in the browser, we may use Browser emulation solution. However, this is much more complex to set up and has major problems at the Authentication step.

Python Online Form Submision

I am using Python 2.7.1 to access an online website. I need to load a URL, then submit a POST request to that URL that causes the website to redirect to a new URL. I would then like to POST some data to the new URL. This would be easy to do, except that the website in question does not allow the user to use browser navigation. (As in, you cannot just type in the URL of the new page or press the back button, you must arrive there by clicking the "Next" button on the website). Therefore, when I try this:
import urllib, urllib2, cookielib
url = "http://www.example.com/"
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
form_data_login = "POSTDATA"
form_data_try = "POSTDATA2"
resp = opener.open(url, form_data_login)
resp2 = opener.open(resp.geturl(), form_data_try)
print resp2.read()
I get a "Do not use the back button on your browser" message from the website in resp2. Is there any way to POST data to the website resp gives me? Thanks in advance!
EDIT: I'll look into Mechanize, so thanks for that pointer. For now, though, is there a way to do it with just Python?

Have you taken a look at mechanize? I believe it has the functionality you need.

You're probably getting to that page by posting something via that Next button. You'll have to take a look at the POST parameters sent when pressing that button and add all of these post parameters to your call.
The website could though be set up in such a way that it only accepts a particular POST parameter that ensures that you'll have to go through the website itself (e.g. by hashing a timestamp in a certain way or something like that) but it's not very likely.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.