I have a certain situation which I'm out of ideas on how exactly to proceed. I have a very repetitive task to do which consists of:
Choose file from list of files
Press submit
Repeat until all files in folder have been submitted/uploaded
Sometimes I have 100's of files at a time, which can be very time consuming. I would like to write a script to automate this routine.
This is the visual of the page in question:
Menu Format
Of course this is represented by the following html code:
<input type="file" class="inputFile" data-name="userNumListFile">
<form class="navbar-form navbar-left" method="post" action="/give/giveItemBatch" enctype="multipart/form-data"><button type="submit" class="btn btn-default">Submit</button></form>
Those are the two entries that represent what I need to send a HTTP request to. I have done something similar in Python where I used the following code to access a authorization only webpage and then use bs4 to gather info needed.
import requests
payload = {'username': 'user',
'password': 'pw',
'rememberMe': 'true'}
with requests.Session() as s:
url = "http://yada.com"
p = s.post(url, data=payload)
soup = BeautifulSoup(p.text, "html.parser")
I was wondering if there is something similar to the above where I can submit a file to be uploaded and then press the submit button.
I would then cycle through all the files on my folder, that's the easy part.
Just use requests.post inside a loop, the name of the remote folder. First read local files and store it inside one array then start a Loop and put inside requests.post with the remote target.
Related
I want to login to the site below using requests module in python.
https://accounts.dmm.com/service/login/password
But I cannot find the "login_id" and "password" fields in the requests' response.
I CAN find them using "Inspect" menu in Chrome.
<input type="text" name="login_id" id="login_id" placeholder="メールアドレス" value="">
and
<input type="password" name="password" id="password" placeholder="パスワード" value="">
I tried to find them in the response from requests, but couldn't.
Here is my code:
import requests
url = 'https://accounts.dmm.com/service/login/password'
session = requests.session()
response = session.get(url)
with open('test_saved_login.html','w',encoding="utf-8")as file:
file.write(response.text) # Neither "login_id" nor "password" field found in the file.
How should I do?
Selenium is an easy solution, but I do not want to use it.
The login form is created with javascript. Try viewing the page in a browser with javascript disabled there will be no form. The people who control that site are trying to prevent people from doing exactly what you're trying to do. In addition to the fact the form elements don't appear (which really doesn't matter with requests,) they are also using a special token that you won't be able to guess which I expect is also in obfuscated javascript. So it is likely impracticable to script a login with requests and unless you have special permission from this company it is highly inadvisable that you continue with doing what you're trying to do.
I am trying to post a request to log in to a website using the Requests module in Python but its not really working. I'm new to this...so I can't figure out if I should make my Username and Password cookies or some type of HTTP authorization thing I found (??).
from pyquery import PyQuery
import requests
url = 'http://www.locationary.com/home/index2.jsp'
So now, I think I'm supposed to use "post" and cookies....
ck = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
r = requests.post(url, cookies=ck)
content = r.text
q = PyQuery(content)
title = q("title").text()
print title
I have a feeling that I'm doing the cookies thing wrong...I don't know.
If it doesn't log in correctly, the title of the home page should come out to "Locationary.com" and if it does, it should be "Home Page."
If you could maybe explain a few things about requests and cookies to me and help me out with this, I would greatly appreciate it. :D
Thanks.
...It still didn't really work yet. Okay...so this is what the home page HTML says before you log in:
</td><td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_email.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="text" name="inUserName" id="inUserName" size="25"></td>
<td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_password.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="password" name="inUserPass" id="inUserPass"></td>
So I think I'm doing it right, but the output is still "Locationary.com"
2nd EDIT:
I want to be able to stay logged in for a long time and whenever I request a page under that domain, I want the content to show up as if I were logged in.
I know you've found another solution, but for those like me who find this question, looking for the same thing, it can be achieved with requests as follows:
Firstly, as Marcus did, check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields. In his example, they are inUserName and inUserPass.
Once you've got that, you can use a requests.Session() instance to make a post request to the login url with your login details as a payload. Making requests from a session instance is essentially the same as using requests normally, it simply adds persistence, allowing you to store and use cookies etc.
Assuming your login attempt was successful, you can simply use the session instance to make further requests to the site. The cookie that identifies you will be used to authorise the requests.
Example
import requests
# Fill in your details here to be posted to the login form.
payload = {
'inUserName': 'username',
'inUserPass': 'password'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('LOGIN_URL', data=payload)
# print the html returned or something more intelligent to see if it's a successful login page.
print p.text
# An authorised request.
r = s.get('A protected web page url')
print r.text
# etc...
If the information you want is on the page you are directed to immediately after login...
Lets call your ck variable payload instead, like in the python-requests docs:
payload = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
url = 'http://www.locationary.com/home/index2.jsp'
requests.post(url, data=payload)
Otherwise...
See https://stackoverflow.com/a/17633072/111362 below.
Let me try to make it simple, suppose URL of the site is http://example.com/ and let's suppose you need to sign up by filling username and password, so we go to the login page say http://example.com/login.php now and view it's source code and search for the action URL it will be in form tag something like
<form name="loginform" method="post" action="userinfo.php">
now take userinfo.php to make absolute URL which will be 'http://example.com/userinfo.php', now run a simple python script
import requests
url = 'http://example.com/userinfo.php'
values = {'username': 'user',
'password': 'pass'}
r = requests.post(url, data=values)
print r.content
I Hope that this helps someone somewhere someday.
The requests.Session() solution assisted with logging into a form with CSRF Protection (as used in Flask-WTF forms). Check if a csrf_token is required as a hidden field and add it to the payload with the username and password:
import requests
from bs4 import BeautifulSoup
payload = {
'email': 'email#example.com',
'password': 'passw0rd'
}
with requests.Session() as sess:
res = sess.get(server_name + '/signin')
signin = BeautifulSoup(res._content, 'html.parser')
payload['csrf_token'] = signin.find('input', id='csrf_token')['value']
res = sess.post(server_name + '/auth/login', data=payload)
Find out the name of the inputs used on the websites form for usernames <...name=username.../> and passwords <...name=password../> and replace them in the script below. Also replace the URL to point at the desired site to log into.
login.py
#!/usr/bin/env python
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
payload = { 'username': 'user#email.com', 'password': 'blahblahsecretpassw0rd' }
url = 'https://website.com/login.html'
requests.post(url, data=payload, verify=False)
The use of disable_warnings(InsecureRequestWarning) will silence any output from the script when trying to log into sites with unverified SSL certificates.
Extra:
To run this script from the command line on a UNIX based system place it in a directory, i.e. home/scripts and add this directory to your path in ~/.bash_profile or a similar file used by the terminal.
# Custom scripts
export CUSTOM_SCRIPTS=home/scripts
export PATH=$CUSTOM_SCRIPTS:$PATH
Then create a link to this python script inside home/scripts/login.py
ln -s ~/home/scripts/login.py ~/home/scripts/login
Close your terminal, start a new one, run login
Some pages may require more than login/pass. There may even be hidden fields. The most reliable way is to use inspect tool and look at the network tab while logging in, to see what data is being passed on.
im having a trouble understanding the module requests, i understand that http have post,get, put, delete methods, but i think i need to know more about how requests works, i have read the documentation but still i have a lot of questions about how to do something, this is the first time i try to make a script for web without selenium or mechanize
im trying to interact with vubey.yt, but i cant make my vubey url change at what i want(or what i see when i manually use the pag) i can send my data, and it changes the url, but if i copy that url and navigate manually, it does nothing... so i dont understand whats happening, because i dont have any visual clue
here is my code (python 3.5):
def Descarga(youtubeid):
# also i have tried only sending videoURL without quality and sub, but is the same
r = requests.get('https://vubey.yt/', params={'videoURL': youtubeid, 'quality': '320', 'submit': 'Convert+To+MP3'})
print(r.url, r.status_code)
Descarga("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
if someone could link me a tutorial for really understand how to use this module or tell me what im doing wrong or misunderstanding about this module i ll thank so much
See with me the site code:
<form class="w-clearfix" name="wf-form-signup-form" data-name="conversionForm" form action="/" method="post" id="conversionForm">
It's a form. The form is using the method 'post' in the same page.
<input class="w-input field" id="videoURL" type="text" placeholder="Video URL" name="videoURL" data-name="videoURL" required="required">
The first data "videoURL".
<select class="w-select" id="quality" name="quality" data-name="quality" required="required">
The second data "quality".
<input class="w-button button" type="submit" name="submit" value="Convert To MP3">
</form>
The submit button is not important. Ignore it.
Now, lets pythonify.
import requests
video_url = 'https://www.youtube.com/watch?v=C0DPdy98e4c'
quality = '320'
post_data={ 'videoURL': video_url, 'quality': quality }
response = requests.post('https://vubey.yt/', data=post_data)
print(response.url, response.status_code)
Now you can parse the response.content and search for "Please wait" until the conversion is completed.
I am trying to web-scrape some elements and their values off a page with Python; However, to get more elements, I need to simulate a click on the next button. There is a post back tied to these buttons, so I am trying to call it. Unfortunately, Python is only printing the same values over and over again [meaning the post back for the next button isn't being called]. I am using requests to do my POST/GET.
import re
import time
import requests
TARGET_GROUP_ID = 778092
SESSION = requests.Session()
REQUEST_HEADERS = {"Accept-Encoding": "gzip,deflate"}
GROUP_URL = "http://roblox.com/groups/group.aspx?gid=%d"%(TARGET_GROUP_ID)
POST_BUTTON_HTML = 'pagerbtns next'
EVENTVALIDATION_REGEX = re.compile(r'id="__EVENTVALIDATION" value="(.+)"').search
VIEWSTATE_REGEX = re.compile(r'id="__VIEWSTATE" value="(.+)"').search
VIEWSTATEGENERATOR_REGEX = re.compile(r'id="__VIEWSTATEGENERATOR" value="(.+)"').search
TITLE_REGEX = re.compile(r'<a id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_ctrl\d+_hlAvatar".*?title="(\w+)".*?ID=(\d+)"')
page = SESSION.get(GROUP_URL, headers = REQUEST_HEADERS).text
while 1:
if POST_BUTTON_HTML in page:
for (ids,names) in re.findall(TITLE_REGEX, page):
print ids,names
postData = {
"__EVENTVALIDATION": EVENTVALIDATION_REGEX(page).group(1),
"__VIEWSTATE": VIEWSTATE_REGEX(page).group(1),
"__VIEWSTATEGENERATOR": VIEWSTATEGENERATOR_REGEX(page).group(1),
"__ASYNCPOST": True,
"ct1000_cphRoblox_rbxGroupRoleSetMembersPane_currentRoleSetID": "4725789",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox": "3"
}
page=SESSION.post(GROUP_URL, data = postData, stream = True).text
time.sleep(2)
How can I properly call the post back in ASP.NET from Python to fix this issue? As stated before, it's only printing out the same values each time.
This is the HTML Element of the button
<a class="pagerbtns next" href="javascript:__doPostBack('ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00','')"> </a>
And this is the div it is in:
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_MembersPagerPanel" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton')">
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_Div1" class="paging_wrapper">
Page <input name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox" type="text" value="1" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_PageTextBox" class="paging_input"> of
<div class="paging_pagenums_container">125</div>
<input type="submit" name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton" value="" onclick="loading('members');" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton" class="pagerbtns translate" style="display:none;">
</div>
</div>
I was thinking of using a JS library and executing the JS __postback method, however, I would like to first see if this can be achieved in pure Python.
Yes it should be achievable you just have to submit correct values on correct fields. But i assume web page you are trying parse uses asp.net web forms so it should be really time consuming to find values and such. I suggest you to look into selenium with that you can easily call click and events on a webpage without writing so much code.
driver = webdriver.Firefox()
driver.get("http://site you are trying to parse")
driver.find_element_by_id("button").click()
//then get the data you want
I'm using Python 3.3 and the Requests library to do a basic POST request.
I want to simulate what happens if you manually enter information into the browser from the webpage:
https://www.dspayments.com/FAIRFAX. For example, at that url, enter "x" for the license plate and Virginia as the state. Then the url changes to: https://www.dspayments.com/FAIRFAX/Home/PayOption, and it displays the desired information (I care about the source code of this second webpage).
I looked through the source code of the above two url's. Doing "inspect element" on the text boxes of the first url I found some things that need to be included in the post request: {'Plate':"x", 'PlateStateProv':"VA", "submit":"Search"}.
Then the second website (ending in /PayOption), had the raw html:
<form action="/FAIRFAX/Home/PayOption" method="post"><input name="__RequestVerificationToken" type="hidden" value="6OBKbiFcSa6tCqU8k75uf00m_byjxANUbacPXgK2evexESNDz_1cwkUpVVePA2czBLYgKvdEK-Oqk4WuyREi9advmDAEkcC2JvfG2VaVBWkvF3O48k74RXqx7IzwWqSB5PzIJ83P7C5EpTE1CwuWM9MGR2mTVMWyFfpzLnDfFpM1" /><div class="validation-summary-valid" data-valmsg-summary="true">
I then used the name:value pairs from the above html as keys and values in my payload dictionary of the post request. I think the problem is that in the second url, there is the "__RequestVerificationToken" which seems to have a randomly generated value every time.
How can I properly POST to this website? A "correct" answer would be one that produces the same source code on the website ending in "/PayOption" as if you manually enter "x" as the plate number and Virginia as the state and click submit on the first url.
My code is:
import requests
url1 = r'https://www.dspayments.com/FAIRFAX'
url2 = r'https://www.dspayments.com/FAIRFAX/Home/PayOption'
s = requests.Session()
#GET request
r = s.get(url1)
text1 = r.text
startstr = '<input name="__RequestVerificationToken" type="hidden" value="'
start_ind = text1.find(startstr)+len(startstr)
end_ind = text1.find('"',start_ind)
auth_string = text1[start_ind:end_ind]
#POST request
payload = {'Plate':'x', 'PlateStateProv':'VA',"submit":"Search",
"__RequestVerificationToken":auth_string,"validation-summary-valid":"true"}
post = s.post(url2, headers=user_agent, data=payload)
source_code = post.text
Thanks, -K.
You should only need the data from the first page, and as you say, the __RequestVerificationToken changes with each request.
You'll have to do something like:
GET request to https://www.dspayments.com/FAIRFAX
harvest __RequestVerificationToken value (Requests Session will take care of any associated cookies)
POST using the data you scraped from the GET request
extract whatever you need from the 2nd page
So, just focus on creating a form that's exactly like the one in the first page. Have a stab at it and if you're still struggling I can help dig into the particulars.