im having a trouble understanding the module requests, i understand that http have post,get, put, delete methods, but i think i need to know more about how requests works, i have read the documentation but still i have a lot of questions about how to do something, this is the first time i try to make a script for web without selenium or mechanize
im trying to interact with vubey.yt, but i cant make my vubey url change at what i want(or what i see when i manually use the pag) i can send my data, and it changes the url, but if i copy that url and navigate manually, it does nothing... so i dont understand whats happening, because i dont have any visual clue
here is my code (python 3.5):
def Descarga(youtubeid):
# also i have tried only sending videoURL without quality and sub, but is the same
r = requests.get('https://vubey.yt/', params={'videoURL': youtubeid, 'quality': '320', 'submit': 'Convert+To+MP3'})
print(r.url, r.status_code)
Descarga("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
if someone could link me a tutorial for really understand how to use this module or tell me what im doing wrong or misunderstanding about this module i ll thank so much
See with me the site code:
<form class="w-clearfix" name="wf-form-signup-form" data-name="conversionForm" form action="/" method="post" id="conversionForm">
It's a form. The form is using the method 'post' in the same page.
<input class="w-input field" id="videoURL" type="text" placeholder="Video URL" name="videoURL" data-name="videoURL" required="required">
The first data "videoURL".
<select class="w-select" id="quality" name="quality" data-name="quality" required="required">
The second data "quality".
<input class="w-button button" type="submit" name="submit" value="Convert To MP3">
</form>
The submit button is not important. Ignore it.
Now, lets pythonify.
import requests
video_url = 'https://www.youtube.com/watch?v=C0DPdy98e4c'
quality = '320'
post_data={ 'videoURL': video_url, 'quality': quality }
response = requests.post('https://vubey.yt/', data=post_data)
print(response.url, response.status_code)
Now you can parse the response.content and search for "Please wait" until the conversion is completed.
Related
I want to login to the site below using requests module in python.
https://accounts.dmm.com/service/login/password
But I cannot find the "login_id" and "password" fields in the requests' response.
I CAN find them using "Inspect" menu in Chrome.
<input type="text" name="login_id" id="login_id" placeholder="メールアドレス" value="">
and
<input type="password" name="password" id="password" placeholder="パスワード" value="">
I tried to find them in the response from requests, but couldn't.
Here is my code:
import requests
url = 'https://accounts.dmm.com/service/login/password'
session = requests.session()
response = session.get(url)
with open('test_saved_login.html','w',encoding="utf-8")as file:
file.write(response.text) # Neither "login_id" nor "password" field found in the file.
How should I do?
Selenium is an easy solution, but I do not want to use it.
The login form is created with javascript. Try viewing the page in a browser with javascript disabled there will be no form. The people who control that site are trying to prevent people from doing exactly what you're trying to do. In addition to the fact the form elements don't appear (which really doesn't matter with requests,) they are also using a special token that you won't be able to guess which I expect is also in obfuscated javascript. So it is likely impracticable to script a login with requests and unless you have special permission from this company it is highly inadvisable that you continue with doing what you're trying to do.
I have a certain situation which I'm out of ideas on how exactly to proceed. I have a very repetitive task to do which consists of:
Choose file from list of files
Press submit
Repeat until all files in folder have been submitted/uploaded
Sometimes I have 100's of files at a time, which can be very time consuming. I would like to write a script to automate this routine.
This is the visual of the page in question:
Menu Format
Of course this is represented by the following html code:
<input type="file" class="inputFile" data-name="userNumListFile">
<form class="navbar-form navbar-left" method="post" action="/give/giveItemBatch" enctype="multipart/form-data"><button type="submit" class="btn btn-default">Submit</button></form>
Those are the two entries that represent what I need to send a HTTP request to. I have done something similar in Python where I used the following code to access a authorization only webpage and then use bs4 to gather info needed.
import requests
payload = {'username': 'user',
'password': 'pw',
'rememberMe': 'true'}
with requests.Session() as s:
url = "http://yada.com"
p = s.post(url, data=payload)
soup = BeautifulSoup(p.text, "html.parser")
I was wondering if there is something similar to the above where I can submit a file to be uploaded and then press the submit button.
I would then cycle through all the files on my folder, that's the easy part.
Just use requests.post inside a loop, the name of the remote folder. First read local files and store it inside one array then start a Loop and put inside requests.post with the remote target.
I am struggling to post to a javascript/react form with Pythons Requests. I understand the regular way would be something like this
payload = {"user": "me", "password": "12345"}
s = requests.Session()
html = s.post(url, data=payload) `
The url part is the problem, since I cannot find it in the source. The source of the form looks like this:
<form class="Login-form" method="POST" data-reactid="19"> … </form>
I assumed a value for the action parameter but, well, it ain't there. I also tried to find an url in the javascript but to be honest, I can't read it very well.
So my question would be: How – if at all – can I make a post with Requests to a react formular?
Edit:
To make the question more concise and reflect the accepted answer:
If an html-form with javascript has no obvious url in the source where it posts to, how can I find out the url?
A form without an action= attribute POSTs to the current URL. But since you're dealing with React, it's probably handled by an action to an API endpoint behind the scenes. Watch the network tab under developer tools in your browser of choice to see how it's actually implemented and what URL the React application talks to.
I am trying to web-scrape some elements and their values off a page with Python; However, to get more elements, I need to simulate a click on the next button. There is a post back tied to these buttons, so I am trying to call it. Unfortunately, Python is only printing the same values over and over again [meaning the post back for the next button isn't being called]. I am using requests to do my POST/GET.
import re
import time
import requests
TARGET_GROUP_ID = 778092
SESSION = requests.Session()
REQUEST_HEADERS = {"Accept-Encoding": "gzip,deflate"}
GROUP_URL = "http://roblox.com/groups/group.aspx?gid=%d"%(TARGET_GROUP_ID)
POST_BUTTON_HTML = 'pagerbtns next'
EVENTVALIDATION_REGEX = re.compile(r'id="__EVENTVALIDATION" value="(.+)"').search
VIEWSTATE_REGEX = re.compile(r'id="__VIEWSTATE" value="(.+)"').search
VIEWSTATEGENERATOR_REGEX = re.compile(r'id="__VIEWSTATEGENERATOR" value="(.+)"').search
TITLE_REGEX = re.compile(r'<a id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_ctrl\d+_hlAvatar".*?title="(\w+)".*?ID=(\d+)"')
page = SESSION.get(GROUP_URL, headers = REQUEST_HEADERS).text
while 1:
if POST_BUTTON_HTML in page:
for (ids,names) in re.findall(TITLE_REGEX, page):
print ids,names
postData = {
"__EVENTVALIDATION": EVENTVALIDATION_REGEX(page).group(1),
"__VIEWSTATE": VIEWSTATE_REGEX(page).group(1),
"__VIEWSTATEGENERATOR": VIEWSTATEGENERATOR_REGEX(page).group(1),
"__ASYNCPOST": True,
"ct1000_cphRoblox_rbxGroupRoleSetMembersPane_currentRoleSetID": "4725789",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox": "3"
}
page=SESSION.post(GROUP_URL, data = postData, stream = True).text
time.sleep(2)
How can I properly call the post back in ASP.NET from Python to fix this issue? As stated before, it's only printing out the same values each time.
This is the HTML Element of the button
<a class="pagerbtns next" href="javascript:__doPostBack('ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00','')"> </a>
And this is the div it is in:
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_MembersPagerPanel" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton')">
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_Div1" class="paging_wrapper">
Page <input name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox" type="text" value="1" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_PageTextBox" class="paging_input"> of
<div class="paging_pagenums_container">125</div>
<input type="submit" name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton" value="" onclick="loading('members');" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton" class="pagerbtns translate" style="display:none;">
</div>
</div>
I was thinking of using a JS library and executing the JS __postback method, however, I would like to first see if this can be achieved in pure Python.
Yes it should be achievable you just have to submit correct values on correct fields. But i assume web page you are trying parse uses asp.net web forms so it should be really time consuming to find values and such. I suggest you to look into selenium with that you can easily call click and events on a webpage without writing so much code.
driver = webdriver.Firefox()
driver.get("http://site you are trying to parse")
driver.find_element_by_id("button").click()
//then get the data you want
I have a critical issue. I would like integrate my application with another much older application. This service is simply a web form, probably behind a framework (I think ASP Classic maybe). I have an action URL, and I have the HTML code for replicating this service.
This is a piece of the old service (the HTML page):
<FORM method="POST"
url="https://host/path1/path2/AdapterHTTP?action_name=myactionWebAction&NEW_SESSION=true"
enctype="multipart/form-data">
<INPUT type="text" name="AAAWebView-FormAAA-field1" />
<INPUT type="hidden" name="AAAWebView-FormAAA-field2" value="" />
<INPUT type="submit" name="NAV__BUTTON__press__AAAWebView-FormAAA-enter" value="enter" />
</FORM>
My application should simulate form submission of this old application from code-behind with Python. For now, I didn't have so much luck.
For now I do this
import requests
payload = {'AAAWebView-FormAAA-field1': field1Value, \
'AAAWebView-FormAAA-field2': field2Value, \
'NAV__BUTTON__press__AAAWebView-FormAAA-enter': "enter"
}
url="https://host/path1/path2/AdapterHTTP?action_name=myactionWebAction&NEW_SESSION=true"
headers = {'content-type': 'multipart/form-data'}
r = requests.post(url, data=payload, headers=headers)
print r.status_code
I receive a 200 HTTP response code, but if I click on submit button on the HTML page, the action saves the values, but my code does not do the same. How do I fix this problem?
The owner of an old application sent me this Java exception log. Any ideas?
org.apache.commons.fileupload.FileUploadException: the request was rejected because no multipart boundary was found
Try passing an empty dictionary as files with requests.post. This will properly construct a request with multipart boundary I think.
r = requests.post(url, data=payload, headers=headers, files={})