Here's the link: http://nikeplus.nike.com/plus/
The email/password option only shows when I click "Log in" button. So how to use python to log into this website?
I tried twill and got the forms on the page but it includes only the search bar. So not sure how to proceed
While not a python solution, I wrote a PHP class that actually lets you get the data from the Nike+ website: https://nikeplusphp.charanj.it
The class works by faking the login on the website and then makes requests to the feeds. If you look through the code you'll find all the URLs to make the necessary GET requests and there is a method called _login() and this should give you an idea of what parameters are posted.
Related
I have a problem getting javascript content into HTML to use it for scripting. I used multiple methods as phantomjs or python QT library and they all get most of the content in nicely but the problem is that there are javascript buttons inside the page like this:
Pls see screenshot here
Now when I load this page from a script these buttons won't default to any value so I am getting back 0 for all SELL/NEUTRAL/BUY values below. Is there a way to set these values when you load the page from a script?
Example page with all the values is: https://www.tradingview.com/symbols/NEBLBTC/technicals/
Any help would be greatly appreciated.
If you are trying to achieve this with scrapy or with derivation of cURL or urrlib I am afraid that you can't do this. Python has another external packages such selenium that allow you to interact with the javascript of the page, but the problem with selenium is too slow, if you want something similar to scrapy you could check how the site works (as i can see it works through ajax or websockets) and fetch the info that you want through urllib, like you would do with an API.
Please let me know if you understand me or i misunderstood your question
I used seleneum which was perfect for this job, it is indeed slow but fits my purpose. I also used the seleneum firefox plugin to generate the python script as it was very challenging to find where exactly in the code as the button I had to press.
I'm trying to scrape the price of this product
http://www.asos.com/au/fila/fila-vintage-plus-ringer-t-shirt-with-small-logo-in-green/prd/9065343?clr=green&SearchQuery=&cid=7616&gridcolumn=2&gridrow=1&gridsize=4&pge=1&pgesize=72&totalstyles=4699
With the following code but it returns an empty array
response.xpath('//*[#id="product-price"]/div/span[2]/text()').extract()
Any help is appreciated, Thanks.
Because the site is dynamic(this is what I got when I use view(response) command in scrapy shell:
As you can see, the price info doesn't come out.
Solutions:
1. splash.
2. selenium+phantomJS
It might help also by checking this answer:Empty List From Scrapy When Using Xpath to Extract Values
The price is later added by the browser which renders the page using javascript code found in the html. If you disable javascript in your browser, you would notice that the page would look a bit different. Also, take a look at the page source, usually that's unaltered, to see that the tag you're looking for doesn't exist (yet).
Scrapy doesn't execute any javascript code. It receives the plain html and that's what you have to work with.
If you want to extract data from pages which look the same as in the browser, I recommend using an headless browser like Splash (if you're already using scrapy): https://github.com/scrapinghub/splash
You can programaticaly tell it to download your page, render it and select the data points you're interested in.
The other way is to check for the request made to the Asos API which asks for the product data. In your case, for this product:
http://www.asos.com/api/product/catalogue/v2/stockprice?productIds=9065343¤cy=AUD&keyStoreDataversion=0ggz8b-4.1&store=AU
I got this url by taking a look at all the XMLHttpRequest (XHR) requests sent in the Network tab found in Developers Tools (on Google Chrome).
You can try to find JSON inside HTML (using regular expression) and parse it:
json_string = response.xpath('//script[contains(., "function (view) {")]/text()').re_first( r'view\(\'([^\']+)' )
data = json.loads(json_string)
price = data["price"]["current"]
I want to build a python script to submit some form on internet website. Such as a form to publish automaticaly some item on site like ebay.
Is it possible to do it with BeautifulSoup or this is only to parse some website?
Is it possible to do it with selenium but quickly without open really the browser?
Are there any other ways to do it?
Look at the requests library.. Also, check out the chrome debugger toolbar to see the requests fly by. There is also a utility called postman, where you can "design", queries, then generate code in many different flavors (including pythons requests library).
BeautifulSoup is for parsing HTML.
You can use selenium with PhantomJS to do this without the browser opening. You have to use the Keys portion of selenium to send data to the form to be submitted. It is also worth noting that this method will not work if there are captcha's on the form.
The mechanize library can fill and submit forms.
I am wondering how I can fill an online form automatically. I have researched it and it tuned out that, one can uses Python ( I am more interested to know how to do it with Python because it is a scripting language I know) but documentation about it is not very good. This is what I found:
Fill form values in a web page via a Python script (not testing)
Even the "mechanize" package itself does not have enough documentation:
http://wwwsearch.sourceforge.net/mechanize/
More specifically, I want to fill the TextArea in this page (Addresses):
http://stevemorse.org/jcal/latlonbatch.html?direction=forward
so I don't know what I should look for? Should I look for "id" of the the textArea? ?It doesn't look like that it has "id" (or I am very naive!). How I can "select_form"?
Python, web gurus, please help.
Thanks
See if my answer to the other question you linked helps:
https://stackoverflow.com/a/5685569/711017
EDIT:
Here is the explicit code for your example. Now, I don't have mechanize installed right now, so I haven't been able to check the code. No online IDE's I checked have it either. But even if it doesn't work, toy around with it, and you should eventually get there:
import re
from mechanize import Browser
br = Browser()
br.open("http://stevemorse.org/jcal/latlonbatch.html?direction=forward")
br.select_form(name="display")
br["locations"] = ["Hollywood and Vine, Hollywood CA"]
response = br.submit()
print response.read()
Explanation: br emulates a browser that opens your url and selects the desired form. It's called display in the website. The textarea to enter the address is called locations, into which I fill in the address, then submit the form. Whatever the server returns is the string response.read(), in which you should find your Lat-Longs somewhere. Install mechanize and check it out.
I'm trying to scrape some information from a web site, but am having trouble reading the relevant pages. The pages seem to first send a basic setup, then more detailed info. My download attempts only seem to capture the basic setup. I've tried urllib and mechanize so far.
Firefox and Chrome have no trouble displaying the pages, although I can't see the parts I want when I view page source.
A sample url is https://personal.vanguard.com/us/funds/snapshot?FundId=0542&FundIntExt=INT
I'd like, for example, average maturity and average duration from the lower right of the page. The problem isn't extracting that info from the page, it's downloading the page so that I can extract the info.
The page uses JavaScript to load the data. Firefox and Chrome are only working because you have JavaScript enabled - try disabling it and you'll get a mostly empty page.
Python isn't going to be able to do this by itself - your best compromise would be to control a real browser (Internet Explorer is easiest, if you're on Windows) from Python using something like Pamie.
The website loads the data via ajax. Firebug shows the ajax calls. For the given page, the data is loaded from https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542
See the corresponding javascript code on the original page:
<script>populator = new Populator({parentId:
"profileForm:vanguardFundTabBox:tab0",execOnLoad:true,
populatorUrl:"/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542",
inline:fals e,type:"once"});
</script>
The reason why is because it's performing AJAX calls after it loads. You will need to account for searching out those URLs to scrape it's content as well.
As RichieHindle mentioned, your best bet on Windows is to use the WebBrowser class to create an instance of an IE rendering engine and then use that to browse the site.
The class gives you full access to the DOM tree, so you can do whatever you want with it.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(loband).aspx