I'm attempting a hack because I can't find a better way (A native less hacky solution would be much appreciated too)
I'm using neo4j to analyse some graphs. I generate cypher queries and can use py2neo or similar to run them and get back and result. Sometimes I need to show the result and I've used vis.js and toyed with others, but I find the best solution is the one build into the browser on port 7474. I can't find a stand alone I of that js package I could use to display plots in a notebook or website (an answer for this would be great) so I'm using selenium to send the generated queries (the hack)
The submission field for queries is a div with a ReactCodeMirror class and a text area. I can't work out how to submit a query to it, as it goes out of focus upon calling send_keys
Similar question (but no answer): How can I execute a cypher query (from java) to neo4j's browser?
You may want to take a look at the Javascript version of the Neo4j Movies Example Application. It shows how to write a web app that dynamically displays query results.
This article on the example should also be informative.
Related
I looked into this question which asked how a bot could input text on a webpage. One of the answers recommended using Selenium, but a comment there suggested using it was an inefficient way of accomplishing that task.
Say I wanted to create a bot that looks up a set of words on Wikipedia (using the search bar on Wikipedia) and gives me the first 20 words in each article. Would Selenium be the best tool for this?
(Note that I'm aware I could do this manually by just looking up https://en.wikipedia.org/wiki/<word I want> for each item in the list, but I'm specifically looking for how a bot would interact with search bars.)
Efficient and bot for what you're doing doesn't seem to intersect from what you described - why bother using a framework that renders the entire view as a human would see it when you are not using any of that visual content? The most efficient way to utilize a python bot to search on wiki would be to utilize the api and get the results as json to be parsed by the bot.
Searching Wikipedia using API
There is nothing magical about a search bar - when the input is put in there, the browser is redirected to the other url location as you stated https://en.wikipedia.org/wiki/<word you want>. I believe the inefficiency that is being referenced is this exact fact that you can just search manually without the search bar. Rendering and finding the bar to type something in and then submit takes hundreds of milliseconds. Searching directly on the API can be done in milliseconds - much more efficient.
I am trying to scrape a web site using python and beautiful soup. I encountered that in some sites, the image links although seen on the browser is cannot be seen in the source code. However on using Chrome Inspect or Fiddler, we can see the the corresponding codes.
What I see in the source code is:
<div id="cntnt"></div>
But on Chrome Inspect, I can see a whole bunch of HTML\CSS code generated within this div class. Is there a way to load the generated content also within python? I am using the regular urllib in python and I am able to get the source but without the generated part.
I am not a web developer hence I am not able to express the behaviour in better terms. Please feel free to clarify if my question seems vague !
You need JavaScript Engine to parse and run JavaScript code inside the page.
There are a bunch of headless browsers that can help you
http://code.google.com/p/spynner/
http://phantomjs.org/
http://zombie.labnotes.org/
http://github.com/ryanpetrello/python-zombie
http://jeanphix.me/Ghost.py/
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/
The Content of the website may be generated after load via javascript, In order to obtain the generated script via python refer to this answer
A regular scraper gets just the HTML document. To get any content generated by JavaScript logic, you rather need a Headless browser that would also generate the DOM, load and run the scripts like a regular browser would. The Wikipedia article and some other pages on the Net have lists of those and their capabilities.
Keep in mind when choosing that some previously major products of those are abandoned now.
TRY THIS FIRST!
Perhaps the data technically could be in the javascript itself and all this javascript engine business is needed. (Some GREAT links here!)
But from experience, my first guess is that the JS is pulling the data in via an ajax request. If you can get your program simulate that, you'll probably get everything you need handed right to you without any tedious parsing/executing/scraping involved!
It will take a little detective work though. I suggest turning on your network traffic logger (such as "Web Developer Toolbar" in Firefox) and then visiting the site. Focus your attention attention on any/all XmlHTTPRequests. The data you need should be found somewhere in one of these responses, probably in the middle of some JSON text.
Now, see if you can re-create that request and get the data directly. (NOTE: You may have to set the User-Agent of your request so the server thinks you're a "real" web browser.)
I was hoping someone might be able to provide some insights into the feasibility of utilizing the scrapy python framework for creating a realtime wrapper.
To clarify my definition of the term "wrapper" in this context let me describe my situation... I was hoping to use scrapy to essentially script a solution to allow a user to execute a search query on a website which in turn would call a scrapy spider in real-time within which that spider is told to:
login to a 3rd party write
execute the users search query
retrieve only the actual html results for the returned query by extracting the resulting html content by specifying the unique result set container class and/or xpath).
modify the extracted html results (by either reforming the html and/or injecting a new header/footer or css elements). 5) and finally returning the modified html results in real-time so the html can be directly injected into the original domain all by being transparent to the user.
I should point out that I am familiar with writing scrapy spider for large scale crawling in bulk but I am less familiar with the prospect or feasibility of being able to use it to construct a real-time type of "wrapper".
If anyone has any insight, advice or examples which illustrate a similar situation I would greatly appreciate it. CH
You may try HTQL browser interface for python at http://htql.net/. An example to Bing search in real time is:
import htql;
a=htql.Browser();
b=a.goUrl("http://www.bing.com/");
c=a.goForm("<form>1", {"q":"test"});
for d in htql.HTQL(c[0], "<a (tx like '%test%')>"):
print(d);
e=a.click("<a (tx like '%test%' and not (href like '/search%'))>1");
It can be coupled with IRobotSoft scraper to do everything visually, by changing the browser to:
a=htql.Browser(2);
More details can be found from this manual http://htql.net/htql-python-manual.pdf or ask at http://irobotsoft.org/bb/
Ok, so I have used this code to setup a simple program. Basically it just sets points on a map in python. Very basic. The problem now is I have a condition where when that condition happens it refreshes the page with a simple QWebView.setHtml(). What I want to do now is find a way to get the current zoom and center information of the map and save it. Is there anyway to go into the javascript and grab that information, store it and then write it into the html the same way this code already does before i do the refresh?
Sorry if this is confusing or broad, I just cant think of a way to get the information from the map using python.
There are a number of ways to interact with client-side code from the server (Python). You could set a cookie on the client, send an AJAX request to the server from Javascript, submit a form, or go a more exotic route like WebSockets. Without more information, I'm not sure we can tell you which is the best.
What web framework are you using?
EDIT:
Oh, I see- you're using a scriptable web view... maybe something like zoom = view.evaluateJavaScript('map.getZoom();')? From what I can see of the library, the difficult part might be getting a reference to the map var in JS.
EDIT:
I don't think this is possible without modifying or extending pymaps, since it scopes the GMap locally in JS and doesn't expose it anywhere. I've done just that in a gist. You can then access the zoom with something similar to the above- maybe zoom = view.evaluateJavaScript("PyMaps[0].gmap.getZoom();").
EDIT:
In case this wasn't clear- the gist I included requires that you use MyPyMap instead of PyMap.
From another StackOverflow question, I realized you can't do evaluteJavaScript straight on a view. The subsequent code would look more like this
doc = view.page().mainFrame().documentElement()
zoom_level = doc.evaluateJavaScript("PyMaps[0].gmap.getZoom();")
I want to create a special wiki page on my local Redmine server. It should contain an inventory of some executables from my server. My goal is a script which scans certain folders on my server for these files and put them (with some additional information) in a nice Redmine wiki page.
My first thought was to traverse my server's file system with a simple batch file and to create a SQL expression for putting the results directly into the underlying mySQL database (which contains Redmine's wiki pages). But I consider this too risky and too error-prone.
Then I had the idea to use a script language like python (which I always wanted to learn) to retrieve the information and send it back to the Redmine server, like a web browser would do. This should be a much safer way. But this doesn't seems to be an easy beginner's task when just starting with python - I fail to authenticate myself on the Redmine server.
My last idea was to create a HTML page with python, which could be displayed within a Redmine wiki page with the plugin 'Redmine Wiki Extensions'. But I consider this only as a solution light, because it's not very elegant.
So what I seek is either a new idea to solve this problem or some clues on how to do a proper authentification with python on my Redmine server - maybe I could use a cookie for easier authentification...
I'm not familiar with redmine, but if you are looking for something like having a script that performs some actions the same way you would do in a browser, then mechanize is a library that might be helpful for you unless there's some javascript involved. In that case, then I'd look into something like windmill or selenium to let you drive the web browser.
However, please note using web scraping is also error-prone since any change in the design of the web pages involved might break your scripts.
Regarding the option of using an API as pointed out by the comment from AdamKG, that would be a good option, since there's a REST API that you can use from python if you like. Unfortunately, I don't see anything to let you do what you're looking for and it seems it hasn't yet reached the stable status yet. Anyway, as I said, it's still a good option to consider in the future.