How to create a Redmine wiki page via script? - python

I want to create a special wiki page on my local Redmine server. It should contain an inventory of some executables from my server. My goal is a script which scans certain folders on my server for these files and put them (with some additional information) in a nice Redmine wiki page.
My first thought was to traverse my server's file system with a simple batch file and to create a SQL expression for putting the results directly into the underlying mySQL database (which contains Redmine's wiki pages). But I consider this too risky and too error-prone.
Then I had the idea to use a script language like python (which I always wanted to learn) to retrieve the information and send it back to the Redmine server, like a web browser would do. This should be a much safer way. But this doesn't seems to be an easy beginner's task when just starting with python - I fail to authenticate myself on the Redmine server.
My last idea was to create a HTML page with python, which could be displayed within a Redmine wiki page with the plugin 'Redmine Wiki Extensions'. But I consider this only as a solution light, because it's not very elegant.
So what I seek is either a new idea to solve this problem or some clues on how to do a proper authentification with python on my Redmine server - maybe I could use a cookie for easier authentification...

I'm not familiar with redmine, but if you are looking for something like having a script that performs some actions the same way you would do in a browser, then mechanize is a library that might be helpful for you unless there's some javascript involved. In that case, then I'd look into something like windmill or selenium to let you drive the web browser.
However, please note using web scraping is also error-prone since any change in the design of the web pages involved might break your scripts.
Regarding the option of using an API as pointed out by the comment from AdamKG, that would be a good option, since there's a REST API that you can use from python if you like. Unfortunately, I don't see anything to let you do what you're looking for and it seems it hasn't yet reached the stable status yet. Anyway, as I said, it's still a good option to consider in the future.

Related

How do you use a web.py application in Wordpress?

I have written an application in python to collect data from a javascript form and returned the processed text. It is based entirely off of the code here (but a lot more complex, so I have to use python for this).
https://kooneiform.wordpress.com/2010/02/28/python-and-ajax-for-beginners-with-webpy-and-jquery/
(note to people who like to edit...please leave this link in place since it shows all the relevant code sections in python and javascript).
I need to use this in wordpress (since that's what runs my site) and I honestly have no idea how to pull this off. Webpy can run with Apache CGI, but the documentation (http://webpy.org/cookbook/cgi-apache) is only clear if one wants to navigate directly to the python app as its own page.
I'm hoping someone here has expertise in how to embed this all within a Wordpress page/post?
Thanks!!
As far as I know, there is no native way to run Python code inside a WordPress site just like php. In fact, if you are not doing anything unique to Python, I would suggest you to use php, which supports regular expression and can be used in WordPress by installing the plugin "Insert PHP".
If you really want to use Python, then you need an API endpoint where you connect the function to your website. You would have to look into Azure Function App/AWS lambda on which you write a function app to work as a backend. Then whenever someone request your website, your website would do an HTTP request to that API.
Can you explain what exactly you want to do on your website?

How to read a HTML page that takes some time to load? [duplicate]

I am trying to scrape a web site using python and beautiful soup. I encountered that in some sites, the image links although seen on the browser is cannot be seen in the source code. However on using Chrome Inspect or Fiddler, we can see the the corresponding codes.
What I see in the source code is:
<div id="cntnt"></div>
But on Chrome Inspect, I can see a whole bunch of HTML\CSS code generated within this div class. Is there a way to load the generated content also within python? I am using the regular urllib in python and I am able to get the source but without the generated part.
I am not a web developer hence I am not able to express the behaviour in better terms. Please feel free to clarify if my question seems vague !
You need JavaScript Engine to parse and run JavaScript code inside the page.
There are a bunch of headless browsers that can help you
http://code.google.com/p/spynner/
http://phantomjs.org/
http://zombie.labnotes.org/
http://github.com/ryanpetrello/python-zombie
http://jeanphix.me/Ghost.py/
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/
The Content of the website may be generated after load via javascript, In order to obtain the generated script via python refer to this answer
A regular scraper gets just the HTML document. To get any content generated by JavaScript logic, you rather need a Headless browser that would also generate the DOM, load and run the scripts like a regular browser would. The Wikipedia article and some other pages on the Net have lists of those and their capabilities.
Keep in mind when choosing that some previously major products of those are abandoned now.
TRY THIS FIRST!
Perhaps the data technically could be in the javascript itself and all this javascript engine business is needed. (Some GREAT links here!)
But from experience, my first guess is that the JS is pulling the data in via an ajax request. If you can get your program simulate that, you'll probably get everything you need handed right to you without any tedious parsing/executing/scraping involved!
It will take a little detective work though. I suggest turning on your network traffic logger (such as "Web Developer Toolbar" in Firefox) and then visiting the site. Focus your attention attention on any/all XmlHTTPRequests. The data you need should be found somewhere in one of these responses, probably in the middle of some JSON text.
Now, see if you can re-create that request and get the data directly. (NOTE: You may have to set the User-Agent of your request so the server thinks you're a "real" web browser.)

Creating a URL filter for a proxy in python

I've been working on a small project which requires me to create a proxy with access to only a few sites.
Am using the code from here: https://github.com/labnol/google-proxy
Now, basically am a PHP guy, but haven't found anything as better as the above for setting up a web proxy server.
What I need here is:
A way to filter out URLs. Like I want people to access only the sites I allow.
In the allowed sites, I'd like to block certain scripts. e.g. On wikipedia.org, the person shouldn't be allowed to login.
Am a complete noob in Python. Can anyone suggest me something here, or provide a code snippet which I can use?
Thanks! :)
P.S.: For a PHP version, I've tried using Glype and miniProxy. But not as good as the one I mentioned. They break the CSS/JS of the websites.
You may install Squid and write your own filtering rules.

Python: How to Capture WebPage as Image File?

I want to cache a webpage as an image upon a user request, but I don't know where to start with this.
I'm developing on App Engine with python.
Here's a good library for capturing a webpage as a png image:
http://github.com/AdamN/python-webkit2png
One way is to use a web service such as thumbalizr since a lot of the programs for this type of thing aren't always install-able on appengine (because they use C++, etc). Other options include girafa and browsershots.
There are websites that to this for you. Google is your friend. If you build a script around them, you have what you need. As a demonstration, see http://webshots.velocitysc.com/sandbox/.
There are also downloadable programs that do it, such as the one at http://download.cnet.com/Advanced-Website-to-Image-JPG-BMP-Converter-Free/3000-2094_4-10900902.html. These are just examples; google a while and you'll find better implementations.
If you want to do it yourself, you basically need to duplicate a web browser (the HTML rendering part, anyway), which is unrealistic, or use a preexisting rendering engine like webkit as Zach suggests. If I were you, I would forget about doing it myself and use a preexisting web service, unless this is going to be the core of your application.

Python server side AJAX library?

I want to have a browser page that updates some information on a timer or events. I'd like to use Python on the server side. It's quite simple, I don't need anything massively complex.
I can spend some time figuring out how to do all this the "AJAX way", but I'm sure someone has written a nice Python library to do all the heavy lifting. If you have used such a library please let me know the details.
Note: I saw how-to-implement-a-minimal-server-for-ajax-in-python but I want a library to hide the implementation details.
AJAX stands for Asynchronous JavaScript and XML. You don't need any special library, other than the Javascript installed on the browser to do AJAX calls. The AJAX requests comes from the client side Javascript code, and goes to the server side which in your case would be handled in python.
You probably want to use the Django web framework.
Check out this tutorial on Django tips: A simple AJAX example.
Here is a simple client side tutorial on XmlHTTPRequest / AJAX
You can also write both the client and server side of the ajax code using python with pyjamas:
Here's an RPC style server and simple example:
http://www.machine-envy.com/blog/2006/12/10/howto-pyjamas-pylons-json/
Lots of people use it with Django, but as the above example shows it will work fine with Pylons, and can be used with TurboGears2 just as easily.
I'm generally in favor of learning enough javascript to do this kind of thing yourself, but if your problem fits what pygjamas can do, you'll get results from that very quickly and easily.
I suggest you to implement the server part in Django, which is in my opinion a fantastic toolkit. Through Django, you produce your XML responses (although I suggest you to use JSON, which is easier to handle on the web browser side).
Once you have something that generates your reply on server side, you have to code the javascript code that invokes it (through the asynchronous call), gets the result (in JSON) and uses it to do something clever on the DOM tree of the page. For this, you need a JavaScript library.
I did some experience with various javascript libraries for "Web 2.0". Scriptaculous is cool, and Dojo as well, but my absolute favourite is MochiKit, because they focus on a syntax which is very pythonic, so it will hide you quite well the differences between javascript and python.

Categories