Python, Cloud - Online Web Scraping Tool with scheduling capabilities - python

I'm planning to develop a Web/cloud application using python which does the following,
1. Upload Perl/Python scraping scripts and execute.
2. Uploaded scripts to run on a Schedule.
3. Run multiple instance of same script with different input parameters.
4. Measure scripts performance.
5. View Scripts standard output/log.
I got limited/no knowledge about technologies which can satisfy my requirement,
Tips, Pointers, Ideas to existing work, libraries, open source implementation etc are appreciated.
Thanks,
Rajesh.

Use Scrapy as the base for your work:
http://scrapy.org/
For uploading, script performance and output (through web interface, I assume) you need to write your custom web front end which stores this information in a database where you can then explore it. One option for this is Pyramid:
http://pypi.python.org/pypi/pyramid/
For Python cloud deployments see Heroku:
http://www.heroku.com/

Related

How to use python and wix for web development [duplicate]

I'm trying to integrate backend code into a Wix site. Im not too picky about how I want to do this, or what language to write in (ideally, I have a locally-hosted Java code that I'd love to simply call). I wouldn't mind re-writing it in JavaScript though, or another language. But before I decide that I'm confused about my options. I can code but I'm new to the concepts like modules, APIs, & servers.
According to my research, back-end code with Wix is supposed to be easy (or at least do-able and not THAT complicated)....
From this webpage https://support.wix.com/en/article/corvid-calling-server-side-code-from-the-front-end-with-web-modules,
"Web modules are exclusive to Corvid and enable you to write functions that run server-side in the backend, and easily call them in your client-side code. With web modules you can import functions from backend into files or scripts in public, knowing they will run server-side. Corvid handles all the client-server communication required to enable this access."
And from this: https://www.sitepoint.com/what-is-wix-code/
"It’s serverless: All this added functionality comes in a serverless environment that lets you get your work done without any of the normal full-stack development headaches.
Just code and go: Wix Code has a built-in, online IDE and backend so you can just add the code you need to your page or your site, publish, and you’re live."
So, I thought they have a backend IDE where I can write backend code directly, or I could call my Java program. But, as I tried doing this and finding tutorials, it seems I can really only do this by calling a public API from the backend...?
https://youtu.be/tuu0D1izrUU
But ive also read (and someone who supposedly has done it before told me this) that Wix integrates with node.js, which is a backend version of JavaScript.
Can I use a Wix domain for a NodeJS app?
But, when I go into my Wix site I cannot find any option for using Node JS, and doing research on that gives me no useful results.
So, I'm thoroughly confused on what the capabilities are here. Can someone help me make sense of this?
Why are there no tutorials showing explicit code in the Corvid backend module? What's stopping me from simply writing my Java program there in a module? Do I really need an API endpoint to call and pass to the front end?
Is Node JS supported or not - has anyone done this before?
Also, in one link above they said everything is "serverless". But if I have to set up my own API endpoint won't I need to set up my own server??
There are basically two ways to go about this, which you seem to have already discovered.
Write your backend code in your Wix site. Indeed, the backend is built on Node.js as you can see here. Using this approach you will have to use JavaScript. As you seem to have found, you write this code in the Backend section of your site in a Web Module. Pros: you don't need to worry about managing a server and all your code is in one place.
Expose your already existing Java code as an API that your Wix site can call using the wix-fetch API. Pros: you don't need to rewrite your code.

Where to run online Python 3.8 script

I have a folder with different folders in it.
In one of the folder I have a python script.
The python script reads an excel file (which is in the same folder), scrapes information from the internet, updates the excel file and creates another excel file in the main directory.
My question is:
As I can't run my computer non stop, I imagine it's possible (easy? and free) to upload all my folders on a website which will allow me to run my python (3.8) script. Do you have any suggestions ? Which website could be appropriate ? Pythonanywhere.com ?
Plus, I'd like to run this script every morning at 6am.
Thank you for your answers ! :)
Yes, you could use PythonAnywhere -- free accounts allow you to create one scheduled task, which can run once a day. If you have an account, you can set it up on the "Tasks" page.
Some public cloud providers, such as GCP, AWS, and Azure, offer free tier VMs. Simply run the code on those and set up a cron job. Though the network usage probably still costs you a few cents a month, this is a very cheap way to go. You could also consider setting up a FaaS solution against very low cost.
As #Klaus said, this is not a programming question. If you are on linux you can use crontab to schedule your process.
crontab
And if you want to run it on the cloud you can use free services like Heroku

Need advice on how to incorporate Python into an Azure, specifically an ASP.NET web application environment

Need advice on how to incorporate Python into an Azure ASP.NET web application environment. Please excuse this question but I am new to Azure and I'm not clear on how to proceed. Every option that I look into looks promising but they all seem to have their own issues. Below is a more thorough explanation but the deal is that I have an Azure account with all kinds of goodies, a full fledged ASP.NET (C#) web app running via App Service, I am new to Azure (but not Python), and I'm hoping to add Python functionality to this whole setup. In short:
I want to add Python to this setup mainly to run scheduled jobs and also to trigger Python code from ASP.NET web form submissions
ideally I want a solution that resembles a non-cloud setup. I know this sounds silly but I'm finding the cloud/Azure functionality to be nuanced and not straightforward. I want a place to put a bunch of Python scripts, run, edit, schedule and trigger them from ASP.NET
for example: I created a WebJob that runs manually and from the documentation it wasn't clear how it should be called. I just figured out that you need to POST with Basic Auth (and the credentials provided).
!Also, Azure CMD does NOT like files with 'underscore _' in them! You cannot submit a Web Job with a py file with an underscore nor can you write output with a file with an underscore
!Also, I don't see an option for this Web Job to run Python 3.6.4 (which I installed via extension). Right now it is using 2.7.15...
!Also, CRON expression in Azure has six *, not five plus a command. Again, more weird stuff to worry about
I tried these instructions but the updates to the web page's Web.config file breaks the ASP.NET web pages
ideally the most cost effective option
Any info is greatly appreciated
MORE DETAILED EXPLANATION
Currently I have an ASP.NET site running via Azure App Service and I would like to add Python scripts and possibly Flask/Rest functionality. Note that I am not expecting to serve any content via Python and will largely be running Python scripts either on a scheduled basis or call them from ASP.NET. As a matter of fact, and this is an important point, I'm hoping to have ASP.NET trigger/run a Python script when a web form is submitted. I realize that I could get a similar effect if I make a web call to a Rest api that is running Python. In any event, I can't tell if I should:
add a Python extension to the current App Service running the web page (I tried this) OR
I did install Python 3.6.4 and some packages via pip
These instructions were useful, however the updates to the web page's Web.config file breaks the ASP.NET web pages
set up a VM that will have all of the Python code (but how can I have the .NET web page(s) call the Python in the VM?) OR
use Azure functions (I'm completely new to this and must admit that I prefer to have my old school Python environment instead although I see the benefit of using functions. But how do you deal with logging and debugging?)
or what about a custom windows container (Docker)?
This requires installing VS Code and that is OK but I'm looking for a solution that another user can get into with as few interruptions as possible
The idea is to ramp up the use of Python although, like I said, I don't expect Python to be serving any of the web content. It will be used to run in the background and to run scheduled jobs. What is the most robust and hopefully easiest way to add Python functionality to Azure (most importantly in a way to be able to trigger/use Python from an App Service running .NET?)? I've searched online and stack overflow so far with interesting finds but nothing to my liking.
For example, the following link discusses how to schedule WebJobs. I just created a manual one and when I called the webhook I got the message: "No route registered for '/api/triggeredwebjobs/TestPython/run'" How to schedule python web jobs on azure
The Docker method looks very promising, however, I'm looking for a simple solution as there is another person who will be involved in all of this and he's busy with other projects
Thank you very much!
I found a solution, though I'm open to more info. Like I mentioned in my post, I used the 'add extension' tool to add Python 3.6.4 to my Azure (installed in D:\home\python364x64).
Then I installed a bunch of packages via pip, these installed into D:\home\python364x64\Lib\site-packages.
I created a Python folder in webpages\Python where I put my scripts.
Finally, in ASP.NET I used the Diagnostics.Process call to run my code in ~\webpages\Python\somecode_2.py
The main issue is that Azure came with Python 2.7.15 installed. And for some reason when my Python code got executed it was using 3.4 (where that version came from beats me). So for each script, I had to create an _2.py version where I simply did the following in order to call the original script via Python 3.6.4. Looks a little nasty but it works. So like I said, I would welcome more info for ways to do this better...
--
import os<br>
os.system("D:\\home\python364x64\python.exe SomePython.py {0}".format(add arguments here)

How to expose an NLTK based ML(machine learning) Python Script as a Web Service?

Let me explain what I'm trying to achieve. In the past while working on Java platform, I used to write Java codes(say, to push or pull data from MySQL database etc.) then create a war file which essentially bundles all the class files, supporting files etc and put it under a servlet container like Tomcat and this becomes a web service and can be invoked from any platform.
In my current scenario, I've majority of work being done in Java, however the Natural Language Processing(NLP)/Machine Learning(ML) part is being done in Python using the NLTK, Scipy, Numpy etc libraries. I'm trying to use the services of this Python engine in existing Java code. Integrating the Python code to Java through something like Jython is not that straight-forward(as Jython does not support calling any python module which has C based extensions, as far as I know), So I thought the next option would be to make it a web service, similar to what I had done with Java web services in the past. Now comes the actual crux of the question, how do I run the ML engine as a web service and call the same from any platform, in my current scenario this happens to be Java. I tried looking in the web, for various options to achieve this and found things like CherryPy, Werkzeug etc but not able to find the right approach or any sample code or anything that shows how to invoke a NLTK-Python script and serve the result through web, and eventually replicating the functionality Java web service provides. In the Python-NLTK code, the ML engine does a data-training on a large corpus(this takes 3-4 minutes) and we don't want the Python code to go through this step every time a method is invoked. If I make it a web service, the data-training will happen only once, when the service starts and then the service is ready to be invoked and use the already trained engine.
Now coming back to the problem, I'm pretty new to this web service things in Python and would appreciate any pointers on how to achieve this .Also, any pointers on achieving the goal of calling NLTK based python scripts from Java, without using web services approach and which can deployed on production servers to give good performance would also be helpful and appreciable. Thanks in advance.
Just for a note, I'm currently running all my code on a Linux machine with Python 2.6, JDK 1.6 installed on it.
One method is to build an XML-RPC server, but you may wish to fork a new process for each connection to prevent the server from seizing up. I have written a detailed tutorial on how to go about this: https://speakerdeck.com/timclicks/case-studies-of-python-in-parallel?slide=68.
NLTK based system tends to be slow at response per request, but good throughput can be achieved given enough RAM.

XAMPP - Execute Python script on web page

How can I execute a Python script on a webpage?
I've used XAMPP to create the Apache server. Are there any tutorials/examples or guides on how to execute a .py script? I'm using Windows 7 and have installed Python on my local machine. If I access the .py script via the web link, it looks as if its HTML code and nothing is executed.
Typically, you don't execute Python in the browser. Instead, the browser accesses a resource (or "webpage", like http://example.com/mypage) by requesting the resource from the server. The server (for example, Apache), when administered correctly, passes off handling of the request to some Python script. Then, your Python script creates some output (for example, HTML) which the server then returns to the browser for the browser to display.
However, some web sites have found it useful to have logic (scripts) run in the browser, rather than on the server. The standard way of doing this is using JavaScript (although in the past there WERE other languages built into browsers, such as VBScript in Internet Explorer).
Right now, pretty much all browsers have settled on JavaScript as THE scripting language in the browser. In order for you to use any other language in the browser (including Python), the browser must support that scripting language (or needs to have an add-on to support that scripting language). Simply having Python installed on your client alongside the browser is not enough. For more information, please see the Python documentation Web Browser Programming.
Another option is to use something like Pyjs. This is a library that has you write your code in Python, and converts the necessary parts to JavaScript. This isn't exactly "Python in the browser", but it might be something you are looking for.
I guess this links will help....
https://community.apachefriends.org/f/viewtopic.php?t=42975
Usually a good resource for Python is the official documentation.
They do a great job of explaining many aspects of Python. Using Python on the web is a big part and they have a great overview.

Categories