Implementing own scrapyd service - python

I want to create my own service for scrapyd API, which should return a little more information about running crawler. I get stuck at very beginning: where I should place the module, which will contain that service. If we look at default "scrapyd.conf" it's has a section called services:
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
so this is the absolute paths to each service in scrapyd package, which placed in dist-packages folder. Is there any way to place my own module, containing service not in dist-packages folder?
upd.
Realized that question may be unclear. Scrapy is a framework for parsing data from websites. I have a simple django site from where I can start/stop crawlers for specific region etc (http://54.186.79.236 it's in russian). Manipulating with crawlers occurs through scrapyd API. In default it has a little API's only for start/stop/list crawlers and their logs etc. This APIs are listed in this doc's http://scrapyd.readthedocs.org/en/latest/api.html
So above was a little intro, to the question now. I want extend existing API to retrieve more info from running crawler and render it in my website mentioned above. For this I need inherit existing scrapyd.webservice.WsResource and write a service. Its ok with that part if I place that service module in one of 'sys.path' paths. But I want to keep this service containing module in scrapy project folder (for some aesthetic reason). So if I keep it there it argues(predictably) 'No module named' on scrapyd launch.

So, I solve my problem according to this.

Related

How do I structure a project which uses django and ebay python sdk?

I am completely new to django and making non-static websites, so I'm sorry if this is a stupid question, but I couldn't find info anywhere on how to actually structure files for a django project.
I built a program that scrapes data from one online store, formats the data, and then adds the item to eBay via ebay python sdk. For this, I wrote three .py files - one that uses the requests library to get product data, one that formats the data into the correct structure to feed into the api, and finally the file that makes the api call. The way I understand eBay SDK, these files all have to be in the ebaysdk directory in order to work (at least, the one that makes the api call has to be in there).
My problem is that I now want to make a simple django website, which takes a product url from the user from that store, and then automatically uploads the product. I am confused about how to make my three .py files work with a django website? Do I have to restructure the whole thing and put them into my views.py file as functions? But then, how can I make the api call with ebay sdk if it's not in the right directory?

Linking python file into functioning HTML/CSS website

I'm giving myself a project to better learn these languages which I already know a lot of it's just syncing them together I need to get better with. This project is a pretty basic "SIM" game, generate some animals into your profile with login/logout. So far I've got the website aspect with HTML/CSS done and functioning with all the pages I currently need all of which is local host on my desktop. Now I'm moving on to working with Python and possibly some PHP aspects into this to get the login/logout and generate a new animal into your account.
Everything I've done with python so far has been done in IDEL, I'm wondering how to link my python document to my HTML document. Like you would CSS? Or is that not possible if not then how do I connect the two to have python interact with the HTML/CSS that has been created? I'm guessing to need MySQL for a database setup but seeing how much I can get as a simple local host without hosting online?
If you want to setup a localhost with PHP and MYSQL I can recommend XAMP (https://www.apachefriends.org/). In order for your webapp to talk to your Python scripts you will either need to use FLASK or Django to create a python webserver, or use PHP to run python scripts. Either way, you will need to make AJAX requests to an API to get this done.
Edit: Forgot to mention this, but you will need JavaScript in order to do this

How to structure a web scraper project?

I have a project that is to collect posts from several second hand vehicle websites using BeautifulSoup and then store them in a database. Also my client requested to build this functionality on top of some content management system he is familiar or semi-familiar with like wordpress.
Can this be done using wordpress without making a big mess out of it? If not how would you suggest to structure my project and what cms to use?
Wordpress seems to support only mySQL and MariaDB, according to their site: https://codex.wordpress.org/Using_Alternative_Databases. Those seem to be your only database-tech options if you want to maintain Wordpress support.
From there, it's up to whatever is easier for your python to access, to be honest.

Relative import of a apackage in python flask application

Trying to make the sample flask application more modular,I am new to python and flask trying to build a sample application where , I have planned to maintain the folder structure of the application as shown below
where the description of the package are as fallows
config ---> database configuration details
flaskApp
1 model--->which has the mongodb schema
2 viewController----> the endpoint to be accessed
static--->
which contains the single html page which i just need to serve (not render it)
The code repo for the same is in github
https://github.com/dhanalakshmiZendynamix/python-Flask-relative-module.git
I am facing following problems
1: I am not finding a easy way to access the packages to another packages as in folder structure(ie, models inside viewController where the end points are present)
2:Not sure how to serve the html page inside static folder
Tried reading many source
https://exploreflask.com/en/latest/preface.html
http://pyvideo.org/pycon-us-2014/writing-restful-web-services-with-flask.html
But still not sure how to get it working
Please help to adopt to the above folder structure and access to the end point really not sure how to go about it
Any suggestion and pointer would help a lot Thank you
#dhana lakshmi
Check the registered url endpoints in your app.
Start Python on the commandline in your project directory and execute the following commands:
>>> import flaskApp
>>> app = flaskApp.create_app()
>>> app.url_map
Please add the output to your question
And I really think you need to read up a bit on python and flask first, here is a list with some great resources on flask https://github.com/humiaozuzu/awesome-flask

How can I update a plone page via a script?

I have a large amount of automatically generated html files that I would like to push to my Plone website with a script. I currently generate the files, log into Plone, click edit on each individual page and copy and paste the html into the editor. I'd like to automate this. It would be nice to retain the plone versioning, have a auto generated comment for the edit, and come from a specific user.
I've read and tried Webdav with little luck at getting it working consistently and know that there is a way to connect to plone via ftp, but haven't tried it. I'm not sure if these are the methods that I need.
My google searches aren't leading me to anything useful. Any ideas on where to start looking for a solution to this? Or any tips on implementing it?
You can script anything in Plone via the following methods:
Through-the-web via API calls (e.g. XML-RPC, wsapi, etc.)
The bin/instance run script provided by plone.recipe.zope2instance (See charm for an example of this).
You can also use a migration framework like:
collective.transmogrifier
which allows you to write migration code, and trigger it via GenericSetup or Browser view. Additionally, there are applications written on top of Transmogrifier aimed roughly at what you are describing, the most popular of which is:
funnelweb
I would recommend that you consider using or writing a Transmogrifier "blueprint(s)" to do your import, and execute the pipeline with a tool that makes that easy:
mr.migrator
You can find blueprints by searching PyPI for "transmogrify". One popular set of blueprints is:
quintagroup.transmogrifier
One of the main attractions to the Transmogrifier approach, aside from getting the job done, is the ability to share useful blueprints with others.
I think transmogrifier is the best tool for this job, but this will definitely be a programming task no matter how you do it. It's used for many such migration jobs such as migrating from drupal.
There's an add-on, wsapi4plone.core that pumazi at WebLion started that provides web services for portals which you can then hook into. You can create, modify, delete content via XML-RPC calls. The only caveat is that it doesn't yet work with Collections (criteria specifically).
project: http://pypi.python.org/pypi/wsapi4plone.core
docs: http://packages.python.org/wsapi4plone.core/
You can also do it programmatically by hooking into the ZODB via Python (zopepy or some other method).
These should get you started:
http://plone.org/documentation/kb/manipulating-plone-objects-programmatically/reading-and-writing-field-values - you should be able to get an understanding of accessors and mutators (setters and getters), in your case you are going to be more than likely working with obj.Text (getter) and obj.setText (setter).
https://weblion.psu.edu/trac/weblion/wiki/AutomatingObjectCreation - lots of examples (slightly outdated but still relevant)
http://plone.org/documentation/faq/upload-images-files
Try to enable Webdav or ftp in Plone, then you can access Plone via webdav or ftp clients, pushing the html files. Plone (Zope) will recognises the html files as Pages.

Categories