Build a scrapy project usint setuptools - python

I have a Scrapy project that need to be build, below the project structure:
.
├── cli.py
├── docs
│   ├── README.md
│   └── README.md.html
├── __init__.py
├── my_scrapy_project
│   ├── exporters.py
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares
│   │   ├── __init__.py
│   │   ├── randomproxy.py
│   │   └── random_user_agent.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   ├── signals.py
│   └── spiders
│   ├── website_extractor_1.py
│   ├── __init__.py
│   └── website_extractor_2.py
├── MANIFEST.in
├── requirements.txt
├── scrapy.cfg
├── setup.py
└── VERSION
Scrapy spiders are executed by cli.py, when I try to build this package it's doesn't inlcude the cli.py, if I put this file cli.py in scripts key in setup.py it doesn't found Scrapy settings when I use get_project_settings
Any help please?

Related

NotSupportedError at / deterministic=True requires SQLite 3.8.3 or higher

I am trying to deploy a Django application using the default SQLite database to Elastic Beanstalk. The application works fine locally, however, on server, I get the error as :
Any idea what's wrong? Can we not deploy SQLite app on AWS EBS?
Here is my project structure that is getting deployed
├── .ebextensions
│   ├── django.config
├── app
│   ├── __init__.py
│   ├── admin.py
│   ├── apps.py
│   ├── migrations
│   │   ├── 0001_initial.py
│   │   ├── __init__.py
│   ├── models.py
│   ├── static
│   │   └── app
│   │   ├── app.js
│   │   ├── logo.svg
│   │   └── style.css
│   ├── templates
│   │   └── app
│   │   ├── hello.html
│   │   ├── index.html
│   │   └── job_detail.html
│   ├── tests.py
│   ├── urls.py
│   └── views.py
├── db.sqlite3
├── env
│   ├── All Virtual env related files
├── images
│   ├── Photo_on_5-9-14_at_2.31_PM.jpg
│   ├── Photo_on_5-9-14_at_2.32_PM.jpg
│   └── Photo_on_5-9-14_at_2.32_PM_v4McLzE.jpg
├── myproject-test
│   ├── __init__.py
│   ├── asgi.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── manage.py
├── requirements.txt
├── staticfiles
│   |-- All static files collected using collectstatic
│   ├── app
│   │   ├── app.js
│   │   ├── logo.svg
│   │   └── style.css
│   └── subscribe
│   ├── email-icon.png
│   ├── main.css
│   └── main.js
├── subscribe
│   ├── __init__.py
│   ├── admin.py
│   ├── apps.py
│   ├── form.py
│   ├── migrations
│   │   ├── 0001_initial.py
│   │   ├── 0002_subscribe_option_alter_subscribe_email_and_more.py
│   │   ├── 0003_alter_subscribe_option.py
│   ├── models.py
│   ├── static
│   │   └── subscribe
│   │   ├── email-icon.png
│   │   ├── main.css
│   │   └── main.js
│   ├── templates
│   │   └── subscribe
│   │   ├── subscribe.html
│   │   └── thank_you.html
│   ├── tests.py
│   ├── urls.py
│   └── views.py
├── templates
│   └── base.html
├── upload
│   └── images
│   ├── Photo_on_5-9-14_at_2.31_PM.jpg
│   └── Photo_on_5-9-14_at_2.31_PM_3.jpg
└── uploads
├── __init__.py
├── admin.py
├── apps.py
├── forms.py
├── images
│   └── Photo_on_5-9-14_at_2.31_PM_3.jpg
├── migrations
│   ├── 0001_initial.py
│   ├── 0002_uploadfile.py
│   ├── 0003_alter_uploadfile_file.py
│   ├── __init__.py
├── models.py
├── templates
│   └── uploads
│   ├── add_file.html
│   └── add_image.html
├── tests.py
├── urls.py
└── views.py
Deployment process:
Zip all the necessary files
Create AWS EBS application with Python
Add environment variables
Create app and access URL
Please help.

Django error with test module while running tests

When i run python manage.py tests i get an error saying that some test module is not found.
I am using PyCharm, Django 2.1.4 and W10 on Ubuntu.
The error:
======================================================================
ERROR: projectname.projectname (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: projectname.projectname
Traceback (most recent call last):
File "/usr/lib/python3.6/unittest/loader.py", line 462, in _find_test_path
package = self._get_module_from_name(name)
File "/usr/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
__import__(name)
ModuleNotFoundError: No module named 'projectname.projectname'
What I've tried
python manage.py runserver and it runs just fine.
Add projectname to INSTALLED_APPS
Create and app called tests
My project structure
Django
│   ├── requirements.txt
│   └── projectname
│   ├── __init__.py
│   ├── manage.py
│   └── projectname
│   ├── apps
│   │   ├── accounts
│   │   │   ├── admin.py
│   │   │   ├── apps.py
│   │   │   ├── __init__.py
│   │   │   ├── migrations
│   │   │   │   ├── __init__.py
│   │   │   ├── models
│   │   │   │   ├── __init__.py
│   │   │   │   ├── profiles.py
│   │   │   │   └── users.py
│   │   │   ├── serializers
│   │   │   │   └── __init__.py
│   │   │   ├── tests.py
│   │   │   ├── urls.py
│   │   │   └── views
│   │   │   └── __init__.py
│   │   ├── __init__.py
│   ├── db.sqlite3
│   ├── __init__.py
│   ├── settings
│   │   ├── base.py
│   │   ├── development.py
│   │   ├── production.py
│   ├── static
│   ├── templates
│   ├── urls.py
│   └── wsgi.py
I just want to run my tests like in any other django project...
I have never encountered this problem before so any help is appreciated! :)
Well, well...it turns out that changing the folder was the solutio, though i have projects working that share the same folder name so i don't really know what happend with this one.
Before:
Django
│ └── projectname
│ └── projectname
After:
Django
│ └── othername
│ └── projectname

Using setuptools to copy non .py files

My python project installs via setup.py. The project structure looks like:
├── Makefile
├── README.rst
├── circle.yml
├── docs
│   ├── Makefile
│   ├── conf.py
│   ├── deps.txt
│   ├── guide_installation.rst
│   ├── guide_model.rst
│   ├── guide_transliteration.rst
│   ├── index.rst
│   ├── make.bat
│   └── module_trans.rst
├── indictrans
│   ├── __init__.py
│   ├── _decode
│   ├── _utils
│   ├── base.py
│   ├── iso_code_transformer.py
│   ├── libindic_
│   ├── mappings
│   ├── models
│   ├── polyglot_tokenizer
│   ├── script_transliterate.py
│   ├── test.py
│   ├── tests
│   ├── transliterator.py
│   ├── trunk
│   └── unicode_marks.py
├── requirements.txt
├── setup.cfg
├── setup.py
├── test-requirements.txt
└── tox.ini
where the subfolder indictrans/models looks like
├── ben-eng
│   ├── classes.npy
│   ├── coef.npy
│   ├── intercept_final.npy
│   ├── intercept_init.npy
│   ├── intercept_trans.npy
│   └── sparse.vec
├── ben-guj
│   ├── classes.npy
│   ├── coef.npy
│   ├── intercept_final.npy
│   ├── intercept_init.npy
│   ├── intercept_trans.npy
│   └── sparse.vec
so I have .npy and .vec files to be included in the project.
In my setup.py I'm trying to explicitly include this folder models via the include_package_data directive like:
setup(
setup_requires=['pbr'],
pbr=True,
packages=find_packages(),
include_package_data=True,
package_data={'models': ['*.npy','*.vec']},
ext_modules=cythonize(extensions)
)
and in the setup.cfg I have
[files]
packages =
indictrans
but running python setup.py install does not copy the models folder to the installation folder /usr/local/lib/python2.7/dist-packages/indictrans/.
If I print the it is the output of the find_packages I get
['indictrans', 'indictrans.tests', 'indictrans.libindic_', 'indictrans._utils', 'indictrans._decode', 'indictrans.polyglot_tokenizer', 'indictrans.models', 'indictrans.trunk', 'indictrans.libindic_.utils', 'indictrans.libindic_.soundex', 'indictrans.libindic_.utils.tests', 'indictrans.libindic_.soundex.utils', 'indictrans.libindic_.soundex.tests', 'indictrans.libindic_.soundex.utils.tests', 'indictrans.polyglot_tokenizer.tests', 'indictrans.trunk.tests']
so I will assume that indictrans/models would be included, but it is not.
Add include_package_data=True to your setup-function (you already did that).
Create a file MANIFEST.in in the same directory as setup.py
MANIFEST.in can look as follows:
include indictrans/models/ben-eng/*
include indictrans/models/ben-guj/*
You don't need setup.cfg for doing this.
Source: This great writeup of python packaging
EDIT about recursive-include:
According to the documentation this should also work:
recursive-include indictrans/models *.npy *.vec
include_package_data=True requires MANIFEST.in.
To include data for the module indictrans.models you have to provide the full name:
package_data={'indictrans.models': ['*.npy','*.vec']},

How to run pip installed elasticsearch?

I'm trying to use Django + Haystack + Elasticsearch.
So I installed Elasticsearch 2.4 with pip becouse Django gave me errors about that it can't import Elasticsearch. Now it can and I can run ./manage.py rebuild_index in my Django project and it gives this output:
Indexing 23 journal entrys
GET /haystack/_mapping [status:404 request:0.006s]
but only if I somehove run elasticsearch. So I installed elsasticsearch2 from AUR packages as well and run that. But as I suspected when I try to get all documents by running: curl -X GET "localhost:9200/_cat/indices?v" which returns:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open haystack 5 1 0 0 795b 795b
If I understand correctly it is empty.
I went where pip installs packages(/usr/lib/python3.6/site-packages) and found two folders related to Elasticsearch:
elasticsearch
├── client
│   ├── cat.py
│   ├── cluster.py
│   ├── indices.py
│   ├── ingest.py
│   ├── __init__.py
│   ├── nodes.py
│   ├── __pycache__
│   │   ├── cat.cpython-36.pyc
│   │   ├── cluster.cpython-36.pyc
│   │   ├── indices.cpython-36.pyc
│   │   ├── ingest.cpython-36.pyc
│   │   ├── __init__.cpython-36.pyc
│   │   ├── nodes.cpython-36.pyc
│   │   ├── snapshot.cpython-36.pyc
│   │   ├── tasks.cpython-36.pyc
│   │   └── utils.cpython-36.pyc
│   ├── snapshot.py
│   ├── tasks.py
│   └── utils.py
├── compat.py
├── connection
│   ├── base.py
│   ├── http_requests.py
│   ├── http_urllib3.py
│   ├── __init__.py
│   ├── pooling.py
│   └── __pycache__
│   ├── base.cpython-36.pyc
│   ├── http_requests.cpython-36.pyc
│   ├── http_urllib3.cpython-36.pyc
│   ├── __init__.cpython-36.pyc
│   └── pooling.cpython-36.pyc
├── connection_pool.py
├── exceptions.py
├── helpers
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-36.pyc
│   │   └── test.cpython-36.pyc
│   └── test.py
├── __init__.py
├── __pycache__
│   ├── compat.cpython-36.pyc
│   ├── connection_pool.cpython-36.pyc
│   ├── exceptions.cpython-36.pyc
│   ├── __init__.cpython-36.pyc
│   ├── serializer.cpython-36.pyc
│   └── transport.cpython-36.pyc
├── serializer.py
└── transport.py
elasticsearch-2.4.1.dist-info
├── DESCRIPTION.rst
├── INSTALLER
├── METADATA
├── metadata.json
├── pbr.json
├── RECORD
├── top_level.txt
└── WHEEL
I don't see a start_elasticsearch.sh or bin/elasticsearch so how can I start it?

Python - how do i load my custom class methods from a root directory?

How do import my class from a specific directory (in my case root directory i want to keep it).
So, i have following directory map, now i need to load the class parsePresets from myglobal.py file, which is located in root directory: /var/tmp/mypython directory.
but, i want to import that from class/methods from my new module: /var/tmp/mypython/media/test.py with:
from myglobal import parsePresets
but i am getting:
from myglobal import parsePresets
ImportError: No module named myglobal
i also have init.py in root directory and in the media directory.
$ cd /var/tmp/mypython; tree
.
├── arduino
│   ├── arduino.diest.c
│   ├── arduino.gent.c
│   ├── arduino.lalouvier.c
│   ├── arduino.makenoise.c
│   ├── arduino.servo.c
│   ├── arduino.string.c
│   ├── arduino.tcpserver.c
│   ├── arduino.tcpserver.c~
│   ├── arduino.test.sh
├── bash
│   ├── all.sh
│   ├── alsa-info.sh
│   ├── asound.conf
│   ├── autoreboot.sh
│   ├── diskfix.sh
│   ├── kernelfix.sh
│   ├── update.sh
│   └── usbformat.sh
├── chrome.py
├── download.py
├── download.sh
├── gui.py
├── image
│   ├── a.png
│   ├── b.gif
│   ├── cross_new.png
│   ├── e150
│   │   ├── 1.png
│   │   ├── de.png
│   │   ├── en.png
│   │   ├── __init__.py
│   │   └── nl.png
│   ├── __init__.py
│   ├── logo.png
│   ├── menu.jpg
│   └── slider_btn.png
├── __init__.py
├── INSTALL
├── ip.py
├── loading.py
├── logout.py
├── media
│   ├── __init__.py
│   ├── test.py
├── menu.py
├── myglass.py
├── myglass.pyc
├── myglobal.py
├── myglobal.pyc
├── rightclick.py
├── runme.sh
├── server.py
├── server.pyc
├── src.nja
├── test
│   ├── Button.py
│   ├── json.py
│   ├── json.pyc
│   ├── keyboard.py
│   ├── loop.sh
│   ├── mytimer.py
│   ├── qtclick.py
│   ├── qtmouse.py
│   ├── qt.py
│   ├── qtwindows7.py
│   ├── shape.py
│   ├── skeleton.py
│   ├── slider.py
│   ├── testpreview.py
│   ├── test.py
│   ├── Text.py
│   ├── transparent.py
│   ├── transparentwindow.py
│   └── Vscale.py
├── test.py
├── unavailable.py
├── upload.sh
└── internet
├── backup
├── protocol.txt
└── server.py
You can add sys.path.append(/var/tmp/mypython/media/) to your script.
EDIT:
$ cat >> /var/tmp/mypython/stackoverflow.py <<\EOF
import sys
sys.path.append("/var/tmp/mypython/")
from myglobal import parsePresets
EOF
$ python /var/tmp/mypython/stackoverflow.py
or with NINJA-IDE
Running: /var/tmp/mypython/media/stackoverflow.py (Wed Dec 11 13:37:25 2013)
Execution Successful!
Use relative imports
from ..myglobal import parsePresets
The extra periods . take you "out" a level in your hierarchy.

Categories