I am trying to deploy Django+Scrapy project on Ubuntu 16.04. When I run scrapyd-deploy, as it is described in the docs, - I get:
Packing version 1526639948
Deploying to project "first_scrapy" in http://my_ip/addversion.json
Deploy failed (404): <full HTML code of '404.html' page>
When I run scrapyd-deploy -l - I see:
default http://my_ip
My scrapy.cfg:
[settings]
default = first_scrapy.settings
[deploy]
url = http://my_ip
username = root
password = rootpassword
project = first_scrapy
What am I doing wrong?
UPDATE 1:
If I change in my scrapy.cfg url=http://my_ip:6800 - this still throws 404 error. Next I tried to run scrapyd in the second console and this was the first time I saw another answer - details are here.
So question now is - how to run scrapyd constantly so if I close the console - it will be still working?
You just have to change directory into your project folder and then run scrapyd command with “nohup” and that will make sure that scrapyd doesn’t get closed after you disconnect with server
cd /path/to/your/project && nohup scrapyd >& /dev/null &
Related
I have a flask app that has one route and nothing complex going on, running in a docker container. I cannot for the life of me get print statements to show up in the logs (docker-compose logs -f <containername>). So far, I have tried various answers that supposedly have fixed this problem for others including:
Calling print("test", flush=True)
Setting PYTHONUNBUFFERED=1 and verifying it is set in the actual container with echo
Setting PYTHONUNBUFFERED=0
Running python with the -u flag
Using the logging module (logger.warning, logger.info, etc)
So far nothing has worked. The flask app is starting perfectly fine, but no output from my print statements is shown. I have sanity checked that i'm editing the correct file by adding random syntax errors and watching the app brick itself. I'm using python 3.8 and docker-compose 2
Try this:
import sys
print('It is working',file=sys.stderr)
I found this question while looking for answers to a similar problem. I was running a flask app in a conda environment in a container and wasn't getting any log output even though the flask app itself was working fine. I added the following lines to my Dockerfile and it starting logging as expected -
ENV PYTHONUNBUFFERED=1
RUN echo "source activate my_env" > ~/.bashrc
ENV PATH /opt/conda/envs/my_env/bin:$PATH
CMD ["python", "api.py"]
You can see logs with docker-compose or docker
With docker-compose you have to see SERVICE
Note: you add containername but you have to add service name
NOT $ docker-compose logs -f <containername>
USE $ docker-compose logs -f <SERVICE_NAME>)
With docker you have to add container name or container id
docker logs -f CONTAINER_ID | CONTAINER_NAME
I added health check to my Dockerfile:
HEALTHCHECK --interval=1m --timeout=5s --retries=2 --start-period=10s \
CMD wget -qO- http://localhost:8070/healthcheck || exit 1
In my project main urls.py file I added entry:
url(r'^healthcheck/', lambda r: HttpResponse())
The project is activated and deployed, so I can understand the healthcheck is valid, however - I keep getting:
2017-12-17 13:25:27,891 WARNING base 51 140551932685128 Not Found:
/healthcheck
written to the logs (once a minute).
The log entry is added also when I run the wget from inside the server.
Is it an issue with the healthcheck syntax, the django entry set up or the wget in docker?
Please assist. Thanks.
the healthcheck url:
http://localhost:8070/healthcheck
should be:
http://localhost:8070/healthcheck/
because the trailing slashes setting in django.
Could you please help me figure out what I'm doing wrong ? Here are the steps:
followed the portia install manual found here https://github.com/scrapinghub/portia - all ok
created a new project, entered an url, tagged an item - all ok
clicked "continue browsing", browsed through site, items were being extracted as expected - all ok
Next I wanted to deploy my spider:
1st try : I tried to run, as the docs specified, scrapyd-deploy your_scrapyd_target -p project_name - got error - scrapyd wasn't installed
fix: pip install scrapyd
2nd try : I launched scrapyd server, accessed http://localhost:6800/ -all ok
After a brief reading of scrapyd docs I found out I had to edit the file scrapy.cfg from my project : slyd/data/projects/new_project/scrapy.cfg
added the following :
[deploy:local]
url = http://localhost:6800/
Went back to the console, checked all is ok :
$:> scrapyd-deploy -l
local http://localhost:6800/
$:> scrapyd-deploy -L local
default
Seemed ok so i gave it another try :
$scrapyd-deploy local -p default
Packing version 1418722113
Deploying to project "default" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mike/www/portia/slyd/data/projects/new_project'"}
What am I missing ?
For anyone who stumbles upon this issue, the fix is to deploy scrapyd in another directory other than that of the project.
See details here : https://github.com/scrapinghub/portia/issues/128
I am trying to deploy scrapy project.
But getting error :
My scrapy.cfg file is :
[settings]
default = eScraper.settings
[deploy]
url = http://localhost:8680/
project = eScraper
And i used this command to deploy : scrapy deploy default -p eScraper
But getting error
Building egg of eScraper-1369325126
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
Deploying eScraper-1369325126 to http://localhost:8680/addversion.json
Deploy failed: <urlopen error [Errno 111] Connection refused>
I tried changing port also but it didn't worked also i tried using above command with sudo but nothing .....can some one help me.......
If you're using lastest version of scrapy (mine 0.24.2) then
scrapy server
no longer exist, it was moved to separate package called scrapyd
simply run
scrapyd
to start the service
Please first run This command scrapy server and then run deploy command on the another terminal...
I had multiple spiders in my project folder and want to run all the spiders at once, so i decided to run them using scrapyd service.
I have started doing this by seeing here
First of all i am in current project folder
I had opened the scrapy.cfg file and uncommented the url line after
[deploy]
I had run scrapy server command, that works fine and scrapyd server runs
I tried this command scrapy deploy -l
Result : default http://localhost:6800/
when i tried this command scrapy deploy -L scrapyd i got following output
Result:
Usage
=====
scrapy deploy [options] [ [target] | -l | -L <target> ]
deploy: error: Unknown target: scrapyd
when i tried to deploy the project with this command scrapy deploy scrapyd -p default got following error
Usage
=====
scrapy deploy [options] [ [target] | -l | -L <target> ]
deploy: error: Unknown target: scrapyd
I am really unable to identify whey scrapyd is showing the above errors, can lead me in to a correct way of how to deploy a project in to scrapyd
Thanks in advance..........
Edited Code:
After seeing the answer of Peter Kirby,i named target in scrapy.cfg and tried the following command in my project folder,
command:
scrapy deploy ebsite -p ebsite
then i got the below error
Building egg of ebsite-1341808241
'build/lib' does not exist -- can't clean it
'build/bdist.linux-x86_64' does not exist -- can't clean it
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
Deploying ebsite-1341808241 to http://localhost:6800/addversion.json
Deploy failed: <urlopen error [Errno 111] Connection refused>
How to solve this.....
From scrapyd service documentation: (http://scrapy.readthedocs.org/en/latest/topics/scrapyd.html?highlight=scrapyd)
You can define targets by adding them to your project’s scrapy.cfg
file... Here’s an example of defining a new target scrapyd2 with
restricted access through HTTP basic authentication:
[deploy:scrapyd2]
url = http://scrapyd.mydomain.com/api/scrapyd/
username = john
password = secret
Essentially what your error means is that your "target" name is not correct. If I remember correctly, the scrapy.cfg file sets the initial target name as "default". What you should be typing is something like:
scrapy deploy default -p project_name
Just type scrapy deploy if you have no named targets and left settings at default!
I got this error when I try to deploy my project without the scrapyd running, so simple run
scrapyd
on another terminal fixed the error
This is the scrapyd proc have no permission!
You need kill the proc,then use root user,Just type:
scrapy server
then the new scrapyd will run.then you can do as scrapyd documention says.