How can i run a portia spider by its port? - python

I am trying to run a spider with portia in its docker version but i don't want to execute the spider using a terminal command like docker exec ... portiacrawl .... Is there any way I can run the spider, that is already created, by making a request at its localhost port and save it in an specific folder?
Something like:
https://localhost:9001/execute/spider_name/folder_path
Example of my own usage:
First what I do is run the container and leave it running, because i cant stop it for other reasons:
docker run -i -t -d --rm -v <PROJECTS_FOLDER>:/app/data/projects:rw -p 9001:9001 scrapinghub/portia
Next I execute the portiacrawl:
docker exec <CONTAINER_ID> portiacrawl <PROJECT_NAME_PATH> <SPIDER_NAME> -o /some/path/in/my/pc/<SPIDER_NAME>.json
Now, what i want is to replace the docker exec step with som http request to the localhost server that is running.
Thanks very much for your time

Yes, you can by doing a port mapping. While starting a docker container you wont have any ports published publicly or exposed internally unless you told docker to do so.
For example:
if you wish to expose a port internally (inside the docker network itself, you need to add EXPOSE in the dockerfile)
if you wish to publish a port publicly that can be access either through localhost or the public ip you can use -p option along with passing the ports so in your case it will be like this:
docker run -p 9001:9001 imagename
The command above will tell docker that you would like to do port mapping from 9001 (using localhost or any other interface) to 9001 (inside the container and you can change the ports according to your actual setup).
If you wish to expose it to localhost only you can change the command to something like this:
docker run -p 127.0.0.1:9001:9001 imagename
For more information check the following docs
According to the updated question, the other and safest way to accomplish this will be implementing an API inside portiacrawl that can be called through HTTP to do the needed tasks instead of using docker exec

Related

Container didn't respond to HTTP pings on port: 80, failing site start

I have deployed python:3.8-slim-buster image to the App Service. Generally it is being run correctly as I can see the processing in the logs, however the health-check mechanism tries to ping the hosted server but it does not respond as it is only code that runs in a loop and process the messages from the queue.
It would be fine, but the application is being killed with the error:
Container didn't respond to HTTP pings on port: 80, failing site start.
Stopping site because it failed during startup.
Is there either a way to remove this Waiting for response to warmup request for container or specify in the dockerfile to respond with OK to those requests?
Currently my dockerfile is a 2 liner, that only copies the scripts and then runs python script.
The code that is inside this script is copied from https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-python-get-started-send#create-a-python-script-to-receive-events
The Dockerfile:
FROM python:3.8-slim-buster
COPY ./Scripts .
CMD [ "python3","-u","./calculate.py"]
The fix for that is to either host the script in a server e.g. nodejs or equivalent for given language, or create separate process that will return something for the 80 port.
There also might be a problem with the port configured on default, here is an answer how to fix that case:
https://serverfault.com/questions/1003418/azure-docker-app-error-site-did-not-start-within-expected-time-limit-and-co
step-1
add EXPOSE 8080 inside Dockerfile
step-2
build image from Dockerfile:
docker build . -t python-calculator
step-3
docker images #search the image with the tag you mentioned earlier i.e python-calculator
step-4
docker run -p 8080:8080 -d
step-5
localhost:8080

Azure App Service: specifying docker run port on container side

I want to use scrapinghub/splash container on Azure App Service (Web App for Containers) on Linux.
But docker run command on deploy randomly changes the binding port of container side (see the log below, port 8961 is automatically assigned. this number varies every deploy)
2020-01-21 08:56:47.494 INFO - docker run -d -p 8961:8050 --name b2scraper-splash_3_d89ce1f2 -e WEBSITES_ENABLE_APP_SERVICE_STORAGE=false -e WEBSITES_PORT=8050 -e WEBSITE_SITE_NAME=b2scraper-splash -e WEBSITE_AUTH_ENABLED=False -e PORT=8050 -e WEBSITE_ROLE_INSTANCE_ID=0 -e WEBSITE_HOSTNAME=b2scraper-splash.azurewebsites.net -e WEBSITE_INSTANCE_ID=5446f93a2cbcb25300f091395c54ce738773ce47489c2818322ffabbc23e3413 scrapinghub/splash:latest python3 /app/bin/splash --proxy-profiles-path /etc/splash/proxy-profiles --js-profiles-path /etc/splash/js-profiles --filters-path /etc/splash/filters --lua-package-path "/etc/splash/lua_modules/?.lua" --disable-private-mode --port 8050
Changing host port binding is possible using WEBSITES_PORT, but seems no way to change container side.
Is there way to fix container-side port binding like -p 8050:80 or -p 8050:443 of docker run command?
e.g. Using the container on Azure Container Instances is possible, without changing service port 8050.
--publish in the docker run command creates a firewall rule which maps a container port to a port on the Docker host.
https://docs.docker.com/config/containers/container-networking/
For the command: docker run -d -p 8961:8050 imagename, TCP port 8050 in the container is mapped to 8961 on the Docker Host. On App Services, this docker run command cannot be changed. The container port (8050 in this case) can be set to a specific value using WEBSITES_PORT application setting.
That doesn´t work. you get 443 as port with HTTPS.
Neither EXPOSE XXXX, nor WEBSITES_PORT or PORT as configuration parameters...
You do see "docker run -d -p 8961:8050" in the logs, but it doesn´t matter to Azure when it comes to exposing the app...

get a python docker container to interact with a redis docker container

I'm new to docker, redis and any kind of networking, (I know python at least!). Firstly I have figured out how to get a redis docker image and run it in a docker container:
docker run --name some-redis -d redis
As I understand this redis instance has port 6379 available to connect to other containers.
docker network inspect bridge
"Containers": {
"2ecceba2756abf20d5396078fd9b2ecf0d60ab04ca6b8df5e1b631b6fb5e9a85": {
"Name": "some-redis",
"EndpointID": "09f0069dae3632a2456cb4d82ad5e7c9782a2b58cb7a4ee655f57b5c410c3e87",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""
}
If I run the following command I can interact with the redis instance and generate key:value pairs:
docker run -it --link some-redis:redis --rm redis redis-cli -h redis -p 6379
set 'a' 'abc'
>OK
get 'a'
>"abc"
quit
I have figured out how to make and run a docker container with the redis library installed that will run a python script as follows:
Here is my Dockerfile:
FROM python:3
ADD redis_test_script.py /
RUN pip install redis
CMD [ "python", "./redis_test_script.py" ]
Here is redis_test_script.py:
import redis
print("hello redis-py")
Build the docker image:
docker build -t python-redis-py .
If I run the following command the script runs in its container:
docker run -it --rm --name pyRed python-redis-py
and returns the expected:
>hello redis-py
It seems like both containers are working ok, the problem is connecting them both together, I would like to ultimately use python to perform operation on the redis container. If I modify the script as follows and rebuild the image for the python container it fails:
import redis
print("hello redis-py")
r = redis.Redis(host="localhost", port=6379, db=0)
r.set('z', 'xyz')
r.get('z')
I get several errors:
...
OSError: [Errno 99] Cannot assign requested address
...
redis.exceptions.ConnectionError: Error 99 connecting to localhost:6379. Cannot assign requested address.
.....
It looks like they're not connecting, I tried again using the bridge IP in the python script:
r = redis.Redis(host="172.17.0.0/16", port=6379, db=0)
and get this error:
redis.exceptions.ConnectionError: Error -2 connecting to 172.17.0.0/16:6379. Name or service not known.
and I tried the redis sub IP:
r = redis.Redis(host="172.17.0.2/16", port=6379, db=0)
and I get this error:
redis.exceptions.ConnectionError: Error -2 connecting to 172.17.0.2/16:6379. Name or service not known.
It feels like I'm fundamentally misunderstanding something about how to get the containers to talk to each other. I've read quite a lot of documentation and tutorials but as I say have no networking experience and have not previously used docker so any helpful explanations and/or solutions would be really great.
Many thanks
That's all about Docker networking. Fast solution - use host network mode for both containers. Drawback is low isolation, but you will get it working fast:
docker run -d --network=host redis ...
docker run --network=host python-redis-py ...
Then to connect from python to redis just use localhost as a hostname.
Better solution is to use docker user-defined bridge network
# create network
docker network create foo
docker run -d --network=foo --name my-db redis ...
docker run --network=foo python-redis-py ...
Note that in this case you cannot use localhost but instead use my-db as a hostname. That's why I've used --name my-db parameter when starting first container. In user-defined bridge networks containers reach each other by theirs names.
Do:
Explicitly create a Docker network for your application, and run your containers connected to that network. (If you use Docker Compose, this happens for you automatically and you don’t need to do anything.)
docker network create foo
docker run -d --net foo --name some-redis redis
docker run -it --rm --net foo --name pyRed python-redis-py
Use containers’ --name as DNS hostnames: you connect to some-redis:6379 to reach the container. (In Docker Compose the name of the service block works too.)
Make the locations of external services configurable, most likely using an environment variable. In your Python code you can connect to
redis.Redis(host=os.environ.get("REDIS_HOST", "localhost"),
port=int(os.environ.get("REDIS_PORT", "6379"))
docker run --rm -it \
--name py-red \
--net foo \
-e REDIS_HOST=some-redis \
python-redis-py
Don’t:
docker inspect anything to find the container-private IP addresses. Between containers you can always use hostnames as described above. The container-private IP addresses are unreachable from other hosts, and may even be unreachable from the same hosts on some platforms.
Use localhost in Docker for anything, expect the specific case of connecting from a browser or other process running directly on the host (not in a container) to a port you’ve published with docker run -p on the same host. (It generally means “this container”.)
Hard-code host names in your code like this; it makes it hard to run the service in a different environment. (For databases in particular it’s not uncommon to run them outside of Docker or even in a hosted cloud service.)
Use --link, it’s outdated and unnecessary.

Running Jupyter notebook in Docker

I want to run jupyter in docker container. I am not able to launch the jupyter notebook. When I copy paste the URL given in the terminal.. server cannot be reached. Will appreciate any ideas to try
You are forwarding port 8080 in the docker run call with -p 8080:8080. But you also need to forward port 8888 by adding -p 8888:8888. More specifically, you want to run:
docker -it -p 8080:8080 -p 8888:8888 jupyter/minimal-notebook
First thing is Jupyter nootbook runs on port 8888. If you want to access the notebook on a diff port on your host you should map it like this -p 80:8888.
If you don't mind using the defaults, you should use run the following command.
Run this command: docker run -p 8888:8888 jupyter/minimal-notebook. Then
replace the host name in the url given in terminal with localhost like this
http://localhost:8888/\?token\=<TOKEN>\&token\=<TOKEN>
This should work.
Note: If you map it to a diff port, you should replace it in the url you get in the terminal. Ex. http://localhost:80/\?token\=<TOKEN>\&token\=<TOKEN>

Getting MYSQL_TCP_ADDR From within Docker container using Python?

So I am having a dilemma. I made a Flask app that uses a mysql DB for storing username and passwords when people log into the app. My question is, is there a dynamic way to get the TCP Port address within my Python code it self? What I am currently doing is just hardcoding the path like so:
app.config['MYSQL_DATABASE_USER'] = 'root'
app.config['MYSQL_DATABASE_PASSWORD'] = ''
app.config['MYSQL_DATABASE_DB'] = 'UserList'
app.config['MYSQL_DATABASE_HOST'] = '172.17.0.3'
But what I would like is to make the host dynamic so if I ever build another mysql container, I won't have to manually change the IP every time. I know you can do this command,
env | grep MYSQL
MYSQL_PORT_5123_TCP_ADDR=172.17.0.3
MYSQL_ENV_MYSQL_ROOT_PASSWORD=test
MYSQL_PORT_5123_TCP=tcp://172.17.0.3:5123
MYSQL_PORT_5123_TCP_PROTO=tcp
MYSQL_ENV_GOSU_VERSION=1.7
MYSQL_PORT_3306_TCP_PORT=3306
MYSQL_PORT_3306_TCP=tcp://172.17.0.3:3306
MYSQL_PORT_5123_TCP_PORT=5123
MYSQL_ENV_MYSQL_VERSION=5.7.18-1debian8
MYSQL_NAME=/site-metrics/mysql
MYSQL_PORT_3306_TCP_PROTO=tcp
MYSQL_PORT_3306_TCP_ADDR=172.17.0.3
MYSQL_ENV_MYSQL_MAJOR=5.7
MYSQL_PORT=tcp://172.17.0.3:3306
But is there a way to do this within my Python script so I do not have to fiddle with this every time? Thanks for the help!
You can achieve this by running both containers on the same network. I'll assume a local network for now but overlay works in the same way.
$ docker network create my-network
$ docker run --name db --net my-network <other args...>
$ docker run --name web --net my-network <other args...>
Then in your app config:
app.config['MYSQL_DATABASE_HOST'] = 'db'
Because both containers are on the same network, they will be able to resolve each other via DNS (using their container names).
Edit: In fact an even better way of doing it would be with an env variable that you pass in. Then you have the best of both worlds. Something like:
app.config['MYSQL_DATABASE_HOST'] = os.getenv('DB_HOST', 'db')
The second argument to os.getenv() is a default value. So by default it will use 'db' but if for whatever reason in your environment you need to change it or have a different Docker service name, you can just run the container with:
$ docker run --name other_db --net my-network <other args...>
$ docker run --name web --net my-network -e DB_HOST=other_db <other args...>
Then it would try to connect to other_db:<port> instead.

Categories