I have data currently stored in an s3 bucket that I don't want to be public. I'm attempting to pandas.read_csv("s3_file_path") to load a pandas DataFrame in a script that runs in a docker container. I get a permission denied error. How do I pull the dataframe while giving aws the permissions it wants?
The end goal of this project is the create a RestApi that will process and use a statistical model on some data and return the results. I am also open to a completely different approach that avoids this problem altogether.
As I am the only user of this aws account, just to get it working, I tried putting my aws keys directly in the Dockerfile and running 'aws configure' to essentially copy the exact process I would use if I was doing this without docker. Obviously that is insecure, but I was simply trying to get it to work before I started implementing anything more complex. Unfortunately, it didn't.
Current Dockerfile
FROM python:3
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install -r requirements.txt
ARG AWS_KEY=My_Actual_Public_Key_In_Plain_Text
ARG AWS_SECRET_KEY=My_Actual_Secret_Key_In_Plain_Text
ARG AWS_REGION='us-east-1'
RUN aws configure set aws_access_key_id $AWS_KEY \
&& aws configure set aws_secret_access_key $AWS_SECRET_KEY \
&& aws configure set default.region $AWS_REGION
COPY . .
CMD [ "python", "./run.py" ]
run.py
from module import app
app.run(host="0.0.0.0", port = 80, debug = True)
from init.py in module
from flask import Flask
import pandas as pd
import numpy as np
file_name = "s3://foo/bar.csv"
df = pd.read_csv(file_name)
#app.route("/")
def index():
return("Hello World!")
The error I get is:
PermissionError: Access Denied
Assuming that you have s3fs installed as per the doc. Adding a print for debug:
from flask import Flask
import pandas as pd
import numpy as np
file_name = "s3://foo/bar.csv"
df = pd.read_csv(file_name)
print(df)
app = Flask(__name__)
#app.route("/")
def index():
return("Hello World!")
Ref. the Dockerfile reference: "The ARG instruction defines a variable that users can pass at build-time" - in this case you need the credentials to be available during the runtime, not during the build, you can pass them in the containers runtime environment for example:
FROM python:3
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install -r requirements.txt
ENV AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
ENV AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
ENV AWS_REGION='us-east-1'
COPY . .
ENTRYPOINT ["flask"]
CMD ["run"]
Build the image: docker build --rm -t so:57700120 .
Run the container: docker run --rm -it -p 5000:5000 -e AWS_ACCESS_KEY_ID=... -e AWS_SECRET_ACCESS_KEY=... so:57700120
Note: boto does not recognize AWS_KEY / AWS_SECRET_KEY ref. the doc for additional information concerning the environment variables which are recognized.
Related
I'm working on an app for my homelab, where I have an Intel NUC I'm using for some web scraping tasks. The NUC is accessible to my home network via 192.xxx.x.xx.
On the NUC I've set up nginx to proxy incoming http request to a docker container. In that container I've got a basic fastapi app to handle the request.
app.main.py
import os
from pathlib import Path
from fastapi import FastAPI
app = FastAPI()
cron_path = Path(os.getcwd(), "app", "cron.log")
#app.get("/cron")
def cron():
with cron_path.open("rt") as cron:
return {"cron_state": cron.read().split("\n")}
app.cronjob.py
import os
from pathlib import Path
from datetime import datetime
cron_path = Path(os.getcwd(), "app", "cron.log")
def append_time():
with cron_path.open("rt") as filein:
text = filein.read()
text += f"\n{datetime.utcnow().strftime('%Y-%m%dT%H:%M:%SZ')}"
with cron_path.open("wt") as fileout:
fileout.write(text)
if __name__ == "__main__":
append_time()
cron-job
* * * * * python3 /code/app/cronjob.py
# An empty line is required at the end of this file for a valid cron file.
Dockerfile
FROM python:3.10-slim-buster
#
WORKDIR /code
COPY ./cron-job /etc/cron.d/cron-job
COPY ./app /code/app
COPY ./requirements.txt /code/requirements.txt
# Give execution rights on the cron job
RUN chmod 0644 /etc/cron.d/cron-job
#Install Cron
RUN apt-get update
RUN apt-get -y install cron
# Apply cron job
RUN crontab /etc/cron.d/cron-job
#
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
#
EXPOSE 8080
CMD crontab ; uvicorn app.main:app --host 0.0.0.0 --port 8080
I can access the the app without issues, but I can't seem to get the cron job to run while fastapi is running. Is what I'm attemping to do something better suited for pure python solution like from fastapi_utils.tasks import repeat_every or is there something I'm missing.
When running my flask app, which uses Python's subprocess to use scrapy within a flask app as specified here (How to integrate Flask & Scrapy?), from a Docker Container and calling the appropriate endpoints specified in my flask app, I receive the error message: ERR_EMPTY_RESPONSE. Executing the flask app outside of my docker container (python app.py, where app.py has my flask code), everything works as intended and my spiders are called using subprocess within the flask app.
Instead of using flask & subprocess to call my spiders within a web app, I tried using twisted & twisted-klein python libraries, with the same result when called from a docker Container. I have also created a new, clean scrapy project, meaning no specific code of my own, just the standard scrapy code and project structure upon creation. This resulted in the same error. I am not quite certain whether my approach is anti-pattern, since flask and scrapy are in bundled into run image, resulting in one container for two purposes.
Here is my server.py code. When executing outside a container (using python interpreter) everything works as intended.
When running it from a container, then I receive the error message (ERR_EMPTY_RESPONSE).
# server.py
import subprocess
from flask import Flask
from medien_crawler.spiders.firstclassspider import FirstClassSpider
app = Flask(__name__)
#app.route("/")
def return_hello():
return "Hello!"
#app.route("/firstclass")
def return_firstclass_comments():
spider_name = "firstclass"
response = subprocess.call(['scrapy', 'crawl', spider_name, '-a', 'start_url=https://someurl.com'])
return "OK!"
if __name__ == "__main__":
app.run(debug=True)
FROM python:3
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 5000
CMD [ "python", "./server.py" ]
Finally I run docker run -p 5000:5000 . It does not work. Any ideas?
Please try it.
.Dockerfile
FROM python:3.6
RUN apt-get update && apt-get install -y wget
WORKDIR /usr/src/app
ADD . /usr/src/app
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 5000
CMD [ "python", "./server.py" ]
I have the following simple Cloud Run service from the Python quickstart:
app.py:
import os
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return 'Hello World!\n'
if __name__ == "__main__":
app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))
Dockerfile:
FROM python:3.7
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . .
RUN pip install Flask
CMD python app.py
How can I run & test this locally?
Similar to any other Dockerfile, you can use this two step command to build your image, and then run it locally:
$ docker build -t your_service .
$ docker run --rm -p 8080:8080 -e PORT=8080 your_service
It's important to specify the PORT environment variable here, and ensure that your app uses it appropriately.
Afterwards, your service will be running on http://localhost:8080
I'm trying to create a Docker container to be able to create a GUI with Flask for the utilisation of a tensorflow model.
The thing is that I would like to be able to modify my python files in real time and not have to rebuild my container everytime.
So for now I've created 3 files :
requirement.txt
Flask
tensorflow
keras
Dockerfile
# Use an official Python runtime as a parent image
FROM python:3.5.6-slim
# Set the working directory to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
ADD . /app
# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python3", "app.py"]
app.py
from flask import Flask
import os
import socket
app = Flask(__name__)
#app.route("/")
def test():
html = "<h3>Hello {name}!</h3>" \
"<b>Hostname:</b> {hostname}<br/>"
return html.format(name=os.getenv("NAME", "world"), hostname=socket.gethostname())
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)
So after all this I build my container with this command
docker build -t modelgui .
End then I use this command to run my container and make a link between the app file I want to modify on the host and the one in the container
docker run -p 4000:80 -v /home/Documents/modelGUI:/app modelgui
But I get this error and I really don't know why
/usr/local/bin/python3: can't find '__main__' module in 'app.py'
My problem might be dumb to resolve but I'm really stuck here.
Check that /home/Documents/modelGUI in your bind volume mount is the path to where your code files reside and that app.py in that path is not created as a directory rather than a python file with the code you intend to run.
If app.py in /home/Documents/modelGUI is a dir, then the cause of this problem is that are not calling your script app.py at all, you are just giving the Python interpreter a nonexistent script name, which in case a similarly named directory (case-insensitive actually) exists it tries to execute it.
I've tried to replicate:
$ ls -lFs
Dockerfile
app.py/
requirements.txt
Then called the Python interpreter with app.py:
$ python3 app.py
/usr/local/bin/python3: can't find '__main__' module in 'app.py'
Running this locally, it looks like mounting your volume is overwriting your directory:
No volume
docker run -it test_image bash
root#c3870b9845c3:/app# ls
Dockerfile app.py requirements.txt
root#c3870b9845c3:/app# python app.py
* Serving Flask app "app" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
With volume
docker run -it -v ~/Barings_VSTS/modelGUI:/app test_image bash
root#f6349f899079:/app# ls
somefile.txt
root#f6349f899079:/app#
That could be part of the issue. If you want to mount a filesystem in, I would mount it into a different directory. The default volume behavior is such that whatever you copied into app will be overwritten by the contents of modelGUI
I'm building a simple app using: Dockerfile, app.py and requirements.txt. When the Dockerfile builds I get the error: "No such file or directory". However, when I change the ADD to COPY in the Dockerfile it works. Do you know why this is?
I'm using the tutorial: https://docs.docker.com/get-started/part2/#define-a-container-with-a-dockerfile
App.py
from flask import Flask
from redis import Redis, RedisError
import os
import socket
# Connect to Redis
redis = Redis(host="redis", db=0, socket_connect_timeout=2, socket_timeout=2)
app = Flask(__name__)
#app.route("/")
def hello():
try:
visits = redis.incr("counter")
except RedisError:
visits = "<i>cannot connect to Redis, counter disabled</i>"
html = "<h3>Hello {name}!</h3>" \
"<b>Hostname:</b> {hostname}<br/>" \
"<b>Visits:</b> {visits}"
return html.format(name=os.getenv("NAME", "world"), hostname=socket.gethostname(), visits=visits)
if __name__ == "__main__":
app.run(host='0.0.0.0', port=80)
requirements.txt
Flask
Redis
Dockerfile
# Use an official Python runtime as a parent image
FROM python:2.7-slim
# Set the working directory to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
ADD . /app
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python", "app.py"]
In the first run, your working directory is /app inside container, and you copy contents to /tmp. To correct this behavior, you should be copying contents to /app and it will work fine.
Second one, where you are using add is correct since you are adding contents to /app., and not /tmp