Running LibreOffice converter on Docker

Running LibreOffice converter on Docker - python

The problem is related to using LibreOffice headless converter to automatically convert uploaded files. Getting this error:
LibreOffice 7 fatal error - Application cannot be started
Ubuntu ver: 21.04
What I have tried:
Getting the file from Azure Blob storage,
put it into BASE_DIR/Input_file,
convert it to PDF using Linux command that I am running by subproccess,
put it into BASE_DIR/Output_file folder.
Below is my code:
I am installing the LibreOffice to docker this way
RUN apt-get update \
&& ACCEPT_EULA=Y apt-get -y install LibreOffice
The main logic:
blob_client = container_client.get_blob_client(f"Folder_with_reports/")
with open(os.path.join(BASE_DIR, f"input_files/{filename}"), "wb") as source_file:
source_file.write(data)
source_file = os.path.join(BASE_DIR, f"input_files/{filename}") # original docs here
output_folder = os.path.join(BASE_DIR, "output_files") # pdf files will be here
# assign the command of converting files through LibreOffice
command = rf"lowriter --headless --convert-to pdf {source_file} --outdir {output_folder}"
# running the command
subprocess.run(command, shell=True)
# reading the file and uploading it back to Azure Storage
with open(os.path.join(BASE_DIR, f"output_files/MyFile.pdf"), "rb") as outp_file:
outp_data = outp_file.read()
blob_name_ = f"test"
container_client.upload_blob(name = blob_name_ ,data = outp_data, blob_type="BlockBlob")
Should I install lowriter instead of LibreOffice? Is it okay to use BASE_DIR for this kind of operations? I would appreciate any suggestion.

Patial solution:
Here I have simplified the case and created additional docker image with this Dockerfile.
I apply both methods: unoconv and straight conversion.
Dockerfile:
FROM ubuntu:21.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get -y upgrade && \
apt-get -y install python3.10 && \
apt update && apt install python3-pip -y
# Method1 - installing LibreOffice and java
RUN apt-get --no-install-recommends install libreoffice -y
RUN apt-get install -y libreoffice-java-common
# Method2 - additionally installing unoconv
RUN apt-get install unoconv
ARG CACHEBUST=1
ADD BASE.py /code/BASE.py
# copying input doc/docx files to the docker's linux
COPY /input_files /code/input_files
CMD ["/code/BASE.py"]
ENTRYPOINT ["python3"]
BASE.py
import os
import subprocess
BASE_DIR = "/code"
# subprocess.run("ls code/input_files", shell=True)
for filename in os.listdir('code/input_files'):
source_file = f"/code/input_files/{filename}" # original document
output_filename = os.path.splitext(filename)[0]+".pdf"
output_file = f"code/output_files/{output_filename}"
output_folder = "code/output_files" # pdf files will be here
# METHOD 1 - LibreOffice straightly
assign the command of converting files through LibreOffice
convert_to_pdf = rf"libreoffice --headless --convert-to pdf {source_file} --outdir {output_folder}"
subprocess.run(r'ls code/output_files/', shell=True)
## METHOD 2 - Using unoconv - also working
# convert_to_pdf = f"unoconv -f pdf {source_file}"
# subprocess.run(convert_to_pdf, shell=True)
# print(f'file {filename} converted')
The above mentioned methods allows to work with the problem if files was already in Linux filesystem while building. But still didn't find a way to write files into system after building the docker image.

Related

While converting Png to Svg libssl not found. How to fix that?

Hello I am trying to convert Png images to Svg. At my windows computer I could convert png's with this code:
import aspose.words as aw
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
shape = builder.insert_image("negative.png")
shape.image_data.save("Output.svg")
But now I'm in popOs and it gives this error:
No usable version of libssl was found
Aborted (core dumped)
I tried updating openssl and installing libssl-dev.
Any ideas on how to fix that?

Actually, the code you are using does not convert PNG to SVG. ImageData.Save method saves the images in the original format, so the Output.svg file will be simply a PNG file with SVG extension.
If you need to wrap your PNG to SVG, you can use ShapeRenderer:
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
shape = builder.insert_image("C:\\Temp\\in.png")
shape.get_shape_renderer().save("C:\\Temp\\out.svg", aw.saving.ImageSaveOptions(aw.SaveFormat.SVG))
Also, please see Linux system requirements of Aspose.Words for Python.
You should install libsll in your system. For example, here is Ubuntu docker configuration:
FROM ubuntu:22.04
RUN apt update \
&& apt install -y python3.10 python3-pip libgdiplus wget \
&& wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1l-1ubuntu1_amd64.deb \
&& dpkg -i ./libssl1.1_1.1.1l-1ubuntu1_amd64.deb \
&& rm -i libssl1.1_1.1.1l-1ubuntu1_amd64.deb \
&& python3.10 -m pip install unittest-xml-reporting==3.2.0
ENTRYPOINT ["/usr/bin/python3.10"]

Trouble on training YoloV5 on AWS Sagemaker | AlgorithmError: , exit code: 1

I'm trying to train YoloV5 on AWS Sagemaker with custom data (that is stored in S3) via a Docker Image (ECR) and I keep getting "AlgorithmError: , exit code: 1". Can someone please tell me how to debug this problem?
Here's the Docker Image :
# GET THE AWS IMAGE
FROM 763104351884.dkr.ecr.eu-west-3.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
# UPDATES
RUN apt update
RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt install -y tzdata
RUN apt install -y python3-pip git zip curl htop screen libgl1-mesa-glx libglib2.0-0
RUN alias python=python3
# INSTALL REQUIREMENTS
COPY requirements.txt .
RUN python3 -m pip install --upgrade pip
RUN pip install --no-cache -r requirements.txt albumentations gsutil notebook \
coremltools onnx onnx-simplifier onnxruntime openvino-dev tensorflow-cpu tensorflowjs
COPY code /opt/ml/code
WORKDIR /opt/ml/code
RUN git clone https://github.com/ultralytics/yolov5 /opt/ml/code/yolov5
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code
ENV SAGEMAKER_PROGRAM trainYolo.py
ENTRYPOINT ["python", "trainYolo.py"]
And here's trainYolo.py :
import json
import os
import numpy as np
import cv2 as cv
import subprocess
import yaml
import shutil
trainSet = os.environ["SM_CHANNEL_TRAIN"]
valSet = os.environ["SM_CHANNEL_VAL"]
output_dir = os.environ["SM_CHANNEL_OUTPUT"]
#Creating the data.yaml for yolo
dict_file = [{'names' : ['block']},
{'nc' : ['1']}, {'train': [trainSet]}
, {'val': [valSet]}]
with open(r'data.yaml', 'w') as file:
documents = yaml.dump(dict_file, file)
#Execute this command to train Yolo
res = subprocess.run(["python3", "yolov5/train.py", "--batch", "16" "--epochs", "100", "--data", "data.yaml", "--cfg", "yolov5/models/yolov5s.yaml","--weights", "yolov5s.pt" "--cache"], shell=True)
shutil.copy("yolov5", output_dir)
Note : I'm not sure if subprocess.run() works in an environment such as Sagemaker.
Thank you

So your training script is not configured properly. When using a SageMaker estimator or Script Mode you must configure it in a format that will save the model properly. Here's an example notebook with TensorFlow and script mode. If you would like to build your own Dockerfile (Bring Your Own Container) then you would have to configure your train file as shown in the second link.
Script-Mode: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/TensorFlow/Classification
BYOC: https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/BYOC/Sklearn/Sklearn-Regressor/container/randomForest

File not found even after adding the file inside docker

I have written a docker file which adds my python script inside the container:
ADD test_pclean.py /test_pclean.py
My directory structure is:
.
├── Dockerfile
├── README.md
├── pipeline.json
└── test_pclean.py
My json file which acts as a configuration file for creating a pipeline in Pachyderm is as follows:
{
"pipeline": {
"name": "mopng-beneficiary-v2"
},
"transform": {
"cmd": ["python3", "/test_pclean.py"],
"image": "avisrivastava254084/mopng-beneficiary-v2-image-7"
},
"input": {
"atom": {
"repo": "mopng_beneficiary_v2",
"glob": "/*"
}
}
}
Even though I have copied the official documentation's example, I am facing an error:
python3: can't open file '/test_pclean.py': [Errno 2] No such file or directory
My dockerfile is:
FROM debian:stretch
# Install opencv and matplotlib.
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install -y unzip wget build-essential \
cmake git pkg-config libswscale-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt
RUN apt update
RUN apt-get -y install python3-pip
RUN pip3 install matplotlib
RUN pip3 install pandas
ADD test_pclean.py /test_pclean.py
ENTRYPOINT [ "/bin/bash/" ]

Like some of the comments above suggest. It looks like your test_pclean.py file isn't in the docker image. Here's what should fix it.
Make sure your test_pclean.py file is in your docker image by having be included as part of the build process. Put this as the last step in your dockerfile:
COPY test_pclean.py .
Ensure that your pachyderm pipeline spec has the following for the cmd portion:
"cmd": ["python3", "./test_pclean.py"]
And this is more of a suggestion than a requirement.... You'll make life easier for yourself if you use image tags as part of your docker build. If you default to latest tag, any future iterations/builds of this step in your pipeline could have negitave affects (new bugs in your code etc.). Therefore the best practice is to use a particular version in your pipeline: mopng-beneficiary-v2-image-7:v1 and mopng-beneficiary-v2-image-7:v2 and so on. That way you can iterate on say version 3 and it won't affect the already running pipeline.
docker build -t avisrivastava254084/mopng-beneficiary-v2-image-7:v1
Then just update your pipeline spec to use avisrivastava254084/mopng-beneficiary-v2-image-7:v1

I was not changing the commits to my docker images on each build and hence, Kubernetes was using the local docker file that it had(w/o tags and commits, it doesn't acknowledge any change). Once I started using commit with each build, Kubernetes started downloading the intended docker image.

WSL, Docker + Python logger -- IOError: [Errno 2] No such file or directory

Running Docker for Windows (Version 18.06.1-ce-win73 (19507))
Calling a behave application (python testing framework) using docker compose file:
version: "3"
services:
behave:
build:
context: .
environment:
NODE_ENV: test
DB_DATABASE: testdb
volumes:
- ".:/app"
command:
- bash
- run_test.sh
- docker
- --capture
- --stop
- ${FEATURE:-feature}/
- ${TAGS}
network_mode: host
The Dockerfile is pretty vanilla:
FROM ubuntu:16.04
RUN apt-get update \
&& apt-get -y upgrade \
&& apt-get -y install build-essential software-properties-common curl bzip2 libfreetype6 libfontconfig wget libcurl4-openssl-dev
RUN cd /usr/local/share \
&& wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-linux-x86_64.tar.bz2 \
&& tar xjf phantomjs-1.9.7-linux-x86_64.tar.bz2 \
&& ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/local/share/phantomjs \
&& ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs \
&& ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/bin/phantomjs
RUN apt-get -y install python python-dev python-setuptools python-pycurl python-tz python-pymongo python-cffi python-openssl python-pip
ADD . /app
RUN cd /app \
&& apt-get -y install python-httplib2 \
&& pip install -r requirements.txt \
&& python -m easy_install --upgrade pyOpenSSL
WORKDIR /app
ENV HOME=/app
In the behave app, I use the standard python logging as such:
fileHandler = logging.FileHandler(BASEDIR + "logs/" + name + ".log", mode='w')
fileHandler.setLevel(flevel)
fileHandler.setFormatter(logFormatter)
rootLogger.addHandler(fileHandler)
This runs fine in linux directly. And, it also runs fine in WSL directly (i.e. not via docker).
The failure is:
File "/app/testlib/log_helpers/__init__.py", line 55, in get_dual_logger
fileHandler = logging.FileHandler(BASEDIR + "logs/" + name + ".log", mode='a')
File "/usr/lib/python2.7/logging/__init__.py", line 913, in __init__
StreamHandler.__init__(self, self._open())
File "/usr/lib/python2.7/logging/__init__.py", line 944, in _open
stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/app/logs/behave.log'
To try and see if there was anything strange, I added a simple print statement in /usr/lib/python2.7/logging/__init__.py before it opens the file. As my test starts, it prints:
Opening /app/logs/behave.log with: a
Opening /app/logs/behave.log with: a
before failing (so the first open works, the second fails).
If I update my logger to always write to a unique file with code like this:
while os.path.exists(BASEDIR + "logs/" + name + ".log"):
name += "_"
fileHandler = logging.FileHandler(BASEDIR + "logs/" + name + ".log", mode='a')
...
then it works fine.
Opening /app/logs/behave.log with: a
Opening /app/logs/behave_.log with: a
Opening /app/logs/behave__.log with: a
Opening /app/logs/behave___.log with: a
Opening /app/logs/behave____.log with: a
Opening /app/logs/behave_____.log with: a
Opening /app/logs/behave______.log with: a
Opening /app/logs/behave_______.log with: a
Opening /app/logs/behave________.log with: a
Opening /app/logs/behave_________.log with: a
Opening /app/logs/behave__________.log with: a
Opening /app/logs/behave___________.log with: a
Opening /app/logs/behave____________.log with: a
So, the core of the problem is the standard library python command (in logger.py):
stream = open(self.baseFilename, self.mode)
Whether I use mode of "w" or "a", the second time a log file is opened by python inside a docker container running on WSL, this fails.
Has anybody ever seen anything like this in WSL? Any workaround?
Seems very, very specific to my use-case, not sure if this is known or not

Alpine shell can't find folder

This is my docker file:
FROM python:3.6.5-alpine3.7
RUN mkdir folder_1
RUN mkdir folder_2
RUN apk --update add build-base libffi-dev openssl-dev python-dev py-
pip p7zip libc6-compat libstdc++
RUN pip install fabric3 boto3 csvsort
EXPOSE <port>
ADD directory/ /
CMD ["python", "scriptname.py"]
The application runs a series of steps, one of which is to extract 7z files from folder_1 to folder_2. It is able to find the folder_1 and the source folder, but unable to find folder_2. I logged into the container to make sure the folder exists, and it does.
I found another question with a similar problem: https://serverfault.com/questions/883625/alpine-shell-cant-find-file-in-docker and also installed libc6-compat and libstdc++ according to the answer.
This is the line of code that's failing:
os.system('7za x ' + source_path + file_name + ' -' +
file_decryption_password +
' -o' + destination_path)
Here, destination_path is 'folder_2/' and the exact error that I get is
sh: -ofolder_2/: not found
The command and the docker work fine on my mac laptop and the docker fails on the Linux based server.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Running LibreOffice converter on Docker - python

Related

While converting Png to Svg libssl not found. How to fix that?

Trouble on training YoloV5 on AWS Sagemaker | AlgorithmError: , exit code: 1

File not found even after adding the file inside docker

WSL, Docker + Python logger -- IOError: [Errno 2] No such file or directory

Alpine shell can't find folder

Categories

Resources