Converting RGB PDFs to CMYK PDFs via GhostScript CLI / Python (quality problem)

Converting RGB PDFs to CMYK PDFs via GhostScript CLI / Python (quality problem) - python

I am trying to achieve the following using a python script:
Read in an SVG design file (with images)
Manipulate the SVG file
Convert this to a web-ready PDF and a print-ready PDF
My problem is with the conversion of the RGB PDF to the CMYK PDF. An SVG with a 15MB photo in it will export as a 15MB RGB PDF, but then convert (using GhostScript) to a 3MB CMYK PDF. When trying ImageMagic, the resolution of the output PDF is determined by the density and I can't find how to keep the PDF's canvas size while setting the density.
So far, I have a script which reads in the SVG files and does some manipulation (add a logo using svgutils, change some text by scanning through the SVG text file). It then uses Inkscape to export the web-ready PDF (using "--export-area-page" and converting the text to paths) and a temporary PDF (using "--export-margin=X" where X is the bleed size, also converting text to paths). The temporary PDF is what I need, except it is RGB rather than CMYK. So, I then want to convert this file (Inkscape does not work with CMYK).
This is the function I am using to convert the file (it is setup with GhostScript and also I was trialling ImageMagick):
converter_program = "GHOSTSCRIPT"
def convertPDFtoPrintReadyPDF(pdf_in, new_filename=None, output_location=None):
global converter_program
if (new_filename == None):
new_filename = os.path.basename(pdf_in).replace(".svg", ".pdf")
if (output_location == None):
output_location = os.path.dirname(pdf_in)
output_file = output_location + "\\" + new_filename
argument_list = []
if (converter_program == "GHOSTSCRIPT"):
pdf_tool_loc = r'"C:\Program Files\gs\gs9.55.0\bin\gswin64c.exe"' # Added "c" at end for non-window version (command line)
argument_list.append('-o "' + output_file + '"')
argument_list.append(r"-sDEVICE=pdfwrite")
argument_list.append(r"-dUseBleedBox")
argument_list.append(r"-dQUIET")
argument_list.append(r"-dPDFSETTINGS=/printer")
argument_list.append(r"-dCompressPages=false")
argument_list.append(r"-dMaxInlineImageSize=200000")
argument_list.append(r"-dDetectDuplicateImages")
#argument_list.append(r"-dJPEGQ=100")
argument_list.append(r"-dAutoFilterColorImages=false")
argument_list.append(r"-dAutoFilterGrayImages=false")
#argument_list.append(r"-sCompression=Flate")
#breaks the code: argument_list.append(r"-sColorImageFilter=/Flate")
#argument_list.append(r"-r600")
argument_list.append(r"-dColorImageResolution=600")
argument_list.append(r"-dGrayImageResolution=300")
argument_list.append(r"-dMonoImageResolution=1200")
argument_list.append(r"-dDownsampleColorImages=false")
argument_list.append(r"-sProcessColorModel=DeviceCMYK")
argument_list.append(r"-sColorConversionStrategy=CMYK")
argument_list.append(r"-sColorConversionStrategyForImages=CMYK")
argument_list.append('"' + pdf_in + '"')
elif (converter_program == "IMAGEMAGICK"):
pdf_tool_loc = 'magick'
argument_list.append(r'convert "' + pdf_in + '"')
argument_list.append(r"-density 300")
argument_list.append(r"-resize 100%")
argument_list.append(r"-colorspace CMYK")
argument_list.append('"' + output_file + '"')
#convert tp_rgb.pdf -verbose -density 300 -colorspace CMYK tp_cmyk.pdf
argument_string = " ".join(argument_list)
subprocess.run(pdf_tool_loc + " " + argument_string, shell=True, check=True)
return output_file
Versions:
Python 3.8.10
GhostScript 9.55.0
ImageMagick 7.1.0-16

I found some GhostScript parameters to add to the conversion process:
argument_list.append(r"-dAutoFilterColorImages=false")
argument_list.append(r"-dAutoFilterGrayImages=false")
argument_list.append(r"-dColorImageFilter=/FlateEncode")
argument_list.append(r"-dGrayImageFilter=/FlateEncode")
argument_list.append(r"-dDownsampleMonoImages=false")
argument_list.append(r"-dDownsampleGrayImages=false")
So, the full argument list looks like this:
argument_list.append('-o "' + output_file + '"')
argument_list.append(r"-sDEVICE=pdfwrite")
argument_list.append(r"-dUseBleedBox")
argument_list.append(r"-dQUIET")
argument_list.append(r"-dDetectDuplicateImages")
argument_list.append(r"-dAutoFilterColorImages=false")
argument_list.append(r"-dAutoFilterGrayImages=false")
argument_list.append(r"-dColorImageFilter=/FlateEncode")
argument_list.append(r"-dGrayImageFilter=/FlateEncode")
argument_list.append(r"-dDownsampleMonoImages=false")
argument_list.append(r"-dDownsampleGrayImages=false")
argument_list.append(r"-dColorImageResolution=300")
argument_list.append(r"-dGrayImageResolution=300")
argument_list.append(r"-sProcessColorModel=DeviceCMYK")
argument_list.append(r"-sColorConversionStrategy=CMYK")
argument_list.append(r"-sColorConversionStrategyForImages=CMYK")
argument_list.append('"' + pdf_in + '"')
This turned the 15MB->3MB conversion into a 15MB->53MB.
It still needs some tweaking, but is now on the right track (I will update this answer if I get the process better).
I found the information thanks to this post: http://zeroset.mnim.org/2014/07/14/save-a-pdf-to-cmyk-with-inkscape/
Documentation is here (don't forget to delete the leading letter to search ("dColorImageFilter" to "ColorImageFilter"): https://www.ghostscript.com/doc/current/VectorDevices.htm

Related

Use relative path to get all png files in python

Trying to retrieve all the paths of the pngs in different sub folders.
All sub folders are located within a main folder - logs.
pngs = []
for idx, device in enumerate(udid):
pngs += glob.glob(os.getcwd() + "/logs/" + device + "_" + get_model_of_android_phone(device) + "/" + "*.png")
File structure
logs/123456789_SM-G920I/123456789google_search_android.png
The values in bold will change. I have added in *.png for the changing pngs.
But how do i get the paths of the pngs when i do not have an absolute path to the png file?
Update
get_model_of_android_phone(device) is a method to get the following value here.
E.g. 123456789_SM-G920I
I am thinking to remove it cause it is not really working as intended. Would like to replace the method with something like *

You can use following in simplified way to get all file names:
for name in glob.glob(os.getcwd() + "/logs/**/*.png", recursive=True):
print '\t', name
When recursive is set, ** will matches 0 or more subdirectories when followed by a separator.
If you just want to make list, use the following code snippet :
pngs = glob.glob(os.getcwd() + "/logs/**/*.png", recursive=True)
It will return a list of all png file paths.
Reference : https://docs.python.org/3/library/glob.html

for idx, device in enumerate(udid):
path_device = os.getcwd() + "/logs/" + device + "_" + get_model_of_android_phone(device) + "/"
file_list = os.listdir(path_device)
pngs = [path_device+file_png for file_png in file_list if str(file_png).endswith(".png")]

Process 24bit wavs using pydub

Im trying to write a basic program that picks 2 audio files at random, layers them, and writes it out as a new file. I'm using pydub and it all works, however the results are distorted. I suspect it's because from what I've learnt, pydub cannot handle 24 bit wavs, which happen to be the norm in sample packs.
So needing some small blip of code that converts the wav to 16 bit before it enters pydub. Hopefully not one that requires writing it to disc first.
from pydub import AudioSegment
import os
import random
import shutil
def process(user_folder):
new_library_folder = user_folder + " Generated Combo Samples"
files_list = []
for root, directory, files in os.walk(user_folder):
for file in files:
if file_is_valid_ext(file):
filepath = str(root) + "/" + str(file)
# print filepath
files_list.append(filepath)
# removes previously created folder
shutil.rmtree(new_library_folder)
os.makedirs(new_library_folder)
i = 0
for number in range(gen_count): # global at 100
i = i + 1
file1 = random.choice(files_list)
file2 = random.choice(files_list)
sound1 = AudioSegment.from_file(file1)
sound2 = AudioSegment.from_file(file2)
sound1 = match_target_amplitude(sound1, -20)
sound2 = match_target_amplitude(sound2, -20)
combinedsound = sound1.overlay(sound2)
combinedsoundnormalised = match_target_amplitude(combinedsound, -6)
combinedsound_path = new_library_folder + "/" + "Sample " + str(i) + ".wav"
combinedsoundnormalised.export(combinedsound_path, format='wav')

It has been some months since you posted this question but I will answer it for others who may need a solution to this. As far as I have found, PySoundFile is the only python package that can deal with 24 bit audio (I am sure there are others but I haven't found them). My suggestion would be to initially read in the audio using PySoundFile and then create a pydub.AudioSegment using that data.

python filenames encoded wrong

I've got some python 2.7.3 code on the Raspberry Pi B+ that loops and creates some dummy files. Unfortunately, the filenames get garbled somewhere between python and the filesystem.
settime = strftime("%Y-%m-%d %H:%M:%S")
for n in range(int(setlength)):
time.sleep(GetShutterSpeed(shutter))
image = 'img/'
image += settime
image += shutter + ' '
image += 'ISO' + iso
image += ' #' + str(n+1).rjust(len(setlength), '0')
image += ' of ' + setlength
image += '.jpg'
with open(image, 'w') as f:
f.write('')
ro(strftime("%H:%M:%S") + ' > ' + image)
Here's what I get when I browse to where the images are saved from a windows share over samba.
Why are these files named so strangely?
Is this because I'm trying to save text files named as .jpgs? Or am I messing up the character encoding somewhere? I don't have any problems opening files in the same python code that already exist. I tried removing the #s, but that didn't matter. If I pass in a /, it balks completely, which is as expected. Just can't figure out what I'm doing wrong with the filename itself.

Convert a color image to input vector for libsvm

I've a set of labeled training images (color). I want to use these images as input to libsvm.
Is there is python library/function/code that can help me convert this color image into a format that libsvm will accept as input ?
Thanks.

When I have done that in python I just converted the image into an array (I always had worked on grayscale images though), something like:
outpath="images\\ts\\"
fp = open(outpath + 'trainingset.dump','wb');
dirList = os.listdir(path)
for fname in dirList:
if '.jpg' in fname:
im = array(Image.open(path + "\\" + fname).convert('L'))
pickle.dump(im,fp)
pickle.dump(1,fp)
fp.close()
At the moment of passing that to libsvm you would need to turn it into a list:
path="images\\ts\\trainingset.dump"
fp = open(path, "r")
im = pickle.load(fp)
label = pickle.load(fp)
imlist = im.tolist()
imlist = [item for sublist in imlist for item in sublist]
y.append(label)
x.append(imlist)
print "Setting up the SVM problem"
prob = svm_problem(y, x)
param = svm_parameter('-t 2 -g 0.00001')
param.C = 1
print "Starting the training process"
m=svm_train(prob, param)
print "Storing the model"
svm_save_model(model, m)
Please note that although I am experienced using libsvm, I most of time used it on C++. I am new on python and that might not be the best approach, it was just the way that worked for me. Also if you would like to maintain colours you could try not converting the image to gray-scale, I haven't tried it myself but this should work, instead of:
im = array(Image.open(path + "\\" + fname).convert('L'))
use this:
im = array(Image.open(path + "\\" + fname))
Hope this would help.

Raster to polygon script loop failing!! error 99999!

I am trying to make a script which selects every .png file in a folder beginning with the letters "LG". I then want the scipt create a shapefile, replacing the "LG" with "SH", and then i want the script to buffer that shapefile and rename the buffer with the first 2 letters being "SB"!
I keep getting an error 99999 error message at line 37!
( gp.RasterToPolygon_conversion(INPUT_RASTER, Output_polygon_features, "SIMPLIFY", "VALUE") )
Can anyone see why this isnt working? I am very, very new to this and have been staring at this script pulling out my hair for days!!
Here is the script:
# Load required toolboxes...
gp.AddToolbox("C:/Program Files/ArcGIS/ArcToolbox/Toolboxes/Conversion Tools.tbx")
gp.AddToolbox("C:/Program Files/ArcGIS/ArcToolbox/Toolboxes/Analysis Tools.tbx")
# Script arguments...
folder = "D:\\J04-0083\\IMAGEFILES"
for root, dirs, filenames in os.walk(folder): # returms root, dirs, and files
for filename in filenames:
filename_split = os.path.splitext(filename) # filename and extensionname (extension in [1])
filename_zero = filename_split[0]
try:
first_2_letters = filename_zero[0] + filename_zero[1]
except:
first_2_letters = "XX"
if first_2_letters == "LG":
Output_polygon_features = "D:\\J04-0083\\ShapeFiles.gdb\\" + "SH_" + filename + ".shp"
# Process: Raster to Polygon...
INPUT_RASTER = os.path.join(root + "\\" + filename_zero + ".png")
gp.RasterToPolygon_conversion(INPUT_RASTER, Output_polygon_features, "SIMPLIFY", "VALUE")
Distance__value_or_field_ = "5 Meters"
Raster_Buffer_shp = "SB_" + filename + ".shp"
# Process: Buffer...
gp.Buffer_analysis(Output_polygon_features, Raster_Buffer_shp, Distance__value_or_field_, "FULL", "ROUND", "NONE", "")

Is .png the format that this function wants? PNG is a compressed format so I would think that something like this would be expecting an uncompressed format. In fact, since the name of the function is RasterToPolygon_conversion, wouldn't the function be expecting a raster format? The docs say that the input should be an integer raster dataset. In addition, The input raster can have any cell size and may be any valid integer raster dataset. Anyway, I suspect that is the real problem.
The last thing to check, if the file is in the correct format as per above, is if there is a field VALUE in the file.

try using a GRID or TIFF file instead of a PNG.
You can convert the PNG with:
http://webhelp.esri.com/arcgiSDEsktop/9.3/index.cfm?TopicName=raster_to_other_format_(multiple)_(conversion)
and then process it's output into the Raster to Polygon conversion.
You could also check the file path of the INPUT RASTER to make sure it looks correct by:
INPUT_RASTER = os.path.join(root + "\\" + filename_zero + ".png")
print INPUT_RASTER
gp.RasterToPolygon_conversion(INPUT_RASTER, Output_polygon_features, "SIMPLIFY", "VALUE")
There is also a method of building a filepath by:
import os
root + os.sep + filename_zero + '.png'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting RGB PDFs to CMYK PDFs via GhostScript CLI / Python (quality problem) - python

Related

Use relative path to get all png files in python

Process 24bit wavs using pydub

python filenames encoded wrong

Convert a color image to input vector for libsvm

Raster to polygon script loop failing!! error 99999!

Categories

Resources