matplotlib: saved imshow pdf looks different from the plot window - python

The following figure was plotted using imshow in matplotlib with option interpolation='none':
However, after I saved it as a pdf file, the saved pdf file looks quite different:
The problem is: the blue patterns become very blurry.
My question is: How can I save a pdf figure that looks exactly like the plot window?

I solved this problem by specifying the dpi in the savefig for filetype pdf. Even though i read online that dpi is not supposed to make a difference in the vector based pdf format in theory, it did solve the problem for me in practice.
plt.imshow(np.random.random((10,10)))
plt.savefig("test.pdf", dpi=300)

PDF format is a vector image format. This means it is upto the program you open it in to interpret how it should be drawn. This can have some benefits when you want to be able to arbitrarily zoom in and out of an image while keeping high quality. However some programs can modify the image through anti-aliasing.
Your best bet for consistency is to use a pixel based image format. I would suggest try saving it as a .png.

Related

How to save grayscale image in Python?

I am trying to save a grayscale image using matplotlib savefig(). I find that the png file which is saved after the use of matplotlib savefig() is a bit different from the output image which is showed when the code runs. The output image which is generated when the code is running contains more details than the saved figure.
How can I save the output plot in such a manner that all details are stored in the output image?
My my code is given below:
import cv2
import matplotlib.pyplot as plt
plt.figure(1)
img_DR = cv2.imread(‘image.tif',0)
edges_DR = cv2.Canny(img_DR,20,40)
plt.imshow(edges_DR,cmap = 'gray')
plt.savefig('DR.png')
plt.show()
The input file (‘image.tif’) can be found from here.
Following is the output image which is generated when the code is running:
Below is the saved image:
Although the two aforementioned images denote the same picture, one can notice that they are slightly different. A keen look at the circular periphery of the two images shows that they are different.
Save the actual image to file, not the figure. The DPI between the figure and the actual created image from your processing will be different. Since you're using OpenCV, use cv2.imwrite. In your case:
cv2.imwrite('DR.png', edges_DR)
Use the PNG format as JPEG is lossy and would thus give you a reduction in quality to promote small file sizes. If accuracy is the key here, use a lossless compression standard and PNG is one example.
If you are somehow opposed to using OpenCV, Matplotlib has an equivalent image writing method called imsave which has the same syntax as cv2.imwrite:
plt.imsave('DR.png', edges_DR, cmap='gray')
Note that I am enforcing the colour map to be grayscale for imsave as it is not automatically inferred like how OpenCV writes images to file.
Since you are using cv2 to load the image, why not using it also to save it.
I think the command you are looking for is :
cv2.imwrite('gray.jpg', gray_image)
Using a DPI that matches the image size seems to make a difference.
The image is of size width=2240 and height=1488 (img_DR.shape). Using fig.get_size_inches() I see that the image size in inches is array([7.24, 5.34]). So an appropriate dpi is about 310 since 2240/7.24=309.4 and 1488/5.34=278.65.
Now I do plt.savefig('DR.png', dpi=310) and get
One experiment to do would be to choose a high enough DPI, calculate height and width of figure in inches, for example width_inch = width_pixel/DPI and set figure size using plt.figure(figsize=(width_inch, height_inch)), and see if the displayed image itself would increase/decrease in quality.
Hope this helps.

Is it possible to generate vector based pdf using wordcloud

I am using wordcloud in python to generate word clouds.
I was able to reproduce this example on my machine, and then tried to change the last line plt.show() to plt.savefig('image.pdf') to have a pdf output.
I had a pdf with the same result, however, the pdf seems like pixel-based instead of vector-based. When I focus a particular point in the pdf it becomes a very low-quality picture.
Is there any way to produce vector-based pdf using wordcloud? If not, is there any other library that I can produce vector-based (pdf) wordclouds in Python?
If wordcloud can generate any sort of vector output such as ps or svg, inkscape can usually convert it to a PDF without rasterizing it. You can even do this headless, e.g. inkscape my.svg -A my.pdf.
Hmm, looking at wordcloud, it looks like it uses PIL. I don't think that PIL can produce vector images. But if you could use the logic in wordcloud and separate it from PIL, you can get vector fonts onto PDFs by drawing onto a reportlab canvas.
You can save the images in a vector format so that they will be scalable without quality loss. Such formats are PDF and EPS. Just change the extension to .pdf or .eps and matplotlib will write the correct image format.
plt.savefig('destination_path.eps', format='eps')
plt.savefig('destination_path.pdf', format='pdf')
I have found that eps/pdf files work best.

How do you improve matplotlib image quality?

I am using a python program to produce some data, plotting the data using matplotlib.pyplot and then displaying the figure in a latex file.
I am currently saving the figure as a .png file but the image quality isn't great. I've tried changing the DPI in matplotlib.pyplot.figure(dpi=200) etc but this seems to make little difference. I've also tried using differnet image formats but they all look a little faded and not very sharp.
Has anyone else had this problem?
Any help would be much appreciated
You can save the images in a vector format so that they will be scalable without quality loss. Such formats are PDF and EPS. Just change the extension to .pdf or .eps and matplotlib will write the correct image format. Remember LaTeX likes EPS and PDFLaTeX likes PDF images. Although most modern LaTeX executables are PDFLaTeX in disguise and convert EPS files on the fly (same effect as if you included the epstopdf package in your preamble, which may not perform as well as you'd like).
Alternatively, increase the DPI, a lot. These are the numbers you should keep in mind:
300dpi: plain paper prints
600dpi: professional paper prints. Most commercial office printers reach this in their output.
1200dpi: professional poster/brochure grade quality.
I use these to adapt the quality of PNG figures in conjunction with figure's figsize option, which allows for correctly scaled text and graphics as you improve the quality through dpi.

How to save figures to pdf as raster images in matplotlib

I have some complex graphs made using matplotlib. Saving them to a pdf using the savefig command uses a vector format, and the pdf takes ages to open. Is there any way to save the figure to pdf as a raster image to get around this problem?
You can force individual figure elements to be rasterized like this:
text(1,1,'foobar',rasterized=True)
Not that I know, but you can use the 'convert' program (ImageMagick') to convert a jpg to a pdf: `convert file.jpg file.pdf'.

Is there a way to extract text information from a postscript file? (.ps .eps)

I want to extract the text information contained in a postscript image file (the captions to my axis labels).
These images were generated with pgplot. I have tried ps2ascii and ps2txt on Ubuntu but they didn't produce any useful results. Does anyone know of another method?
Thanks
It's likely that pgplot drew the fonts in the text directly with lines rather than using text. Especially since pgplot is designed to output to a huge range of devices including plotters where you would have to do this.
Edit:
If you have enough plots to be worth
the effort than it's a very simple
image processing task. Convert each
page to something like tiff, in mono
chrome Threshold the image to binary,
the text will be max pixel value.
Use a template matching technique.
If you have a limited set of
possible labels then just match the
entire label, you can even start
with a template of the correct size
and rotation. Then just flag each
plot as containing label[1-n], no
need to read the actual text.
If you
don't know the label then you can
still do OCR fairly easily, just
extract the region around the axis,
rotate it for the vertical - and use
Google's free OCR lib
If you have pgplot you can even
build the training set for OCR or
the template images directly rather
than having to harvest them from the
image list

Categories