I'm trying to create curved text images with python as fake image to train my network.
sample desired image :
I tried using TextRecognitionDataGenerator but it's output wrap characters but I want characters to be on curve not distorted.
I saw also this example matplotlib but can't work on it. I can create with photoshop but because of large amount of images it is impossible.
regards
Related
I am trying to make a yolo v4 tiny custom data set using google collab. I am using labelImg.py for image annotations which is shown in https://github.com/tzutalin/labelImg.
I have annotated one image as shown as below,
The .txt file with the annotated coordinates looks as following,
0 0.580859 0.502083 0.303906 0.404167
I only have one class which is calculator class. I want to use this one image to produce 4 more annotated images. I want to rotate the annotated image 45 degrees every time and create a new annotated image and a.txt coordinate file. I have seen something like this done in roboflow but I cant figure out how to do it manually with a python script. Is it possible to do it? If so how?
You can look into the repo and article below for python based data augmentation including rotation, shearing, resizing, translation, flipping etc.
https://github.com/Paperspace/DataAugmentationForObjectDetection
https://blog.paperspace.com/data-augmentation-for-bounding-boxes/
If you are using AlexeyAB's darknet repo for yolov4, then there are some augmentations you can use to increase training data size and variation.
https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-%5Bnet%5D-section
Look into Data augmentation section where you can use various defined augmentations for object detection by adding them to yolo cfg file.
I am trying to analyse an image and extract each number to then process using a CNN trained with MNIST. The images show garments with a grid-like pattern in each intersection of the grid there is a number (e.g. 0412). I want to analyse and detect which number it is to then store it's coordinates. Does anyone have any recommendations on how to preprocess the image given that it is quite noisy and with multiple numbers. I have tried using contours and it didn't work. I also put the image into binary and there are areas of the image which are unreadable. My initial idea was to isolate each number to then process.
Thanks in advance!
I am trying to write a script (in bash using imagemagick or in python), to generate an image similar as in this example:
The source is 25 separate jpeg's. So far I have written a script (imagemagick) which takes each of the images and detects the contours of the person and replaces the white background with a transparent one.
The next step is to fit the contours randomly into one large image. Each image should fit into the larger image, without overlapping it's neighbors. It seems I need to some type of collision detection.
I am looking for pointers on how to tackle this problem.
I am using wordcloud in python to generate word clouds.
I was able to reproduce this example on my machine, and then tried to change the last line plt.show() to plt.savefig('image.pdf') to have a pdf output.
I had a pdf with the same result, however, the pdf seems like pixel-based instead of vector-based. When I focus a particular point in the pdf it becomes a very low-quality picture.
Is there any way to produce vector-based pdf using wordcloud? If not, is there any other library that I can produce vector-based (pdf) wordclouds in Python?
If wordcloud can generate any sort of vector output such as ps or svg, inkscape can usually convert it to a PDF without rasterizing it. You can even do this headless, e.g. inkscape my.svg -A my.pdf.
Hmm, looking at wordcloud, it looks like it uses PIL. I don't think that PIL can produce vector images. But if you could use the logic in wordcloud and separate it from PIL, you can get vector fonts onto PDFs by drawing onto a reportlab canvas.
You can save the images in a vector format so that they will be scalable without quality loss. Such formats are PDF and EPS. Just change the extension to .pdf or .eps and matplotlib will write the correct image format.
plt.savefig('destination_path.eps', format='eps')
plt.savefig('destination_path.pdf', format='pdf')
I have found that eps/pdf files work best.
I want to extract the text information contained in a postscript image file (the captions to my axis labels).
These images were generated with pgplot. I have tried ps2ascii and ps2txt on Ubuntu but they didn't produce any useful results. Does anyone know of another method?
Thanks
It's likely that pgplot drew the fonts in the text directly with lines rather than using text. Especially since pgplot is designed to output to a huge range of devices including plotters where you would have to do this.
Edit:
If you have enough plots to be worth
the effort than it's a very simple
image processing task. Convert each
page to something like tiff, in mono
chrome Threshold the image to binary,
the text will be max pixel value.
Use a template matching technique.
If you have a limited set of
possible labels then just match the
entire label, you can even start
with a template of the correct size
and rotation. Then just flag each
plot as containing label[1-n], no
need to read the actual text.
If you
don't know the label then you can
still do OCR fairly easily, just
extract the region around the axis,
rotate it for the vertical - and use
Google's free OCR lib
If you have pgplot you can even
build the training set for OCR or
the template images directly rather
than having to harvest them from the
image list