Selecting multiple files for input and getting respective output

Selecting multiple files for input and getting respective output - python

So I have this bit of code, which clips out a shapefile of a tree out of a Lidar Pointcloud. When doing this for a single shapefile it works well.
What I want to do: I have 180 individual tree shapefiles and want to clip every file out of the same pointcloud and save it as a individual .las file.
So in the end I should have 180 .las files. E.g. Input_shp: Tree11.shp -> Output_las: Tree11.las
I am sure that there is a way to do all of this at once. I just dont know how to select all shapefiles and save the output to 180 individual .las files.
Im really new to Python and any help would be appreciated.
I already tried to get this with placeholders (.format()) but couldnt really get anywhere.
from WBT.whitebox_tools import WhiteboxTools
wbt = WhiteboxTools()
wbt.work_dir = "/home/david/Documents/Masterarbeit/Pycrown/Individual Trees/"
wbt.clip_lidar_to_polygon(i="Pointcloud_to_clip.las", polygons="tree_11.shp", output="Tree11.las")

I don't have the plugin you are using, but you may be looking for this code snippet:
from WBT.whitebox_tools import WhiteboxTools
wbt = WhiteboxTools()
workDir = "/home/david/Documents/Masterarbeit/Pycrown/Individual Trees/"
wbt.work_dir = workDir
# If you want to select all the files in your work dir you can use the following.
# though you may need to make it absolute, depending on where you run this:
filesInFolder = os.listDir(workDir)
numberOfShapeFiles = len([_ for _ in filesInFolder if _.endswith('.shp')])
# assume shape files start at 0 and end at n-1
# loop over all your shape files.
for fileNumber in range(numberOfShapeFiles):
wbt.clip_lidar_to_polygon(
i="Pointcloud_to_clip.las",
polygons=f"tree_{fileNumber}.shp",
output=f"Tree{fileNumber}.las"
)
This makes use of python format string templates.
Along with the os.listdir function.

Related

Errno 20: Not a directory when saving into zip file

When I try to save a pyplot figure as a jpg, I keep getting a directory error saying that the given file name is not a directory. I am working in Colab. I have a numpy array called z_img and have opened a zip file.
import matplotlib.pyplot as plt
from zipfile import ZipFile
zipObj = ZipFile('slices.zip', 'w') # opening zip file
plt.imshow(z_img, cmap='binary')
The plotting works fine. I did a test of saving the image into Colab's regular memory like so:
plt.savefig(str(ii)+'um_slice.jpg')
And this works perfectly, except I am intending to use this code in a for loop. ii is an index to differentiate between each image, and several hundred images would be created so I want them going in the zipfile. Now when I try adding the path to the zipfile:
plt.savefig('/content/slices.zip/'+str(ii)+'um_slice.jpg')
I get: NotADirectoryError: [Errno 20] Not a directory: '/content/slices.zip/150500um_slice.jpg'
I assume it's because the {}.jpg string is a filename, and not a directory per se. But I am quite new to Python, and don't know how to get the plot into the zip file. That's all I want. Would love any advice!

First off, for anything that's not photographic content (ie. nice and soft), JPEG is the wrong format. You'll have a better time using a different file format. PNG is nice for pixels, SVG for vector graphics (in case you embed this in a website later!), PDF for vector, too.
The error message is quite on point: you cannot just save to a zip file as if it was a directory.
Multiple ways around:
use the tempfile module's mkdtemp to make a temporary directory, save into that, and zip the result
save not into a filename, but into a buffer (BytesIO I guess) and append that to the compressed stream (I'm not too familiar with ZipFile)
use PDF as output and simply generate a multipage PDF; it's not hard, and probably much nicer in the long term. You can still convert that vector graphic result to PNG (or any other pixel format9 as desired, but for the time being, it's space efficient, arbitrarily scaleable and keeps all your pages in one place. It's easy to import selected pages into LaTeX (matter of fact, \includegraphics does it directly) or into websites (pdf.js).

From the docs, matplotlib.pyplot.savefig accepts a binary file-like object. ZipFile.open creates binary file like objects. These two have to get todgether!
with zipobj.open(str(ii)+'um_slice.jpg', 'w') as fp:
plt.savefig(fp)

trying to make a video using moviepy

Having a issue with my code. I'm getting a list index out of range index error
import os
import moviepy.video.io.ImageSequenceClip
image_folder= r'C:\Users\Porsche\OneDrive - Imperial College London\Documents\plates'
fps=1
image_files = [image_folder+'/'+img for img in os.listdir(image_folder) if img.endswith(".jpeg")]
clip = moviepy.video.io.ImageSequenceClip.ImageSequenceClip(image_files, fps=fps)
clip.write_videofile('my_video.mp4')
I'm new to Python and I can't seem to see where the index is and the documentation I found for moviepy was not clear.
the error is on this line
clip = moviepy.video.io.ImageSequenceClip.ImageSequenceClip(image_files, fps=fps)

I just used ImageSequenceClip 5 minutes ago to successfully make a small video so maybe I can assist.
Below are issues related to what I saw in your code and related to problems I had with ImageSequenceClip also. I didn't experience an index error similar to what you've mentioned, but that may be due to your list comprehension line.
A few general suggestions--maybe this will be enough or helpful in other projects you work on:
Careful with your '/' and '\'; keep these uniform to avoid any unwanted issues popping up. I typically use / in all cases for Windows filesystems and it seems to work fine. Also, when manually combining path+filename variables don't forget to include a final '/' at the end of the path variable.
Print out and check the length of the image_files variable you create to make sure you are actually adding the files you wanted and there are no other obvious issues with your list comprehension line.
If you can't locate the issue causing the index error, you can try adding just the folder with the image files only (instead of a list of individual image file locations). In this case, you might need to make a new folder with the files you want included only.
the fps argument was counterintuitive, for me at least. The lower the value, the longer the duration of the individual images in the video.
Finally, the directory you provide to the ImageSequenceClip function will sort the files in alphanumeric order based on the filenames. Keep this in mind as, for example, a1, a2, a11 will be reordered into a1, a11, a2.

How to find objects in floor plan image in Tkinter python through svg file?

I have a vectorized floorplan image. I want to identify the objects in the image through the vector data in the SVG file of that image. The SVG code does not have any close points(z) in between them. So I am unable to understand when does the point moves to the other object? Can somebody help me, please?
I have very little knowledge about these SVG files and using them in Tkinter. So please somebody help me or suggest me what can I do?
This is the vector data of the image.
vector data of the image

use in conjunction with SO floorplan question.
Jump to z_final_floorplan.svg for final file.
A
Create 4 files:
w_original_floorplan.svg
x_rough_static_floorplan.svg
y_rough_live_floorplan.svg
z_final_floorplan.svg
w_original_floorplan.svg and x_rough_static_floorplan.svg are identical apart from filename.
y_rough_live_floorplan.svg and z_final_floorplan.svg are empty; to be populated.
Copy x_rough_static_floorplan.svg to y_rough_live_floorplan.svg.
Open y_rough_live_floorplan.svg on browser using server.
x_rough_static_floorplan.svg find all M and replace with two newlines / symbol M (case sensitive). shift + enter shift + enter /M
B
[this section takes the time]
Take away 1st '/' in path in y_rough_live_floorplan.svg [shows blackout_floorplan]
Label x_rough_static_floorplan.svg code section blackout_floorplan where code is.
(this file is used as rough-work, so being xml / svg valid is irrelevant)
In y_rough_live_floorplan.svg find next '/' and delete it [shows floorplan_top_left_whiteout]
Label x_rough_static_floorplan.svg code section floorplan_top_left_whiteout where code is.
Have x_rough_static_floorplan.svg and y_rough_live_floorplan.svg open in 2 windows, will be going back and forth to each of them. Keep repeating until at end.
(hint: find tool seems to be on switching from files in vscode, so you can use find / and next one cmd + g easily) Maybe handy to have a paper printout of original svg as reference and label the names of objects you create e.g.bath, sink, table, as you go along (don’t be fooled by this, one table is 'table'. Is 2nd chair chair2, chair_2, chair_two etc.?) etc..
C
Reorder the whole labels and corresponding code in path x_rough_static_floorplan.svg so the labels are ordered next to each other, but in the order they are found in the path:
e.g.
…
floorplan
bath
sink
table_chairs
sofa
…
Use the 'find' tool here. This process, itself will require a temp file to copy and paste to rather than reorder within the file working on. And rewrite temp to file working on. Might be good idea to create checklist of objects and cross-off as done.
E.g. floorplan, bath, table_chairs, sink…
D
Create path elements from your grouped objects, putting each id as id=“floorplan_main”, id=“bath”, id=“sink” etc.. etc..
Bear in mind, the data of how this is drawn is really, really bad. Really they should be drawn with rect elements for a rectangle when possible and a lot of the path data is very unnecessary, but that’s obviously how the application generates the svg.

save multi directory images in a single file after preprocessing

I am working on DICOM images, I have 5 scans(folders) each scan contain multiple images, after working some preprocessing on the images, I want to save the processed images in a single file using "np.save", I have the code below that save each folder in a separate file:
data_path = 'E:/jupyter/test/LIDC-IDRI/'
patients_data = os.listdir(data_path)
for pd in range(len(patients_data)):
full_path = load_scan(data_path + patients_data[pd])
after_pixel_hu = get_pixels_hu(full_path)
after_resample, spacing = resample(after_pixel_hu, full_path, [1,1,1])
np.save(output_path + "images_of_%s_patient.npy" % (patients_data[pd]), after_resample)
load_scan is a function for loading(reading) DICOM files, what I want to do with this code is to save all processed images in a single file, not in five files, can anyone tell me how to do that, please?

The first thing to notice is that you are using %s with patients_data[pd]. I assume patients_data is a list of the names of the patients, which means you are constructing a different output path for each patient - you are asking numpy to save each of your processed images to a new location.
Secondly, .npy is probably not the file type you want to use for your purposes, as it does not handle appending data. You probably want to pick a different file type, and then np.save() to the same file path each time.
Edit: Regarding file type, a pdf may be your best option, where you can make each of your images a separate page.

Extract image position from .docx file using python-docx

I'm trying to get the image index from the .docx file using python-docx library. I'm able to extract the name of the image, image height and width. But not the index where it is in the word file
import docx
doc = docx.Document(filename)
for s in doc.inline_shapes:
print (s.height.cm,s.width.cm,s._inline.graphic.graphicData.pic.nvPicPr.cNvPr.name)
output
21.228 15.920 IMG_20160910_220903848.jpg
In fact I would like to know if there is any simpler way to get the image name , like s.height.cm fetched me the height in cm. My primary requirement is to get to know where the image is in the document, because I need to extract the image and do some work on it and then again put the image back to the same location

This operation is not directly supported by the API.
However, if you're willing to dig into the internals a bit and use the underlying lxml API it's possible.
The general approach would be to access the ImagePart instance corresponding to the picture you want to inspect and modify, then read and write the ._blob attribute (which holds the image file as bytes).
This specimen XML might be helpful:
http://python-docx.readthedocs.io/en/latest/dev/analysis/features/shapes/picture.html#specimen-xml
From the inline shape containing the picture, you get the <a:blip> element with this:
blip = inline_shape._inline.graphic.graphicData.pic.blipFill.blip
The relationship id (r:id generally, but r:embed in this case) is available at:
rId = blip.embed
Then you can get the image part from the document part
document_part = document.part
image_part = document_part.related_parts[rId]
And then the binary image is available for read and write on ._blob.
If you write a new blob, it will replace the prior image when saved.
You probably want to get it working with a single image and get a feel for it before scaling up to multiple images in a single document.
There might be one or two image characteristics that are cached, so you might not get all the finer points working until you save and reload the file, so just be alert for that.
Not for the faint of heart as you can see, but should work if you want it bad enough and can trace through the code a bit :)

You can also inspect paragraphs with a simple loop, and check which xml contains an image (for example if an xml contains "graphicData"), that is which is an image container (you can do the same with runs):
from docx import Document
image_paragraphs = []
doc = Document(path_to_docx)
for par in doc.paragraphs:
if 'graphicData' in par._p.xml:
image_paragraphs.append(par)
Than you unzip docx file, images are in the "images" folder, and they are in the same order as they will be in the image_paragraphs list. On every paragraph element you have many options how to change it. If you want to extract img process it and than insert it in the same place, than
paragraph.clear()
paragraph.add_run('your description, if needed')
run = paragraph.runs[0]
run.add_picture(path_to_pic, width, height)

So, I've never really written any answers here, but i think this might be the solution to your problem. With this little code you can see the position of your images given all the paragraphs. Hope it helps.
import docx
doc = docx.Document(filename)
paraGr = []
index = []
par = doc.paragraphs
for i in range(len(par)):
paraGr.append(par[i].text)
if 'graphicData' in par[i]._p.xml:
index.append(i)

If you are using Python 3
pip install python-docx
import docx
doc = docx.Document(document_path)
P = []
I = []
par = doc.paragraphs
for i in range(len(par)):
P.append(par[i].text)
if 'graphicData' in par[i]._p.xml:
I.append(i)
print(I)
#returns list of index(Image_Reference)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.