Loading WSI slide images to python and converting images to array

Loading WSI slide images to python and converting images to array - python

I have been new to WSI image processing and have been trying load images to python. I was successfull in loading slide images but it is not getting converted to an array for further processing. any help is appreciated. Following is the code i used
filenames = os.listdir("C:/Users/DELL/Downloads/dataset1")
X_train= np.array(filenames)
following is the output i get instead of an array of numbers representing an image
'TCGA-BA-4074-01A-01-BS1.9c51e4d0-cb30-412a-995a-97ac4f860a87.svs'

You should use specialized libraries for the reading of WSI image. Check these:
slideio
openslide
Keep in mind that normally the WSI slides are too large to load them
in the memory in the original resolution. You can load them partially or on the low resolutions. Both of the libraries support such a functionality.

Related

Convert a .jpg image to rgb in micropython?

I am trying to create a program on the Raspberry Pi Pico W that downloads a jpg image and displays a very pixilated version of the image on an addressable led matrix. I have decided to attempt to use micro python as my coding language for this project since I have already completed a similar project in the full python language. Since I am downloading the images from an online source I am stuck using the '.jpg' format at a fixed size.
I have been running into some difficulties processing the image. To run on the addressable LEDs (Neopixel) I want to collect the rgb data from each individual pixel of the .jpg and add them to a list.
Working on python I know that the PIL/Pillow library is a great solution to this problem.
from PIL import Image
image = Image.open('256256.jpg',formats=None)
print(image)
from numpy import asarray
data = asarray(image)
Unfortunately the RP2040 doesn't seem have enough storage space to handle the module.
I need to find a way to decode the image using the modules readily available to micro python.
I have attempted to reverse engineer the PIL open feature but haven't had any luck so far.
Thank you in advance!

Compressing PIL image without saving the file

I am having trouble with compressing image in python without saving the image at the disk. The image has a save function as described here. Here it optimizes the image by saving it. Is it possible to use the same procedure without saving the image. I want to do it like another python function.
image=image.quantize() [here it reduces the quality a lot ]
Thanks in advance :)

In PIL or opencv the image is just a large matrix with values for its pixels. If you want to do something with the image(e.g. display it), the function needs to know all the pixel values, and thus needs the extracted image.
However, there is a method to keep the image compressed in memory until you really need to do something with the image. Have a look at this answer: How can i load a image in Python, but keep it compressed?

How to extract individual JPEG images from a HDF5 file

I have a big HDF5 file with the images and its corresponding ground truth density map.
I want to put them into the network CRSNet and it requires the images in separate files.
How can I achieve that? Thank you very much.
-- Basic info I have a HDF5 file with two keys "images" and "density_maps". Their shapes are (300, 380, 676, 1).
300 stands for the number of images, 380 and 676 refer to the height and width respectively.
-- What I need to put into the CRSNet network are the images (jpg) with their corresponding HDF5 files. The shape of them would be (572, 945).
Thanks a lot for any comment and discussion!

For starters, a quick clarification on h5py and HDF5. h5py is a Python package to read HDF5 files. You can also read HDF5 files with the PyTables package (and with other languages: C, C++, FORTRAN).
I'm not entirely sure what you mean by "the images (jpg) with their corresponding h5py (HDF5) files" As I understand all of your data is in 1 HDF5 file. Also, I don't understand what you mean by: "The shape of them would be (572, 945)." This is different from the image data, right? Please update your post to clarify these items.
It's relatively easy to extract data from a dataset. This is how you can get the "images" as NumPy arrays and and use cv2 to write as individual jpg files. See code below:
with h5py.File('yourfile.h5','r') as h5f:
for i in range(h5f['images'].shape[0]):
img_arr = h5f['images'][i,:] # slice notation gets [i,:,:,:]
cv2.imwrite(f'test_img_{i:03}.jpg',img_arr)
Before you start coding, are you sure you need the images as individual image files, or individual image data (usually NumPy arrays)? I ask because the first step in most CNN processes is reading the images and converting them to arrays for downstream processing. You already have the arrays in the HDF5 file. All you may need to do is read each array and save to the appropriate data structure for CRSNet to process them. For example, here is the code to create a list of arrays (used by TensorFlow and Keras):
image_list = []
with h5py.File('yourfile.h5','r') as h5f:
for i in range(h5f['images'].shape[0]):
image_list.append( h5f['images'][i,:] ) # gets slice [i,:,:,:]

Extract tiles from tiled TIFF and store in numpy array

My overall goal is to crop several regions from an input mirax (.mrxs) slide image to JPEG output files.
Here is what one of these images looks like:
Note that the darker grey area is part of the image, and the regions I ultimately wish to extract in JPEG format are the 3 black square regions.
Now, for the specifics:
I'm able to extract the color channels from the mirax image into 3 separate TIFF files using vips on the command line:
vips extract_band INPUT.mrxs OUTPUT.tiff[tile,compression=jpeg] C --n 1
Where C corresponds to the channel number (0-2), and each output file is about 250 MB in size.
The next job is to somehow recognize and extract the regions of interest from the images, so I turned to several python imaging libraries, and this is where I encountered difficulties.
When I try to load any of the TIFFs using OpenCV using:
i = cv2.imread('/home/user/input_img.tiff',cv2.IMREAD_ANYDEPTH)
I get an error error: (-211) The total matrix size does not fit to "size_t" type in function setSize
I managed to get a little more traction with Pillow, by doing:
from PIL import Image
tiff = Image.open('/home/user/input_img.tiff')
print len(tiff.tile)
print tiff.tile[0]
print tiff.info
which outputs:
636633
('jpeg', (0, 0, 128, 128), 8, ('L', ''))
{'compression': 'jpeg', 'dpi': (25.4, 25.4)}
However, beyond loading the image, I can't seem to perform any useful operations; for example doing tiff.tostring() results in a MemoryError (I do this in an attempt to convert the PIL object to a numpy array) I'm not sure this operation is even valid given the existence of tiles.
From my limited understanding, these TIFFs store the image data in 'tiles' (of which the above image contains 636633) in a JPEG-compressed format.
It's not clear to me, however, how would one would extract these tiles for use as regular JPEG images, or even whether the sequence of steps in the above process I outlined is a potentially useful way of accomplishing the overall goal of extracting the ROIs from the mirax image.
If I'm on the right track, then some guidance would be appreciated, or, if there's another way to accomplish my goal using vips/openslide without python I would be interested in hearing ideas. Additionally, more information about how I could deal with or understand the TIFF files I described would also be helpful.
The ideal situations would include:
1) Some kind of autocropping feature in vips/openslide which can generate JPEGs from either the TIFFs or original mirax image, along the lines of what the following command does, but without generated tens of thousands of images:
vips dzsave CMU-1.mrxs[autocrop] pyramid
2) Being able to extract tiles from the TIFFs and store the data corresponding to the image region as a numpy array in order to detect the 3 ROIs using OpenCV or another methd.

I would use the vips Python binding, it's very like PIL but can handle these huge images. Try something like:
from gi.repository import Vips
slide = Vips.Image.new_from_file(sys.argv[1])
tile = slide.extract_area(left, top, width, height)
tile.write_to_file(sys.argv[2])
You can also extract areas on the command-line, of course:
$ vips extract_area INPUT.mrxs OUTPUT.tiff left top width height
Though that will be a little slower than a loop in Python. You can use crop as a synonym for extract_area.
openslide attaches a lot of metadata to the image describing the layout and position of the various subimages. Try:
$ vipsheader -a myslide.mrxs
And have a look through the output. You might be able to calculate the position of your subimages from that. I would also ask on the openslide mailing list, they are very expert and very helpful.
One more thing you could try: get a low-res overview, corner-detect on that, then extract the tiles from the high-res image. To get a low-res version of your slide, try:
$ vips copy myslide.mrxs[level=7] overview.tif
Level 7 is downsampled by 2 ** 7, so 128x.

Can you reduce memory consumption by ReportLab when embedding very large images, or is there a Python PDF toolkit that can?

Right now reportlab is making PDFs most of the time. However when one file gets several large images (125 files with a total on disk size of 7MB), we end up running out of memory and crashing trying to build a PDF that should ultimately be smaller than 39MB. The problem stems from:
elif mode not in ('L','RGB','CMYK'):
im = im.convert('RGB')
self.mode = 'RGB'
Where nice b&w (bitonal) images are converted to RGB and when you have images with sizes in the 2595x3000, they consume a lot of memory. (Not sure why they consume 2GB, but that point is moot. When we add them to reportlab our entire python memory footprint is about 50MB, when we call
doc.build(elements, canvasmaker=canvasmaker)
Memory usage skyrockets as we go from bitonal PNGs to RGB and then render them onto the page.
While I try to see if I can figure out how to inject bitonal images into reportlab PDFs, I thought I would see if anyone else had an idea of how to fix this problem either in reportlab or with another tool.
We have a working PDF maker using PODOFO in C++, one of my possible solutions is to write a script/outline for that tool that will simply generate the PDF in a subprocess and then return that via a file or stdout.

Short of redoing PIL you are out of luck. The Images are converted internally in PIL to 24 bit color TIFs. This is not something you can easily change.
We switched to Podofo and generate the PDF outside of python.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading WSI slide images to python and converting images to array - python

Related

Convert a .jpg image to rgb in micropython?

Compressing PIL image without saving the file

How to extract individual JPEG images from a HDF5 file

Extract tiles from tiled TIFF and store in numpy array

Can you reduce memory consumption by ReportLab when embedding very large images, or is there a Python PDF toolkit that can?

Categories

Resources