Get the Values of an image in Python - python

I have a GeoTIFF and I need to get the values of each pixel.
I proceeded this way :
import gdal
from gdalconst import *
im = gdal.Open("test.tif", GA_ReadOnly)
band = im.GetRasterBand(1)
bandtype = gdal.GetDataTypeName(band.DataType)
scanline = band.ReadRaster( 0, 0, band.XSize, 1,band.XSize, 1, band.DataType)
scanline contains uninterpretable values :
>>> scanline
'\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19
\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\x19\xfc\
x19\xfc\x19\xfc\x19...
I need to convert this data to readable values.
In other words, I need to get the values of the image in order to count the number of pixels having values greater than a specified threshold.

use ReadAsArray instead.
//for float data type
scanline = band.ReadAsArray( 0, 0, band.XSize, band.YSize).astype(numpy.float)
refer to website : link

From the gdal tutorial, "Note that the returned scanline is of type string, and contains xsize*4 bytes of raw binary floating point data. This can be converted to Python values using the struct module from the standard library:"
import struct
tuple_of_floats = struct.unpack('f' * b2.XSize, scanline)
Alternatively, depending on what you are ultimately trying to do with the data, you could read it in as an array (which opens the door to using numpy for computations).
import gdal
im = gdal.Open("test.tif", GA_ReadOnly)
data_array = im.GetRasterBand(1).ReadAsArray()

Related

Reading "a flat, binary array of 16-bit signed, little-endian (LSB) integers" from file in python

I'm trying to read a old file of snow data from here, but I'm having a ton of trouble just opening a single file and getting data out. In the user guide, it says "Each monthly binary data file with the file extension ".NSIDC8" contains a flat, binary array of 16-bit signed, little-endian (LSB) integers, 721 columns by 721 rows (row-major order, i.e. the top row of the array comprises the first 721 values in the file, etc.)." The data is 20 to 50 years old so there's not much coding documentation
If I just open the file and run readlines, with this code:
with open(os.path.join(folder,file), 'rb') as f:
# contents = f.read()
lines = f.readlines()
I get something looking like this:
\x00P\x00#\x00\x19\x00\x13\x00C\x00F\x00\x11\x00\r\x00:\x00.\x00\x02
If I use np.load(), the results are number like: -6.85682214e+304
I imagine I need to use the struct package and the unstruct function, but I have no idea what format to use, and my attempts are not getting reasonable answers. For instance, I've tried just reading the first four bytes and using '<i' as the format, as shown in the code below
with open(os.path.join(folder,file), 'rb') as f:
print(struct.unpack('<i', f.read(4)))
And the print statement showed (-13041864,), which doesn't make sense. Any insights would be greatly appreciated
You can unpack the data 16 bits at a time and specify this in your unpack format string. You're using <i, which wants 4 bytes. The data is in 16 bit numbers, which wants 2 bytes. Instead, use <h.
For example,
# I chose a random file from their setup
with open("NL198303.v01.NSIDC8", "rb") as dfile:
print(struct.unpack("<h", dfile.read(2)))
# prints -200, which is a "fixed value for corners" according to their docs
Here, h means "signed short".
I looked at several random locations in the file and only saw -200 and -250, corresponding to some sort of fixed boundary and ocean spots. Presumably there are other values somewhere, but I didn't look.
You can translate your output to a bytes object (with .encode()) and then convert it to a list:
dat = b'\x00P\x00#\x00\x19\x00\x13\x00C\x00F\x00\x11\x00\r\x00:\x00.\x00\x02'
lst = list(dat)
print(lst)
>>> [0, 80, 0, 64, 0, 25, 0, 19, 0, 67, 0, 70, 0, 17, 0, 13, 0, 58, 0, 46, 0, 2]

ValueError: could not convert string to float: '2,3972E-7'---loadtxt (numpy)

This is some sample from large txt file: [0, 0, 0, 2.3972E-7, 2.3972E-6, 1.23, 100.5, 1000.78, 2012.99] and I get ValueError: could not convert string to float: '2,3972E-7'. Here is code:
# read the data sample
W_data = open("power.txt").read().split()
W_data1 = np.array(W_data).astype('float64')
In [22]: a = [0, 0, 0, 2.3972E-7, 2.3972E-6, 1.23, 100.5, 1000.78, 2012.99]
In [25]: np.array(a).astype(np.float64)
Out[25]: array([0.00000e+00, 0.00000e+00, 0.00000e+00, 2.39720e-07, 2.39720e-06,
1.23000e+00, 1.00500e+02, 1.00078e+03, 2.01299e+03])
One thing that doesn't match up here is that the error message uses a decimal comma , while the given sample uses decimal point .. This typically occurs when the locale field LC_NUMERIC has been set inappropriately or misapplied. Your code also doesn't support the sample format, so this difference is probably stored in the file, which appears to be a whitespace separated list of numbers.
Two ways to parse this:
import numpy as np
import locale
# Simply replace the commas with periods
numstrings = open("power.txt").read().replace(',', '.').split()
nums = np.array(numstrings).astype('float64')
# Parse according to a locale that uses decimal comma
locale.setlocale(locale.LC_NUMERIC, 'sv_SE')
numstrings = open("power.txt").read().split()
nums = np.array(list(map(locale.atof, numstrings)))
Your default locale may well work, which you can set using locale.setlocale(locale.LC_ALL, '') or locale.resetlocale(). Locale may also be set by default, but some functions such as numpy's astype do not use it.

Python Pillow Image.frombytes mode '1' bad result

Where am I wrong ? I want to create a basic white pict from bytes
from PIL import Image
if __name__ == "__main__":
data = [chr(1)] * 8192
data = "".join(data)
im = Image.frombytes('1', (128,64), data, 'raw')
im = im.convert("RGB")
im.save("image.png", "PNG")
But I get this:
Just use Image.new instead:
im = Image.new(mode='RGB', size=(128,64), color=(255,255,255))
If you really want to make it from bytes, it would be like this:
Image.frombytes(mode='RGB', size=(128,64), data=b'\xff'*128*64*3)
edit: Image.frombytes expects bytes, not a list of integers. To convert a list of integers to the right type, use this:
>>> bytes([0,1,2]) # Python 3
b'\x00\x01\x02'
>>> bytes(bytearray([0,1,2])) # Python 2
'\x00\x01\x02'
edit 2: mode='1' or the docs have bug (see comment thread). Assuming you have a list of zeros and ones, 1024 elements long, and you want to convert this to an 128x64 monochromatic image (one bit per pixel) then you'll have to pack the bytes manually:
bits = [int(not (y%13 and x%7)) for x in range(64) for y in range(128)]
# asymmetric grid
octets = [bits[i:i+8] for i in range(0, len(bits), 8)]
def bits2byte(bits8):
result = 0
for bit in bits8:
result <<= 1
result |= bit
return result
data = bytes(bytearray([bits2byte(octet) for octet in octets]))
im = Image.frombytes(mode='1', size=(128,64), data=data)
im.show()
Result:
In mode 1 each byte represents 8 pixels (there might be zero padding at end of each row if the width does not divide by 8). So to get a white image, you have to pass in only the byte b'\xff'
data = b'\xff' * 1024
im = Image.frombytes('1', (128,64), data)
Even if the Pillow docs say that there's one pixel per byte in this mode, that is not true for the frombytes and tobytes methods, at least.
Any other repeating input other than \xff (all white) or \x00 (all black) will give some sort of pinstripe pattern, like the one in your question.

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)
array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.
This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'
The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)
import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

Reading bmp files in Python

Is there a way to read in a bmp file in Python that does not involve using PIL? PIL doesn't work with version 3, which is the one I have. I tried to use the Image object from graphics.py, Image(anchorPoint, filename), but that only seems to work with gif files.
In Python it can simply be read as:
import os
from scipy import misc
path = 'your_file_path'
image= misc.imread(os.path.join(path,'image.bmp'), flatten= 0)
## flatten=0 if image is required as it is
## flatten=1 to flatten the color layers into a single gray-scale layer
I realize that this is an old question, but I found it when solving this problem myself and I figured that this might help someone else in the future.
It's pretty easy actually to read a BMP file as binary data. Depending on how broad support and how many corner-cases you need to support of course.
Below is a simple parser that ONLY works for 1920x1080 24-bit BMP's (like ones saved from MS Paint). It should be easy to extend though. It spits out the pixel values as a python list like (255, 0, 0, 255, 0, 0, ...) for a red image as an example.
If you need more robust support there's information on how to properly read the header in answers to this question: How to read bmp file header in python?. Using that information you should be able to extend the simple parser below with any features you need.
There's also more information on the BMP file format over at wikipedia https://en.wikipedia.org/wiki/BMP_file_format if you need it.
def read_rows(path):
image_file = open(path, "rb")
# Blindly skip the BMP header.
image_file.seek(54)
# We need to read pixels in as rows to later swap the order
# since BMP stores pixels starting at the bottom left.
rows = []
row = []
pixel_index = 0
while True:
if pixel_index == 1920:
pixel_index = 0
rows.insert(0, row)
if len(row) != 1920 * 3:
raise Exception("Row length is not 1920*3 but " + str(len(row)) + " / 3.0 = " + str(len(row) / 3.0))
row = []
pixel_index += 1
r_string = image_file.read(1)
g_string = image_file.read(1)
b_string = image_file.read(1)
if len(r_string) == 0:
# This is expected to happen when we've read everything.
if len(rows) != 1080:
print "Warning!!! Read to the end of the file at the correct sub-pixel (red) but we've not read 1080 rows!"
break
if len(g_string) == 0:
print "Warning!!! Got 0 length string for green. Breaking."
break
if len(b_string) == 0:
print "Warning!!! Got 0 length string for blue. Breaking."
break
r = ord(r_string)
g = ord(g_string)
b = ord(b_string)
row.append(b)
row.append(g)
row.append(r)
image_file.close()
return rows
def repack_sub_pixels(rows):
print "Repacking pixels..."
sub_pixels = []
for row in rows:
for sub_pixel in row:
sub_pixels.append(sub_pixel)
diff = len(sub_pixels) - 1920 * 1080 * 3
print "Packed", len(sub_pixels), "sub-pixels."
if diff != 0:
print "Error! Number of sub-pixels packed does not match 1920*1080: (" + str(len(sub_pixels)) + " - 1920 * 1080 * 3 = " + str(diff) +")."
return sub_pixels
rows = read_rows("my image.bmp")
# This list is raw sub-pixel values. A red image is for example (255, 0, 0, 255, 0, 0, ...).
sub_pixels = repack_sub_pixels(rows)
Use pillow for this. After you installed it simply import it
from PIL import Image
Then you can load the BMP file
img = Image.open('path_to_file\file.bmp')
If you need the image to be a numpy array, use np.array
img = np.array(Image.open('path_to_file\file.bmp'))
The numpy array will only be 1D. Use reshape() to bring it into the right shape in case your image is RGB. For example:
np.array(Image.open('path_to_file\file.bmp')).reshape(512,512,3)
I had to work on a project where I needed to read a BMP file using python, it was quite interesting, actually the best way is to have a review on the BMP file format (https://en.wikipedia.org/wiki/BMP_file_format) then reading it as binairy file, to extract the data.
You will need to use the struct python library to perform the extraction
You can use this tutorial to see how it proceeds https://youtu.be/0Kwqdkhgbfw
Use the excellent matplotlib library
import matplotlib.pyplot as plt
im = plt.imread('image.bmp')
It depends what you are trying to achieve and on which platform?
Anyway using a C library to load BMP may work e.g. http://code.google.com/p/libbmp/ or http://freeimage.sourceforge.net/, and C libraries can be easily called from python e.g. using ctypes or wrapping it as a python module.
or you can compile this version of PIL https://github.com/sloonz/pil-py3k
If you're doing this in Windows, this site, should allow you to get PIL (and many other popular packages) up and running with most versions of Python: Unofficial Windows Binaries for Python Extension Packages
The common port of PIL to Python 3.x is called "Pillow".
Also I would suggest pygame library for simple tasks. It is a library, full of features for creating games - and reading from some common image formats is among them. Works with Python 3.x as well.

Categories