Compressing bitmap fails (Python implementation)

Compressing bitmap fails (Python implementation) - python

I have an application that uses bitmaps for image storage. Currently these are stored as uncompressed bitmaps, but since the number of colors is almost always <=16, I should be able to use a compressed bitmap format, which makes the files nearly 6 times smaller. With the help of the Wikipedia entry at https://en.wikipedia.org/wiki/BMP_file_format I tried to implement a compression program in Python - shown below. This successfully reads the input bmp and outputs a smaller bmp, but the resulting bmp is not readable, so I must be doing something wrong. In windows explorer, the properties of the file include "Bit Depth: 32", which is clearly not the intention - but I'm pretty sure I got the header info for bit depth and compression right. It would help if I could find an example of a compressed bitmap for inspection, but haven't been able to track one down. Is this format not generally supported, or am I doing something wrong?
import math
def rgb(r,g,b):
return hex(256*256*256+r*256*256+g*256+b)[3:9]
def read_bmp(path):
bm={}
with open(path,"rb") as f:
byte=f.read(54)
header_size=byte[14]+byte[15]*256
if header_size!=40: return None
bm['w']=w=byte[18]+byte[19]*256
bm['h']=byte[22]+byte[23]*256
bm['color_depth']=byte[28]
bm['compression']=byte[30]
bm['im_size']=byte[34]+byte[35]*256+byte[36]*256*256
if bm['compression']!=0:
ct=f.read(4*bm['color_depth'])
color_table=[]
for i in range(0,bm['color_depth']):
color_table.append((ct[i*4],ct[i*4+1],ct[i*4+2]))
if bm['color_depth']==24:
bmp=f.read(bm['im_size'])
row_size=bm['im_size']//bm['h']
col={}
pixel=[]
for y in range(0,bm['h']):
s=""
row=[]
for x in range(0,w):
row.append((bmp[y*row_size+x*3],bmp[y*row_size+x*3+1],bmp[y*row_size+x*3+2]))
color=rgb(bmp[y*row_size+x*3],bmp[y*row_size+x*3+1],bmp[y*row_size+x*3+2])
if color in col:
col[color]+=1
else:
col[color]=1
pixel.append(row)
bm['pix']=pixel
bm['color_count']=len(col)
return bm
def putint(b,p,i):
b[p]=i & 255
b[p+1]=(i>>8)&255
if i>65535:
b[p+2]=(i>>16)&255
b[p+3]=(i>>24)&255
def write_compressed_bmp(bm,path):
pix=bm['pix']
col=[]
color_index=[]
for row in pix:
rowindex=[]
for color in row:
if not color in col: col.append(color)
rowindex.append(col.index(color))
color_index.append(rowindex)
print(col)
if len(col)>16: return
with open(path,"wb") as f:
row_size=math.ceil(bm['w']/8)*4
image_size=row_size*bm['h']
file_len=54+64+image_size
header=bytearray(54)
header[0]=ord('B')
header[1]=ord('M')
putint(header,2,file_len)
putint(header,10,54+64)
putint(header,14,40)
putint(header,18,bm['w'])
putint(header,22,bm['h'])
putint(header,26,1)
putint(header,28,4)
putint(header,34,image_size)
f.write(header)
color_map=bytearray(64)
for i in range(0,len(col)):
color_map[i*4]=col[i][0]
color_map[i*4+1]=col[i][1]
color_map[i*4+2]=col[i][2]
f.write(color_map)
for row in pix:
row_bytes=bytearray(row_size)
i=0
for color in row:
ci=col.index(color)
if i&1:
row_bytes[i>>1]|=ci
else:
row_bytes[i>>1]|=ci<<4
i+=1
f.write(row_bytes)
bm=read_bmp("bitmap.bmp")
if bm['color_count']<=16:
write_compressed_bmp(bm,"bitmap2.bmp")

I found out what was wrong - I forgot that the range() method in Python takes one larger than the largest desired value as its second parameter, so one too few rows was being written. Fixing that, and leaving the "compression" bit at 0 fixed the problem - I edited the code above accordingly. If it's OK, I'll leave this up here for possible use by others.

Related

np.savetxt isn't storing array information

I have the following code:
#Time integration
T=28
AT=5/(1440)
L=T/AT
tr=np.linspace(AT,T,AT) %I set minimum_value to AT,
to avoid a DivisionByZero Error (in function beta_cc(i))
np.savetxt('tiempo_real.csv',tr,delimiter=",")
#Parameters
fcm28=40
beta_cc=0
fcm=0
s=0
# Hardening coeficient (s)
ct=input("Cement Type (1, 2 or 3): ")
print("Cement Type: "+str(ct))
if int(ct)==1:
s=0.2
elif int(ct)==2:
s=0.25
elif int(ct)==3:
s=0.38
else: print("Invalid answer")
# fcm determination
iter=1
maxiter=8065
while iter<maxiter:
iter += 1
beta_cc = np.exp(s*(1-(28/tr))**0.5)
fcm = beta_cc*fcm28
np.savetxt('Fcm_Results.csv',fcm,delimiter=",")
The code runs without errors, and it creates the two desired files, but there is no information stored in neither.
What I would like the np.savetxt to do is to create a .CSV file with the result of fcm at every iteration, (so a 1:8064 array)
Instead of the while-loop, I had previously tried using a For-loop, but as the timestep is a float, I had some problems with it.
Thank you very much.
PS. Not sure if I should mention: I used Python3 on Ubuntu.

If anyone has the same issue, I solved this by changing the loop to a FOR loop, appending the iterative values of the functions (beta_cc & fcm) in an array, and using the savetxt command.
iteration=0
maxiteration=8064
fcmM1=[]
tiemporeal=[]
for i in range(iterat,maxiter):
def beta_cc(i):
return np.exp(s*1-(28/tr)**0.5))
def fcm(i):
return beta_cc(i)**fcm28
tr=tr+AT
fcmM1.append(fcm(i))
tiemporeal.append(tr)
np.savetxt('M1_Resultados_fcm.csv',fcmM1,delimiter=",",header="Fcm",fmt="%s")

What is the fastest way to sort and unpack a large bytearray?

I have a large binary file that needs to be converted into hdf5 file format.
I am using Python3.6. My idea is to read in the file, sort the relevant information, unpack it and store it away. My information is stored in a way that the 8 byte time is followed by 2 bytes of energy and then 2 bytes of extra information, then again time, ... My current way of doing it, is the following (my information is read as an bytearray, with the name byte_array):
for i in range(0, len(byte_array)+1, 12):
if i == 0:
timestamp_bytes = byte_array[i:i+8]
energy_bytes = byte_array[i+8:i+10]
extras_bytes = byte_array[i+10:i+12]
else:
timestamp_bytes += byte_array[i:i+8]
energy_bytes += byte_array[i+8:i+10]
extras_bytes += byte_array[i+10:i+12]
timestamp_array = np.ndarray((len(timestamp_bytes)//8,), '<Q',timestamp_bytes)
energy_array = np.ndarray((len(energy_bytes) // 2,), '<h', energy_bytes)
extras_array = np.ndarray((len(timestamp_bytes) // 8,), '<H', extras_bytes)
I assume there is a much faster way of doing this, maybe by avoiding to loop over the the whole thing. My files are up to 15GB in size so every bit of improvement would help a lot.

You should be able to just tell NumPy to interpret the data as a structured array and extract fields:
as_structured = numpy.ndarray(shape=(len(byte_array)//12,),
dtype='<Q, <h, <H',
buffer=byte_array)
timestamps = as_structured['f0']
energies = as_structured['f1']
extras = as_structured['f2']
This will produce three arrays backed by the input bytearray. Creating these arrays should be effectively instant, but I can't guarantee that working with them will be fast - I think NumPy may need to do some implicit copying to handle alignment issues with these arrays. It's possible (I don't know) that explicitly copying them yourself with .copy() first might speed things up.

You can use numpy.frombuffer with a custom datatype:
import struct
import random
import numpy as np
data = [
(random.randint(0, 255**8), random.randint(0, 255*255), random.randint(0, 255*255))
for _ in range(20)
]
Bytes = b''.join(struct.pack('<Q2H', *row) for row in data)
dtype = np.dtype([('time', np.uint64),
('energy', np.uint16), # you may need to change that to `np.int16`, if energy can be negative
('extras', np.uint16)])
original = np.array(data, dtype=np.uint64)
result = np.frombuffer(Bytes, dtype)
print((result['time'] == original[:, 0]).all())
print((result['energy'] == original[:, 1]).all())
print((result['extras'] == original[:, 2]).all())
print(result)
Example output:
True
True
True
[(6048800706604665320, 52635, 291) (8427097887613035313, 15520, 4976)
(3250665110135380002, 44078, 63748) (17867295175506485743, 53323, 293)
(7840430102298790024, 38161, 27601) (15927595121394361471, 47152, 40296)
(8882783920163363834, 3480, 46666) (15102082728995819558, 25348, 3492)
(14964201209703818097, 60557, 4445) (11285466269736808083, 64496, 52086)
(6776526382025956941, 63096, 57267) (5265981349217761773, 19503, 32500)
(16839331389597634577, 49067, 46000) (16893396755393998689, 31922, 14228)
(15428810261434211689, 32003, 61458) (5502680334984414629, 59013, 42330)
(6325789410021178213, 25515, 49850) (6328332306678721373, 59019, 64106)
(3222979511295721944, 26445, 37703) (4490370317582410310, 52413, 25364)]

I'm not an expert on numpy, but here's my 5 cents:
You have lots of data, and probably it's more than your RAM.
This points to the simplest solution - don't try to fit all data in your program.
When you read a file into a variable - then the X GB is being read into RAM. If it's more than available RAM, then swapping is done by your OS. Swapping slows you down, since not only do you have disk operations for reading from source file, now you also have writing to disk to dump RAM contents into swap file.
Instead of that write the script so that it uses parts of the input file as necessary (in your case you read the file along anyways and don't go back or jump far ahead).
Try opening the input file as memory mapped data structure (please note differences in usage between Unix and windows environments)
Then you can do simple read([n]) bytes at a time and append that to your arrays.
behind the scenes data is read into RAM page by page as needed and will not exceed the available memory, also, leaving more space for your arrays to grow.
Also consider the fact that your resultant arrays can also outgrow RAM, which will cause similar slowdown as reading of a big file.

Python input/output optimisation

I think this code takes too long to execute, so maybe there are better ways to do this. I'm not looking for an answer related to parallelising the for loops, or using more than one processor.
What I'm trying to do is to read values from "file" using "np.genfromtxt(file)". I have 209*500*16 of these files. I want to extract the minimum value of the highest 1000 values of the 209 loop, and putting these 500 values in 16 different files. If the files are missing or the data hasn't the adequate size, the info is written to the "missing_all" file.
The questions are:
Is this the best method to open a file?
Is this the best method to write to files?
How can I make this code faster?
Code:
import numpy as np
import os.path
output_filename2 = '/home/missing_all.txt'
target2 = open(output_filename2, 'w')
for w in range(16):
group = 1200 + 50*w
output_filename = '/home/veto_%s.txt' %(group)
target = open(output_filename, 'w')
for z in range(1,501):
sig_b = np.zeros((209*300))
y = 0
for index in range(1,210):
file = '/home/BandNo_%s_%s/%s_209.dat' %(group,z,index)
if not os.path.isfile(file):
sig_b[y:y+300] = 0
y = y + 300
target2.write('%s %s %s\n' % (group,z,index))
continue
data = np.genfromtxt(file)
if (data.shape[0] < 300):
sig_b[y:y+300] = 0
y = y + 300
target2.write('%s %s %s\n' % (group,z,index))
continue
sig_b[y:y+300] = np.sort(data[:,4])[::-1][0:300]
y = y + 300
sig_b = np.sort(sig_b[:])[::-1][0:1000]
target.write('%s\n' % (sig_b[-1]))

Profiler
You can use a profiler to figure out what parts of your script take the most time. This way you know exactly what takes the most time and can optimize those lines instead of blindly trying to optimize your code. The time invested to figure out how the profiler works will pay for itself easily later on.
Some possible slow-downs
Here are some guesses, but they really are only guesses.
You open() only 17 files, so it probably doesn't matter how exactly you do this.
I don't know much about writing-performance. Using file.write() seems fine to me.
genfromtextfile probably takes quite a while (depends on your input files), is loadtxt an alternative for you? The docs states you can use it for data without holes.
Using a binary file format instead of text could speed up reading the file.
You sort your array on every iteration. Is there a way to sort it only at the end?
Usually asking the file system something is not very fast, i.e. os.path.isfile(file) is potentially slow. You could try creating a dict of all the children of the parent directory and use that cached version.
Similarly, if most of your files exist, using exceptions can be faster:
try:
data = np.genfromtxt(file)
except FileNotFoundError: # not sure if this is the correct exception
sig_b[y:y+300] = 0
y += 300
target2.write('%s %s %s\n' % (group,z,index))
continue
I didn't try to understand your code in detail. Maybe you can reduce the necessary work by using a smarter algorithm?
PS: I like that you try to put all equal signs on the same column. Unfortunately here it makes it harder to read your code.

python's putpixel() isn't working

So I was working on this school project (I know really basic programming, and python is the only language I know) where I need to change my pixel colour to encode a message in a picture, but PIL's putpixel doesn't seem to be working, here is my code.
P.S.: all my PIL information is self taught and English isn't my main language so if you could talk simplified I'd be grateful
from PIL import Image
e=input('file and location? ')
img=Image.open(e)
pmap=img.load()
imy=img.height
imx=img.width
if int(input('1 for encoding, 2 for decoding '))==1:
a=input('Your message? ')
for i in range(len(a)):
r , g , b=img.getpixel((i+10,imy//2))
img.putpixel((i+10,imy//2),(ord(a[i]),g,b))
r,g,b=img.getpixel((len(a)+10,imy//2))
img.putpixel((len(a)+10,imy//2),(999,g,b)) #999 is the stop code in decoding
else:
r=u=0
m=''
while r!=999:
r , g , b=img.getpixel((10+u,imy//2))
m+=chr(r)
u+=1
print(m[:len(a)-1])
img.save(e)
please bare in mind that I'm not looking to make a visual difference and I've already done debugging.There are also no errors,putpixel is not working for some reason though.
as I said, I'm new to programming, so sorry if it includes stupid mistakes.

After using your code and trying it out on an image, putpixel is working as expected. The change in the pixels is very hard to see and that may be why you believe that it isn't working. Believe me, it is working, you just can't see it.
However, there are two problems I see with your code:
1) 999 is not encodable
999 can not be encoded in a single pixel. The maximum value for a pixel is 255 (The range is 0-255). You need to choose a different stop code/sequence. I recommend changing the stop code to 255.
2) When decoding, a has never been defined
You need to get the length of the message by another means. I suggest doing this with a counter:
counter = 0
while something:
counter += 1
# do something with count here
All in all, a working version of your code would look like:
e=input('file and location? ')
img=Image.open(e)
pmap=img.load()
imy=img.height
imx=img.width
if int(input('1 for encoding, 2 for decoding '))==1:
a=input('Your message? ')
for i in range(len(a)):
r , g , b= img.getpixel((i+10,imy//2))
img.putpixel((i+10,imy//2),(ord(a[i]),g,b))
r,g,b=img.getpixel((len(a)+10,imy//2))
img.putpixel((len(a)+10,imy//2),(255,g,b)) #255 is the stop code in decoding
else:
r=u=0
m=''
message_length=0
while r!=255:
message_length+=1
r , g , b=img.getpixel((10+u,imy//2))
m+=chr(r)
u+=1
print(m[:message_length-1])
img.save(e)

The difference is there, but it's just a few single pixels. If I calculate the difference between original and new image, you'll see it in the middle left, stored in test2.png. In order to enhance contrast I have "equalized" the image.
from PIL import Image, ImageChops, ImageOps
img=Image.open("image.jpg")
pmap=img.load()
img2=img.copy()
imy=img.height
imx=img.width
if int(input('1 for encoding, 2 for decoding '))==1:
a=input('Your message? ')
for i in range(len(a)):
r , g , b=img.getpixel((i+10,imy//2))
img.putpixel((i+10,imy//2),(ord(a[i]),g,b))
r,g,b=img.getpixel((len(a)+10,imy//2))
img.putpixel((len(a)+10,imy//2),(999,g,b)) #999 is the stop code in decoding
else:
r=u=0
m=''
while r!=999:
r , g , b=img.getpixel((10+u,imy//2))
m+=chr(r)
u+=1
print(m[:len(a)-1])
img.save("test.png")
img3=ImageChops.difference(img, img2)
img3=ImageOps.equalize(img3)
img3.save("test2.png")
This is the result:

im.getcolors() returns None

I am using a simple code to compare an image to a desktop screenshot through the function getcolors() from PIL. When I open an image, it works:
im = Image.open('sprites\Bowser\BowserOriginal.png')
current_sprite = im.getcolors()
print current_sprite
However, using both pyautogui.screenshot() and ImageGrab.grab() for the screenshot, my code returns none. I have tried using the RGB conversion as shown here: Cannot use im.getcolors.
Additionally, even when I save a screenshot to a .png, it STILL returns none.
i = pyautogui.screenshot('screenshot.png')
f = Image.open('screenshot.png')
im = f.convert('RGB')
search_image = im.getcolors()
print search_image
First time posting, help is much appreciated.

Pretty old question but for those who sees this now:
Image.getcolors() takes as a parameter "maxcolors – Maximum number of colors." (from the docs here).
The maximum number of colors an image can have, equals to the number of pixels it contains.
For example, an image of 50*60px will have maximum 3,000 colors.
To translate it into code, try this:
# Open the image.
img = Image.open("test.jpg")
# Set the maxcolors number to the image's pixels number.
colors = img.getcolors(img.size[0]*img.size[1])

If you'd check the docs, getcolors returns None if the number of colors in the image is greater than the default parameter, which is set to 256.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.