Reading input lines for int objects separated with whitespace? - python

I'm trying to solve a programming problem that involves returning a boolean for an uploaded profile pic, matching its resolution with the one that I provide as input and returning a statement that I've described below. This is one such test case that is giving me errors:
180
3
640 480 CROP IT
320 200 UPLOAD ANOTHER
180 180 ACCEPTED
The first line reads the dimension that needs to be matched, the second line represents the number of test cases and the rest comprise of resolutions with whitespace separators. For each of the resolutions, the output shown for each line needs to be printed.
I've tried this, since it was the most natural thing I could think of and being very new to Python I/O:
from sys import stdin, stdout
dim = int(input())
n = int(input())
out = ''
for cases in range(0, n):
in1 = int(stdin.readline().rstrip('\s'))
in2 = int(stdin.readline().rstrip('\s'))
out += str(prof_pic(in1, in2, dim))+'\n'
stdout.write(out)
ValueError: invalid literal for int() with base 10 : '640 480\n'
prof_pic is the function that I'm abstaining from describing here to prevent the post getting too long. But I've written in such a way that the width and height params both get compared with dim and return an output. The problem is with reading those lines. What is the best way to read such lines with differing separators?

You can try this it is in python 3.x
dimention=int(input())
t=int(input())
for i in range(t):
a=list(map(int,input().split()))

Instead of:
in2 = int(stdin.readline().rstrip('\s'))
you may try:
in2 = map( int, stdin.readline().split()[:2])
and you get
in2 = [640, 480]

You're calling readline. As the name implies, this reads in a whole line. (If you're not sure what you're getting, you should try printing it out.) So, you get something like this:
640 480 CROP IT
You can't call int on that.
What you want to do is split that line into separate pieces like this:
['640', '480', 'CROP IT']
For example:
line = stdin.readline().rstrip('\s')
in1, in2, rest = line.split(None, 2)
Now you can convert those first two into ints:
in1 = int(in1)
in2 = int(in2)

Related

Converting list from string to int but there's a catch

I'll start off by saying that I don't know much about programming and I tried searching for answers but I didn't even know what to type in the search engine. So here goes.
class Point:
def __init__ (self, x, y):
self.x = x
self.y = y
def __str__ (self):
return "Members are: %s, %s" % (self.x, self.y)
I have this class which represents a point with its x and y coordinate.
I have a list points = [] and if I manually append a point to that list e.g. points.append(Point(-1.0, 3)) the output returns (-1.0, 3) I'm doing some calculations with these points but I don't think it matters if I put the code for that here.
Things get tricky because I have to input the numbers from a file. I already added them to another list and appended them using a loop. The problem is that the list is in str and if I convert it into int I get an error because of the decimal .0 It says in my assignment that I have to keep the same format as the input.
The thing I don't understand is how does it keep the decimal .0 when I input it like this points.append(Point(-1.0, 3)) and is it possible to get the same output format with numbers from a file.
I tried converting it to float but then all the coordinates get decimal places.
You can use this code to convert the inputs appropriately, with this try-catch mechanism, we first try int, then if we didn't successful, we continue with float.
def float_or_int(inp):
try:
n = int(inp)
except ValueError:
try:
n = float(inp)
except ValueError:
print("it's not int or float")
return n
input_1 = '10.3'
input_2 = '10.0'
input_3 = '10'
res1 = float_or_int(input_1)
res2 = float_or_int(input_2)
res3 = float_or_int(input_3)
print(res1, type(res1)) # 10.3 <class 'float'>
print(res2, type(res2)) # 10.0 <class 'float'>
print(res3, type(res3)) # 10 <class 'int'>
I don't know how your inputs stored in the file/another list you are reading, but you get the idea how to parse a single input.
You could use this:
proper_points = []
for x,y in points:
float_x = float(x)
int_y = int(y)
coords = proper_points.append((x,y))
For your calculations, you could use the proper_points list instead of points
Man, do not reinvent the wheel. If you need to import data from file you can use numpy for example and the function loadtxt. https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html
I do not understand how your file is made and the format of the coordinate. In the bunch of code points.append(Point(-1.0, 3)) the first number is a float the second one is an integer. Which format do you want?
For example in file.dat you have the x,y positions:
1 1
2 3
4 5
where the first column is the x position and the second one represents y. Then you can use this code
import numpy as np
points = np.loadtxt('file.dat', dtype = 'int')
in points you have all the positions inside the file and you can just use slicing to access them.

How do I get the least significant bits of many bytes?

I'm trying to get lsb from the line of an image,I managed to get here:
from PIL import Image
import sys
challengeImg = Image.open('image.png')
pixels = challengeImg.load()
for x in range(2944):
red = (pixels[x,310][0])
bred = format(red,"b")
#print(green)
#print(bred)
green = (pixels[x,310][1])
bgreen = format(green,"b")
#print(bgreen)
#print(green)
Well, until then I'm fine but now my problem, I managed to create the following code:
num = 10100001
n = 0
lsb = num >> n &1
print(lsb)
It works, but only with one byte, I suppose that with for I can achieve something but I am very beginner and I have not managed to make it work, how I can do to extract the lsb from each byte in the line of pixels of the red channel (or green, I guess it's the same procedure)?
It occurs to me that I could use a dictionary to group the bits in bytes (1: 10011001, 2: 01100110 ...) and then use the loop to apply the lsb code in each byte, anyway I do not know how I can do this and i dont think it's the best way (maybe it's not even valid).
I have a .png image of 2944x1912 that contains information hidden in the least significant bits, the first code that I put is the script that I am developing, and so far what it does is get the information of the pixels of the red channel in the line 310 and transform them into binary.
The second code is the code to get the lsb of a byte which I need to implement in the first code, so the second code should somehow group all the bits in 8 and select the last one for I save in a variable, resulting in (2944/8 = 368 = 368 bytes.)
The solution that came to me might not be the most optimal. I'll look for a better solution if it does not suffice, but in the meanwhile:
num = 10100001
num_string = str(num)
lsb_string = num_string[len(num_string)-1]
lsb = int(lsb_string)
print(lsb)
# output: 1
It works, thats the code;
from PIL import Image
import sys
challengeImg = Image.open('challenge.png')
pixels = challengeImg.load()
for x in range(2944):
red = (pixels[x,310][0])
bred = format(red,"b")
#print(green)
#print(bred)
green = (pixels[x,310][1])
bgreen = format(green,"b")
#print(bgreen)
#print(green)
rnum = format(red,"b")
rnum_string = str(rnum)
rlsb_string = rnum_string[len(rnum_string)-1]
rlsb = int(rlsb_string)
print(rlsb, end="")
Thanks!

Python - Efficient way to flip bytes in a file?

I've got a folder full of very large files that need to be byte flipped by a power of 4. So essentially, I need to read the files as a binary, adjust the sequence of bits, and then write a new binary file with the bits adjusted.
In essence, what I'm trying to do is read a hex string hexString that looks like this:
"00112233AABBCCDD"
And write a file that looks like this:
"33221100DDCCBBAA"
(i.e. every two characters is a byte, and I need to flip the bytes by a power of 4)
I am very new to python and coding in general, and the way I am currently accomplishing this task is extremely inefficient. My code currently looks like this:
import binascii
with open(myFile, 'rb') as f:
content = f.read()
hexString = str(binascii.hexlify(content))
flippedBytes = ""
inc = 0
while inc < len(hexString):
flippedBytes += file[inc + 6:inc + 8]
flippedBytes += file[inc + 4:inc + 6]
flippedBytes += file[inc + 2:inc + 4]
flippedBytes += file[inc:inc + 2]
inc += 8
..... write the flippedBytes to file, etc
The code I pasted above accurately accomplishes what I need (note, my actual code has a few extra lines of: "hexString.replace()" to remove unnecessary hex characters - but I've left those out to make the above easier to read). My ultimate problem is that it takes EXTREMELY long to run my code with larger files. Some of my files I need to flip are almost 2gb in size, and the code was going to take almost half a day to complete one single file. I've got dozens of files I need to run this on, so that timeframe simply isn't practical.
Is there a more efficient way to flip the HEX values in a file by a power of 4?
.... for what it's worth, there is a tool called WinHEX that can do this manually, and only takes a minute max to flip the whole file.... I was just hoping to automate this with python so we didn't have to manually use WinHEX each time
You want to convert your 4-byte integers from little-endian to big-endian, or vice-versa. You can use the struct module for that:
import struct
with open(myfile, 'rb') as infile, open(myoutput, 'wb') as of:
while True:
d = infile.read(4)
if not d:
break
le = struct.unpack('<I', d)
be = struct.pack('>I', *le)
of.write(be)
Here is a little struct awesomeness to get you started:
>>> import struct
>>> s = b'\x00\x11\x22\x33\xAA\xBB\xCC\xDD'
>>> a, b = struct.unpack('<II', s)
>>> s = struct.pack('>II', a, b)
>>> ''.join([format(x, '02x') for x in s])
'33221100ddccbbaa'
To do this at full speed for a large input, use struct.iter_unpack

Python - Merge many big numpy arrays with unknown shape, that would not fit in memory

Let's suppose I have a large number of NumPy arrays saved as files (np.save(), ".npy" files). All these have shape e.g. (n,20), where I don't know n without opening the file. n is different for every file.
I want to merge these into a single dataset, and then using a set of selection methods split it into three different numpy arrays written on the disk.
Usually I would loop over all files and use np.concatenate(). However the final array is likely not to fit in memory.
The other option I have is to use np.memmap(), which I am absolutely not so sure how it works. To my understanding, I'd have to do something like that:
a = np.memmap('output.npy',dtype='float64',mode='w+',shape=(N,20))
for i,f in enumerate(myfiles):
a[i,:] = np.load(f)
a.flush()
# And then find a way to split "a" into three, does the following work?
part_one = a[ [0,2,10,42,58] , : ]
The problem is that I don't know N, the final number of rows. Therefore I would need to open each file, read number of rows, close the file, sum all the number of rows before declaring the memmap. Which is highly inefficient, and there must be a better method.
Do you have any suggestion on this problem? Am I doing something wrong?
The .npy file specification defines the header for npy files. I couldn't find an already-baked way to read it, but the format is easy and you can pull the information out yourself. The file information is encoded in a python dict including a shape tuple. This is a short read of the top of the file and will be much faster than reading in the data.
import struct
import ast
# structs to decode .npy file header consisting of a "magic"
# string verifying the file type, major and minor version numbers,
# header length, and literal string representation of a python dict
# holding file's type and shape.
npy_magic = b"\x93NUMPY"
npy_v1_header = struct.Struct(
"<" # little-endian encoding
"6s" # 6 byte magic string
"B" # 1 byte major number
"B" # 1 byte minor number
"H" # 2 byte header length
# ... header string follows
)
npy_v2_header = struct.Struct(
"<" # little-endian encoding
"6s" # 6 byte magic string
"B" # 1 byte major number
"B" # 1 byte minor number
"L" # 4 byte header length
# ... header string follows
)
def read_npy_file_header(filename):
with open(filename, 'rb') as fp:
buf = fp.read(npy_v1_header.size)
magic, major, minor, hdr_size = npy_v1_header.unpack(buf)
if magic != npy_magic:
raise IOError("Not an npy file")
if major not in (0,1):
raise IOError("Unknown npy file version")
if major == 2:
fp.seek(0)
buf = fp.read(npy_v2_header.size)
magic, major, minor, hdr_size = npy_v2_header.unpack(buf)
return ast.literal_eval(fp.read(hdr_size).decode('ascii'))
# test
from glob import glob
for fn in glob('*.npy'):
print(fn, read_npy_file_header(fn))

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)
array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.
This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'
The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)
import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

Categories