Reading 32 Bytes of file at a time - python

Working on reading and interpreting FAT12 directory entries. I am trying to determine how to read 32 bytes of the file at a time. So far I have the following:
f_name = sys.argv[1] #set file name as the argument to be passed in command line
with open(f_name, mode='rb') as file:
data = file.read()
struct.unpack(,data[:])
Most of the things I have seen say to use struct.unpack() I have looked at the documentation on this and I am having trouble understanding how to use it. Is there an easier way to read 32 bytes at a time until I have read a full 512 bytes?

file.read() takes the upper limit of bytes/characters to read and advance the read pointer by:
data = file.read(32)

Related

Reading a binary file from memory in chunks of 10 bytes with python

I have a very big .BIN file and I am loading it into the available RAM memory (128 GB) by using:
ice.Load_data_to_memory("global.bin", True)
(see: https://github.com/iceland2k14/secp256k1)
Now I need to read the content of the file in chunks of 10 bytes, and for that I am using:
with open('global.bin', 'rb') as bf:
while True:
data = bf.read(10)
if data = y:
do this!
This works good with the rest of the code, if the .BIN file is small, but not if the file is big. My suspection is, by writing the code this way I will open the .BIN file twice OR I won't get any result, because with open('global.bin', 'rb') as bf is not "synchronized" with ice.Load_data_to_memory("global.bin", True). Thus, I would like to find a way to directly read the chunks of 10 bytes from memory, without having to open the file with "with open('global.bin', 'rb') as bf"
I found a working approach here: LOAD FILE INTO MEMORY
This is working good with a small .BIN file containing 3 strings of 10 bytes each:
with open('0x4.bin', 'rb') as f:
# Size 0 will read the ENTIRE file into memory!
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only
# Proceed with your code here -- note the file is already in memory
# so "readine" here will be as fast as could be
data = m.read(10) # using read(10) instead of readline()
while data:
do something!
Now the point: When using a much bigger .BIN file, it will take much more time to load the whole file into the memory and the while data: part starts immediately to work, so I would need here a function delay, so that the script only starts to work AFTER the file is completely loaded into the memory...

Reconstructing and files uploaded in SQL Server with python

I am working with a SQL Server database table similar to this
USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext
sample data:
USER_ID: 1
FILE_NAME: (AttachedFiles:1)=file1.pdf
FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….
Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"
My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(content_str))
...getting a TypeError: expected bytes-like object, not str
Investigating further, I found this other post and proceeded like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
encoded = content_str.encode('ascii')
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(encoded))
...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.
I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!
The value of the FILE_CONTENT is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.
import base64
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(base64.b64decode(content_str))
The base64 sequence "H4sI" at the start of your content string translates to the bytes 0x1f, 0x8b, 0x08. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.
I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.
If your PDF reader does not accept the data as is, decompress it before saving it to file:
import gzip
# ...
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(gzip.decompress(base64.b64decode(content_str)))

wave.error: # channels not specified

I am trying to edit length of wav files using the wave module. However it seems that I can't get anywhere because i keep getting the same error that number of channels is not specified. Still, when i write something to see the number of channels i still get that error or when i try to set the number of channels as seen here:
def editLength(wavFile):
file = wave.open(wavFile, 'w')
file.setnchannels(file.getnchannels())
x = file.getnchannels()
print (x)
from https://docs.python.org/3.7/library/wave.html#wave.open
wave.open(file, mode=None)
If file is a string, open the file by that name, otherwise treat it as a file-like
object.
mode can be:
'rb' Read only mode.
'wb' Write only mode.
Note that it does not allow read/write WAV files.
You attempt to read and write from a WAV file, the file object has at the time of the first file.getnchannels() not specified the number of channels.
def editLength(wavFile):
with open(wavFile, "rb") as file:
x = file.getnchannels()
print(x)
if you want to edit the file you should first read from the original file and write to a temporary file. then copy the temporary file over the original file.
It's maybe not super obvious from the docs: https://docs.python.org/3/library/wave.html
The Wave_write object expects you to explicitly set all params for the object.
After a little trial and error I was able to read my wav file and write a specific duration to disk.
For example if I have a 44.1k sample rate wav file with 2 channels...
import wave
with wave.open("some_wavfile.wav", "rb") as handle:
params = handle.getparams()
# only read the first 10 seconds of audio
frames = handle.readframes(441000)
print(handle.tell())
print(params)
params = list(params)
params[3] = len(frames)
print(params)
with wave.open("output_wavfile.wav", "wb") as handle:
handle.setparams(params)
handle.writeframes(frames)
This should leave you with an stdout looking something like this.
441000
_wave_params(nchannels=2, sampwidth=2, framerate=44100, nframes=10348480, comptype='NONE', compname='not compressed')
[2, 2, 44100, 1764000, 'NONE', 'not compressed']
nframes here is 1764000 probably because nchannels=2 and sampwidth=2, so 1764000/4=441000 (I guess)
Oddly enough, setparams was able to accept a list instead of a tuple.
ffprobe shows exactly 10 seconds of audio for the output file and sounds perfect to me.

How can I truncate an mp3 audio file by 30%?

I am trying to truncate an audio file by 30%, if the audio file was 4 minutes long, after truncating it, it should be around 72 seconds. I have written the code below to do it but it only returns a 0 byte file size. Please tell me where i went wrong?
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = len(in_file.read())
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(data)
out_file.write(in_file.read()[:ndata])
def newBytes(bytes):
newLength = (bytes/100) * 30
return int(newLength)
loadFile()
You are trying to read your file a second time which will result in no data, e.g. len(in_file.read(). Instead read the whole file into a variable and then calculate the length of that. The variable can then be used a second time.
def newBytes(bytes):
return (bytes * 70) / 100
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = in_file.read()
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(len(data))
out_file.write(data[:ndata])
Also it is better to multiply first and then divide to avoid having to work with floating point numbers.
You cannot reliably truncate an MP3 file by byte size and expect it to be equivalently truncated in audio time length.
MP3 frames can change bitrate. While your method will sort of work, it won't be all that accurate. Additionally, you'll undoubtedly break frames leaving glitches at the end of the file. You will also lose ID3v1 tags (if you still use them... better to use ID3v2 anyway).
Consider executing FFmpeg with -acodec copy instead. This will simply copy the bytes over while maintaining the integrity of the file, and ensuring a good clean cut where you want it to be.

Stop Python Script from Writing to File after it reaches a certain size in linux

Somewhat new to Python and new to linux. I created a script that mines Twitter's streaming API. Script writes to a .csv file when things in the stream match my parameters.
I'd like to know if there's any way to stop my script once the file has reached 1 gig. I know cron can be used to time the script and everything, but I'm more concerned about the file size than the time it takes.
Thanks for your input and consideration.
In your case, you probably don't need os.stat and os.stat may give you a false size in some cases (namely buffers not flushing). Why not just use f.tell() to read the size with something like this
with open('out.txt', 'w', encoding='utf-8') as f:
csvfile = csv.writer(f)
maxsize = 1024 # max file size in bytes
for row in data():
csvfile.writerow(row)
if f.tell() > maxsize: # f.tell() gives byte offset, no need to worry about multiwide chars
break
Use python's os.stat() to get info on the file, then check the total number of bytes of the existing file (fileInfo.st_size) plus the size of the data you are about to write.
import os
fileInfo = os.stat('twitter_stream.csv')
fileSize = fileInfo.st_size
print fileSize
# Now get data from twitter
# determine number of bytes in data
# write data if file size + data bytes < 1GB

Categories