Python: how to insert gaps of silence to generated .wav file? - python

I'm writing a simple script that generates random "glitches" out of a source audio file. It reads a random chunk of a source audio file and writes that chunk to a new .wav file, followed by a gap of silence, followed by a different chunk, then silence, etc .
I tried reading about the .wav format to understand what "silence" is, but most of it is over my head. Silence simply seems to be a (hex?) value of 0, it seems. I made a pure silence file for analysis, and in notepad the silence was represented as spaces, whereas in sublime text it was a bunch of 0s. So my approach was to take the silence character (either a " " or a 0), multiply it by how many characters a frame is for the source audio (so it automatically corrects for different possible .wav attributes, mono stereo etc), and then multiply that by how many frames of silence one wants for the gap. Everything works fine except that whatever I'm writing as the silence is inevitably written at some value that, though consistent, does have amplitude and therefore shows up like this in the wave:
and in a hex editor the gap is always some repeating pattern like 30303030303030 or something. I'm obviously doing something wrong or misunderstanding the nature of .wav data, but I can't figure it out. Here is a basic stripped-down version of the code:
import sys
import wave
script, filename = sys.argv
sourceFile = wave.open(filename, 'r')
sampleParams = sourceFile.getparams()
def randChunk(source):
blahblah
# Returns random chunk of audio from sourceFile
numGlitch = int(raw_input('How many glitches do you want?: '))
silenceSpace = int(raw_input('How many frames of silence between glitches?: '))
singleglitchFile = filename[:-4] + '_glitch.wav'
outfile = wave.open(singleglitchFile, 'w')
# set the outfile params to whatever sourceFile params were
outfile.setparams(sampleParams)
# WHERE EVERYTHING GOES WRONG
silence = 0 # or " " or hex(0) or whatever the hell silence is supposed to be
frameLength = len(sourceFile.readframes(1))
emptyspace = (silence * frameLength) * silenceSpace
for n in range(numGlitch):
outfile.writeframes(randChunk(sourceFile))
outfile.writeframes(emptyspace)
outfile.close()
Figuring out a solution not only would get this script working but would help me figure out the next phase: how to get the average amplitude of sequential frames in the source and filter out any that dont meet a certain threshold (i.e. filter out chunks that are too quiet).

The character with the value 0 is chr(0) or '\x00'. 0 is an integer, so multiplying it gives you the result of integer multiplication. 0 * anything is 0 but '\x00' * 3 is '\x00\x00\x00'.

Related

How can I truncate an mp3 audio file by 30%?

I am trying to truncate an audio file by 30%, if the audio file was 4 minutes long, after truncating it, it should be around 72 seconds. I have written the code below to do it but it only returns a 0 byte file size. Please tell me where i went wrong?
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = len(in_file.read())
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(data)
out_file.write(in_file.read()[:ndata])
def newBytes(bytes):
newLength = (bytes/100) * 30
return int(newLength)
loadFile()
You are trying to read your file a second time which will result in no data, e.g. len(in_file.read(). Instead read the whole file into a variable and then calculate the length of that. The variable can then be used a second time.
def newBytes(bytes):
return (bytes * 70) / 100
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = in_file.read()
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(len(data))
out_file.write(data[:ndata])
Also it is better to multiply first and then divide to avoid having to work with floating point numbers.
You cannot reliably truncate an MP3 file by byte size and expect it to be equivalently truncated in audio time length.
MP3 frames can change bitrate. While your method will sort of work, it won't be all that accurate. Additionally, you'll undoubtedly break frames leaving glitches at the end of the file. You will also lose ID3v1 tags (if you still use them... better to use ID3v2 anyway).
Consider executing FFmpeg with -acodec copy instead. This will simply copy the bytes over while maintaining the integrity of the file, and ensuring a good clean cut where you want it to be.

How do i replace a specific value in a file in python

Im trying to replace the zero's with a value. So far this is my code, but what do i do next?
g = open("January.txt", "r+")
for i in range(3):
dat_month = g.readline()
Month: January
Item: Lawn
Total square metres purchased:
0
monthly value = 0
You could do that -
but that is not the usual approach, and certainly is not the correct approach for text files.
The correct way to do it is to write another file, with the information you want updated in place, and then rename the new file to the old one. That is the only sane way of doing this with text files, since the information size in bytes for the fields is variable.
As for the impresion that you are "writing 200 bytes to the disk" instead of a single byte, changing your value, don't let that fool you: at the Operating system level, all file access has to be done in blocks, which are usually a couple of kilobytes long (in special cases, and tunned filesystems it could be a couple hundred bytes). Anyway, you will never, in a user-space program, much less in a high level language like Python, trigger a diskwrite of less than a few hundred bytes.
Now, for the code:
import os
my_number = <number you want to place in the line you want to rewrite>
with open("January.txt", "r") as in_file, open("newfile.txt", "w") as out_file:
for line in in_file:
if line.strip() == "0":
out_file.write(str(my_number) + "\n")
else:
out_file.write(line)
os.unlink("January.txt")
os.rename("newfile.txt", "January.txt")
So - that is the general idea -
of course you should not write code with all values hardcoded in that way (i.e. the values to be checked and written fixed in the program code, as are the filenames).
As for the with statement - it is a special construct of the language wich is very appropriate to oppening files and manipulating then in a block, like in this case - but it is not needed.
Programing apart, the concept you have to keep in mind is this:
when you use an application that lets you edit a text file, a spreadsheet, an image, you, as user, may have the impression that after you are done and have saved your work, the updates are comitted to the same file. In the vast, vast majority of use cases, that is not what happens: the application uses internally a pattern like the one I presented above - a completly new file is written to disk and the old one is deleted, or renamed. The few exceptions could be simple database applications, which could replace fixed width fields inside the file itself on updates. Modern day databases certainly do not do that, resorting to appending the most recent, updated information, to the end of the file. PDF files are another kind that were not designed to be replaced entirely on each update, when being created: but also in that case, the updated information is written at the end of the file, even if the update is to take place in a page in the beginning of the rendered document.
dat_month = dat_month.replace("0", "45678")
To write to a file you do:
with open("Outfile.txt", "wt") as outfile:
And then
outfile.write(dat_month)
Try this:
import fileinput
import itertools
import sys
with fileinput.input('January.txt', inplace=True) as file:
beginning = tuple(itertools.islice(file, 3))
sys.stdout.writelines(beginning)
sys.stdout.write(next(file).replace('0', 'a value'))
sys.stdout.write(next(file).replace('0', 'a value'))
sys.stdout.writelines(file)

Python Disk Imaging

Trying to make a script for disk imaging (such as .dd format) in python. Originally started as a project to get another hex debugger and kinda got more interested in trying to get raw data from the drive. which turned into wanting to be able to image the drive first. Anyways, I've been looking around for about a week or so and found the best way get get information from the drive on smaller drives appears to be something like:
with file("/dev/sda") as f:
i=file("~/imagingtest.dd", "wb")
i.write(f.read(SIZE))
with size being the disk size. Problem is, which seems to be a well known issue, trying to use large disks shows up as (even in my case total size of 250059350016 bytes):
"OverflowError: Python int too large to convert to C long"
Is there a more appropriate way to get around this issue? As it works fine for a small flash drive, but trying to image a drive, fails.
I've seen mention of possibly just iterating by sector size (512) per the number of sectors (in my case 488397168) however would like to verify exactly how to do this in a way that would be functional.
Thanks in advance for any assistance, sorry for any ignorance you easily notice.
Yes, that's how you should do it. Though you could go higher than the sector size if you wanted.
with open("/dev/sda",'rb') as f:
with open("~/imagingtest.dd", "wb") as i:
while True:
if i.write(f.read(512)) == 0:
break
Read the data in blocks. When you reach the end of the device, .read(blocksize) will return the empty string.
You can use iter() with a sentinel to do this easily in a loop:
from functools import partial
blocksize = 12345
with open("/dev/sda", 'rb') as f:
for block in iter(partial(f.read, blocksize), ''):
# do something with the data block
You really want to open the device in binary mode, 'rb' if you want to make sure no line translations take place.
However, if you are trying to create copy into another file, you want to look at shutil.copyfile():
import shutil
shutil.copyfile('/dev/sda', 'destinationfile')
and it'll take care of the opening, reading and writing for you. If you want to have more control of the blocksize used for that, use shutil.copyfileobj(), open the file objects yourself and specify a blocksize:
import shutil
blocksize = 12345
with open("/dev/sda", 'rb') as f, open('destinationfile', 'wb') as dest:
shutil.copyfileobj(f, dest, blocksize)

Keep Track of Number of Bytes Read

I would like to implement a command line progress bar for one of my programs IN PYTHON which reads text from a file line by line.
I can implement the progress scale in one of two ways:
(number of lines / total lines) or
(number of bytes completed / bytes total)
I don't care which, but "number of lines" would seem to require me to loop through the entire document (which could be VERY large) just to get the value for "total lines".
This seems extremely inefficient. I was thinking outside the box and thought perhaps if I took the size of the file (easier to get?) and kept track of the number of bytes that have been read, it might make for a good progress bar metric.
I can use os.path.getsize(file) or os.stat(file).st_size to retrieve the size of the file, but I have not yet found a way to keep track of the number of bytes read by readline(). The files I am working with should be encoded in ASCII, or maybe even Unicode, so... should I just determine the encoding used and then record the number of characters read or use os.getsizeof() or some len() function for each line read?
I am sure there will be problems here. Any suggestions?
(P.S. - I don't think manually inputting the number of bytes to read at a time will work, because I need to work with each line individually; or else I will need to split it up afterwards by "\n"'s.)
bytesread = 0
while True:
line = fh.readline()
if line == '':
break
bytesread += len(line)
Or, a little shorter:
bytesread = 0
for line in fh:
bytesread += len(line)
Using os.path.getsize() (or os.stat) is an efficient way of determining the file size.

Reading only the end of huge text file [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Get last n lines of a file with Python, similar to tail
Read a file in reverse order using python
I have a file that's about 15GB in size, it's a log file that I'm supposed to analyze the output from. I already did a basic parsing of a similar but GREATLY smaller file, with just few line of logging. Parsing strings is not the issue. The issue is the huge file and the amount of redundant data it contains.
Basically I'm attempting to make a python script that I could say to; for example, give me 5000 last lines of the file. That's again basic handling the arguments and all that, nothing special there, I can do that.
But how do I define or tell the file reader to ONLY read the amount of lines I specified from the end of the file? I'm trying to skip the huuuuuuge amount of lines in the beginning of a file since I'm not interested in those and to be honest, reading about 15GB of lines from a txt file takes too long. Is there a way to err.. start reading from.. end of the file? Does that even make sense?
It all boils down to the issue of reading a 15GB file, line by line takes too long. So I want to skip the already redundant data (redundant to me at least) in the beginning and only read the amount of lines from the end of file I want to read.
Obvious answer is to manually just copy N amount of lines from the file to another file but is there a way to do this semi-auto-magically just to read the N amount of lines from the end of the file with python?
Farm this out to unix:
import os
os.popen('tail -n 1000 filepath').read()
use subprocess.Popen instead of os.popen if you need to be able to access stderr (and some other features)
You need to seek to the end of the file, then read some chunks in blocks from the end, counting lines, until you've found enough newlines to read your n lines.
Basically, you are re-implementing a simple form of tail.
Here's some lightly tested code that does just that:
import os, errno
def lastlines(hugefile, n, bsize=2048):
# get newlines type, open in universal mode to find it
with open(hugefile, 'rU') as hfile:
if not hfile.readline():
return # empty, no point
sep = hfile.newlines # After reading a line, python gives us this
assert isinstance(sep, str), 'multiple newline types found, aborting'
# find a suitable seek position in binary mode
with open(hugefile, 'rb') as hfile:
hfile.seek(0, os.SEEK_END)
linecount = 0
pos = 0
while linecount <= n + 1:
# read at least n lines + 1 more; we need to skip a partial line later on
try:
hfile.seek(-bsize, os.SEEK_CUR) # go backwards
linecount += hfile.read(bsize).count(sep) # count newlines
hfile.seek(-bsize, os.SEEK_CUR) # go back again
except IOError, e:
if e.errno == errno.EINVAL:
# Attempted to seek past the start, can't go further
bsize = hfile.tell()
hfile.seek(0, os.SEEK_SET)
pos = 0
linecount += hfile.read(bsize).count(sep)
break
raise # Some other I/O exception, re-raise
pos = hfile.tell()
# Re-open in text mode
with open(hugefile, 'r') as hfile:
hfile.seek(pos, os.SEEK_SET) # our file position from above
for line in hfile:
# We've located n lines *or more*, so skip if needed
if linecount > n:
linecount -= 1
continue
# The rest we yield
yield line
Even though I would prefer the 'tail' solution - if you know the max number of characters per line you can implement another possible solution by getting the size of the file, open a file handler and use the 'seek' method with some estimated number of characters you are looking for.
This final code should look somehing like this - just to explain why I also prefer the tail solution :) goodluck!
MAX_CHARS_PER_LINE = 80
size_of_file = os.path.getsize('15gbfile.txt')
file_handler = file.open('15gbfile.txt', "rb")
seek_index = size_of_file - (number_of_requested_lines * MAX_CHARS_PER_LINE)
file_handler.seek(seek_index)
buffer = file_handler.read()
you can improve this code by analyzing newlines of the buffer you read.
Good luck ( and you should use the tail solution ;-) i am quite sure you can get tail for every OS)
The preferred method at this point was just to use unix's tail for the job and modify the python to accept input through std input.
tail hugefile.txt -n1000 | python magic.py
It's nothing sexy but at least it takes care of the job. The big file is a too big of a burden to handle, I found out. At least for my python skills. So it was a lot easier just to add a pinch of nix magic to it to cut down the filesize. Tail was new one for me so. Learned something and figure out another way of using the terminal to my advantage again. Thank you everyone.

Categories