I am trying to truncate an audio file by 30%, if the audio file was 4 minutes long, after truncating it, it should be around 72 seconds. I have written the code below to do it but it only returns a 0 byte file size. Please tell me where i went wrong?
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = len(in_file.read())
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(data)
out_file.write(in_file.read()[:ndata])
def newBytes(bytes):
newLength = (bytes/100) * 30
return int(newLength)
loadFile()
You are trying to read your file a second time which will result in no data, e.g. len(in_file.read(). Instead read the whole file into a variable and then calculate the length of that. The variable can then be used a second time.
def newBytes(bytes):
return (bytes * 70) / 100
def loadFile():
with open('music.mp3', 'rb') as in_file:
data = in_file.read()
with open('output.mp3', 'wb') as out_file:
ndata = newBytes(len(data))
out_file.write(data[:ndata])
Also it is better to multiply first and then divide to avoid having to work with floating point numbers.
You cannot reliably truncate an MP3 file by byte size and expect it to be equivalently truncated in audio time length.
MP3 frames can change bitrate. While your method will sort of work, it won't be all that accurate. Additionally, you'll undoubtedly break frames leaving glitches at the end of the file. You will also lose ID3v1 tags (if you still use them... better to use ID3v2 anyway).
Consider executing FFmpeg with -acodec copy instead. This will simply copy the bytes over while maintaining the integrity of the file, and ensuring a good clean cut where you want it to be.
Related
Hi and happy holidays to everyone!
I have to cope with big csv files (around 5GB each) on a simple laptop, so I am learning to read files in chunks (I am a complete noob in this), using python 2.7 in particular. I found this very nice example
# chunked file reading
from __future__ import division
import os
def get_chunks(file_size):
chunk_start = 0
chunk_size = 0x20000 # 131072 bytes, default max ssl buffer size
while chunk_start + chunk_size < file_size:
yield(chunk_start, chunk_size)
chunk_start += chunk_size
final_chunk_size = file_size - chunk_start
yield(chunk_start, final_chunk_size)
def read_file_chunked(file_path):
with open(file_path) as file_:
file_size = os.path.getsize(file_path)
print('File size: {}'.format(file_size))
progress = 0
for chunk_start, chunk_size in get_chunks(file_size):
file_chunk = file_.read(chunk_size)
# do something with the chunk, encrypt it, write to another file...
progress += len(file_chunk)
print('{0} of {1} bytes read ({2}%)'.format(
progress, file_size, int(progress / file_size * 100))
)
if __name__ == '__main__':
read_file_chunked('some-file.gif')
(source: https://gist.github.com/richardasaurus/21d4b970a202d2fffa9c)
but something still is not very clear to me. For example, let's say that I write a piece of code and I want to test it on a small fraction of my dataset, just to check if it runs properly. How could I read, let's say, only the first 10% of my csv file and run my code on that chunk without having to store in the memory the rest of the dataset?
I appreciate any hint - even some reading or external reference is good, if related to chunking files with python. Thank you!
Let's consider the following CSV file:
If you open this CSV file with Notepad or any simple text editor you can see this:
CU-C2376;Airbus A380;50.00;259.00
J2-THZ;Boeing 737;233.00;213.00
SU-XBG;Embraer ERJ-195;356.00;189.00
TI-GGH;Boeing 737;39.00;277.00
HK-6754J;Airbus A380;92.00;93.00
6Y-VBU;Embraer ERJ-195;215.00;340.00
9N-ABU;Embraer ERJ-195;151.00;66.00
YV-HUI;Airbus A380;337.00;77.00
If you observe carefully each line corresponds to one row and each value is separated with a ";".
Let's say i want to read only the first three rows, then:
with open('data.csv') as f:
lines = list()
for i in range(3):
lines.append(f.readline())
#Do some stuff with the first three lines
This is a better way of reading chunk of file because let's the file is 10MB and if you read first 3MB the last byte you read may not represent anything.
Alternatively you can use libraries like panda..
I am trying to edit length of wav files using the wave module. However it seems that I can't get anywhere because i keep getting the same error that number of channels is not specified. Still, when i write something to see the number of channels i still get that error or when i try to set the number of channels as seen here:
def editLength(wavFile):
file = wave.open(wavFile, 'w')
file.setnchannels(file.getnchannels())
x = file.getnchannels()
print (x)
from https://docs.python.org/3.7/library/wave.html#wave.open
wave.open(file, mode=None)
If file is a string, open the file by that name, otherwise treat it as a file-like
object.
mode can be:
'rb' Read only mode.
'wb' Write only mode.
Note that it does not allow read/write WAV files.
You attempt to read and write from a WAV file, the file object has at the time of the first file.getnchannels() not specified the number of channels.
def editLength(wavFile):
with open(wavFile, "rb") as file:
x = file.getnchannels()
print(x)
if you want to edit the file you should first read from the original file and write to a temporary file. then copy the temporary file over the original file.
It's maybe not super obvious from the docs: https://docs.python.org/3/library/wave.html
The Wave_write object expects you to explicitly set all params for the object.
After a little trial and error I was able to read my wav file and write a specific duration to disk.
For example if I have a 44.1k sample rate wav file with 2 channels...
import wave
with wave.open("some_wavfile.wav", "rb") as handle:
params = handle.getparams()
# only read the first 10 seconds of audio
frames = handle.readframes(441000)
print(handle.tell())
print(params)
params = list(params)
params[3] = len(frames)
print(params)
with wave.open("output_wavfile.wav", "wb") as handle:
handle.setparams(params)
handle.writeframes(frames)
This should leave you with an stdout looking something like this.
441000
_wave_params(nchannels=2, sampwidth=2, framerate=44100, nframes=10348480, comptype='NONE', compname='not compressed')
[2, 2, 44100, 1764000, 'NONE', 'not compressed']
nframes here is 1764000 probably because nchannels=2 and sampwidth=2, so 1764000/4=441000 (I guess)
Oddly enough, setparams was able to accept a list instead of a tuple.
ffprobe shows exactly 10 seconds of audio for the output file and sounds perfect to me.
Somewhat new to Python and new to linux. I created a script that mines Twitter's streaming API. Script writes to a .csv file when things in the stream match my parameters.
I'd like to know if there's any way to stop my script once the file has reached 1 gig. I know cron can be used to time the script and everything, but I'm more concerned about the file size than the time it takes.
Thanks for your input and consideration.
In your case, you probably don't need os.stat and os.stat may give you a false size in some cases (namely buffers not flushing). Why not just use f.tell() to read the size with something like this
with open('out.txt', 'w', encoding='utf-8') as f:
csvfile = csv.writer(f)
maxsize = 1024 # max file size in bytes
for row in data():
csvfile.writerow(row)
if f.tell() > maxsize: # f.tell() gives byte offset, no need to worry about multiwide chars
break
Use python's os.stat() to get info on the file, then check the total number of bytes of the existing file (fileInfo.st_size) plus the size of the data you are about to write.
import os
fileInfo = os.stat('twitter_stream.csv')
fileSize = fileInfo.st_size
print fileSize
# Now get data from twitter
# determine number of bytes in data
# write data if file size + data bytes < 1GB
I'm writing a simple script that generates random "glitches" out of a source audio file. It reads a random chunk of a source audio file and writes that chunk to a new .wav file, followed by a gap of silence, followed by a different chunk, then silence, etc .
I tried reading about the .wav format to understand what "silence" is, but most of it is over my head. Silence simply seems to be a (hex?) value of 0, it seems. I made a pure silence file for analysis, and in notepad the silence was represented as spaces, whereas in sublime text it was a bunch of 0s. So my approach was to take the silence character (either a " " or a 0), multiply it by how many characters a frame is for the source audio (so it automatically corrects for different possible .wav attributes, mono stereo etc), and then multiply that by how many frames of silence one wants for the gap. Everything works fine except that whatever I'm writing as the silence is inevitably written at some value that, though consistent, does have amplitude and therefore shows up like this in the wave:
and in a hex editor the gap is always some repeating pattern like 30303030303030 or something. I'm obviously doing something wrong or misunderstanding the nature of .wav data, but I can't figure it out. Here is a basic stripped-down version of the code:
import sys
import wave
script, filename = sys.argv
sourceFile = wave.open(filename, 'r')
sampleParams = sourceFile.getparams()
def randChunk(source):
blahblah
# Returns random chunk of audio from sourceFile
numGlitch = int(raw_input('How many glitches do you want?: '))
silenceSpace = int(raw_input('How many frames of silence between glitches?: '))
singleglitchFile = filename[:-4] + '_glitch.wav'
outfile = wave.open(singleglitchFile, 'w')
# set the outfile params to whatever sourceFile params were
outfile.setparams(sampleParams)
# WHERE EVERYTHING GOES WRONG
silence = 0 # or " " or hex(0) or whatever the hell silence is supposed to be
frameLength = len(sourceFile.readframes(1))
emptyspace = (silence * frameLength) * silenceSpace
for n in range(numGlitch):
outfile.writeframes(randChunk(sourceFile))
outfile.writeframes(emptyspace)
outfile.close()
Figuring out a solution not only would get this script working but would help me figure out the next phase: how to get the average amplitude of sequential frames in the source and filter out any that dont meet a certain threshold (i.e. filter out chunks that are too quiet).
The character with the value 0 is chr(0) or '\x00'. 0 is an integer, so multiplying it gives you the result of integer multiplication. 0 * anything is 0 but '\x00' * 3 is '\x00\x00\x00'.
I need to compile a binary file in pieces with pieces arriving in random order (yes, its a P2P project)
def write(filename, offset, data)
file.open(filename, "ab")
file.seek(offset)
file.write(data)
file.close()
Say I have a 32KB write(f, o, d) at offset 1MB into file and then another 32KB write(f, o, d) at offset 0
I end up with a file 65KB in length (i.e. the gap consisting of 0s between 32KB - 1MB is truncated/disappears)
I am aware this may appear an incredibly stupid question, but I cannot seem to figure it out from the file.open(..) modes
Advice gratefully received.
*** UPDATE
My method to write P2P pieces ended up as follows (for those who may glean some value from it)
def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts):
file = open(filename,"r+b")
if not self.piecemap[ipdst].has_key(pieceindex):
little = struct.pack('<'+'B'*len(bytes), *bytes)
# Seek to offset based on piece index
file.seek(pieceindex * self.piecesize)
file.write(little)
file.flush()
self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))
# Remember we have this piece now in case duplicates arrive
self.piecemap[ipdst][pieceindex] = True
file.close()
Note: I also addressed some endian issues using struct.pack which plagued me for a while.
For anyone wondering, the project I am working on is to analyse BT messages captured directly off the wire.
>>> import os
>>> filename = 'tempfile'
>>> def write(filename,data,offset):
... try:
... f = open(filename,'r+b')
... except IOError:
... f = open(filename,'wb')
... f.seek(offset)
... f.write(data)
... f.close()
...
>>> write(filename,'1' * (1024*32),1024*1024)
>>> write(filename,'1' * (1024*32),0)
>>> os.path.getsize(filename)
1081344
You opened the file in append ("a") mode. All writes are going to the end of the file, irrespective of the calls to seek().
Try using 'r+b' rather than 'ab'.
It seems to me like there's not a lot of point in trying to assemble the file until all the pieces of it are there. Why not keep the pieces separate until all are present, then write them to the final file in order? That's what most P2P apps do, AFAIK.