Related
I have a multi-track midi file that I'm reading with music21:
import music21
f = music21.midi.MidiFile()
f.open('1079-02.mid')
f.read()
stream = music21.midi.translate.midiFileToStream(f).flat
note_filter = music21.stream.filters.ClassFilter('Note')
for n in stream.recurse().addFilter(note_filter):
offset = n.offset # offset from song start in beats
note = n.pitch # letter of the note, e.g. C4, F5
midi_note = n.pitch.midi # midi number of the pitch, e.g. 60, 72
duration = n.duration # duration of the note in beats
instrument = n.activeSite.getInstrument() # instrument voice
I'd like to figure out which track each note in this stream belongs to. E.g. when I open the file in GarageBand, the notes are organized into tracks:
In mido, each MidiFile has a tracks attribute that contains one list of notes for each track.
Is there a way to get the same with music21? Any help would be appreciated!
The music tracks are parsed into separate stream.Part objects, so you can just walk through the parts of the stream.Score that you produced if you avoid flattening it (here, I've just produced a stream with converter.parse():
s = converter.parse('1079-02.mid')
for part in s.parts:
for note in part.recurse().notes:
print("I WAS IN PART ", part)
or look up the containing part:
s = converter.parse('1079-02.mid')
for note in s.recurse().notes:
part = note.sites.getObjByClass('Part')
print("I WAS IN PART ", part)
I doubt you really need to flatten anything. Good luck!
I have a load of 3 hour MP3 files, and every ~15 minutes a distinct 1 second sound effect is played, which signals the beginning of a new chapter.
Is it possible to identify each time this sound effect is played, so I can note the time offsets?
The sound effect is similar every time, but because it's been encoded in a lossy file format, there will be a small amount of variation.
The time offsets will be stored in the ID3 Chapter Frame MetaData.
Example Source, where the sound effect plays twice.
ffmpeg -ss 0.9 -i source.mp3 -t 0.95 sample1.mp3 -acodec copy -y
Sample 1 (Spectrogram)
ffmpeg -ss 4.5 -i source.mp3 -t 0.95 sample2.mp3 -acodec copy -y
Sample 2 (Spectrogram)
I'm very new to audio processing, but my initial thought was to extract a sample of the 1 second sound effect, then use librosa in python to extract a floating point time series for both files, round the floating point numbers, and try to get a match.
import numpy
import librosa
print("Load files")
source_series, source_rate = librosa.load('source.mp3') # 3 hour file
sample_series, sample_rate = librosa.load('sample.mp3') # 1 second file
print("Round series")
source_series = numpy.around(source_series, decimals=5);
sample_series = numpy.around(sample_series, decimals=5);
print("Process series")
source_start = 0
sample_matching = 0
sample_length = len(sample_series)
for source_id, source_sample in enumerate(source_series):
if source_sample == sample_series[sample_matching]:
sample_matching += 1
if sample_matching >= sample_length:
print(float(source_start) / source_rate)
sample_matching = 0
elif sample_matching == 1:
source_start = source_id;
else:
sample_matching = 0
This does not work with the MP3 files above, but did with an MP4 version - where it was able to find the sample I extracted, but it was only that one sample (not all 12).
I should also note this script takes just over 1 minute to process the 3 hour file (which includes 237,426,624 samples). So I can imagine that some kind of averaging on every loop would cause this to take considerably longer.
Trying to directly match waveforms samples in the time domain is not a good idea. The mp3 signal will preserve the perceptual properties but it is quite likely the phases of the frequency components will be shifted so the sample values will not match.
You could try trying to match the volume envelopes of your effect and your sample.
This is less likely to be affected by the mp3 process.
First, normalise your sample so the embedded effects are the same level as your reference effect. Constructing new waveforms from the effect and the sample by using the average of the peak values over time frames that are just short enough to capture the relevant features. Better still use overlapping frames. Then use cross-correlation in the time domain.
If this does not work then you could analyze each frame using an FFT this gives you a feature vector for each frame. You then try to find matches of the sequence of features in your effect with the sample. Similar to https://stackoverflow.com/users/1967571/jonnor suggestion. MFCC is used in speech recognition but since you are not detecting speech FFT is probably OK.
I am assuming the effect playing by itself (no background noise) and it is added to the recording electronically (as opposed to being recorded via a microphone). If this is not the case the problem becomes more difficult.
This is an Audio Event Detection problem. If the sound is always the same and there are no other sounds at the same time, it can probably be solved with a Template Matching approach. At least if there is no other sounds with other meanings that sound similar.
The simplest kind of template matching is to compute the cross-correlation between your input signal and the template.
Cut out an example of the sound to detect (using Audacity). Take as much as possible, but avoid the start and end. Store this as .wav file
Load the .wav template using librosa.load()
Chop up the input file into a series of overlapping frames. Length should be same as your template. Can be done with librosa.util.frame
Iterate over the frames, and compute cross-correlation between frame and template using numpy.correlate.
High values of cross-correlation indicate a good match. A threshold can be applied in order to decide what is an event or not. And the frame number can be used to calculate the time of the event.
You should probably prepare some shorter test files which have both some examples of the sound to detect as well as other typical sounds.
If the volume of the recordings is inconsistent you'll want to normalize that before running detection.
If cross-correlation in the time-domain does not work, you can compute the melspectrogram or MFCC features and cross-correlate that. If this does not yield OK results either, a machine learning model can be trained using supervised learning, but this requires labeling a bunch of data as event/not-event.
To follow up on the answers by #jonnor and #paul-john-leonard, they are both correct, by using frames (FFT) I was able to do Audio Event Detection.
I've written up the full source code at:
https://github.com/craigfrancis/audio-detect
Some notes though:
To create the templates, I used ffmpeg:
ffmpeg -ss 13.15 -i source.mp4 -t 0.8 -acodec copy -y templates/01.mp4;
I decided to use librosa.core.stft, but I needed to make my own implementation of this stft function for the 3 hour file I'm analysing, as it's far too big to keep in memory.
When using stft I tried using a hop_length of 64 at first, rather than the default (512), as I assumed that would give me more data to work with... the theory might be true, but 64 was far too detailed, and caused it to fail most of the time.
I still have no idea how to get cross-correlation between frame and template to work (via numpy.correlate)... instead I took the results per frame (the 1025 buckets, not 1024, which I believe relate to the Hz frequencies found) and did a very simple average difference check, then ensured that average was above a certain value (my test case worked at 0.15, the main files I'm using this on required 0.55 - presumably because the main files had been compressed quite a bit more):
hz_score = abs(source[0:1025,x] - template[2][0:1025,y])
hz_score = sum(hz_score)/float(len(hz_score))
When checking these scores, it's really useful to show them on a graph. I often used something like the following:
import matplotlib.pyplot as plt
plt.figure(figsize=(30, 5))
plt.axhline(y=hz_match_required_start, color='y')
while x < source_length:
debug.append(hz_score)
if x == mark_frame:
plt.axvline(x=len(debug), ymin=0.1, ymax=1, color='r')
plt.plot(debug)
plt.show()
When you create the template, you need to trim off any leading silence (to avoid bad matching), and an extra ~5 frames (it seems that the compression / re-encoding process alters this)... likewise, remove the last 2 frames (I think the frames include a bit of data from their surroundings, where the last one in particular can be a bit off).
When you start finding a match, you might find it's ok for the first few frames, then it fails... you will probably need to try again a frame or two later. I found it easier having a process that supported multiple templates (slight variations on the sound), and would check their first testable (e.g. 6th) frame and if that matched, put them in a list of potential matches. Then, as it progressed on to the next frames of the source, it could compare it to the next frames of the template, until all frames in the template had been matched (or failed).
This might not be an answer, it's just where I got to before I start researching the answers by #jonnor and #paul-john-leonard.
I was looking at the Spectrograms you can get by using librosa stft and amplitude_to_db, and thinking that if I take the data that goes in to the graphs, with a bit of rounding, I could potentially find the 1 sound effect being played:
https://librosa.github.io/librosa/generated/librosa.display.specshow.html
The code I've written below kind of works; although it:
Does return quite a few false positives, which might be fixed by tweaking the parameters of what is considered a match.
I would need to replace the librosa functions with something that can parse, round, and do the match checks in one pass; as a 3 hour audio file causes python to run out of memory on a computer with 16GB of RAM after ~30 minutes before it even got to the rounding bit.
import sys
import numpy
import librosa
#--------------------------------------------------
if len(sys.argv) == 3:
source_path = sys.argv[1]
sample_path = sys.argv[2]
else:
print('Missing source and sample files as arguments');
sys.exit()
#--------------------------------------------------
print('Load files')
source_series, source_rate = librosa.load(source_path) # The 3 hour file
sample_series, sample_rate = librosa.load(sample_path) # The 1 second file
source_time_total = float(len(source_series) / source_rate);
#--------------------------------------------------
print('Parse Data')
source_data_raw = librosa.amplitude_to_db(abs(librosa.stft(source_series, hop_length=64)))
sample_data_raw = librosa.amplitude_to_db(abs(librosa.stft(sample_series, hop_length=64)))
sample_height = sample_data_raw.shape[0]
#--------------------------------------------------
print('Round Data') # Also switches X and Y indexes, so X becomes time.
def round_data(raw, height):
length = raw.shape[1]
data = [];
range_length = range(1, (length - 1))
range_height = range(1, (height - 1))
for x in range_length:
x_data = []
for y in range_height:
# neighbours = []
# for a in [(x - 1), x, (x + 1)]:
# for b in [(y - 1), y, (y + 1)]:
# neighbours.append(raw[b][a])
#
# neighbours = (sum(neighbours) / len(neighbours));
#
# x_data.append(round(((raw[y][x] + raw[y][x] + neighbours) / 3), 2))
x_data.append(round(raw[y][x], 2))
data.append(x_data)
return data
source_data = round_data(source_data_raw, sample_height)
sample_data = round_data(sample_data_raw, sample_height)
#--------------------------------------------------
sample_data = sample_data[50:268] # Temp: Crop the sample_data (318 to 218)
#--------------------------------------------------
source_length = len(source_data)
sample_length = len(sample_data)
sample_height -= 2;
source_timing = float(source_time_total / source_length);
#--------------------------------------------------
print('Process series')
hz_diff_match = 18 # For every comparison, how much of a difference is still considered a match - With the Source, using Sample 2, the maximum diff was 66.06, with an average of ~9.9
hz_match_required_switch = 30 # After matching "start" for X, drop to the lower "end" requirement
hz_match_required_start = 850 # Out of a maximum match value of 1023
hz_match_required_end = 650
hz_match_required = hz_match_required_start
source_start = 0
sample_matched = 0
x = 0;
while x < source_length:
hz_matched = 0
for y in range(0, sample_height):
diff = source_data[x][y] - sample_data[sample_matched][y];
if diff < 0:
diff = 0 - diff
if diff < hz_diff_match:
hz_matched += 1
# print(' {} Matches - {} # {}'.format(sample_matched, hz_matched, (x * source_timing)))
if hz_matched >= hz_match_required:
sample_matched += 1
if sample_matched >= sample_length:
print(' Found # {}'.format(source_start * source_timing))
sample_matched = 0 # Prep for next match
hz_match_required = hz_match_required_start
elif sample_matched == 1: # First match, record where we started
source_start = x;
if sample_matched > hz_match_required_switch:
hz_match_required = hz_match_required_end # Go to a weaker match requirement
elif sample_matched > 0:
# print(' Reset {} / {} # {}'.format(sample_matched, hz_matched, (source_start * source_timing)))
x = source_start # Matched something, so try again with x+1
sample_matched = 0 # Prep for next match
hz_match_required = hz_match_required_start
x += 1
#--------------------------------------------------
I am making a program that should be able to extract the notes, rests, and chords from a certain midi file and write the respective pitch (in midi tone numbers - they go from 0-127) of the notes and chords to a csv file for later use.
For this project, I am using the Python Library "Music21".
from music21 import *
import pandas as pd
#SETUP
path = r"Pirates_TheCarib_midi\1225766-Pirates_of_The_Caribbean_Medley.mid"
#create a function for taking parsing and extracting the notes
def extract_notes(path):
stm = converter.parse(path)
treble = stm[0] #access the first part (if there is only one part)
bass = stm[1]
#note extraction
notes_treble = []
notes_bass = []
for thisNote in treble.getElementsByClass("Note"):
indiv_note = [thisNote.name, thisNote.pitch.midi, thisNote.offset]
notes_treble.append(indiv_note) # print's the note and the note's
offset
for thisNote in bass.getElementsByClass("Note"):
indiv_note = [thisNote.name, thisNote.pitch.midi, thisNote.offset]
notes_bass.append(indiv_note) #add the notes to the bass
return notes_treble, notes_bass
#write to csv
def to_csv(notes_array):
df = pd.DataFrame(notes_array, index=None, columns=None)
df.to_csv("attempt1_v1.csv")
#using the functions
notes_array = extract_notes(path)
#to_csv(notes_array)
#DEBUGGING
stm = converter.parse(path)
print(stm.parts)
Here is the link to the score I am using as a test.
https://musescore.com/user/1699036/scores/1225766
When I run the extract_notes function, it returns two empty arrays and the line:
print(stm.parts)
it returns
<music21.stream.iterator.StreamIterator for Score:0x1b25dead550 #:0>
I am confused as to why it does this. The piece should have two parts, treble and bass. How can I get each note, chord and rest into an array so I can put it in a csv file?
Here is small snippet how I did it. I needed to get all notes, chords and rests for specific instrument. So at first I iterated through part and found specific instrument and afterwards check what kind of type note it is and append it.
you can call this method like
notes = get_notes_chords_rests(keyboard_instruments, "Pirates_of_The_Caribbean.mid")
where keyboard_instruments is list of instruments.
keyboard_nstrument = ["KeyboardInstrument", "Piano", "Harpsichord", "Clavichord", "Celesta", ]
def get_notes_chords_rests(instrument_type, path):
try:
midi = converter.parse(path)
parts = instrument.partitionByInstrument(midi)
note_list = []
for music_instrument in range(len(parts)):
if parts.parts[music_instrument].id in instrument_type:
for element_by_offset in stream.iterator.OffsetIterator(parts[music_instrument]):
for entry in element_by_offset:
if isinstance(entry, note.Note):
note_list.append(str(entry.pitch))
elif isinstance(entry, chord.Chord):
note_list.append('.'.join(str(n) for n in entry.normalOrder))
elif isinstance(entry, note.Rest):
note_list.append('Rest')
return note_list
except Exception as e:
print("failed on ", path)
pass
P.S. It is important to use try block because a lot of midi files on the web are corrupted.
I used to decode AIS messages with theis package (Python) https://github.com/schwehr/noaadata/tree/master/ais until I started getting a new format of the messages.
As you may know, AIS messages come in two types mostly. one part (one message) or two parts (multi message). Message#5 is always comes in two parts. example:
!AIVDM,2,1,1,A,55?MbV02;H;s<HtKR20EHE:address#hidden#Dn2222222216L961O5Gf0NSQEp6ClRp8,0*1C
!AIVDM,2,2,1,A,88888888880,2*25
I used to decode this just fine using the following piece of code:
nmeamsg = fields.split(',')
if nmeamsg[0] != '!AIVDM':
return
total = eval(nmeamsg[1])
part = eval(nmeamsg[2])
aismsg = nmeamsg[5]
nmeastring = string.join(nmeamsg[0:-1],',')
bv = binary.ais6tobitvec(aismsg)
msgnum = int(bv[0:6])
--
elif (total>1):
# Multi Slot Messages: 5,6,8,12,14,17,19,20?,21,24,26
global multimsg
if total==2:
if msgnum==5:
if nmeastring.count('!AIVDM')==2 and len(nmeamsg)==13: # make sure there are two parts concatenated together
aismsg = nmeamsg[5]+nmeamsg[11]
bv = binary.ais6tobitvec(aismsg)
msg5 = ais_msg_5.decode(bv)
print "message5 :",msg5
return msg5
Now I'm getting a new format of the messages:
!SAVDM,2,1,7,A,55#0hd01sq`pQ3W?O81L5#E:1=0U8U#000000016000006H0004m8523k#Dp,0*2A,1410825672
!SAVDM,2,2,7,A,4hC`2U#C`40,2*76,1410825672,1410825673
Note. the number at the last index is the time in epoch format
I tried to adjust my code to decode this new format. I succeed in decoding messages with one part. My problem is multi message type.
nmeamsg = fields.split(',')
if nmeamsg[0] != '!AIVDM' and nmeamsg[0] != '!SAVDM':
return
total = eval(nmeamsg[1])
part = eval(nmeamsg[2])
aismsg = nmeamsg[5]
nmeastring = string.join(nmeamsg[0:-1],',')
dbtimestring = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(float(nmeamsg[7])))
bv = binary.ais6tobitvec(aismsg)
msgnum = int(bv[0:6])
Decoder can't bring the two lines as one. So decoding fails because message#5 should contain two strings not one. The error i get is in these lines:
if nmeastring.count('!SAVDM')==2 and len(nmeamsg)==13:
aismsg = nmeamsg[5]+nmeamsg[11]
Where len(nmeamsg) is always 8 (second line) and nmeastring.count('!SAVDM') is always 1
I hope I explained this clearly so someone can let me know what I'm missing here.
UPDATE
Okay I think I found the reason. I pass messages from file to script line by line:
for line in file:
i=i+1
try:
doais(line)
Where message#5 should be passed as two lines. Any idea on how can I accomplish that?
UPDATE
I did it by modifying the code a little bit:
for line in file:
i=i+1
try:
nmeamsg = line.split(',')
aismsg = nmeamsg[5]
bv = binary.ais6tobitvec(aismsg)
msgnum = int(bv[0:6])
print msgnum
if nmeamsg[0] != '!AIVDM' and nmeamsg[0] != '!SAVDM':
print "wrong format"
total = eval(nmeamsg[1])
if total == 1:
dbtimestring = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(float(nmeamsg[8])))
doais(line,msgnum,dbtimestring,aismsg)
if total == 2: #Multi-line messages
lines= line+file.next()
nmeamsg = lines.split(',')
dbtimestring = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(float(nmeamsg[15])))
aismsg = nmeamsg[5]+nmeamsg[12]
doais(lines,msgnum,dbtimestring,aismsg)
Be aware that noaadata is my old research code. libais is my production library thst is in use for NOAA's ERMA and WhaleAlert.
I usually make decoding a two pass process. First join multi-line messages. I refer to this as normalization (ais_normalize.py). You have several issues in this step. First the two component lines have different timestamps on the right of the second string. By the USCG old metadata standard, the last one matters. So my code will assume that these two lines are not related. Second, you don't have the required station id field.
Where are you getting the SA from in SAVDM? What device ("talker" in the NMEA vocab) is receiving these messages?
If you're in Ruby, I can recommend the NMEA and AIS decoder ruby gem that I wrote, available on github. It's based on the unofficial AIS spec at catb.org which is maintained by one of Kurt's colleagues.
It handles combining of multipart messages, reads from streams, and supports a large of NMEA and AIS messages. Decoding the 50 binary subtypes of AIS messages 6 and 8 is presently in development.
To handle the nonstandard lines you posted:
!SAVDM,2,1,7,A,55#0hd01sq`pQ3W?O81L5#E:1=0U8U#000000016000006H0004m8523k#Dp,0*2A,1410825672
!SAVDM,2,2,7,A,4hC`2U#C`40,2*76,1410825672,1410825673
It would be necessary to add a new parse rule that accepts fields after the checksum, but aside from that it should go smoothly. In other words, you'd copy the parser line here:
| BANG DATA CSUM { result = NMEAPlus::AISMessageFactory.create(val[0], val[1], val[2]) }
and have something like
| BANG DATA CSUM COMMA DATA { result = NMEAPlus::AISMessageFactory.create(val[0], val[1], val[2], val[4]) }
What do you do with those extra timestamp(s)? It almost looks like they've been appended by whatever software is doing the logging, rather than being part of the actual message.
I wanna code a gstramer pipeline in python to convert a webm video into avi one.
I made the pipeline to display the webmvideo, which works.
How to perform what I want?
I though that just adding a "x264" element to the video queue and "lame" to the audio one, was sufficient.
I noted that a mux is necessary and I aded that.
What I get is:
gst.element_link_many(self.queuev, self.video_decoder,colorspace,x264enc)
gst.element_link_many(self.queuea, self.audio_decoder, audioconv,lame)
gst.element_link_many(avimux,filesink)
where there's a specific function to use the audiodecoder and videodecoder which is:
def demuxer_callback(self, demuxer, pad):
if pad.get_property("template").name_template == "video_%02d":
qv_pad = self.queuev.get_pad("sink")
pad.link(qv_pad)
elif pad.get_property("template").name_template == "audio_%02d":
qa_pad = self.queuea.get_pad("sink")
pad.link(qa_pad)
I think I've to code something similar for avimux.
And I've done this:
def avimux_callback(self, avimux, pad1):
if pad1.get_property("template").name_template == "video_%02d":
qv_pad1 = self.queuev.get_pad("sink")
pad1.link(qv_pad1)
elif pad1.get_property("template").name_template == "audio_%02d":
qa_pad1 = self.queuea.get_pad("sink")
pad1.link(qa_pad1)
but I get an error about the filesource and the script doesn't works.
Suggestions??
Thanks
FrankBr