Music21: remove percussion based on channel number - python

I have some midi files [sample] from which I'd like to remove percussion.
Here's what I've been using to read midi files and then save midi back to disk. The resulting sound is great:
path = 'lld.midi'
score = music21.converter.parse(path,
forceSource=False,
quantizePost=False,
).stripTies(inPlace=True)
score.write('midi', 'score.midi')
Since percussion is stored on channel 10 in midi, I thought I could strip the percussion with something like:
m = music21.midi.MidiFile()
m.open(path)
m.read()
tracks = []
for track in m.tracks:
keep = True
for event in track.events:
if event.channel == 10:
keep = False
if keep:
tracks.append(track)
s = music21.midi.translate.midiTracksToStreams(tracks, quantizePost=False)
s.write('midi', 'no-percussion.midi')
This does strip the percussion, but it seems to mess up the note timing as well:
What am I missing? If others can offer advice as to how I can correct the timings of the MidiFile approach, I'd be very grateful!

Lord have mercy I needed to pass forceSource=False into the midiTracksToStreams call as well:
m = music21.midi.MidiFile()
m.open(path)
m.read()
tracks = [t for t in m.tracks if not any([e.channel == 10 for e in t.events])]
score = music21.stream.Score()
music21.midi.translate.midiTracksToStreams(tracks,
inputM21=score,
forceSource=False,
quantizePost=False,
ticksPerQuarter=m.ticksPerQuarterNote,
quarterLengthDivisors=(4,3),
)
score.write('midi', fp='out.midi')

Related

Improving speed while iterating over ~400k XML files

This is more of a theoretical question to understand objects, garbage collection and performance of Python better.
Lets say i have a ton of XML files and want to iterate over each one, get all the tags, store them in a dict, increase counters for each tag etc. When i do this, the first, lets say 15k iterations, process really quick but afterwards the script slows down significantly, while the memory usage, CPU load etc. are fine. Why is that? Do i write hidden objects each iteration which are not cleaned up, can i do something to improve it? I tried to use regex instead of ElementTree but it wasnt worth the effort since i only want to extract first level tags and it would make it more complex.
Unfortunately i cannot give a reproducible example without providing the XML files, however this is my code:
import os
import datetime
import xml.etree.ElementTree as ElementTree
start_time = datetime.datetime.now()
original_implemented_tags = os.path.abspath("/path/to/file")
required_tags = {}
optional_tags = {}
new_tags = {}
# read original tags
for _ in open(original_implemented_tags, "r"):
if "#XmlElement(name =" in _:
_xml_attr = _.split('"')[1]
if "required = true" in _:
required_tags[_xml_attr] = 1 # i set this to 1 so i can use if dict.get(_xml_attr) (0 returns False)
else:
optional_tags[_xml_attr] = 1
# read all XML files from nested folder containing XML dumps and other files
clinical_trial_root_dir = os.path.abspath("/path/to/dump/folder")
xml_files = []
for root, dirs, files in os.walk(clinical_trial_root_dir):
xml_files.extend([os.path.join(root, _) for _ in files if os.path.splitext(_)[-1] == '.xml'])
# function for parsing a file and extract unique tags
def read_via_etree(file):
_root = ElementTree.parse(file).getroot()
_main_tags = list(set([_.tag for _ in _root.findall("./")])) # some tags occur twice
for _attr in _main_tags:
# if tag doesnt exist in original document, increase counts in new_tags
if _attr not in required_tags.keys() and _attr not in optional_tags.keys():
if _attr not in new_tags.keys():
new_tags[_attr] = 1
else:
new_tags[_attr] += 1
# otherwise, increase counts in either one of required_tags or optional_tags
if required_tags.get(_attr):
required_tags[_attr] += 1
if optional_tags.get(_attr):
optional_tags[_attr] += 1
# actual parsing with indicator
for idx, xml in enumerate(xml_files):
if idx % 1000 == 0:
print(f"Analyzed {idx} files")
read_via_etree(xml)
# undoing the initial 1
for k in required_tags.keys():
required_tags[k] -= 1
for k in optional_tags.keys():
optional_tags[k] -= 1
print(f"Done parsing {len(xml_files)} documents in {datetime.datetime.now() - start_time}")
Example of one XML file:
<parent_element>
<tag_i_need>
<tag_i_dont_need>Some text i dont need</tag_i_dont_need>
</tag_i_need>
<another_tag_i_need>Some text i also dont need</another_tag_i_need>
</parent_element>
After the helpful comments i added a timestamp to my loop indicating how much time is passed since the last 1k documents and flushed the sys.stdout:
import sys
loop_timer = datetime.datetime.now()
for idx, xml in enumerate(xml_files):
if idx % 1000 == 0:
print(f"Analyzed {idx} files in {datetime.datetime.now() - loop_timer}")
sys.stdout.flush()
loop_timer = datetime.datetime.now()
read_via_etree(xml)
I think it makes sense now since the XML files vary in size, and due the fact that the standard output stream is buffered. Thanks to Albert Winestein

Music21: get track index of a note

I have a multi-track midi file that I'm reading with music21:
import music21
f = music21.midi.MidiFile()
f.open('1079-02.mid')
f.read()
stream = music21.midi.translate.midiFileToStream(f).flat
note_filter = music21.stream.filters.ClassFilter('Note')
for n in stream.recurse().addFilter(note_filter):
offset = n.offset # offset from song start in beats
note = n.pitch # letter of the note, e.g. C4, F5
midi_note = n.pitch.midi # midi number of the pitch, e.g. 60, 72
duration = n.duration # duration of the note in beats
instrument = n.activeSite.getInstrument() # instrument voice
I'd like to figure out which track each note in this stream belongs to. E.g. when I open the file in GarageBand, the notes are organized into tracks:
In mido, each MidiFile has a tracks attribute that contains one list of notes for each track.
Is there a way to get the same with music21? Any help would be appreciated!
The music tracks are parsed into separate stream.Part objects, so you can just walk through the parts of the stream.Score that you produced if you avoid flattening it (here, I've just produced a stream with converter.parse():
s = converter.parse('1079-02.mid')
for part in s.parts:
for note in part.recurse().notes:
print("I WAS IN PART ", part)
or look up the containing part:
s = converter.parse('1079-02.mid')
for note in s.recurse().notes:
part = note.sites.getObjByClass('Part')
print("I WAS IN PART ", part)
I doubt you really need to flatten anything. Good luck!

Finding image similarities in a folder of thousands

I've cobbled together/wrote some code (Thanks stackoverflow users!) that checks for similarities in images using imagehash, but now I am having issues checking thousands of images (roughly 16,000). Is there anything that I could improve with the code (or a different route entirely) that can more accurately find matches and/or decrease time required? Thanks!
I first changed my list that is created to an itertools combination, so it only compares unique combinations of images.
new_loc = os.chdir(r'''myimagelocation''')
dirloc = os.listdir(r'''myimagelocation''')
duplicates = []
dup = []
for f1, f2 in itertools.combinations(dirloc,2):
#Honestly not sure which hash method to use, so I went with dhash.
dhash1 = imagehash.dhash(Image.open(f1))
dhash2 = imagehash.dhash(Image.open(f2))
hashdif = dhash1 - dhash2
if hashdif < 5: #May change the 5 to find more accurate matches
print("images are similar due to dhash", "image1", f1, "image2", f2)
duplicates.append(f1)
dup.append(f2)
#Setting up a CSV file with the similar images to review before deleting
with open("duplicates.csv", "w") as myfile:
wr = csv.writer(myfile)
wr.writerows(zip(duplicates, dup))
Currently, this code may take days to process the number of images I have in the folder. I'm hoping to reduce this down to hours if possible.
Try this, instead of hashing each image at comparison (127,992,000 hashes), you hash ahead of time and compare the hashes since those are not going to change (16,000 hashes).
new_loc = os.chdir(r'''myimagelocation''')
dirloc = os.listdir(r'''myimagelocation''')
duplicates = []
dup = []
hashes = []
for file in dirloc:
hashes.append((file, imagehash.dhash(Image.open(file))))
for pair1, pair2 in itertools.combinations(hashes,2):
f1, dhash1 = pair1
f2, dhash2 = pair2
#Honestly not sure which hash method to use, so I went with dhash.
hashdif = dhash1 - dhash2
if hashdif < 5: #May change the 5 to find more accurate matches
print("images are similar due to dhash", "image1", f1, "image2", f2)
duplicates.append(f1)
dup.append(f2)
#Setting up a CSV file with the similar images to review before deleting
with open("duplicates.csv", "w") as myfile: # also move this out of the loop so you arent rewriting the file every time
wr = csv.writer(myfile)
wr.writerows(zip(duplicates, dup))

How to Extract Individual Chords, Rests, and Notes from a midi file?

I am making a program that should be able to extract the notes, rests, and chords from a certain midi file and write the respective pitch (in midi tone numbers - they go from 0-127) of the notes and chords to a csv file for later use.
For this project, I am using the Python Library "Music21".
from music21 import *
import pandas as pd
#SETUP
path = r"Pirates_TheCarib_midi\1225766-Pirates_of_The_Caribbean_Medley.mid"
#create a function for taking parsing and extracting the notes
def extract_notes(path):
stm = converter.parse(path)
treble = stm[0] #access the first part (if there is only one part)
bass = stm[1]
#note extraction
notes_treble = []
notes_bass = []
for thisNote in treble.getElementsByClass("Note"):
indiv_note = [thisNote.name, thisNote.pitch.midi, thisNote.offset]
notes_treble.append(indiv_note) # print's the note and the note's
offset
for thisNote in bass.getElementsByClass("Note"):
indiv_note = [thisNote.name, thisNote.pitch.midi, thisNote.offset]
notes_bass.append(indiv_note) #add the notes to the bass
return notes_treble, notes_bass
#write to csv
def to_csv(notes_array):
df = pd.DataFrame(notes_array, index=None, columns=None)
df.to_csv("attempt1_v1.csv")
#using the functions
notes_array = extract_notes(path)
#to_csv(notes_array)
#DEBUGGING
stm = converter.parse(path)
print(stm.parts)
Here is the link to the score I am using as a test.
https://musescore.com/user/1699036/scores/1225766
When I run the extract_notes function, it returns two empty arrays and the line:
print(stm.parts)
it returns
<music21.stream.iterator.StreamIterator for Score:0x1b25dead550 #:0>
I am confused as to why it does this. The piece should have two parts, treble and bass. How can I get each note, chord and rest into an array so I can put it in a csv file?
Here is small snippet how I did it. I needed to get all notes, chords and rests for specific instrument. So at first I iterated through part and found specific instrument and afterwards check what kind of type note it is and append it.
you can call this method like
notes = get_notes_chords_rests(keyboard_instruments, "Pirates_of_The_Caribbean.mid")
where keyboard_instruments is list of instruments.
keyboard_nstrument = ["KeyboardInstrument", "Piano", "Harpsichord", "Clavichord", "Celesta", ]
def get_notes_chords_rests(instrument_type, path):
try:
midi = converter.parse(path)
parts = instrument.partitionByInstrument(midi)
note_list = []
for music_instrument in range(len(parts)):
if parts.parts[music_instrument].id in instrument_type:
for element_by_offset in stream.iterator.OffsetIterator(parts[music_instrument]):
for entry in element_by_offset:
if isinstance(entry, note.Note):
note_list.append(str(entry.pitch))
elif isinstance(entry, chord.Chord):
note_list.append('.'.join(str(n) for n in entry.normalOrder))
elif isinstance(entry, note.Rest):
note_list.append('Rest')
return note_list
except Exception as e:
print("failed on ", path)
pass
P.S. It is important to use try block because a lot of midi files on the web are corrupted.

python function not recurving properly (adding nodes to graph)

I'm having a rare honest-to-goodness computer science problem (as opposed to the usual how-do-I-make-this-language-I-don't-write-often-enough-do-what-I-want problem), and really feeling my lack of a CS degree for a change.
This is a bit messy, because I'm using several dicts of lists, but the basic concept is this: a Twitter-scraping function that adds retweets of a given tweet to a graph, node-by-node, building outwards from the original author (with follower relationships as edges).
for t in RTs_list:
g = nx.DiGraph()
followers_list=collections.defaultdict(list)
level=collections.defaultdict(list)
hoppers=collections.defaultdict(list)
retweets = []
retweeters = []
try:
u = api.get_status(t)
original_tweet = u.retweeted_status.id_str
print original_tweet
ot = api.get_status(original_tweet)
node_adder(ot.user.id, 1)
# Can't paginate -- can only get about ~20 RTs max. Need to work on small data here.
retweets = api.retweets(original_tweet)
for r in retweets:
retweeters.append(r.user.id)
followers_list["0"] = api.followers_ids(ot.user.id)[0]
print len(retweets),"total retweets"
level["1"] = ot.user.id
g.node[ot.user.id]['crossover'] = 1
if g.node[ot.user.id]["followers_count"]<4000:
bum_node_adder(followers_list["0"],level["1"], 2)
for r in retweets:
rt_iterator(r,retweets,0,followers_list,hoppers,level)
except:
print ""
def rt_iterator(r,retweets,q,followers_list,hoppers,level):
q = q+1
if r.user.id in followers_list[str(q-1)]:
hoppers[str(q)].append(r.user.id)
node_adder(r.user.id,q+1)
g.add_edge(level[str(q)], r.user.id)
try:
followers_list[str(q)] = api.followers_ids(r.user.id)[0]
level[str(q+1)] = r.user.id
if g.node[r.user.id]["followers_count"]<4000:
bum_node_adder(followers_list[str(q)],level[str(q+1)],q+2)
crossover = pull_crossover(followers_list[str(q)],followers_list[str(q-1)])
if q<10:
for r in retweets:
rt_iterator(r,retweets,q,followers_list,hoppers,level)
except:
print ""
There's some other function calls in there, but they're not related to the problem. The main issue is how Q counts when going from a (e.g.) a 2-hop node to a 3-hop node. I need it to build out to the maximum depth (10) for every branch from the center, whereas right now I believe it's just building out to the maximum depth for the first branch it tries. Hope that makes sense. If not, typing it up here has helped me; I think I'm just missing a loop in there somewhere but it's tough for me to see.
Also, ignore that various dicts refer to Q+1 or Q-1, that's an artifact of how I implemented this before I refactored to make it recurve.
Thanks!
I'm not totally sure what you mean by "the center" but I think you want something like this:
def rt_iterator(depth, other-args):
# store whatever info you need from this point in the tree
if depth>= MAX_DEPTH:
return
# look at the nodes you want to expand from here
for each node, in the order you want them expanded:
rt_iterator(depth+1, other-args)
think I've fixed it... this way Q isn't incremented when it shouldn't be.
def rt_iterator(r,retweets,q,depth,followers_list,hoppers,level):
def node_iterator (r,retweets,q,depth,followers_list,hoppers,level):
for r in retweets:
if r.user.id in followers_list[str(q-1)]:
hoppers[str(q)].append(r.user.id)
node_adder(r.user.id,q+1)
g.add_edge(level[str(q)], r.user.id)
try:
level[str(q+1)] = r.user.id
if g.node[r.user.id]["followers_count"]<4000:
followers_list[str(q)] = api.followers_ids(r.user.id)[0]
bum_node_adder(followers_list[str(q)],level[str(q+1)],q+2)
crossover = pull_crossover(followers_list[str(q)],followers_list[str(q-1)])
if q<10:
node_iterator(r,retweets,q+1,depth,followers_list,hoppers,level)
except:
print ""
depth = depth+1
q = depth
if q<10:
rt_iterator(r,retweets,q,depth,followers_list,hoppers,level)

Categories