jesussaddle
Power User
I need to detect pitch fundamentals from music and use the pitch's (as accurate as possible) pitch and amplitude envelope info. I was discouraged recently to see a couple of Melodyne Audio to MIDI videos that seem to show that this problem, for polyphonic music, is still very far from being solved. The harmonics should all be visible as plotted on an FFT Graph, such as here:
https://www.seventhstring.com/xscribe/PianoRoll.png
But evidently in polyphonic material of a single recorded instrument with no reverb or delay, getting an idea where the notes begin, such as from an instrument with what could be presumed to be fairly obvious transients, seems still to be a challenge.
See 00:32
Mr. Russell uses the phrase "Easily and accurately", but then Mr. Russell fails to actually play the result of the polyphonic detection, instead he plays the audio recording as if its the result. Then a couple of viewers in the comment section point this out, and say that Melodyne basically does not yet do an accurate job. Missed notes, and even placement of notes at significantly inaccurate times - the latter being the most problematic for my needs. My experience with such programs, including WIDI Pro, agree with the viewer comments - this problem, for a simple piano track, isn't even mostly solved (much less a full band or a piece of music with fx processing - which I don't even care about for this post.)
I'm not knowledgeable by any means, but I have a theory on how to solve this - not in anything near a real time approach - but at least to solve it using significant software and hardware resources. Is there anyone out there that has an inkling of the calculation methods involved, that I could maybe bounce my idea off of?
https://www.seventhstring.com/xscribe/PianoRoll.png
But evidently in polyphonic material of a single recorded instrument with no reverb or delay, getting an idea where the notes begin, such as from an instrument with what could be presumed to be fairly obvious transients, seems still to be a challenge.
See 00:32
Mr. Russell uses the phrase "Easily and accurately", but then Mr. Russell fails to actually play the result of the polyphonic detection, instead he plays the audio recording as if its the result. Then a couple of viewers in the comment section point this out, and say that Melodyne basically does not yet do an accurate job. Missed notes, and even placement of notes at significantly inaccurate times - the latter being the most problematic for my needs. My experience with such programs, including WIDI Pro, agree with the viewer comments - this problem, for a simple piano track, isn't even mostly solved (much less a full band or a piece of music with fx processing - which I don't even care about for this post.)
I'm not knowledgeable by any means, but I have a theory on how to solve this - not in anything near a real time approach - but at least to solve it using significant software and hardware resources. Is there anyone out there that has an inkling of the calculation methods involved, that I could maybe bounce my idea off of?
Last edited:
(Sorry, my critique of Western music theory is that you can conceive of chords as stacked thirds - and the chord "root" as simply being the starting point for stacking (an obviously useful definition). But the term "root" in the German apparently meant "fundamental" - in other words not just the bottom. [I think the language where it meant "fundamental" was German but now I'm not sure - Wikipedia removed the paragraph describing this scenario, and my memory is not clear on it.] In terms of common use the majority of chord structures that flow into popular harmony contain an interval of a Perfect Fifth. In many cases the root of the P5 is the chord root.