Wish Reference IRs by waveform instead of location

It is but it's unnecessarily complex for this problem. We can simplify the problem because the input "text" is constant length. In fact I'd be willing to wager a 32-bit CRC would provide sufficient uniqueness.
I'm really not sure a CRC would be good : https://stackoverflow.com/questions/3645461/creating-a-fast-hash-function-for-fixed-length-input

Computing md5 is very fast and quite good

Anyway i guess you can try the CRC32 method on a very huge IR collection and see if you get collisions ?
 
The "unique fingerprint" problem has been solved to many different levels of collision tolerance. I'm pretty sure CRC isn't high on that list.

I will grant you that the odds of an MD5 collision seem pretty low, but I'm not sure why optimizing this for minimum something (hash size? speed? compute power?) is a priority. Collision avoidance clearly is.

How about a real-world test, for anybody who can code a little and has a lot of IRs: Calculate the MD5 hash all of them, and check for duplicates. I can do that myself, with my medium-size collection, but not until after work. Not that that'd "prove" anything, but it'd be a potentially interesting data point. The complication is that some of them actually ARE duplicates, for instance in folders with different formats (Helix, generic .wav).
 
How about a real-world test, for anybody who can code a little and has a lot of IRs: Calculate the MD5 hash all of them, and check for duplicates. I can do that myself, with my medium-size collection, but not until after work. Not that that'd "prove" anything, but it'd be a potentially interesting data point. The complication is that some of them actually ARE duplicates, for instance in folders with different formats (Helix, generic .wav).
You don’t have to do this. The math above suffices to tell you how many IRs you’d need to scan to get a 50% chance of a collision with MD5. For any IR library the answer is: you won’t see any collisions. Feel free to spend your time convincing yourself of this, but it’s largely unnecessary.
 
You don’t have to do this. The math above suffices to tell you how many IRs you’d need to scan to get a 50% chance of a collision with MD5. For any IR library the answer is: you won’t see any collisions. Feel free to spend your time convincing yourself of this, but it’s largely unnecessary.
Yes but the question was with a crc
 
Ok, I bit the bullet and wrote a quick C++ program to enumerate 35,039 *.syx, *.ir, *.wav in my IRs directory and check for collisions. My cheesy hash was not good because so many of those files had tons of zero bytes (it is actually a remarkably effective string hash though because it's a pseudo polynomial). Using a crc32 (sans table) I had like over 200 collisions. Aha! You skeptics think that crc32 has been defeated. Tcha, as if. I had duplicate files. So after adding a file compare when a collision was detected I had exactly 0 collisions.
 
Yes but the question was with a crc
No it wasn't read the quoted text and his post. @Dave Merrill only talks about testing MD5. Quoting it here:

I will grant you that the odds of an MD5 collision seem pretty low, but I'm not sure why optimizing this for minimum something (hash size? speed? compute power?) is a priority. Collision avoidance clearly is.

How about a real-world test, for anybody who can code a little and has a lot of IRs: Calculate the MD5 hash all of them, and check for duplicates.
I'm 100% confidant that for any IR library out there, there will be zero collisions that aren't the result of actual, duplicate data, if you use MD5.
 
Last edited:
Ok, I bit the bullet and wrote a quick C++ program to enumerate 35,039 *.syx, *.ir, *.wav in my IRs directory and check for collisions. My cheesy hash was not good because so many of those files had tons of zero bytes (it is actually a remarkably effective string hash though because it's a pseudo polynomial). Using a crc32 (sans table) I had like over 200 collisions. Aha! You skeptics think that crc32 has been defeated. Tcha, as if. I had duplicate files. So after adding a file compare when a collision was detected I had exactly 0 collisions.
Yea, across 30k IRs I had no false collisions using MD5. But, that's unsurprising.
 
Ok, I bit the bullet and wrote a quick C++ program to enumerate 35,039 *.syx, *.ir, *.wav in my IRs directory and check for collisions. My cheesy hash was not good because so many of those files had tons of zero bytes (it is actually a remarkably effective string hash though because it's a pseudo polynomial). Using a crc32 (sans table) I had like over 200 collisions. Aha! You skeptics think that crc32 has been defeated. Tcha, as if. I had duplicate files. So after adding a file compare when a collision was detected I had exactly 0 collisions.
Yup, my back of the envelope calculations say that CRC32 is more than adequate. MD5 is probably safer and if it doesn't add significant time to the boot task we'll use that.
 
Yup, my back of the envelope calculations say that CRC32 is more than adequate. MD5 is probably safer and if it doesn't add significant time to the boot task we'll use that.
This gonna be huge :)
Will you store the ir signature only in the cab block or will still store the slot number?
(users can have multiple identical irs in different slots)

Also another problem which could appear is when converting from gen2 by example, you'll only have the slot number information...
 
This should increase the value of free IR packs I reckon, in the sense that since everyone can theoretically have them in their units that presets using them will be much more readily shareable.
 
Back
Top Bottom