If you want to torment(?) yourself more, or want to pursue this issue another time, I recommend reading this post where I did something similar.
It displays how I dealt with some of the caveats in a similar case, and it might give you some suggestions as to what kind of regex’s you’ll need to handle your case as well.