I recently landed a web design client who has a website that needs a little sprucing up, an entirely new admin backend that can be administrated by real people, and a lot of code updating. Its code is hideous; completely non-XHTML compliant with capitalized tags scattered everywhere, some pages are seas of Word-generated HTML (which is the worst nightmare of any decent web designer), and it’s otherwise just pretty hideous.
Fortunately for me, I have a lovely utility called TextSoap Deluxe. TextSoap has any number of ways to clean text, and one of its nicest ones is the ability to create a custom cleaner based off of regex rules, which will then churn through a chunk of text and do things to it. Of course, I knew very little regex at the time, but after a few hours sitting down with the excellent Regular-Expressions.info tutorial and doing a bit of trial and error in TextSoap I was able to create a cleaner that at least simplifies my life by lowercasing the things that need lowercasing (among a few other things). If you use TextSoap and want to see what I’ve done, I’ve posted an early version of the cleaner in the TextSoap forums: XHTML Cleaner for TextSoap.