I recently landed a web design client who has a website that needs a little sprucing up, an entirely new admin backend that can be administrated by real people, and a lot of code updating. Its code is hideous; completely non-XHTML compliant with capitalized tags scattered everywhere, some pages are seas of Word-generated HTML (which is the worst nightmare of any decent web designer), and it’s otherwise just pretty hideous.
Fortunately for me, I have a lovely utility called TextSoap Deluxe. TextSoap has any number of ways to clean text, and one of its nicest ones is the ability to create a custom cleaner based off of regex rules, which will then churn through a chunk of text and do things to it. Of course, I knew very little regex at the time, but after a few hours sitting down with the excellent Regular-Expressions.info tutorial and doing a bit of trial and error in TextSoap I was able to create a cleaner that at least simplifies my life by lowercasing the things that need lowercasing (among a few other things). If you use TextSoap and want to see what I’ve done, I’ve posted an early version of the cleaner in the TextSoap forums: XHTML Cleaner for TextSoap.
Whenever I think of Regular Expressions Lewis Carol springs to mind:
“Beware the Jaberwock, my son! The jaws that bite, the claws that catch!”
TR
Posted 2:19 PM on May. 11, 2007 ↑
Sweet. UNIXisms all sound like Jabberwocky. grep, grok, sed, perl, vim, bash, root, it’s all profoundly nonsensensical.
TR
Posted 5:31 PM on May. 11, 2007 ↑
“And, as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And grep-ed madly as it came!”
Posted 5:11 PM on May. 11, 2007 ↑