Regex: oh my

I recently landed a web design client who has a website that needs a little sprucing up, an entirely new admin backend that can be administrated by real people, and a lot of code updating. Its code is hideous; completely non-XHTML compliant with capitalized tags scattered everywhere, some pages are seas of Word-generated HTML (which is the worst nightmare of any decent web designer), and it’s otherwise just pretty hideous.

Fortunately for me, I have a lovely utility called TextSoap Deluxe. TextSoap has any number of ways to clean text, and one of its nicest ones is the ability to create a custom cleaner based off of regex rules, which will then churn through a chunk of text and do things to it. Of course, I knew very little regex at the time, but after a few hours sitting down with the excellent Regular-Expressions.info tutorial and doing a bit of trial and error in TextSoap I was able to create a cleaner that at least simplifies my life by lowercasing the things that need lowercasing (among a few other things). If you use TextSoap and want to see what I’ve done, I’ve posted an early version of the cleaner in the TextSoap forums: XHTML Cleaner for TextSoap.

3 responses to “Regex: oh my”

Leave a response

  1. Todd Ransom says:

    Whenever I think of Regular Expressions Lewis Carol springs to mind:

    “Beware the Jaberwock, my son! The jaws that bite, the claws that catch!”

    TR

  2. Todd Ransom says:

    Sweet. UNIXisms all sound like Jabberwocky. grep, grok, sed, perl, vim, bash, root, it’s all profoundly nonsensensical.

    TR

  3. Ian Beck says:

    “And, as in uffish thought he stood,
    The Jabberwock, with eyes of flame,
    Came whiffling through the tulgey wood,
    And grep-ed madly as it came!”

Respond to Todd Ransom

(cancel reply)