Regex: oh my

I recently landed a web design client who has a website that needs a little sprucing up, an entirely new admin backend that can be administrated by real people, and a lot of code updating. Its code is hideous; completely non-XHTML compliant with capitalized tags scattered everywhere, some pages are seas of Word-generated HTML (which is the worst nightmare of any decent web designer), and it’s otherwise just pretty hideous.

Fortunately for me, I have a lovely utility called TextSoap Deluxe. TextSoap has any number of ways to clean text, and one of its nicest ones is the ability to create a custom cleaner based off of regex rules, which will then churn through a chunk of text and do things to it. Of course, I knew very little regex at the time, but after a few hours sitting down with the excellent Regular-Expressions.info tutorial and doing a bit of trial and error in TextSoap I was able to create a cleaner that at least simplifies my life by lowercasing the things that need lowercasing (among a few other things). If you use TextSoap and want to see what I’ve done, I’ve posted an early version of the cleaner in the TextSoap forums: XHTML Cleaner for TextSoap.

3 Responses

Leave a response

  1. Todd Ransom says...

    Whenever I think of Regular Expressions Lewis Carol springs to mind:

    “Beware the Jaberwock, my son! The jaws that bite, the claws that catch!”

    TR

  2. Ian Beck says...

    “And, as in uffish thought he stood,
    The Jabberwock, with eyes of flame,
    Came whiffling through the tulgey wood,
    And grep-ed madly as it came!”

  3. Todd Ransom says...

    Sweet. UNIXisms all sound like Jabberwocky. grep, grok, sed, perl, vim, bash, root, it’s all profoundly nonsensensical.

    TR

This post has no trackbacks, which is sad.

Leave a response

Track me like a stalker:
  • Tagamac
  • Twitter
Clicky Web Analytics