Adventures with Yahoo Pipes

When I was testing Tumblr as a platform for beckbits, I discovered that their RSS feeds didn’t offer quite what I was hoping. Since I was primarily planning to use beckbits as a link blog, I wanted it to work Daring Fireball-style: link items should go straight to the source material, and all other items should be permalinks back to the site. I’ve always thought that was a brilliant design choice on John Gruber’s part, and I’ve always been a big believer in integrating aspects of great design that I find around the web into my own projects.

In the hopes that I could take the RSS feed and remix it easily on my own, I turned to Yahoo Pipes. Yahoo Pipes provides a relatively easy graphical interface to parse, modify, and combine data streams, JSON, and RSS and spit it back out as RSS, JSON, or PHP serialized code. I first discovered Pipes when I found someone’s “lifestream”, a website that displayed a list of their activity across numerous different services using Yahoo Pipes. These kind of RSS feed mashups seem to be the most common use of the service, but the service is quite flexible so it seemed well suited for repurposing Tumblr’s RSS feed to my own uses.

Over the course of creating the feed, I discovered some interesting features that I feel are worth sharing. For those of your who are curious about my specific implementation, here’s the actual pipe that is powering beckbits.

Moving data between elements

Very early on in my exploration of the default Tumblr RSS feed I realized that I was going to need to use the Tumblr API if I wanted to get direct access to the various components that make up Tumblr posts. Of course, the API provides completely different elements than an RSS feed, so one of the main tasks of my pipe is to move data around between elements.

For things with a one-to-one relationship, the “Rename” operator typically does the trick. For instance, using the external site’s URL for the RSS item link on Tumblr link items was as easy as throwing in a Rename with the rule “item.link-url renamed to link“.

But what about when I needed to combine multiple fields into one? Rename certainly wasn’t going to help me there, and I couldn’t find any way to pipe data between operators and string functions.

Fortunately, the Regex operator came to the rescue.

Although I couldn’t find any documentation for this, the regex module offers a couple great features:

  • If you reference an element in the first column that doesn’t exist, it will be created.
  • You can include the contents of other elements using the “named backreference” syntax (${element-name}). For instance, when I wanted my description element to include both the Tumblr “quote-text” and “quote-source” elements the “replacement” column looked something like this:

    <blockquote>${quote-text}</blockquote><p><cite>${quote-source}</cite></p>

Simple conditionals using regex

Thanks to the fact that everything in a Yahoo pipe is evaluated sequentially, you can use the regex operator to setup simple conditionals. For instance, regular Tumblr posts have an optional title. If the title existed, I wanted to use it as my RSS item’s title; otherwise it should default to a short excerpt of the text. To accomplish this, I setup the following rules:

Rename
item.regular-title copy as title

Regex
In item.title replace (.*) with $1```${excerpt}
In item.title replace ^```(.*)$ with $1
In item.title replace ^(.+?)```.* with $1

The basic idea is to combine two fields into one separated by some delimiter characters that are unlikely to ever show up otherwise. I chose to use three backticks, since it kept things pretty legible and I rarely use backticks. If you’re not comfortable with regex, the rules in order say:

  1. Copy regular-title to title (because regular-title might not exist, this may result in an empty title element)
  2. Append ``` plus the excerpt element to whatever is in the title element
  3. If the title element starts with backticks (^```), replace its contents with whatever follows the backticks (the excerpt)
  4. If the title element starts with one or more characters followed by backticks (^(.+?)```), replace everything with that starting content

Replacing your site feed with a pipe

Once I’d created my pipe it was time to replace beckbits’ feed with the pipe, and I discovered that Yahoo Pipes has a serious downside when it comes to using it instead of your default site feed: the pipe’s output always links back to the pipe page as the feed’s homepage. Although this wouldn’t be a huge deal, it has the unfortunate side effect of causing the favicon associated with your feed to be the Yahoo Pipes favicon, which is extremely non-ideal. In order to fix this, you actually have to post-process the pipe output.

For myself, I opted to do this by reading in the pipe as serialized PHP and then constructing my own simple RSS feed. If you’re interested in doing something similar and would like a starting point, here’s the gist of it.

4 responses to “Adventures with Yahoo Pipes”

Leave a response

  1. Brian Kelly says:

    That regex trick was very helpful!

  2. Brad says:

    This was very helpful, esp. as I am trying to replace the yahoo favicon with mine. I followed your PHP approach in constructing my own simple RSS feed, but I get a “xmlParseEntityRef: no name” error when I try to view the rss feed in my google chrome browser. It does appear to work well in my feed reader. Anything I am missing. Thanks for your help.

  3. Ian Beck says:

    Sounds like you have misformed XML (this most often happens to me when I have an unescaped character like an ampersand or less than symbol, but there could be other reasons). First step to debug is run it through an XML validator. Good luck!

  4. Great tip on the “named backreference” – very helpful for solving a case where I needed to combine two RSS fields into one.

Leave a response