/ aggregation

Building a Syndication Framework for the Domain of One's Own

I have to admit I feel a little bit like an amateur cook that has a lot of really fancy ingredients in front of him and no idea how to make them sing. I've been playing with a lot of different tools lately and they all seem very powerful in their own way. And yet for my specific need I can't seem to put them together in a way that works, yet. I figured it wouldn't hurt to put all this into a blog post and shop it around to people much smarter than I am for further insight. I'll break it down into three sections: The Scenario, The Tools, and The Duct Tape.

the Scenario

We are currently running a pilot of almost 400 students and faculty who have their own domains and web hosting. This fall we'll open it up to all incoming students and every year after that we'll do the same. Each user can have unlimited subdomains and install any LAMP-based software on there they want (most choose Wordpress but they're certainly not limited to that). We want to create a central site that feeds in content from all of these various web properties into a single source that can be broken down by categories. We could then take category feeds and push them out to other properties. This means having a History department highlight the work they're students are doing is as simple as aggregating a categorized feed of that work. We could break it down by course, discipline, semester, year, faculty member, etc. We are already able to do this using UMW Blogs because everyone is on the same system and it's all Wordpress so a Tags Blog is simple. In this case there are a lot of unknowns (Did they create a subdomain? What did they install? What's the RSS feed for the site?) that make automation a lot trickier, but I've got a few tools and options I've been banging around and though I'm not successful yet I think everything is there. ### The tools

FeedWordpress - A plugin that we use quite often in DTLT for aggregation. Grabs a bunch of feeds and brings those feeds in as actual posts on the blog. It can also bring in and create categories from the sites and sites can have specific categories or authors applied to them. Problem: A lot of this stuff is manual work. You can hand it a list of sites but it's still going to have you manually choosing feeds from an autodiscovered list and any new feeds means you're back in the FWP admin area. FWP seems ideal for the last leg of the framework since Wordpress does RSS feeds for categories so well and FWP does a great job of syndicating, but the automation piece needs to be figured out. Installatron - I mentioned that due to the nature of the Domain of One's Own project we don't always know where people are installing these websites. That's only partly true. If someone creates a subdomain and uploads a set of files pointing to a database they created we won't have any idea it exists without constant monitoring. But the good news is the vast majority install web-based software using the automated tool we implemented called Installatron. And Installatron has an API that will spit out a list of installations including everything you ever wanted to know about that installation. I've only briefly played with this and got a JSON-formatted document with everything installed on the server. Ideally what I'd want from this is a hook that writes specific information from the available data to a separate file (probably an OPML file) every time a new install is created. There are customization hooks where I can do various things when an install happens so that's likely the best place to start. Information on their API is available here. SimplePie - Started looking at this today because of another tool, Moonmoon, that uses it as a backend. It does feed discovery and aggregation of multiple sites. I don't know enough about it but their documentation seems pretty good. Yahoo Pipes - I get a bit weary about using Pipes for critical glue like this because ultimately it's owned by Yahoo and nothing is sacred there, but there's not doubt it could be useful for massaging the content of various sources into the right formats. ### The Duct Tape

Back in the fall I wrote about some of these ideas and even pulled together an aggregate blog that was feeding in most of the fall semester work. But as I've outlined here it was a manual process using FWP and pasting in a long list of sites. With most students opting to use subdomains these days it makes that process even more difficult. I'm imagining some combination of the tools above could be hacked together to give us what we want. Ideally the less duct tape involved the better, but as long as we're using solid tools I'm not afraid of daisy-chaining multiple processes together to get the desired output. So as is probably quite obvious at this point I have almost no idea what I'm doing and rather than bang my head against the problem for months on end I figured it couldn't hurt to outline my goals and ideas here in the hopes that someone may have some ideas. Help me Obi Wan!