I’ve had an idea kicking around in the back of head for a few months, but not the time to write it down. As I’m currently on a flight with some time to kill, I thought I’d write an abstract of the idea for the blog to wet people’s appetites for a white paper to come soon.
In talking with realtors and developers who make use of our various RETS projects like libRETS, ezRETS, and vieleRETS it seems a majority of users are replicating the MLS database for web-based advertising purposes. Following CRT’s mission of playing with a diverse set of technology and following Perl‘s Motto of “there’s more than one way to do it” I’ve been tinkering with an idea of how MLS’s can share that data in alternative ways.
Since every good project needs a catchy name that you can turn into a fun acronym, I’ve decided to call this the Syndicated Advertising-based Listings Engine or SALE. Besides, this makes promoting the project fun with catchphrases like “Everybody wants a SALE!” But what is this project you ask?
Being the hip up-to-date netizen that I am, I obviously read tons of web sites and listen to a fair amount of podcasts via their RSS feeds. In some ways, syndication is just another word for replication, so I wondered, how could we replicate the MLS’s database easily using something like RSS and/or ATOM? Since I talk to realtors more than MLS staff, I’ve heard lots of talk of Realtors using RSS/ATOM feeds of listings in their business, but I haven’t heard of an MLS doing so. It’s also worth pointing out that some companies, such as Google or Trulia take in listings in an RSS-ish format.
In this thought experiment I’ve made a few assumptions. 1) In the general case, a Realtor only wants to pull listings via this technology, not update it. 2) RSS/ATOM is a widely deployed technology that most VAR/consulting software developers have been exposed to. 3) An advertising feed doesn’t contain ALL the data in the MLS listing. 4) This is advertising data we, as real estate professionals, WANT spread all over the earth, ie. only public data.
Using those assumptions the idea is that an MLS could have a feed of listings that could be hit periodically and would contain a few key pieces of data that makes the person receiving the data be able to do complex things like tract prices changes and status. In my mind, to be on the safe side, this feed should have the last weeks worth of changes. So if someone chose to hit the feed every three days that would still work and the feed size would be of reasonable size.
For every change in the MLS database, there would be an entry in the feed. Say a property was listed, had its price change, and then was closed that would generate three entries in the feed over the lifetime of that listing. Even if listing lifecycle happened in a single day.
There is the problem of someone just coming on to the system for the first time. There will need to be a “full database” version of the feed available as well. In theory, this should only be hit once per entity pulling the data, but in reality it will probably be more due to system crashes and the like. I see this as something that needs to be updated once a week or so and then the “live feed” can be used to synchronize to the current date.
Based on the needs of the live feed and the full dump feed, I see that each listing should have the following as required items: 1) the Listing ID, 2) a unique ID of some sort, maybe a UUID, to uniquely identify the change, and 3) the URL to where the property is listed on the MLS or Realtor web site and 4) URL(s) to image(s) of the property that can be snagged easily via some media server.
My original thought in emphasizing the change is that the client receiving the data would be responsible for figuring out what changed since each entry in the feed would have the full data. However, I’m thinking maybe a require field that lists the fields that have changed, or NONE, might also be a good idea. This does, however, put more burden on the server-side implementer.
The security minded will notice I didn’t talk about authentication. Because these feeds are just basic HTTP grabs, you can secure them anyway you want to using your web server’s authentication choices as well as running it over SSL. Also, thanks to assumption 4, we can ignore things like per-level field encryption and the like as this is all public data we want seen.
One of the things that I haven’t really thought about yet is what the data inside of the feed would be or how it would look. In my conversations around the industry a magic number of “about 50 fields” keeps coming up as what would be needed for a schema like this. I know of at least one of the regional MLS data sharing agreements that shares just about that number of data items. For something like SALE, I could see that number as well. Of course, because RSS/ATOM feeds are namespaced xml, a particular server vendor could extend it. I haven’t thought in that direction yet either.
In any case, this was my brain dump/abstract of the idea. In the white paper version I’ll have more of the specifics in terms of what the feed could look like fleshed out and maybe a workflow example of that first meshing of the full feed with the life feed.