How cool is Yahoo? I was poking around their developer network page after looking up some YUI documentation and I came across their Terms Extraction service. The service allows you to post content to it and return a list of “significant words or phrases” ordered by importance. I ran a couple test posts and sure enough it did a pretty good job figuring out the relevant terms and ranking them more or less how I probably would. I assume it’s the same or at least a similar indexer to the one Yahoo’s own search probably uses.
However it’s done, I think their server has a huge amount of potential from an ECM perspective. Nearly every client of mine has asked about offering some level of automation to their meta-tagging while contributing content. While many of the enterprise vendors offer applications to automatically tag content, I find that they are usually focused on batch loading or migrating content, rather than assisting the every day contributor. In my ideal world I’d love to see contribution forms come up pre-loaded with system recommended meta-data, allowing contributors to make only necessary edits and then simply approve thier submissions.
I was pretty pumped when I ran across Yahoo’s service, so I decided to throw together a quick component to take advantage of it for Stellent / Oracle Universal Content Server. Nothing too crazy here, the component basically adds an additional option to the Content Action menu called “Add Yahoo Terms”, which executes a service called “ADD_YAHOO_TERMS”.
The service then does the following:
- Performs a dynamic conversion on the document
- Reads the contents of the resulting HCST output file
- Posts the HTML to Yahoo’s Terms Extraction service
- Parses(XML) the terms returned
- Creates a comma delimited list of the best terms
- Saves the list to the binder in a predesignated memo meta-data field(the default is xComments)
The remainder of the service is basically the GET_UPDATE_FORM service, which brings up the update form. Since the terms are in the binder as a meta-data field, the update forms comes up pre-loaded with the document’s meta-data as well as the Yahoo terms in the pre-designated field. From there on you are on the standard Stellent / Oracle update form, so saving will commit your changes.
It’s a pretty simple integration, nothing revolutionary, but I think it’s a decent example of how one might integrate Stellent with Yahoo’s service.
Feel free to download the component here: