Lazy Web Request: MT-NoRepeats

As I generate more and more content for this site, I increasingly worry that I’m repeating myself. Have I already written that anecdote about burying Han Solo in the backyard before? Have I referenced that article about gay robots already? My memory is poor, and I often write entries at maximum thinking and typing speed, causing plenty of room for errors.

I want a Movable Type plug-in that, when I click ‘Save’, parses my database to find possible matches for the current entry. I’m not going to explore how this might work programmatically, as I’ll only embarrass myself. And it’s not only a question of checking for repeated links–I want to ensure that I’m not retelling anecdotes and related info. Regardless, such a plug-in would cut down on my anti-repeat anxiety.

    What you’re describing is a process called “text fingerprinting,” and the algorithms are being tested at leading universities right now. The programs scan through texts and, disregarding articles and the like, generate unique “fingerprints” of the content. These fingerprints are then applied against every other fingerprint generated from another text. This allows context matches without precise text matches, and ultimately it will be useded to search for the usual things: cheating on essays. Who knows? An MT-plugin might be next.

    If you do succeed in your search, please share your code with the Slashdot team. They are famous for double posting the same story. :-)


