educate others about Zeitgeist
These are not the conspirators you are looking for

We are seven people working together on a website that has 200,000 registered users. We have no QA department, no customer service department, nobody but us monitoring the site. Meanwhile, thousands of entries and comments are added to our database every day. Most of them are innocent, positive entries about people accomplishing their goals. Others are vile attacks on other users, childish pranks, heated religious debates, or perfectly valid content that happens to be meant for a mature audience. There is a lot of truly disgusting, malicious, and irrelevent material on our site, but luckily most people don’t know about it, because we are able to keep it off the main pages of our site. (Note that we don’t censor or forbid it except in extreme circumstances; we merely keep it out of the “display window” pages of our site.)

How do we do that? Well, we can’t read every entry or comment and determine if it is tasteful and representative of the site’s purpose. Like every web company facing this situation and serving user-created content, we have to use automated filtering on the incoming stream of entries. Since computers aren’t smart enough to determine the “gist” of what an entry contains, they need to filter text using keywords. The keywords can be the four-letter variety or they can be standard words that are often used in racy contexts. Consider something like “meet young girls” as a goal. None of the words are racy, but the phrase can be considered inappropriate depending on who is writing it, and their purpose. The computer can’t understand any of this, and can’t parse phrases well (e.g. “meet young girls” vs “meet some young girls” vs “meet me a young girl”); it can only find words (and parts of words) in the text streaming in. So that’s what it does.

There’s no point trying to list all the words, word parts, and word patterns we use, or having us justify each keyword being there. It’s a big list built from lots of real-world use. But religion seems to be the culprit of this particular conspiracy theory, so I’ll touch on it. Many discussions about religion become combative and inflammatory – in fact religion could be considered one of the most likely candidates to inspire heated debate rather than simple progress-tracking on life goals. So some of our keywords involve religion. Keep in mind that the computer isn’t taking sides on on one religion or another, on atheism vs theism, or whatever. It’s also not making a statement about how the site is meant to be used. It is merely flagging text as possibly containing inflammatory arguments or mature content, and keeping that text from appearing on the high-traffic pages of the site, just in case. More often than not, as some have noticed, the filter catches “false positives”, completely innocent content that happens to contain keywords which are sometimes found in inappropriate contexts. But, since we can’t read all the content, we have to use this imperfect, automated filtering to keep the Zeitgeist and other high-traffic pages relatively innocent and relatively representative of what 43 Things is all about.

This is a standard methodology of dynamic websites, and it actually works pretty well.



Comments:

Thanks for your thoughts! True, I didn’t mention more advanced algorithms we could be using, like those used in spam filters. I was definitely aiming for the non-technical audience in that post, trying to explain the basic “how and why” of our keyword filtering to our users in order to distinguish it from the topical censorship of which we were being accused. But maybe “the computer can’t” was less direct than “our algorithm doesn’t”. Fair enough. I definitely could have been more exhaustive in explaining the various filtering options available to us. But I successfully avoided using the word “algorithm” in a non-technical post, you have to give me points for that!

In all honesty, the filter will probably become more selective over time rather than less, but it will likely involve other quality and relevance metrics like the reputation score of the author, the nature of the goal it’s written about, or even the relationships between the author and viewer (a personalized Zeitgeist). Not sure when we’ll be rethinking how the Zeitgeist page works, but if you have more feature requests for it, or come up with a better name than Zeitgeist, make sure your thoughts get recorded for posterity on our ideas site. That’s where feature requests are guaranteed to be seen and registered with the Robots.


 

I want to:
43 Things Login