We the News is conceived of as a project to examine, analyze, and classify blogosphere reactions to some controversial news event. Blogs are interesting in many ways. But sometimes they are interesting not for their “truth value,” but because they are personal and opinionated. We attempt to collect and classify blog entries as a gauge of public reaction toward news events, with the intent of automating the process by which core arguments and positions on a given topic can be identified and presented.
The project set out to build two primary components: a web service framework that allows semi real-time retrieval of news stories and blog entries that relate to those stories, and a NLP tool that uses the structured output from the retrieval service to rank, classify, and extract highly affective, charged statements from blog texts.
The retrieval web service is a logistical framework that uses multiple public web service APIs and RSS feeds to obtain blog entries that relate to a specific event, whether extracted from a news service, or by the user manually typing in search parameters. In addition to API service adaptors and XML input and output processing modules, the service possesses a text extraction module that uses some predefined heuristics to identify and extract the blog full-text body, given a link to a HTML blog post.
The affect/opinion extraction and classification toolkit is of more interest in NLP terms. The classification tool extracts highly opinionated statements, given structured XML input of a set of blog entries. These statements can be considered “sound-bite” summaries of the entry in question, and should be in some sense very revealing of the blog author’s core opinions regarding the subject in question. The toolkit also contains a set of three human retrieved and constructed training corpora containing 20 to 30 blog posts each that correspond to a central topic of controversy.
We make a fundamental hypothesis that highly affective sentences are proxies for highly opinionated statements, and thus attempt to identify and extract these charged statements from the blog text. While this is not always the case, we assume that this is a reasonable approximation for the majority of the cases where we seek toidentify sentences that are revealing of core opinion. We assess the performance of our system, with respect to this hypothesis, in our discussion of results.
Team members held weekly meetings to coordinate project direction. Further, individual tasks were allocated as follows:
The Orchestration.py module (implemented in Python) controls the web service that retrieves blogs posts (along with Flickr pictures) relevant to the news story. The orch.py file implements an object of an Orchestration class. blogservice.php is a thin PHP code layer to enable its deployment as a web service.
Parameters are passed to blogservice.php as HTTP GET or HTTP POST variables. They include:
Orchestration class uses the following data fields to store the extracted contents:
The collection is filled when Orchestration utilized the following modules to implement its main functionality:
The NLP Analyzer consists of two Classifiers, supported by a number of feature scorers. Both classifiers operate with the NewsCollection data structure - they take in a NewsCollection as input, and outputs the same NewsCollection structure but with the necessary <emotionalSummary> fields filled with what they believe to be highly opinionated sentences.
Both Classifiers are designed with Cross Validation and Testing modes. When run on the command line, each *Classifier.py module will execute a self-diagnostic, cross-validation routine. The threshold for cross-validation selection is 0.2 by default, which means 20 percent of posts will be selected as test posts. The topic corpus used for evaluation defaults to kElection, the election 2006 corpus. Classifiers accept a topic argument, however, which can be "ie" or "truth", to switch to a different corpus.
For Testing mode, the ClassifierTest.py driver script will accept a topic argument and use its corresponding corpus for training. The classifier will then be run on the testing corpus, a collection of about 10 posts for each topic that have not been preprocessed by humans for the emotionalSummary.
SummaryEvaluator is a class designed to assess the performance of the Classifiers during cross-validation. In essence, it compares the original NewsCollection data structure as deserialized from the human-prepared gold standard XML files to the newly generated NewsCollection data structure from the Classifier. Discrepancies between each blog posts' extracted sentences and the extracted sentences in the gold standard are noted. The Evaluator will then generate a recall and precision report for the validation run, with some additional miscellaneous statistics.
Each feature scorer that we implemented followed this convention:
Scorers: Technique description, feature rationales
The rationale here is that a sentence with strongly positive words (such as "affection," valence score 8.39) and strongly negative words (such as "whore," valence score 1.61) are more likely to be strongly emotional than sentences without these words.
We decided that sentences that included cursing were likely to be highly charged sentences: "Why must that ######## Hilary be so ######?"
So for each inquirer scorer, we load up the appropriate dictionary and return the counts for each sentence in each blog post, for a given topic. Since the inquirer dictionary lists root words, and stemming is expensive, we decided to define a list of suffixes which would be applied to each dictionary word. For example, "disgusting" is not in the dictionary for inquirer, but by listing "ing" as a suffix, we will try affixing "ing" to each dictionary word in finding matches. This general set of behaviors with regard to counting words is implemented in inqGeneralScoring.py module.
We decided that sentences including pleasure words are more likely to be affective than those that don't.
We decided that sentences including pain words are more likely to be affective than those that don't.
We decided that sentences including Strong words are more likely to be affective than those that don't.
We believed our domain-specific development sets to be good for finding good bonus words and stigma words for that type of blog post. For instance "scandal" is an especially charged word, given the domain of the 2006 American election results. Instead of relying on another authoritative list (such as ANEW or General Inquirer), we decided it would be a strength to use our own development sets to create a list of words. And in our cross validation trials, the Cue Word scorers benefitted both recall and precision.
We determined that words that showed up disproportionately little in the affective sentences of our development corpora would indicate lower affect in general.
We determined that words that showed up a disproportionately high number of times in the affective sentences of our development corpora would indicate higher affect/charge in general.
Challenges in scorers:
XML Reader receives an XML either as an input from the Orchestration module or reads in a static file (such as a Gold Standard blog collection for each of the sample news events). It extracts data about the new articles and relevant blogs, and produces a Python data structure with the datafields filled with the extracted data.
Simple classifier is implemented in SimpleClassifier.py module. It is an unsupervised method which does not do any self-training and uses all of the available feature scorers to select sentences with the highest feature count.
The classifier is used to classify each blog sentence as either a good represenative of the opinion of blog or not. It is called from the SummarizeTest.py file.
The NaiveBayesClassifier class requires the following input arguments to initialize:
The classifier is implemented in NaiveBayesClassifier.py file. It trains itself by assigning the higher weights to the scores which were most successful in identifying the correct summary sentences in the Gold Standard (the train() method). When the training is complete, it attempts to summarize the test set of blogs (the summarize() method) and uses the SummaryEvaluator.py in order to analyze how well the classifier was able to pick the emotional sentences from the test collection.
To facilitate training and testing of the classifiers, three training corpora of blog posts were developed, ranging from 20 to 30 blog posts each. These posts pertain to the central theme of their respective corpora, which consists of three prominent news events:
Each corpus is manually developed by a member of the project team, who also makes a qualitative judgment on the valence of the opinion of the blogger with respect to the topic on a 1-9 scale, 9 being most favorable and 1 being least favorable. The team member also selects a number of sentences which he or she believes is most reflective of the blog's core opinion (and thus the targets for automatic extraction). This metadata is inserted into the markup for use by the evaluator module.
Using the Simple Classifier as the baseline extraction tool and the Naive Bayes Classifier as the main extraction tool, we performed cross-validation with all feature scorers on the three gold standard corpora. From each corpus, 20 percent of its blog posts are randomly designated as test posts, while the remaining 80 percent are designated training posts. The classifiers are then trained and tested on the respective posts. The SummaryEvaluator is then used to compare machine-selected sentences with human-selected sentences in the corpora, and precision/recall scores are output.
Because of the random-selection of posts used for cross-validation, we run each classifier 5 times and take the average recall and precision ratings for comparison.
Finally, we train the classifiers on a training corpus, run it on a separate test set of posts on the same subject, and qualitatively assess the accuracy.
| Classifier | Recall | Precision |
|---|---|---|
| Simple | 0.32 | 0.30 |
| NaiveBayes | 0.71 | 0.79 |
Al Jazeera (and others) report that Islamic radicals in the middle-east are praising American voters for "defeating and rejecting Bush’s failed policies" following yesterday’s midterm elections. This is all that really needs to be said about yesterday’s big Democrat victory(s), (UPDATE: but here's more)
But I can’t resist digging deeper. How miserable is your political party when you have the enemy of your country cheering for your victrory as a sign that they have won the "hearts and minds" of Americans? Terrorists are cheering because Democrats have been championing their cause since 2003 and they believe all of the Democrat rhetoric and benefit from it. (Is the above photo from Al Jazeera or a Democrat rally?) Islamic throat-cutting fascists know that a Democrat win is a win for Islamic throat-cutting fascists. How so? They have been doing this for a while now. They believed Democrats when they said "this is a war America cannot win". They believe Democrats when they say America must get out of Iraq (cut and run) ASAP. They agree with John Kerry and think our troops are uneducated idiots. They agree with Democrats who want to let them phone America without the CIA/NSA/FBI etc listening in on their calls. They love the fact that Democrats want to give them a free lawyer and full US citizenship rights if captured while trying to kill US troops. They repeat Democrat talking points (almost word for word) in their speeches when they say that America is the problem in the world and should concentrate on our own healthcare system and feeding the poor instead of the war on terror in Iraq.
What is this so-called New Direction? If siding with the enemy of America during a time of war is Democrats idea of a "New Direction" then "God help us". How can Democrats be proud when their victory is considered even more of a victory by America’s enemy? (Islamic throat-cutting fascists) It’s easy to understand why terrorists would support Democrats, but why would American voters? "Props" go to CNN, MSNBC, ABC News, CBS News, NBC News, Jon Stewart, David Letterman, Whoopie Goldberg, Sean Penn, Rosie O'Donnell, The Dixie Chicks, Bill Mahr, Al Franken, Michael Moore and the NY Times and virtually every other liberal newspaper.(All of whom are cheering along with the terrorists) Now, what happens if America doesn’t live-up to Democrats promises to "cut and run"?
The problem is that liberals in America have eagerly let yourselves be used as "useful Idiots" in order to bring down Bush.The plan is to kill you along with the rest of us when the time comes. Yesterday was a victory for all of you useful idiots who claim to be smarter than everyone else and a victory for the terrorists who played you like idiots against your own government. Congratulations on your "victory". One other interesting note: Democrats and all the news media talking about nothing but rampant voter fraud and problems with the voting process for days and even on election day right up until the time Democrats started winning, then suddenly no further mention by anyone anywhere of ANY problems with the count. Hmmmmmm Had Democrats lost, we would be knee-deep in lawyers right now. I'm just sayin'...
EXTRACTED: Terrorists are cheering because Democrats have been championing their cause since 2003 and they believe all of the Democrat rhetoric and benefit from it. (Is the above photo from Al Jazeera or a Democrat rally?) Islamic throat-cutting fascists know that a Democrat win is a win for Islamic throat-cutting fascists.
EXTRACTED: How miserable is your political party when you have the enemy of your country cheering for your victrory as a sign that they have won the "hearts and minds" of Americans?
MISSED: Yesterday was a victory for all of you useful idiots who claim to be smarter than everyone else and a victory for the terrorists who played you like idiots against your own government.
| Classifier | Recall | Precision |
|---|---|---|
| Simple | 0.15 | 017 |
| NaiveBayes | 0.55 | 0.73 |
| Classifier | Recall | Precision |
|---|---|---|
| Simple | 0.19 | 0.40 |
| NaiveBayes | 0.38 | 0.57 |
The exhaustion (and usually the grief of the loss) would hit the morning after, I'd be dead to the world, often miss school , and usually come down with a head cold. O Joy, O Rapture, O Bliss. There it was, the headline: "RUMSFELD TO STEP DOWN!!!" And to my Dems who may now consider themselves the Congressional Leadership...don't blow it. I've been alternating between fits of spontaneous, hysterical giggles and wanting to burst into tears of sheer relief. Maybe I was feeling it sympathetically for all the stalwarts who still managed to trudge through that ghastly campaign after years of heartbreak.
We hypothesized that identifying highly opinionated statements can be approximated by identifying and extracting highly emotional sentences from the blog text. Overall, this has proven to be a reasonable hypothesis, especially in political or politicized domains where the event in question generates very charged statements, such as the election of 2006 and "An Inconvenient Truth". When bloggers emphasized their particular viewpoints, they often included some form of emphasis using loaded words, capitalization, punctuation, and other features that can be captured by our scorers. Instances such as "RUMSFELD TO STEP DOWN!!!" and "I've been alternating between fits of spontaneous, hysterical giggles and wanting to burst into tears of sheer relief" were very prominent examples of this.
This hypothesis has proven less accurate with the Internet Explorer event, which while having some emotion, is usually less well-defined, buried under sarcastic/bitter musings ("IE7 is clearly a Microsoft Product."), or consist of very factual, very frank comparison of technical features (such as IE 7 vs Firefox 2). In this latter instance, affect detection was not a very effectual approximation for opinion.
We also noted a phenomenon in which highly charged sentences were not core opinion statements in and of themselves, but serve as cues that indicate core opinion statements were in close proximity. In several instances, the sentences that were clearly opinion-summary statements preceded or followed after a very affectively charged sentence. We did not have enough time to extend our scoring framework to capture this particular feature, but this is a very interesting example of an extension to the current methodology in order to capture more accurately the desired statements.
The analysis tool generally achieved higher precision than recall for any given subject domain, implying that it was reasonably accurate at identifying opinion-summary statements but was not sufficiently sensitive to produce many of these statements from the general body of text.
Multinomial Naive Bayes classification outperformed the baseline on all measures, achieving some ~40 percent higher precision on most corpora, despite its assumptions regarding independence of feature probabilities.
Challenges:
Future Work: the Mechanical Pundit
Ultimately, work in opinion detection, extraction, summarization, and affect scoring leads to an interesting possibility in building an advocacy agent. The distilled sentiments, opinions, and assertions of various bloggers can be stored in a database, along with metadata that extracts core opinions and evaluate them as points or counterpoints. Thus when queried on any topic of controversey, the agent may 1) inform the user as to the state of the debate, the sentiments involved, and arguments being presented, 2) play Devil's advocate in testing out arguments, and 3) become a pundit of at least equal caliber to late-night radio talk show hosts. An agent such as this may be of some interest academically, but also as a kind of technology entertainment/art piece for others.