Python代写:CSCI1310 How Do You Feel On Twitter

代写一个美国大选的小程序,根据Twitter数据绘制热力图。

Requirement

Sentiment analysis is the process of computationally identifying a writer’s attitude towards a topic expressed in a piece of text. Some companies apply sentiment analysis to opinions expressed in social media about their products.

In this assignment, we are providing you with all tweets generated in the second week of November and you are going to use that data to generate a geographic visualization of the sentiment expressed about particular topics. As an example, consider the following map that shows how people feel about Justin Bieber using the sentiments expressed in their tweets. States that are red have the most positive view, while states that are dark blue have the most negative view; yellow represents a more neutral view, while states in gray have insufficient data.

To generate this image, thousands of tweets that included the word “bieber” were collected. Each tweet contained the latitude and longitude of the tweet’s location, which could be used to associate the tweet with a state. To determine if the tweet was overall positive or negative, the individual words in the tweet were analyzed.
Words were assigned a score between -1 and +1 using a pre-defined dictionary of word sentiments. For example, a few of the words in the dictionary and their scores include,

'DEPLORABLE' = -1.0
'BAD' = -0.625
'GOOD' = 0.875
'EXCELLENT' = 1.0

If a word of the tweet is not found in the sentiment dictionary, it is ignored. The overall sentiment of the tweet is the average of the sentiment scores that are found. If no sentiment scores are found for any of the words of the tweet, this tweet is ignored. The overall sentiment of a state is computed as the average sentiment score for all tweets that are associated with that state (ignoring those tweets that did not have a sentiment score). The state’s sentiment score is then mapped to a color between blue (negative) and red (positive) using a prescribed color gradient.

Data provided

There is a file on Moodle called tweets.zip that includes nine json files of tweets collected using the Twitter API. Some of the files have a timestamp, while others do not have a timestamp. All of the files contain the text in the tweet and the latitude and longitude of the tweeter.

Code provided

There are several files provided in finalProjectFiles.zip that provide the functionality for calculating the sentiment from the tweet text and graphically rendering the sentiment for each state. The files include,

  • geo.py contains a GeoPosition class to represent a geographic location in terms of latitude and longitude. Each tweet will have a latitude and longitude that can be used to get its location relative to the states. State descriptions also have a latitude and longitude. Also included in GeoPosition is a distance method that computes that properly computes the shortest distance between two geographic locations (based on the distance traveled on the great circle that connects them).
    The class also provides methods latitude and longitude, to access the individual components in a tweet.
  • tweet.py contains the Tweet class. An instance of that class represents a single twitter message. The class includes the following methods:
    • message() – returns a string that comprises the full body of the tweet
    • position() – returns a GeoPosition instance describing the location of the tweet.
    • timestamp() – returns a datetime instance describing the day and time at which the tweet was posted. (This information is only relevant for the extra credit challenge.).
  • state.py defines a State class used to represent information about a state. Each state has a standard two-letter abbreviation (e.g., MO for Missouri), that is returned by the abbrev() method. The boundaries of each state are defined with a series of geographic positions. The relevant information about State for you is that the State class supports a method, centroid(), that returns a single GeoPosition for the centroid of the state. Informally, the centroid is an “average” of all positions in the state, which can be used as an approximation for the entire state for determining the closest state for a tweet.
  • us_states.py module contains the actual data needed for representing the United States. You will not need to examine this file; it will be used by other parts of the project.
  • country.py defines a Country class that handles the actual rendering of the states. It supports the following two methods:
    • setFillColor(stateCode, color)
      This method causes the state with the given two-letter state code (e.g., ‘MO’) to be filled with the given color (specified either as a string or an RGB triple).
    • setTitle(title)
      This method sets the title of the window (it is ‘United States’ by default).
  • colors.py provides support for translating the numeric “sentiment” values into an appropriate color based on a fixed gradient suggested by Cynthia Brewer of Penn State University. In particular, the module defines a method: get_sentiment_color(sentimentValue) that returns an RGB triple of an appropriate color for the given numeric sentiment value. If None is sent as a parameter, it returns the color gray (which is different than the color indicated by a neutral sentiment of 0.0).
  • parse.py includes load_sentiments to load the sentiments dictionary.
  • The data folder contains the raw data for sentiment scores and tweets.
  • The samples folder contains four examples of complete images for the respective terms: bacon, bieber, cat, and dog. The bieber image is the one shown at the beginning of this page; others can be viewed for bacon, cat, and dog.

What you need to do

You need to use the data and code provided to generate a sentiment analysis on some topic. All of your code should go in the file trends.py. The file currently has a very basic class definition for a SentimentAnalysis class that loads the sentiments dictionary, the states list, and the Country instance.

Your code needs to read in the data files you are using: there are nine files provided, you can use either the files with the created date or the ones without the created date. You only want to include tweets that have a specified search term, hashtag, or keyword. For example, if you are analyzing the sentiment towards the recent election, you might want to include tweets only if they include Hillary or Trump in the text. You need to write the code to filter the data.

Your primary tasks in this assignment are to loop through the provided data, and for each tweet that you include, compute the average sentiment for that tweet. You can do that by breaking the tweet into a sequence of words and looking up each word in the sentiment dictionary. The sentiment for the tweet is the average of all word sentiments for the tweet.

For example, if the original tweet were

justin bieber...doesn't deserve the award..eminem deserves it.

The words of the tweet should be considered:

1
['justin', 'bieber', 'doesn', 't', 'deserve', 'the', 'award', 'eminem', 'deserv es', 'it']

Assuming the tweet has a sentiment score (that is, at least one word of the tweet was identified in the sentiments dictionary), assign this tweet’s sentiment score to the “closest” state. The rule that you should use is to assign the tweet to whichever state has its centroid closest to the location of the tweet. This is an imperfect rule (for example, because tweets from New York City will actually be closer to the centroid of Connecticut and New Jersey then to the centroid of New York state); but it is an easy rule to implement, and it will do for now.

Once you have scored all tweets and assigned those scores to the appropriate state, compute the cumulative sentiment for each state as the average of all sentiments that were assigned. Then use that sentiment to pick an appropriate color (using the get_sentiment_color function from our colors module), and set the state’s color in the visualization.
You should feel free to define any additional functions within the trends.py file that help you organize your code in a more clear and modular fashion.

Command-line arguments

Your program needs to take the search terms, such as

>> python trends.py Trump #MakeAmericaGreatAgain

if you want to include tweets that match either of the search terms provided. If you only want one search term, you would call your program using

>> python trends.py Hillary

Some options for how you could use this data

  • Determine what people are saying in different states, this could include the sentiment only, or the sentiment weighted by the volume of tweets in a state.
  • Examine median sentiment values instead of the average sentiment.
  • Compare the results of different keywords or hashtags in the results.
  • Compare results by region instead of individual states.

Report

Write a short, 1-2 page report describing what you did and any interesting results you generated. Your report should include the following three sections:
Purpose: What is the purpose of the assignment
Procedure: What did you do? What code did you write? What functionality did you implement? What analysis did you do on the data?
Results: What were the results of the project? How did sentiments in different states compare to each other?