## Question 1: Identifying undervalued stocks

The main objectives of this question are to solve the followings:

1. Since the information provided on the stocks listed in NASDAQ and NYSE in the website http://finviz.com/ is comprehensive, we are interested in obtaining the table provided in the website for further analysis.
2. However, life is not so easy. When we apply getURL() to the page http://finviz.com/screener.ashx?v=152&r=0, the result is an empty string. This indicates that this is not the right path to go. Alternatively, we observe that if we have the stock number of a stock, say PIH, the same table of information can be obtained in the page http://finviz.com/quote.ashx?t=PIH.
3. Fortunately, a table of listed stocks can be found in NASDAQ webpage http://www.nasdaq.com/screening/companies-by-name.aspx Our first task is to collect a table of listed stocks in NASDAQ webpage. Based on the listed stocks, another table of information on these stocks are collected from finviz.com and stored in a data frame.
4. The next step is to prepare the data frame in a suitable format for analysis. The followings are some of the variables where cleaning is required:
• a) Earnings
• b) Optionable
• c) Shortable
• d) country
• e) Sector
• f) industry
• g) 52W Range
• h) IPOyear
• i) Index
• j) Volume
• k) Volatility
• l) Variables which contains %, M, K, B and \$ character
5. If your cleaning in step 4 is done properly, the command summary(dataset) should produce useful answers.
6. Obtain a histogram for prices of all stocks and a histogram for prices of stocks whose price is less than 150 in the dataset. Compare the difference.
7. Obtain a horizontal bar chart of the average prices per Sector.
8. Obtain a horizontal bar chart of the top 50 average prices per industry.
9. Obtain a horizontal bar chart of the average prices per financial industry.
10. Since the industry property casualty insurers has the second highest average price in the finance sector, obtain a horizontal bar chart of the top 20 highest selling stock prices of property casualty insurers.
11. Create variables to locate stocks which sells below their sector averages on PE, PEG, PS, PB and Price respectively.
12. Create variables to locate stocks which sells below their industry averages on PE, PEG, PS, PB and Price respectively.
13. Question 11 and 12 altogether define 10 simplifying criteria for an undervalued stock. Create an index to determine the number of criteria each stock satisfies. We call this index a relative_value_index.
14. Besides the relative_value_index, suppose that other criteria for identifying an undervalued stock are as follows:
• a) Price per share is between \$20 and \$150
• b) Volume must be greater than 10,000
• c) Positive earnings per share and positive projected earnings per share
• d) Total debt to equity ratio less than 1
• e) Beta less than 1.5
• f) Institutional ownership less than 30 percent
• g) Relative valuation index values greater than 8

Identify stocks in the dataset that satisfies the stated criteria.

## Question 2: Sentiment Analysis on Mr. Donald Trump Election Campaign

The webpage http://datascienceplus.com/sentiment-analysis-on-donaldtrump-using-r-and-tableau/ provides a comprehensive sentiment analysis on Donald Trump Election Campaign. Basically the program downloads current tweets from Twitter API. After cleaning the messages in tweets (each message contains no more than 140 characters), a score is given to each tweet obtained based on whether words in a tweet lean towards more positive or more negative. However, there are errors in the given program.
Our task is to rewrite the program to make it work and then report the result of this sentiment analysis in a Rmd file.

## Question 3: US Election

As seen in tutorial, the website http://www.elections.state.md.us/elections/2016/index.html contains data files of US election results since 2000. In the program USElections.R, a way to download the data files of all years in which csv files are provided was demonstrated. However, we can also do the same task using RSelenium server.
Our task is to write a program to drive a RSelenium server to the desired webpages automatically. After we reach a webpage, the csv files kept in the page will be downloaded.

## Question 4: A Shiny Application

This task is to write a Shiny application on energy consumption of different appliances.
Suppose that we have the following appliances:

``````Appliance      Power Consumption
Clothes dryer  3000 Watts
Clothes iron   2400 Watts
Dishwasher     1600 Watts
``````

The method to calculate energy consumption is given as follows:
Energy consumption per day (kWh/day) = Power Consumption times the number of usage hours per day divided by 1000.
Energy consumption per month (kWh/month) = Power Consumption times the number of usage hours per month divided by 1000.
Energy consumption per year (kWh/year) = Power Consumption times the number of usage hours per year divided by 1000.
User is allowed to select in one of the three appliances and once a selection is made, its power consumption would display automatically. User then enters the usage per day. After all the required information from user is obtained, energy consumption per day, month and year would then be calculated and displayed on the right hand side.