## Introduction

This assignment broadly deals with location-based mobile marketing. You have data from a location-based marketing agency which handles geo-fencing campaigns on behalf of advertisers. Due to the very large volume of data, you are given a random sample for two campaigns of a single advertiser - AMC Theaters. The advertising impressions are inserted into the mobile app being used on the device. The data include the following elements: impression size (e.g., 320x50 pixels), app category (e.g., IAB1), app review volume and valence, device OS (e.g., iOS), geo-fence lat/long coordinates, mobile device lat/long coordinates, and click outcome (0 or 1). The column names are self-explanatory, although we have provided a data dictionary file on Canvas.

## Analysis

Data Processing

- a. Create dummy variable imp_large for the large impression
- b. Create dummy variables cat_entertainment, cat_social and cat_tech for app categories
- c. Create dummy variable os_ios for iOS devices
- d. Create variable distance using Harvesine formula to calculate the distance for a pair of latitude/longitude coordinates. Distance (in kilometers) = 6371 * acos( cos( radians(LATITUDE1) ) * cos( radians( LATITUDE2 ) ) * cos( radians( LONGITUDE1 ) - radians(LONGITUDE2) ) + sin( radians(LATITUDE1) ) * sin( radians( LATITUDE2 ) ) )
- e. Create variable distance_squared by squaring variable distance
- f. Create variable ln_app_review_vol by taking natural log of app_review_vol

## Descriptive Statistics

- a. Summarize the data by calculating the summary statistics (i.e., mean, median, std. dev., minimum and maximum) for didclick, distance, imp_large, cat_entertainment, cat_social, cat_tech, os_ios, ln_app_review_vol and app_review_val.
- b. Report the correlations among the above variables.
- c. Plot the relationship of distance (x-axis) and click-through-rate (y-axis), and any other pairs of variables of interest.

## Logistics Regression

- a. Specify the following Logistic regression model:

Dependent variable: didclick

Independent variables: distance, distance_squared, imp_large, cat_entertainment, cat_social, cat_tech, os_ios, ln_app_review_vol and app_review_val. - b. Estimate the model in R (using the glm function) and report coefficients and p-value of the estimates. - c. Discuss your findings and their implications, limiting your answer to a page or so.