Introduction

This assignment broadly deals with location-based mobile marketing. You have data from a location-based marketing agency which handles geo-fencing campaigns on behalf of advertisers. Due to the very large volume of data, you are given a random sample for two campaigns of a single advertiser - AMC Theaters. The advertising impressions are inserted into the mobile app being used on the device. The data include the following elements: impression size (e.g., 320x50 pixels), app category (e.g., IAB1), app review volume and valence, device OS (e.g., iOS), geo-fence lat/long coordinates, mobile device lat/long coordinates, and click outcome (0 or 1). The column names are self-explanatory, although we have provided a data dictionary file on Canvas.

Analysis

Data Processing

• a. Create dummy variable imp_large for the large impression
• b. Create dummy variables cat_entertainment, cat_social and cat_tech for app categories
• c. Create dummy variable os_ios for iOS devices
• d. Create variable distance using Harvesine formula to calculate the distance for a pair of latitude/longitude coordinates. Distance (in kilometers) = 6371 * acos( cos( radians(LATITUDE1) ) * cos( radians( LATITUDE2 ) ) * cos( radians( LONGITUDE1 ) - radians(LONGITUDE2) ) + sin( radians(LATITUDE1) ) * sin( radians( LATITUDE2 ) ) )
• e. Create variable distance_squared by squaring variable distance
• f. Create variable ln_app_review_vol by taking natural log of app_review_vol

Descriptive Statistics

• a. Summarize the data by calculating the summary statistics (i.e., mean, median, std. dev., minimum and maximum) for didclick, distance, imp_large, cat_entertainment, cat_social, cat_tech, os_ios, ln_app_review_vol and app_review_val.
• b. Report the correlations among the above variables.
• c. Plot the relationship of distance (x-axis) and click-through-rate (y-axis), and any other pairs of variables of interest.

Logistics Regression

• a. Specify the following Logistic regression model:
Dependent variable: didclick
Independent variables: distance, distance_squared, imp_large, cat_entertainment, cat_social, cat_tech, os_ios, ln_app_review_vol and app_review_val.
• b. Estimate the model in R (using the glm function) and report coefficients and p-value of the estimates. - c. Discuss your findings and their implications, limiting your answer to a page or so.