Your challenge in this assignment is to develop an interactive data-driven web-based Python application that shows your mastery of many coding concepts as you interact with real world data. You will use Pandas and NumPy modules for managing and interacting with data, MatPlotLib or Pandas charts for plotting, and the Streamlit.io package for creating interactive web applications using Python.
Choose one of these data sets:
- Cambridge, MA AirBnB Data (695 rows, data from insideairbnb.com)
- USA Earthquake Data (20,000 rows, from US Geological Survey https://usgs.gov website and visualizations
- McDonalds Locations in the USA (14,171 rows) from GavinR’s Github
- Boston Uber and Lyft Rideshare Data from 2018, downloaded from Kaggle (693,702 rows)
To ensure students create a variety of projects, you will sign up for the data set you wish to use in class on Thursday. If you miss class, or if the signups are not approximately equally distributed, I will assign a data set for you to use.
If you decide to use the McDonalds data, please run the clean_mcdonalds.py file that accompanies the data to add leading zeros to zip codes in the Northeast.
Your Python code should demonstrate your Python coding skills as you implement several of the concepts that we studied throughout the course that appropriate for your project, such as:
- Coding Fundamentals: data types, if statements, loops, formatting, etc.
- Data Structures: Interact with Lists, Tuples, Dictionaries (keys, values, items)
- Functions: passing parameters, returning values
- Files: Reading data from a CSV File
- Statistics or Pandas module functions for calculating mean, median, etc.
- MatPlotLib or Pandas for creating different types of charts
- StreamLit.io for making interactive applications, displaying charts and maps
- Numpy functions for interacting with arrays (such as np.arange)
- Pandas DataFrames for interacting and manipulating large data sets using filtering, sorting, pivot tables, etc.
You are not required to use all of these. For example, if your application does not need to use a tuple, do not worry about trying to find a way to include one.
The purpose of this part is to get you thinking about what you might do before you start coding. Identify two different queries or questions you can ask about your data set and ways to interact with and present the data based on your understanding of Pandas DataFrames, MatPlotLib, and the Streamlit.io packages.
Describe how your queries will be interactive by incorporating Streamlit’s user interface elements to obtain user input. Describe how you will visually present this data using charts, graphs, Streamlit tables or maps. For example, if analyzing housing data, you might use a dropdown list to specify a list of neighborhoods and a slider to specify a price range. You then might display all rooms for rent in that neighborhood within that price range using a table, chart, or map. (That’s an easy one. At least one of your queries needs to be more complex than this!)
Be sure your page is “user friendly” -and is as “polished” as possible. Provide ample user instructions; label values that are part of the user interaction, make sure your charts have titles, legends or explanations that would be helpful to the user.
Create a Word document describing your plans. Submit it on Blackboard only. I will respond within 24 hours on Blakcboard approving your proposed questions or making suggestions if they appear to be too complicated or too easy. Due dates for proposal.
You may change your queries or visualizations after you start coding if you need to change your plans. If you do this, please notify me during the coding week.
Feel free to add to your project as you explore Pandas and Streamlit capabilities and find cool ways to implement new features. Part of your grade will be a “complexity/originality” score. If you use a module or do something cool that we may not have discussed in class, that will give you a higher score.
Create your Python application with a Streamlit UI and the various visualizations. Create at least two different charts, graphs of different types with custom legends, axis labels, tick marks, colors, other features), or a map showing latitude and longitude. Be sure to include appropriate context or labels in your user interface to cue the reader about which values to specify, and the purpose of each chart or graph. You may wish to add a few sentences explaining each chart. Place all UI controls in the left sidebar, and your visualizations in the main content area. Make your application as professional looking as you can.
As you write your code, be sure to demonstrate your mastery of these capabilities in your project:
- At least one function that has two parameters and returns a value
- At least one function that does not return a value
- Interacting with dictionaries, lists, and tuples
- Using a Python module to calculate a statistical function such as average, median, mode, etc.
- User Interface and dashboard with Streamlit.io
Your code should demonstrate your mastery of at least three Pandas capabilities as appropriate for your queries and data. These include:
- Sorting data in ascending or descending order, multi-column sorting
- Filtering data by one or more conditions
- Analyzing data with pivot tables
- Managing rows or columns
- Add/drop/select/create new/group columns, frequency count, other features as you wish
Usual rules about writing “good” code apply:
- Make your code as modular and easy to follow as possible
- Include a docstring, comments, and meaningful variable names.
- If you did something “cool” in your code that you are incredibly proud of, please write a comment call attention to what you did.
- If you referred to any online articles or other information beyond class examples, please be sure to list them as references in your code.
- Make sure the program runs and the output is correct.
Use this documentation string at the top of your code file:
CS230: Section XXX
Name: Your Name
Data: Which data set you used
This program ... (a few sentences about your program and the queries and charts)
I pledge that I have completed the programming assignment independently.
I have not copied the code from a student or any source.
I have not given my code to any student.
URL: Link to your web application online (see extra credit)
All presentations ill be done in class over two class periods. Please let me know by Wednesday December 9 if you plan to present earlier in the week (Monday for HB1 and Tuesday for HB3) or Thursday (both classes). If the signups are not approximately equally distributed, I will assign a day for you to present.
Post your application to the web by following these Streamlit Sharing instructions. This is a newly released feature. It may take a few days before your request is filled, so sign up for the invite now! As an alternative, you can deploy it to a server on Heroku by following these instructions or similar tutorials you find online by searching for “streamlit deploy heroku”. The extra credit will be five points added to your Assignment score.