In this project, you are expected to write some SQL queries and execute them in a database server we built for you. The schemas of the tables will be given and such tables are loaded with real NBA statistics data. Specifically, the database contains the following six tables:
- coaches_season, each tuple of which describes the performance of one coach in one season;
- teams, each tuple of which gives the basic information of a team;
- players, each tuple of which gives the basic information of one player;
- player_rs, each tuple of which gives the detailed performance of one player in one regular season;
- player_rs_career, each tuple of which gives the detailed regular-season performance of one player in his career;
- draft, each tuple of which shows the information of an NBA draft.
Your job should be done in the machine (the same one you used for project 1). The DBMS used in this project is PostgreSQL (www.postgresql.org). After remotely logging into catalyst by ssh, you can connect to the database using the following command:
and you should see a prompt like this:
and now you are ready to type in any SQL statements you want. Try some simple things like “select * from teams;” to make sure you are really connected. Query against all the tables to get familar with the schema of each table. Note that there has to be a semicolon after each SQL statement you write, otherwise the system will not start processing your query.
Your task in this project is to write the following queries in SQL and make sure they run in the PostgreSQL database. If you make any assumptions, clearly state them as comments in your submitted file.
- Find all the coaches who have coached exactly ONE team. List their first names followed by their last names;
- Find all the players who played in a Boston team and a Denver team (this does not have to happen in the same season). List their first names only.
- Find those who happened to be a coach and a player in the same team in the same season. List their first names, last names, the team where this happened, and the year(s) when this happened.
- Find the average height (in centimeters) of each team coached by Phil Jackson in each season. Print the team name, season and the average height value (in centimeters), and sort the results by the average height.
- Find the coach(es) (first name and last name) who have coached the largest number of players in year 1999.
- Find the coaches who coached in ALL leagues. List their first names followed by their last names.
- Find those who happened to be a coach and a player in the same season, but in different teams. List their first names, last names, the season and the teams this happened.
- Find the players who have scored more points than Michael Jordan did. Print out the first name, last name, and total number of points they scored.
- Find the second most successful coach in regular seasons in history. The level of success of a coach is measured as season_win /(season_win + season_loss). Note that you have to count in all seasons a coach attended to calculate this value.
- List the top 10 schools that sent the largest number of drafts to NBA. List the name of each school and the number of drafts sent. Order the results by number of drafts (hint: use “order by” to sort the results and ‘limit xxx’ to limit the number of rows returned);
Note that the data is not perfectly formatted for our use. For that, you have to try some other techniques that are not a part of the SQL, but are supported by PsotgreSQL. For example, the ID of the same palyer (e.g., ILKID) can be shown as all upper case in one table (e.g., players) but in a mixture of upper and lower cases in another table (e.g., draft). To find more matches, you can use the ‘LOWER’ or ‘UPPER’ function to transform a string to all lower or upper case and then do the match.
You can download a sample script file named proj2.sql from ‘Files/proj2” of Canvas. When you feel comfortable with your queries, copy and paste them to proj2.sql and rename the file to proj2-xxx.sql where xxx is your NetID. Submit this file only! Put all comments, assumption statements in this file (note that a line that starts with “–” is regarded as comments and will not be processed by DBMS). The purpose of having this script file is to run all queries written in the file by typing one command in psql:
NBA=> \i proj2-xxx.sql
The output of this script can be dumped to a file named “proj2-xxx.out” by
NBA=> \o proj2-xxx.out
You can type ‘ \o ‘ to change the output device back to your psql console.
Attention!! You must try the above command to run all your queries as a whole batch in the submitted file and make sure all queries work in the way you expected. You will get heavy deductions (up to 30%) if your query only works when typed into the console one by one!
Data used in this project is provided by basketballconference.com.