CSSE 290 Web Programming
Lecture 26: Multi-table SQL Queries (Joins)
Reading: 13.3 - 13.5
Attribution:Except where otherwise noted, the contents of this document are
Copyright 2012 Marty Stepp, Jessica Miller, and Victoria Kirst. All rights reserved.
Any redistribution, reproduction, transmission, or storage of part
or all of the contents in any form is prohibited without the author's
expressed written permission.
Otherwise noted: Claude Anderson was given permission to modify the slides for CSSE 290 at Rose-Hulman by author Jessica Miller.
The authors' original slides, based on Web Programming Step by Step, can be seen at http://webstepbook.com.
Some of the examples in some days' slides are from David Fisher at Rose-Hulman, who was kind enough to allow me to use them.
My intention is to mark these examples with [DSF].
Your database on wwwuser
- http://wwwuser.csse.rose-hulman.edu/phpMyAdmin/
- You can also access the database from the command line
-
Use SecureCRT, puTTY or some other SSH program to commect to server
wwwuser.csse.rose-hulman.edu
-
mysql -u username -p
- Your initial password is your username (change it and don't forget it)
- You have one database that you can change
- Can add the tables from the three small databases, and any other table you'd like
- You also have read-only access to the full
imdb
database
Discovering databases and tables
SHOW DATABASES;
SHOW TABLES;
DESCRIBE table;
SHOW TABLES;
+-----------+
| students |
| courses |
| grades |
| teachers |
+-----------+ 4 rows in set
The SQL statement
INSERT INTO table
VALUES (value, value, ..., value);
INSERT INTO students
VALUES (789, "Nelson", "muntz@fox.com", "haha!");
- adds a new row to the given table
- columns' values should be listed in the same order as in the table
- How would we record that Nelson took CSSSE290 and got a B+ in it?
More about INSERT
INSERT INTO table (columnName, columnName, ..., columnName)
VALUES (value, value, ..., value);
INSERT INTO students (name, email)
VALUES ("Lewis", "lewis@fox.com");
- some columns have default or auto-assigned values (such as IDs)
- omitting them from the INSERT statement uses the defaults
The SQL statement
REPLACE INTO table (columnName, columnName, ..., columnName)
VALUES (value, value, ..., value);
REPLACE INTO students
VALUES (789, "Martin", "prince@fox.com");
- just like INSERT, but if an existing row exists for that key (ID), it will be replaced
- can pass optional list of column names, like with INSERT
The SQL statement
UPDATE table
SET column = value,
...,
column = value
WHERE column = value;
UPDATE students
SET email = "lisasimpson@gmail.com"
WHERE id = 888;
- modifies an existing row(s) in a table
- BE CAREFUL! If you omit the WHERE, it modifies ALL rows
The SQL statement
DELETE FROM table
WHERE condition;
DELETE FROM students
WHERE id = 888;
- removes existing row(s) in a table
- can be used with other syntax like LIMIT, LIKE, ORDER BY, etc.
- BE CAREFUL! If you omit the WHERE, it deletes ALL rows
and
an entire database
CREATE DATABASE name;
DROP DATABASE name;
CREATE DATABASE warcraft;
- adds/deletes an entire database from the server
Creating and deleting a table
CREATE TABLE name (
columnName type constraints,
...
columnName type constraints
);
DROP TABLE name;
CREATE TABLE students (
id INTEGER,
name VARCHAR(20),
email VARCHAR(32),
password VARCHAR(16)
);
- adds/deletes a table from this database
- all columns' names and types must be listed (see next slide)
SQL
BOOLEAN
: either TRUE
or FALSE
INTEGER
DOUBLE
VARCHAR(length)
: a string
ENUM(value, ..., value)
: a fixed set of values
DATE
, TIME
, DATETIME
BLOB
: binary data
Column constraints
CREATE TABLE students (
id INTEGER UNSIGNED NOT NULL PRIMARY KEY,
name VARCHAR(20) NOT NULL,
email VARCHAR(32),
password VARCHAR(16) NOT NULL DEFAULT "12345"
);
NOT NULL
: not allowed to insert a null/empty value in any row for that column
PRIMARY KEY
/ UNIQUE
: no two rows can have the same value
DEFAULT value
: if no value is provided, use the given default
AUTO_INCREMENT
: default value is the last row's value plus 1 (useful for IDs)
UNSIGNED
: don't allow negative numbers (INTEGER
only)
Rename a table
ALTER TABLE name RENAME TO newName;
ALTER TABLE students RENAME TO children;
- changes the name of an existing table
Add/remove/modify a column in a table
ALTER TABLE name
ADD COLUMN columnName type constraints;
ALTER TABLE name DROP COLUMN columnName;
ALTER TABLE name
CHANGE COLUMN oldColumnName newColumnName type constraints;
- adds/deletes/respecifies a column in an existing table
- if a column is added, all existing rows are given a default value for that column
13.3: Multi-table Queries
-
13.1: Database Basics
-
13.2: SQL
-
13.3: Multi-table Queries
-
13.4: Databases and PHP
Related tables and keys
students
id | name | email |
123 | Bart | bart@fox.com |
456 | Milhouse | milhouse@fox.com |
888 | Lisa | lisa@fox.com |
404 | Ralph | ralph@fox.com |
|
courses
id | name | teacher_id |
10001 | Computer Science 142 | 1234 |
10002 | Computer Science 143 | 5678 |
10003 | Computer Science 190M | 9012 |
10004 | Informatics 100 | 1234 |
|
grades
student_id | course_id | grade |
123 | 10001 | B- |
123 | 10002 | C |
456 | 10001 | B+ |
888 | 10002 | A+ |
888 | 10003 | A+ |
404 | 10004 | D+ |
|
teachers
id | name |
1234 | Krabappel |
5678 | Hoover |
9012 | Stepp |
|
- primary key: a column guaranteed to be unique for each record (e.g. Lisa Simpson's ID 888)
- foreign key: a column in table A storing a primary key value from table B
- (e.g. records in
grades
with student_id
of 888 are Lisa's grades)
- normalizing: splitting tables to improve structure / redundancy (linked by unique IDs)
Querying multi-table databases
When we have larger datasets spread across multiple tables, we need queries that can answer high-level questions such as:
- What courses has Bart taken and gotten a B- or better?
- What courses have been taken by both Bart and Lisa?
- Who are all the teachers Bart has had?
- How many total students has Ms. Krabappel taught, and what are their names?
To do this, we'll have to join data from several tables in our SQL queries.
Cross product with JOIN
SELECT column(s) FROM table1 JOIN table2;
SELECT * FROM students JOIN grades;
id | name | email | student_id | course_id | grade |
123 | Bart | bart@fox.com | 123 | 10001 | B- |
404 | Ralph | ralph@fox.com | 123 | 10001 | B- |
456 | Milhouse | milhouse@fox.com | 123 | 10001 | B- |
888 | Lisa | lisa@fox.com | 123 | 10001 | B- |
123 | Bart | bart@fox.com | 123 | 10002 | C |
404 | Ralph | ralph@fox.com | 123 | 10002 | C |
... (24 rows returned) |
- cross product or Cartesian product: combines each row of first table with each row of second
- produces M * N rows, where table 1 has M rows and table 2 has N
- problem: produces too much irrelevant/meaningless data
Joining with ON
clauses
SELECT column(s)
FROM table1
JOIN table2 ON condition(s)
...
JOIN tableN ON condition(s);
SELECT *
FROM students
JOIN grades ON id = student_id;
- join: combines records from two or more tables if they satisfy certain conditions
- the
ON
clause specifies which records from each table are matched
- the rows are often linked by their key columns (id)
Join example
SELECT *
FROM students
JOIN grades ON id = student_id;
id | name | email | student_id | course_id | grade |
123 | Bart | bart@fox.com | 123 | 10001 | B- |
123 | Bart | bart@fox.com | 123 | 10002 | C |
404 | Ralph | ralph@fox.com | 404 | 10004 | D+ |
456 | Milhouse | milhouse@fox.com | 456 | 10001 | B+ |
888 | Lisa | lisa@fox.com | 888 | 10002 | A+ |
888 | Lisa | lisa@fox.com | 888 | 10003 | A+ |
Filtering columns in a join
SELECT name, course_id, grade
FROM students
JOIN grades ON id = student_id;
name | course_id | grade |
Bart | 10001 | B- |
Bart | 10002 | C |
Ralph | 10004 | D+ |
Milhouse | 10001 | B+ |
Lisa | 10002 | A+ |
Lisa | 10003 | A+ |
Filtered join (JOIN
with WHERE
)
SELECT name, course_id, grade
FROM students
JOIN grades ON id = student_id
WHERE name = 'Bart';
name | course_id | grade |
Bart | 10001 | B- |
Bart | 10002 | C |
FROM
/ JOIN
glue the proper tables together, and WHERE
filters the results
- what goes in the
ON
clause, and what goes in WHERE
?
ON
directly links columns of the joined tables
WHERE
sets additional constraints such as particular values (123
, 'Bart'
)
What's wrong with this?
SELECT name, id, course_id, grade
FROM students
JOIN grades ON id = 123
WHERE id = student_id;
name | id | course_id | grade |
Bart | 123 | 10001 | B- |
Bart | 123 | 10002 | C |
- The above query produces the same rows as the previous one, but it is poor style. Why?
-
The
JOIN ON
clause is poorly chosen. It doesn't really say what connects a grades
record to a students
record.
- They are related when they are for a student with the same
id
.
- Filtering out by a specific ID or name should be done in the
WHERE
clause, not JOIN ON
.
Giving names to tables
SELECT s.name, g.*
FROM students s
JOIN grades g ON s.id = g.student_id
WHERE g.grade <= 'C';
name | student_id | course_id | grade |
Bart | 123 | 10001 | B- |
Bart | 123 | 10002 | C |
Milhouse | 456 | 10001 | B+ |
Lisa | 888 | 10002 | A+ |
Lisa | 888 | 10003 | A+ |
- can give names to tables, like a variable name in Java
- to specify all columns from a table, write
table.*
- (
grade
column sorts alphabetically, so grades C or better are
ones <=
'C')
- What's the flaw in this, how to fix it?
Multi-way join
SELECT c.name
FROM courses c
JOIN grades g ON g.course_id = c.id
JOIN students bart ON g.student_id = bart.id
WHERE bart.name = 'Bart' AND g.grade <= 'B-';
name |
Computer Science 142 |
- More than 2 tables can be joined, as shown above
- What does the above query represent?
- The names of all courses in which Bart has gotten a B- or better.
- Is this test okay?
A suboptimal query
- Exercise: What courses have been taken by both Bart and Lisa?
- problem: requires us to know Bart/Lisa's Student IDs, and only spits back course IDs, not names.
- Write a version of this query that gets us the course names, and only requires us to know Bart/Lisa's names, not their IDs.
Improved query
- What courses have been taken by both Bart and Lisa?
Practice queries
- What are the names of all teachers Bart has had?
- How many total students has Ms. Krabappel taught, and what are their names?
Designing a query
- Figure out the proper SQL queries in the following way:
- Which table(s) contain the critical data? (
FROM
)
- Which columns do I need in the result set? (
SELECT
)
- How are tables connected (
JOIN
) and values filtered (WHERE
)?
- Test on a small data set (
imdb_small
).
- Confirm on the real data set (
imdb
).
- Try out the queries first in phpMyAdmin or from the command line.
- Write the PHP code to run those same queries.
- Make sure to check for SQL errors at every step.
Example imdb
database
actors
id | first_name | last_name | gender |
433259 | William | Shatner | M |
797926 | Britney | Spears | F |
831289 | Sigourney | Weaver | F |
... |
movies
id | name | year | rank |
112290 | Fight Club | 1999 | 8.5 |
209658 | Meet the Parents | 2000 | 7 |
210511 | Memento | 2000 | 8.7 |
... |
roles
actor_id | movie_id | role |
433259 | 313398 | Capt. James T. Kirk |
433259 | 407323 | Sgt. T.J. Hooker |
797926 | 342189 | Herself |
... |
movies_genres
movie_id | genre |
209658 | Comedy |
313398 | Action |
313398 | Sci-Fi |
... |
directors
id | first_name | last_name |
24758 | David | Fincher |
66965 | Jay | Roach |
72723 | William | Shatner |
... |
movies_directors
director_id | movie_id |
24758 | 112290 |
66965 | 209658 |
72723 | 313398 |
... |
- also available,
imdb_small
with fewer records (for testing queries)
IMDb table relationships / ids
IMDb query example
[stepp@webster ~]$ mysql -u myusername -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
mysql> use imdb_small;
Database changed
mysql> select * from actors where first_name like '%mick%';
+--------+------------+-----------+--------+
| id | first_name | last_name | gender |
+--------+------------+-----------+--------+
| 71699 | Mickey | Cantwell | M |
| 115652 | Mickey | Dee | M |
| 470693 | Mick | Theo | M |
| 716748 | Mickie | McGowan | F |
+--------+------------+-----------+--------+
4 rows in set (0.01 sec)
IMDb practice queries
- What are the names of all movies released in 1995?
- How many people played a part in the movie "Lost in Translation"?
- What are the names of all the people who played a part in the movie "Lost in Translation"?
- Who directed the movie "Fight Club"?
- How many movies has Clint Eastwood directed?
- What are the names of all movies Clint Eastwood has directed?
- What are the names of all directors who have directed at least one horror film?
- What are the names of every actor who has appeared in a movie directed by Christopher Nolan?