Problem Set 7: Movies
Evidence of a 5-point Submission
- Uses
COUNT(*)
whenever possible (as opposed to specifying an arbitrary column) - Uses a numeric type to represent numbers (as opposed to strings, e.g. “2010”)
Evidence of a 4-point Submission
- Uses
id
s to determine unique or specific entries whenever possible (as opposed to names) - May use the
INTERSECT
keyword to find overlapping information (i.e. Query 12)
Evidence of a 3-point Submission
- Uses more
JOIN
operations in one query than are necessary to execute the query- very technical quibble: if the student specifies that egregious
JOIN
s would work fine if you properly index the appropriate columns in movies.db, thenJOIN
s are OK and will work great; but if the given submission runs much slower than would be ideal (~8-10 seconds) without explanation, then feel free to deduct - we can’t imagine the above quibble applying to any but the most comfortable of students
- very technical quibble: if the student specifies that egregious
- May unnecessarily specify
ASC
ordering (ORDER BY
does this by default) - May unnecessarily specify
IS NOT NULL
(running .schema would tell you that the tables already specifyNOT NULL
, so this is redundant)
Evidence of a 2-point Submission
- Uses 2+ more
JOIN
operations in one query than are necessary to execute the query - May unnecessarily cast data types
- May unnecessarily use the
LOWER
function to compare strings (i.e. Query 5) - May unnecessarily use
ROUND
after taking theAVG
of a column
Evidence of a 1-point Submission
- Hard-codes
id
s into queries
Example Implementations (Worse vs. Better)
Query 4
Worse Implementation
The example below represents the number using a string and unnecessarily specifies the movie_id
column be counted
SELECT COUNT(movie_id)
FROM ratings
WHERE rating = '10';
Better Implementation
The example below uses a numeric type to represent the rating and counts the number of rows in the entire table (instead of specifying an arbitrary column)
SELECT COUNT(*)
FROM ratings
WHERE rating = 10;
Query 6
Worse Implementation
The example below uses an extraneous CAST
in the last line
SELECT AVG(rating) AS avg_rating
FROM ratings
JOIN movies ON ratings.movie_id = movies.id
WHERE CAST(year AS INT) >= 2018 ORDER BY title;
Better Implementation
The example below uses no extraneous operations
SELECT AVG(rating)
FROM movies
JOIN ratings ON movies.id = ratings.movie_id
WHERE year = 2012;
Query 13
Worse Implementation
The example below uses two identical pairs of JOIN
statements in each subsection of the query, leading to slow and repetitive calculation
SELECT name FROM people
JOIN stars ON people.id = stars.person_id
JOIN movies ON stars.movie_id = movies.id
WHERE movies.id IN
(
SELECT movies.id FROM movies
JOIN stars ON stars.movie_id = movies.id
JOIN people ON stars.person_id = people.id
WHERE people.name = "Kevin Bacon"
AND people.birth = 1958
)
AND people.name != "Kevin Bacon";
Better Implementation
The example below uses clean SELECT
statements and specifies “Kevin Bacon” using his unique id
SELECT name
FROM people
WHERE id IN
(
SELECT person_id
FROM stars
WHERE movie_id IN
(
SELECT movie_id
FROM stars
WHERE person_id =
(
SELECT id
FROM people
WHERE name = "Kevin Bacon"
AND birth = 1958
)
)
)
AND id !=
(
SELECT id
FROM people
WHERE name = "Kevin Bacon"
AND birth = 1958
);