Problem Set 7: Movies

Evidence of a 5-point Submission

  • Uses COUNT(*) whenever possible (as opposed to specifying an arbitrary column)
  • Uses a numeric type to represent numbers (as opposed to strings, e.g. “2010”)

Evidence of a 4-point Submission

  • Uses ids to determine unique or specific entries whenever possible (as opposed to names)
  • May use the INTERSECT keyword to find overlapping information (i.e. Query 12)

Evidence of a 3-point Submission

  • Uses more JOIN operations in one query than are necessary to execute the query
    • very technical quibble: if the student specifies that egregious JOINs would work fine if you properly index the appropriate columns in movies.db, then JOINs are OK and will work great; but if the given submission runs much slower than would be ideal (~8-10 seconds) without explanation, then feel free to deduct
    • we can’t imagine the above quibble applying to any but the most comfortable of students
  • May unnecessarily specify ASC ordering (ORDER BY does this by default)
  • May unnecessarily specify IS NOT NULL (running .schema would tell you that the tables already specify NOT NULL, so this is redundant)

Evidence of a 2-point Submission

  • Uses 2+ more JOIN operations in one query than are necessary to execute the query
  • May unnecessarily cast data types
  • May unnecessarily use the LOWER function to compare strings (i.e. Query 5)
  • May unnecessarily use ROUND after taking the AVG of a column

Evidence of a 1-point Submission

  • Hard-codes ids into queries

Example Implementations (Worse vs. Better)

Query 4

Worse Implementation

The example below represents the number using a string and unnecessarily specifies the movie_id column be counted

SELECT COUNT(movie_id)
FROM ratings
WHERE rating = '10';

Better Implementation

The example below uses a numeric type to represent the rating and counts the number of rows in the entire table (instead of specifying an arbitrary column)

SELECT COUNT(*)
FROM ratings
WHERE rating = 10;

Query 6

Worse Implementation

The example below uses an extraneous CAST in the last line

SELECT AVG(rating) AS avg_rating
FROM ratings
JOIN movies ON ratings.movie_id = movies.id
WHERE CAST(year AS INT) >= 2018 ORDER BY title;

Better Implementation

The example below uses no extraneous operations

SELECT AVG(rating)
FROM movies
JOIN ratings ON movies.id = ratings.movie_id
WHERE year = 2012;

Query 13

Worse Implementation

The example below uses two identical pairs of JOIN statements in each subsection of the query, leading to slow and repetitive calculation

SELECT name FROM people
JOIN stars ON people.id = stars.person_id
JOIN movies ON stars.movie_id = movies.id
WHERE movies.id IN
(
    SELECT movies.id FROM movies
    JOIN stars ON stars.movie_id = movies.id
    JOIN people ON stars.person_id = people.id
    WHERE people.name = "Kevin Bacon"
    AND people.birth = 1958
)
AND people.name != "Kevin Bacon";

Better Implementation

The example below uses clean SELECT statements and specifies “Kevin Bacon” using his unique id

SELECT name
FROM people
WHERE id IN
(
    SELECT person_id
    FROM stars
    WHERE movie_id IN
    (
        SELECT movie_id
        FROM stars 
        WHERE person_id = 
        (
            SELECT id
            FROM people
            WHERE name = "Kevin Bacon"
            AND birth = 1958
        )
    )
)
AND id !=
(
    SELECT id
    FROM people
    WHERE name = "Kevin Bacon"
    AND birth = 1958
);