Example SQL queries: distinct, joins, group by, order by, count, and sorting

As a brief note to self, this is a SQL query I use in the Scala Cookbook to get a distinct list of all movie titles in the MovieLens dataset that are five-star rated in the Comedy/Romance genre:

val q = """
    select distinct(m.title), r.rating, m.genres
    from movies m, ratings r
    where m.movieId == r.movieId 
    and m.genres like '%Comedy%Romance%'
    and r.rating == 5.0
    order by m.title"""

spark.sql(q).show

Using Apache Spark — which I demonstrate in the Scala Cookbook — the output of that SQL query looks like this:

+--------------------+------+--------------------+
|               title|rating|              genres|
+--------------------+------+--------------------+
|(500) Days of Sum...|   5.0|Comedy|Drama|Romance|
|           10 (1979)|   5.0|      Comedy|Romance|
|10 Items or Less ...|   5.0|Comedy|Drama|Romance|
|10 Things I Hate ...|   5.0|      Comedy|Romance|

Similarly, here’s another SQL query where I use count, group by, and order by to get a sorted count of the number of people who gave each Comedy/Romance movie a five-star rating:

val q = """
    select m.title, count(1) as the_count
    from movies m, ratings r
    where m.movieId == r.movieId 
    and m.genres like '%Comedy%Romance%'
    and r.rating == 5.0
    group by m.title
    order by the_count desc"""

// using 'false' in 'show' tells spark to print the full column widths
spark.sql(q).show(100, false)

That query’s output looks like this:

+-------------------------------------------------------------+---------+
|title                                                        |the_count|
+-------------------------------------------------------------+---------+
|Forrest Gump (1994)                                          |115      |
|Princess Bride, The (1987)                                   |61       |
|Life Is Beautiful (La Vita è bella) (1997)                   |43       |
|Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)         |42       |
|Groundhog Day (1993)                                         |39       |
... this post is sponsored by my books ...

#1 New Release!

FP Best Seller

As a summary, if you wanted to see a couple of SQL queries that demonstrate group by, order by, distinct, joins, count, and sorting — and also use Apache Spark — I hope these examples are helpful. And for more details on these and other queries using Spark, see the Scala Cookbook.