This open dataset I made for movie fans compiles several ‘greatest films’ lists to find the greatest of the great, and the analysis reveals seven films to be the best of the best.
Here’s a Movie Recommender visualization and interface for the data hosted on Tableau Public:
There’s nothing worse than sitting through a bad movie.
(OK, there are many things that are worse, but it’s still a bummer.)
So I always check reviews, recommendations, and ratings before committing to a film. If it doesn’t pass a certain threshold on a few of my trusted sources, I don’t watch it.
I want to be generally familiar with the history of film, and I want to see the movies that many other informed folks agree are worth seeing. I’m a sucker for a good story.
Some films influenced the culture in big ways and changed the art of filmmaking. I’d like to see as many of those movies as I can.
To provide myself with a reliable movies ideas, I have been collecting lists of “The Greatest Films of All Time.” I have lists created by film critics, film industry leaders, and screenwriters, and I decided to put them all together in one giant spreadsheet. The complete dataset I compiled is posted here as a CSV.
I will be using it to navigate movie choices, and since there are 1,212 titles on the master list, it’s gonna to be a multi-year journey. The source lists are “greatest films” publications from the American Film Institute, the Writer’s Guild of America, The Sight & Sound Top 50, The Guardian, and 1001 Movies to See Before You Die.
I will be moving through the master list watching the films that many film experts, critics, and screenwriters agree are the best movies ever made.
What Movies Are On All ‘Greatest Films’ Lists?
Using Python in a Jupyter Notebook, I imported the csv as as a pandas dataframe and performed a basic query to find which movies appear on every list.
There are 7 films that appear on every list and they are: Citizen Kane, The Searchers, Some Like It Hot, Psycho, The Godfather: Part I, The Godfather: Part II, and Apocalypse Now.
I’ve seen all those films, so I used another query to find which movies appear on 6 of the 7 lists. (I dropped the Sight & Sound Top 50 list.)
The movies on 6 out of 7 lists were: Gone With the Wind, The Wizard of Oz, Casablanca, Double Indemnity, North by Northwest, The Apartment, Dr. Strangelove, The Graduate, Butch Cassidy and the Sundance Kid, The Wild Bunch, Chinatown, Annie Hall, Star Wars, Raiders of the Lost Ark, E.T. The Extra Terrestrial, Goodfellas, and Pulp Fiction.
I’m going to start this hero’s journey through cinema history with North by Northwest. It’s the oldest movie that I haven’t seen yet that appears on 6 out of 7 of the greatest films of all time lists.
To get a sense for when the films on these lists were released, I used a plotting library with Python called matplotlib to make a histogram of the numbers of movies on the list by release year.
I also made a histogram of the number of movies that have particular star ratings on IMDB. The mean IMDB user rating for the films on the master list is 7.7 stars (SD = .54) and the mean Metascore rating is 79 (SD = 10.7).
What sources do you use for picking the next movie you see? Rotten Tomatoes Percent Fresh score, IMDB star rating, or a list you keep around? How many of the top 7 have you seen?