Tags:

Transcription

DEGREE PROJECT IN TECHNOLOGY,FIRST CYCLE, 15 CREDITSSTOCKHOLM, SWEDEN 2018What are the main factorsaffecting movie profitability?KARL WALLSTRÖMMARKUS WAHLGRENKTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ENGINEERING SCIENCES

What are the main factorsaffecting movie profitability?KARL WALLSTRÖMMARKUS WAHLGRENDegree Projects in Applied Mathematics and Industrial EconomicsDegree Programme in Industrial Engineering and ManagementKTH Royal Institute of Technology year 2018Supervisors at KTH: Daniel Berglund, Julia LiljegrenExaminer at KTH: Henrik Hult

TRITA-SCI-GRU 2018:188MAT-K 2018:07Royal Institute of TechnologySchool of Engineering SciencesKTH SCISE-100 44 Stockholm, SwedenURL: www.kth.se/sci

AbstractThis thesis consists of two parts which both focus on the profitability of movies with a theatrical release. The first part has examined if it is possible to create a model for predictingand understanding the profitability of movies. Factors that influence the profitability have alsobeen identified. This has been achieved through a regression analysis with data on moviesfrom 2015-2017. A final multiple regression model has been created which explains 41% of thevariability in the data. Significant influencing factors with at least a level of 5% significancehave been identified such as the genres Horror and Musical, the MPAA-ratings PG and PG-13,the creative types Historical Fiction and Dramatization, Rotten Tomatoes Score and Sequel.The second part is an analysis on movie theaters as a distribution channel with respect to profitability for distribution companies using Porter’s five forces. The five forces analysis suggeststhat movie theaters are a good distribution channel for studios which can provide mainstreammovies while studios should consider releasing their low profile movies directly to streamingservices since it might be more profitable.

SammanfattningDen här uppsatsen består av två olika delar som båda fokuserar på lönsamheten hos filmersom släpps ut på bio. Den första delen har undersökt ifall det är möjligt att skapa en modellför att förutspå och förstå lönsamheten hos filmer. Faktorerna som påverkade lönsamheten harockså identifierats. Detta har gjorts genom en regressions analys med data från filmer mellanåren 2015-2017. En slutgiltig multipel regressionsmodell har skapats som kan förklara 41% avvariabiliteten i datan. Signifikanta påverkande faktorer med en signifikansnivå på minst 5%har identifierats som genrerna Horror och Musical, MPAA-betygen PG och PG-13, kreativatyperna Historical Fiction och Dramatization, Rotten Tomatoes Score och Sequel. Den andradelen av rapporten är en analys på hur lämplig biografer är som distributionskanal med avseendepå lönsamheten för distributionsföretagen. Detta har gjorts genom en Porters femkrafts modell.Femkraftsmodellen antyder att biografer är en bra distributionskanal for studios som kan skapamainstream filmer medan studios borde överväga att släppa deras lågprofil filmer direkt tillstreamingtjänster eftersom det kan vara mer lönsamt.

Karl Wallström Markus WahlgrenBachelor ThesisAcknowledgementsWe would like to thank OpusData for supplying us with the necessary data making this thesispossible.Page 5

Karl Wallström Markus WahlgrenBachelor ThesisContents1 Introduction1.1 Background . . . . . . .1.2 Purpose and motivation1.3 Disposition . . . . . . .1.4 Scope . . . . . . . . . .1.5 Problem Statement . . .99101010112 Mathematical Theory2.1 Multiple linear regression . . . . . . . . . . . .2.2 Ordinary Least Squares . . . . . . . . . . . . .2.3 Assumptions . . . . . . . . . . . . . . . . . . .2.4 Dealing with violation of assumptions . . . . .2.4.1 Residuals vs fitted . . . . . . . . . . . .2.4.2 Residuals vs regressors plot . . . . . . .2.4.3 Q-Q plot . . . . . . . . . . . . . . . . .2.4.4 Histogram of residuals . . . . . . . . . .2.4.5 Scale-location plot . . . . . . . . . . . .2.4.6 Box-Cox Transform . . . . . . . . . . .2.5 Multicollinearity . . . . . . . . . . . . . . . . .2.5.1 VIF . . . . . . . . . . . . . . . . . . . .2.6 Detecting leverage and influential observations2.6.1 Cook’s distance . . . . . . . . . . . . . .2.6.2 Residuals vs Leverage plot . . . . . . . .2.7 Hypothesis testing . . . . . . . . . . . . . . . .2.7.1 F-Statistic . . . . . . . . . . . . . . . . .2.7.2 t-test . . . . . . . . . . . . . . . . . . .2.8 Variable selection and model selection . . . . .2.8.1 All possible regression . . . . . . . . . .2.8.2 Cross-validation . . . . . . . . . . . . .2.8.3 R2 and adjusted R2 . . . . . . . . . . .2.8.4 Mallows’s Cp statistic . . . . . . . . . .2.8.5 AIC . . . . . . . . . . . . . . . . . . . .2.9 Dummy variables . . . . . . . . . . . . . . . . 718. . . . . . . . . . . . . . . . . . . . . . . . . . . .values.1818181819202021212124.3 Method3.1 Data collection . . . . . . . . . . . . . . .3.2 Software . . . . . . . . . . . . . . . . . . .3.3 Dataset from Opus Data . . . . . . . . . .3.4 Preprocessing of Opus Dataset . . . . . .3.5 Creation of new variables . . . . . . . . .3.6 Initial transformation of variables . . . . .3.7 Choice of response variable . . . . . . . .3.8 Choice of regressors . . . . . . . . . . . .3.9 Initial model . . . . . . . . . . . . . . . .3.9.1 Residuals plotted against the fitted.Page 6

Karl Wallström Markus WahlgrenBachelor Thesis3.9.2 The residuals plotted against the regressor variables .3.9.3 Normal Q-Q plot . . . . . . . . . . . . . . . . . . . . .3.9.4 Histogram of residuals . . . . . . . . . . . . . . . . . .3.9.5 Scale-Location plot . . . . . . . . . . . . . . . . . . . .3.9.6 Residual vs leverage plot . . . . . . . . . . . . . . . .3.9.7 Multicollinearity . . . . . . . . . . . . . . . . . . . . .3.10 Transformation of the model . . . . . . . . . . . . . . . . . .3.11 Reduction of the model . . . . . . . . . . . . . . . . . . . . .3.11.1 Models generated by the first approach . . . . . . . .3.11.2 Models generated by the second approach . . . . . . .3.11.3 Cross validation results . . . . . . . . . . . . . . . . .3.11.4 Choice, verification and transformation of final model4 Result4.1 Final4.1.14.1.24.1.34.1.44.1.54.1.6model . . . . . . . . . . . . . . . . .Residuals plotted against the fittedNormal Q-Q plot . . . . . . . . . .Histogram of residuals . . . . . . .Scale-location plot . . . . . . . . .Residual vs leverage plot . . . . .Multicollinearity . . . . . . . . . . . . .values. . . . . . . . . . . . . . . .5 Discussion5.1 Evaluation of final model . . . . . . . . . .5.2 Impact of the regressor variables . . . . . .5.2.1 Rating . . . . . . . . . . . . . . . . .5.2.2 Genre . . . . . . . . . . . . . . . . .5.2.3 Sequel . . . . . . . . . . . . . . . . .5.2.4 RottenTomatoesScore . . . . . . . .5.2.5 IMDbScore . . . . . . . . . . . . . .5.2.6 DirectorScore . . . . . . . . . . . . .5.2.7 CreativeType . . . . . . . . . . . . .5.2.8 Source . . . . . . . . . . . . . . . . .5.2.9 ProductionMethod . . . . . . . . . .5.2.10 Confidence intervals of the regressors5.3 Limitations of model and approach . . . . .5.3.1 Budget estimations . . . . . . . . . .5.3.2 Marketing . . . . . . . . . . . . . . .5.3.3 Casting . . . . . . . . . . . . . . . .5.3.4 Initial model formulation . . . . . .5.3.5 Limitation due to scope . . . . . . .5.4 Conclusion . . . . . . . . . . . . . . . . . 839393940404040414142424242434343Page 7

Karl Wallström Markus Wahlgren6 Porter’s five forces analysis6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .6.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . .6.3 Problem statement . . . . . . . . . . . . . . . . . . .6.4 Method . . . . . . . . . . . . . . . . . . . . . . . . .6.4.1 Types of movies . . . . . . . . . . . . . . . .6.4.2 Choice of Porter’s five forces analysis . . . . .6.5 Theory . . . . . . . . . . . . . . . . . . . . . . . . . .6.6 Five forces applied on the cinematic theater industry6.6.1 Threat of new entrants . . . . . . . . . . . . .6.6.2 Threat of substitutes . . . . . . . . . . . . . .6.6.3 Bargaining power of customers . . . . . . . .6.6.4 Bargaining power of suppliers . . . . . . . . .6.6.5 Industry rivalry . . . . . . . . . . . . . . . . .6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . .6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . .6.9 Limitation of approach . . . . . . . . . . . . . . . . .Bachelor Thesis.4444444545454646474748494950505252Page 8

Karl Wallström Markus Wahlgren1Bachelor ThesisIntroduction1.1BackgroundMotion pictures have been a staple in the modern society ever since the emergence of dedicatedmovie theaters at the beginning of the 20th century. The global box office revenue is forecast toincrease from about 38 billion in 2016 to nearly 50 billion in 2020.1Creating a movie is an intricate process with multiple stages and hurdles. Production requiresa lot of capital upfront and a long wait until any returns materialize. Movie production is consequently a risky venture since there exists a lot of uncertainty on whether a movie will be profitableor not. This is especially a problem for big-budget movies since there is a lot of capital at stake.The production of a movie can be broken down into five important stages:2 DevelopmentThe stage where the movie idea is determined, necessary rights are acquired, the screenplayis written and where the financing for the movie is obtained. Pre-productionPreparations before shooting are made such as hiring cast and crew, locations to shoot in arelocated and constructing necessary sets and props. ProductionThe actual shooting of the movie where raw footage and the additional sound is recorded. Post-productionThe recorded footage and sound is edited and combined with visual effects and music into thefinal product. DistributionThe movie is then marketed, distributed and released to cinemas or other media.A select number of large studios are attempting to capture the audience’s attention. 89% of totalbox office revenue in 2017 was accounted for by 7 different studios.3 But not all studios are successful, Sony Pictures division for instance reported a 913 million loss in the quarter ending in Dec 312016.4 Viacom reported a 364 million loss in 2016.5The risk associated with creating blockbusters could be an explanation to the rise of movie sequels, remakes and adaptations that can be seen in today’s movie industry where these kinds of1 Statista. Global box office revenue 2016 to 2020(in billion U.S. Dollars). June 2016. box-office-revenue/ (retrieved 2018-03-12)2 Dems, Kristina. What are The Five Stages of Filmmaking? Brighthub. 2010-07-12. s/77345.aspx (retrieved 2018-03-8)3 Box Office Mojo.Studio Market Share. http://www.boxofficemojo.com/studio/?view company&view2 yearly&yr 2017&p .htm (retrieved 2018-03-15)4 Frater, Patrick. Sony Pictures’ 913-Million Loss Cuts Group Profits. Variety. 2017-02-01. ctures-division-cut-group-profits-1201976056/ (retrieved 2018-03-23)5 Szalai, Georg. Studio-by-Studio Profitability Ranking: Disney Surges, Sony Sputters. Hollywood -studio-profitability-977490(retrieved 2018-04-01)Page 9

Karl Wallström Markus WahlgrenBachelor Thesismovies are a safer investment since they already have an established audience.6 Multiple differentinvestment decisions apart from making sequels exists in the different production phases. To whichscreenplay should the rights be bought? Should A-list actors be cast? When should the movie bereleased? How should the movie be distributed? Examining how these types of decisions affectprofitability is the goal of this thesis.1.2Purpose and motivationAn aim of this project is to use regression analysis to gain insight regarding what influences theprofitability of a movie. If this thesis is able to produce a satisfactory model for profitability thenit would be able to mitigate some of this risk associated with investment decisions in movie production. A movie is still something that is artistic in its nature so believing that a mathematicalmodel can rule is nonsensical but knowing how a certain decision during the production of a moviewill shift the profitability horizon could still be of value.How a movie should be distributed with regard to profitability is also a main factor which thethesis will examine. This will specifically be done with a porters five forces analysis focused on themovie theatre industry. The reasoning behind this analysis is to complement the regression analysis, which target intrinsic values of a movie and its effect on profitability, by targeting distributionseffect on profitability.1.3DispositionSince this thesis consists of two parts (the regression analysis and Porter’s five forces analysis)they will have their own scope and problem statement. The scope and problem statement for theregression analysis will be presented below and the scope and problem statement for Porter’s fiveforces analysis will be presented later in section 6.1.4ScopeThe thesis will focus on motion pictures with at least a cinematic release and an estimated budgetgreater than 5 million. As a result, the thesis will focus on movies produced by relatively established movie studios which is our goal. The thesis will investigate movie profitability with respect tomultiple variables and movies released within 2015-2017 will be used for this part. This restrictionis made to ensure that the model is not affected by changes, such as trend shifts, dependent ongreater time differences. It would not make sense to compare movies from this year to movies released in the eighties since the influencing factors have probably changed since then. What kind ofpotential profitability influencing factors to examine and the scope associated with those decisionswill be discussed in section 3.1.6 Koehler, Michael. Risk and Originality in Today’s Hollywood The Age of Presold Concepts. Lights film and-originality-in-todays-hollywood (retrieved 2018-03-17)Page 10

Karl Wallström Markus Wahlgren1.5Bachelor ThesisProblem Statement Is it possible to build a model with multiple linear regression to predict and/or understandmovie profitability? Which factors influence profitability the most?2Mathematical Theory2.1Multiple linear regressionRegression analysis is a statistical technique for explaining and modeling the relationship betweendifferent variables. The general idea of linear regression is that you can model the relationshipbetween a response variable y and another variable x called the regressor variable. The responsecan thereby be explained by the regressor variable since it is dependent on it. The multiple linearregression model is just a linear regression model with multiple regressor variables xi that has alinear relationship with the response variable y. The relationship can be described by the followingequation.7y β0 β1 x1 β2 x2 . βk xk (3.1)Where y is the response variable which is dependent on k regressor variables x and the error .The parameters βj , j 0, 1, 2, 3, ., k are called the regression coefficients where β0 is the intercept.The coefficients can be determined by using collected data and the method of least squares. A moreconvenient way of displaying the model is to use matrix notations which transform equation (3.1)to the following equation.8y Xβ where y1 y2 y . , . yn β0 β1 β . , . βk(3.2) 1 1 X . .x11x21.x12x22.······. x1kx2k . . 1xn1xn2···xnk 1 2 . . n7 Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to linear regression analysis.Vol. 821. John Wiley Sons. 2012. p. 688 ibid p. 72Page 11

Karl Wallström Markus Wahlgren2.2Bachelor ThesisOrdinary Least SquaresThe parameter β in the equation of the multiple linear regression model is unknown and must beestimated. Ordinary least squares (OLS) is a method that can be used to estimate the regressioncoefficient β. OLS estimates the coefficients using sample data where n observations will givedifferent data points of the following kind which is called the sample regression model.9yi β0 β1 xi1 β2 xi2 . βk xik ii 1, 2, ., n(3.3)OLS estimates the coefficients so that the sum of the squares of the differences between the observations of the dependent variable yi and the predicted values from the linear model in equation(3.1) is at a minimum.When using the matrix notations introduced in equation (3.2) the goal is to find the vector ofleast-squares estimators, β̂, that minimizes the least-square function.10S(β) Pn2i 1 i 0 (y Xβ)0 (y Xβ)(3.4)Which gives the least squares normal equations.X 0 X β̂ X 0 y(3.5)Which when solved gives the least squares estimator of β.β̂ (X 0 X) 1 X 0 y(3.6)provided that the inverse matrix (X 0 X) 1 exists.2.3AssumptionsThe assumptions made when creating multiple linear regression models are the following11:1. The relationship between the response variable and the regressor variables is linear, at leastapproximately.2. The error term has zero mean.3. The error term has constant variance σ 2 .4. The errors are uncorrelated.5. The errors are normally distributed.9 ibidp. 71p. 7211 ibid p. 12910 ibidPage 12

Karl Wallström Markus WahlgrenBachelor ThesisThe linear regression model assumptions must be examined and must hold before using the regression model. A number of graphical methods exist to identify any violations of the assumptionsand there also exists methods to handle these violations such as different transforms of either theresponse variable or regressor variables. The methods used in this thesis will be presented below.2.42.4.1Dealing with violation of assumptionsResiduals vs fittedA plot of residuals versus the corresponding fitted values can be useful to identify violations ofthe linear regression assumptions.12 Violations can be identified by searching for patterns in theplot. The desired pattern is a horizontal band which means that the variance is constant andis called homoscedasticity. An unwanted pattern could be funnel shapes which imply that thevariance is not constant, which is called heteroscedasticity, or non-linear shapes which imply thatthe relationship between response variables and regressor variables are not linear. Transforms ofresponse or regressor variables can be used to ensure that the assumptions hold.2.4.2Residuals vs regressors plotA plot of residuals versus regressor variables is also helpful to identify violations of the assumptions.13 To identify violations patterns are searched for as with the plot of residuals versus thefitted values. Transforms of the regressor variables can be the solution to this problem.2.4.3Q-Q plotThe Q-Q plot, which stands for the quantile-quantile plot, is a graphical method which can beused to determine if some data comes from some theoretical distribution. This is achieved byplotting two sets of quantiles against each other and if the quantiles truly come from the samedistribution the plotted points should form a line. This is useful for checking if the assumption ofnormally distributed residuals with mean zero holds. The normal Q-Q plot is created by plotting thestandardized residuals against the theoretical quantiles which should form a line if truly normallydistributed.2.4.4Histogram of residualsAnother way to check if the normality assumptions with mean zero holds is to investigate a histogram of the residuals. The histogram should follow a normal distribution with mean zero.2.4.5Scale-location plotThe scale-location plot can be useful for identifying violations against the assumption of constantvariance. It is a plot of the square root of the standardized residuals versus the fitted values andshows if residuals are equally spread along the range of the regressors. The desired result is ahorizontal line with randomly equally spread points.12 ibid13 ibidp. 139p. 141Page 13

Karl Wallström Markus Wahlgren2.4.6Bachelor ThesisBox-Cox TransformThe Box-Cox transform can be used to correct non-normality or non-constant variance.14 It usesthe power transformation y λ where λ is a parameter which needs to be estimated. A problem ariseswhen λ 0 since then the transformation will be worthless. A fix for this is to use:(y λ 1)when λ 6 0λand log(y) when λ 0The λ can be estimated computationally by minimizing the residual sum of squares from the fittedmodel SSRes (λ). The wanted λ is the one which minimizes this function.2.5MulticollinearityThe problem of Multicollinearity arises when there exist near-linear dependencies among theregressor variables and will have a serious impact on the least-square estimators.15 A strong multicollinearity between regressor will lead to large variances and covariances for the least-squareestimators of the regression coefficients.162.5.1VIFThe variance inflation factor can be used when identifying multicollinearity. It is based on the 1matrix C (X’X) where j th diagonal element of C can be written as Cjj (1 Rj2 ) 1 , where2Rj is the coefficient of determination.17 If xj is linearly dependent on some of the remaining regressors, Cjj will be large and since the variance of the j th regression coefficient is Cjj σ 2 , Cjj can beviewed as a factor the least-square estimators variance is increased with because of the near-lineardependencies with other regressors. So V IFj is therefore the following.V IFj Cj j (1 Rj2 ) 1Where a VIF which exceeds 5 or 10 is considered to be an indication of that multicollinearityaffects the regression coefficients negatively.182.6Detecting leverage and influential observationsWhen creating a multiple linear regression model it is important to investigate if leverage or influential observations exist within the chosen data-set. The difference between leverage points andinfluential observations is that leverage points lie approximately along the regression line but withabnormal x-values while influential observations depart from the regression line.Influential points have a noticeable impact on the regression model and the model will be drawn14 ibidp.p.16 ibid p.17 ibid p.18 ibid p.15 ibid182285289296296Page 14

Karl Wallström Markus WahlgrenBachelor Thesistowards these points.19 This is problematic since a model could be severely affected by a relativelyfew amount of points. Influential points that have a bad impact on the model should, therefore, beconsidered for removal.While not all points with unusual x-values are influential they can potentially play an importantrole when determining the properties of the regression model. Remote leverage points could have adisproportionate impact on the model parameters and should, therefore, be identified and handled.20Diagnostics to detect influential points is presented below.2.6.1Cook’s distanceThe Cook’s distance is a diagnostic tool for detecting influential and leverage points and does thisby measuring the squared distance between the least-squared estimate β̂ based on all n points in the21dataset and the estimate βˆThe Cook’s distance(i) which is obtained by deleting the i th point.can be expressed asDi (X 0 X, pM Sres ) 00ˆ(βˆ(i) β̂) X X(β(i) β̂),pM Sresi 1, 2, ., n(3.7)where M Sres is the mean square of the residuals and p is the number of regressors. Observations with large Di values will have a noticeable influence on the least-square estimators and needsto be handled some way. This thesis will classify Di values as large when they are greater than 1.2.6.2Residuals vs Leverage plotThe residuals versus leverage plot is a useful plot to detect influential observations. Data pointswith a high influence will influence the model greatly and the model would not be the same if thisdata point is eliminated. An observation which is not influential would not change the model muchif eliminated. The residuals vs leverage plot will contain dashed lines representing certain values ofCook’s distance. Influential observations can be identified since these will be outside of the dashedlines representing a Cook’s distance equal to 1.2.72.7.1Hypothesis testingF-StatisticThe F-test is used to determine if two populations variances are equal or not and does this bycomparing the variances between the populations and within the populations. This is useful whendetermining whether the linear regression model in question provides a better fit to the data thanthe model with no regressor variables. The two hypothesis of the F-test for multiple regressionmodel are the following.2219 ibidp.p.21 ibid p.22 ibid p.20 ibid21121221584Page 15

Karl Wallström Markus WahlgrenBachelor Thesis The null hypothesis: H0 : β1 β2 . βk 0Which means that the model with no regressor variables fits the data as good as the modelin question. The alternative hypothesis: H1 : βj 6 0 for at least one jWhich means that the model in question fits the data better than the model with no regressorvariables.A rejection of the null hypothesis implies that at least one of the regressors are statistically significant. The F-statistic which can be acquired through an ANOVA-table can be used when decidingto reject or accept the null hypothesis and is defined as the following.F0 s M SRM SresWhere M SR is the mean square due to regression and M Sres is the mean square due to the residuals.If the null hypothesis holds this division would be equal to one since the variances between andwithin the populations would be the same and the model in question would be as good as the modelwith no variables. A large positive F-statistic that is far away from one would, therefore, be thedesired result.The P-value which also is provided through an ANOVA-table is the probability of getting theacquired F-statistic while the null hypothesis is true. Because of this, the corresponding P-valuemust always be used alongside the F-statistic. If the P-value is lower than the set significance levelthe null hypothesis can be rejected.2.7.2t-testA t-test is used to test the significance of an individual regressor coefficient and is based on the tdistribution. Important to note is that this test the significance of a coefficient while the remainingregressors are included in the model. The hypotheses are as followsH0 : βj 0, H1 : βj 6 0Test statistic T0 is computed as T0 βˆjSE(βˆj )The null hypothesis fails to be rejected if t α2 ,n 2 T0 t α2 ,n 22.82.8.1Variable selection and model selectionAll possible regressionAll possible regression is a method to reduce a model to possibly find a better model. The methodgenerates all possible regression models based on the available regressors and then compares thesemodels. There are several different criteria that can be used to compare the models which meansthat it is seldom possible to arrive at a single final model.Page 16

Karl Wallström Markus WahlgrenBachelor ThesisAll possible regression is preferable compared to types of stepwise regression since it is more thorough. A downside to the method is the computational load since there are 2k different models toexamine.2.8.2Cross-validationCross-validation is a method to examine the predictive power of a model. The principle behindcross-validation is that the model is fitted to a specific part of the dataset, this part is called thetraining set and then tested on the rest of the data, called the test set. k-fold cross validation meansthat the data is randomly split into k folds. One fold is used a test set and the remaining k-1 foldsare used as training set. This process is then repeated for each fold. The k different results canth

aren 2015-2017. En slutgiltig multipel regressionsmodell har skapats som kan f orklara 41% av variabiliteten i datan. Signi kanta p averkande faktorer med en signi kansniv a p a minst 5% har identi erats som genrerna Horror och Musical, MPAA-betygen PG och PG-13, kreativa typerna Historical Fiction och Dramatization, Rotten Tomatoes Score och .