
Originally Posted by
jimbobodoll
Variation in average rating and number of raters over time
In order to test the hypotheses that the average Warseer rating of WD has increased over time, simple bi-variate correlational analyses were first carried out. Results revealed that over the 2 years 1 month we have data for, the average Warseer rating of WD has significantly increased (r(23) = 0.52, p < 0.01). However this is not the end of the story.
To further test the relationship between issue number and average rating of WD, the number of monthly ratings of WD was also examined using the same correlational analysis as before. Findings indicated firstly that as the monthly poll has gone on, significantly less people have been reporting their ratings of WD (r(23) = -0.63, p < 0.01). Secondly, there is some indication that the number of people rating WD each month might be related to the average rating of WD that month such that more raters leads to lower average rating. I say “some indication” because the message from the correlations is mixed, whilst Pearson’s correlations suggest no association (r(23) = -0.21, p > 0.05), Spearman’s suggest otherwise (ρ(23) = -0.43, p < 0.05).
Taken together, these findings might be suggestive of a tri-variable relationship known as “mediation”. That is to say that over time, the number of Warseer raters of WD’s quality has decreased and this has been at least partially responsible for the significant increases we have observed in the average ratings of WD. Providing greater evidence of this relationship is more complicated than the analyses conducted so far (we’d need path analysis), and I’m not sure it’d even return reliable results due to the nature of the data we have and the number of issues we have reviewed so far as a forum.
Testing the nature of these relationships
As with the analysis I conducted last time, I also thought I’d examine the above significant relationships to try and identify the form they take. This seemed especially appropriate given the mixed message concerning the number of ratings and the average rating. If the relationship is not linear (a straight line), then the Pearson’s correlation is an unreliable statistic to use.
To test the relationships identified above, a variety of mathematical relationships were tested with the aim of trying to identifying the one that explained most of the variation in average rating. To start though, the hypothesis was tested that the first year’s average rating might be lower than the second years using a statistical t-test. Results revealed that the second year’s average rating was significantly higher than the first (t(23) = -2.91, p < .01).
Examining whether this relationship was linear or not provided some small indication that we might be observing a “tailing off” of the relationship; although a cubic relationship explained more variance than a linear one (27% vs. 28%), the difference was small. The attached graph shows that the average rating seems to be plateauing to around an average of 5.5 although whether this will last is something only time (and Warseer ratings) will tell.
Of greater difference though was the relationship between the number of WD raters and the average rating they give. Whilst the linear relationship could only explain 5% of the variance (very very low!), the quadratic explained 25% whilst the cubic 35%! Furthermore, as we explained more of the variance, the relationship finally became statistically significant at both the quadratic and cubic levels (p<0.05). This explains the difference between the Pearson and Spearman correlations we observed earlier – the relationship between the number of raters and average rating is not linear, it is best explained as cubic (an “S” shape). The highest average rating was achieved when around 80 people rated, whilst the lowest when 150 rated. Perhaps this indicates that the WD rating thread serves as a voice of dissatisfaction for Warseer members?
Variation in ratings by focus of issue
Next, a statistical analysis called an Analysis of Variance (ANOVA) was used to test the hypotheses that the average Warseer rater: 1) Would vary their rating of WD based on that issue’s focus; 2) Would vary in their numbers who rated WD.
Results revealed that neither number of ratings, nor average Warseer rating varied significantly due to the focus of WD each month (F(2, 22) = 1.784, p=0.191; and F(2, 22) = 0.693, p=0.511 respectively).
Caveat
The nature of the data we have is termed “multi-level” (or “hierarchical linear” if you are in the USA – don’t ask me why, I’ve no idea as to why the terminology varies) and this refers to the fact that we have multiple raters of WD “nested within” each month of its rating. The consequence of this is that special analyses are needed to take this nesting into account or it can decrease the validity of any statistics you run. Unfortunately, I don’t believe this is possible to do given the data – we’d need to identify the trends in individual raters over time, not just the average. As a result, the above results are not as reliable as I’d like but they do paint a brief picture of what’s going on.