% DRAFT: Review: "Ego Depletion", JDW 2010
% Michael Stone
% April 20, 2013

Earlier this weekend, I read a fun [paper][pdf] ([SAGE]) by Job, Dweck,
and Walton questioning the "strength" model of willpower, I think as part of a
much larger research program on whether and how our ["self-theories"][ST]
influence, e.g., our ability to perform or to persist in performing difficult,
frustrating, or tiring tasks.

[pdf]: http://www.stanford.edu/~gwalton/home/Publications_files/Job,%20Dweck,%20%26%20Walton,%202010.pdf
[SAGE]: http://pss.sagepub.com/content/21/11/1686
[ST]: http://www.amazon.com/Self-theories-Motivation-Personality-Development-Psychology/dp/1841690244

Some things I really liked about this paper were that:

  * it asks a valuable research question,
  * it uses an initial descriptive study to motivate a randomized experiment, and
  * it considers some alternate explanations of the available data.

However, there were also some parts that bothered or confused me. As a result,
in the hopes that my bothers and confusions may be of some use to others, I've
written up several "bug reports" below that may be of interest.

Okay, here we go:

  * Selection bias: "People" is a great and accessible gender-neutral plural
    noun but I nevertheless wish that it was more clear in the abstract and
    conclusion that the "people" whom the results most directly describe are
    mostly? (all?) undergraduate university students.

  * Ethical considerations: I think it much more likely than not that the
    described research was approved by the Stanford or University of Zurich
    IRBs as minimal-risk human subjects research but I still wish that the
    paper made it clear one way or the other, e.g., with a link to a record of
    the approved protocol. (Note: the [2012 PSS submission guidelines][PSS
    guidelines] now request this information but maybe they didn't in
    2009-2010?)

[PSS guidelines]: http://www.psychologicalscience.org/index.php/publications/journals/psychological_science/ps-submissions

  * Modeling Equations: the JDW authors report that they used logistic
    hierarchical linear models to comprehend their data and they provide some
    useful coding information but their paper doesn't contain the modeling
    equations for any of the models that they fit, thereby making it needlessly
    difficult to understand the meaning of the regression coefficients that
    they report. (Note: one of my favorite books, [ALDA], has lots of really
    nice examples of how to write about hierarchical linear models like the
    ones that I imagine were used here.)

[ALDA]: http://www.amazon.com/Applied-Longitudinal-Data-Analysis-Occurrence/dp/0195152964

  * Effect size: the research agenda underlying all four reported study designs
    seems to posit that particular patterns of differences in treatment and
    control group sample Stroop-test accuracy statistics can falsify (or at
    least, cast doubt on) the Baumeister et al. "strength model of
    self-control" but JDW do not explain how their measured effect size should
    influence our belief in the "strength" model. (Note: the [2012 PSS
    submission guidelines][PSS guidelines] now also require this information.)

  * Graphical integrity: the JDW paper contains five plots, numbered
    "Figure 1", "Figure 2", "Figure 3-A", "Figure 3-B", and "Figure 3-C".
    Examining just plots "1" and "3-B":

      * Plot 1: According to the legend and axis labels for this plot, each
        record in the underlying dataset has been labeled with one of five
        conditions; namely: "Nondepleting + Nonlimited-Resource Theory",
        "Nondepleting + Limited-Resource Theory", "Depleting +
        Nonlimited-Resource Theory", "Depleting + Limited-Resource Theory", or
        "Not Labeled", but how was this labeling done?
        
        According to the fine print in the figure caption, 
        
          > "The limited-resource-theory group represents participants 1
          > standard deviation above the mean on the implicit-theories
          > measure. The nonlimited-resource-theory group represents
          > participants 1 standard deviation below the mean on the
          > implicit-theories measure."
        
        There are a couple of problems here:
        
          1. Ambiguity: I *think* the claim in the figure caption is intended
             to mean something like "The limited-resource-theory group
             represents participants whose score on the implicit-theories
             measure was *at least* one standard deviation above the mean" but
             I can't tell for sure.
        
          2. Distributional Assumptions: Labeling participants based on
             z-scores implicitly assumes that the underlying distribution of
             scores is normally distributed but no evidence is given that this
             is so. Why should I believe it to be true?
        
          3. Power: Assuming that I read the caption correctly, how many
             observations were thrown out as a result of being unlabeled?
        
        Zooming out, though, there are even bigger problems:
        
          1. Bad summarization: This plot only shows one number for
             each condition, yet each condition ostensibly labels many records.
             In short: "where are the box plots"?

          2. Unnecessary summarization: why group the participants at all? Why
             not just draw the scatterplot of *all* participant's
             mistake-frequencies as a function of their
             implicit-theories-measure score, perhaps faceted or colored by
             their depletion treatment condition? Then you could plot the
             fitted models as density heat-maps in the background, thereby
             revealing outliers or other model-fitting problems! (Note: Hadley
             Wickham's [ggplot2] package makes this kind of plotting *super
             fun and easy*!)
    
    * Plot 3b: All five plots have dependent measures with labels that begin
      with the prefix "Probability of a Mistake" and four of the five figures
      have ratio scales for these measures: that is, their scales cover
      intervals ranging from $[0, 0.08]$ to $[0, 0.12]$. Unlike all the other
      plots, Plot 3b's scale is presented as an interval scale, covering the
      interval $[0.20, 0.45]$. Why? (Just to devote more ink to showing the
      measured inter-class differences? Or is there some deeper confusion
      about what scale matters for measuring effect sizes?)

  * Traceability: We've seen some otherwise interesting research brought down
    recently by simple slips, e.g., in calculation, model-fitting, and data
    entry. In the software world, we try control for this sort of problem in a
    bunch of ways, most notably with [open source]. Anyway, as many others have
    requested, perhaps it's time to start providing links to the raw data and
    to the intermediate analysis results as part of the published supplemental
    materials? (Also, if the data are already up and I just couldn't see them
    because of the SAGE paywall, then maybe the issue is the need for more [open
    access], perhaps in the style of the [Episciences Project] ([intro][Gowers])
    or in the style of [PLOS ONE] (which I see that the JDW authors are
    [already exploring]; yay!)?)

[ggplot2]: http://ggplot2.org/
[open source]: http://en.wikipedia.org/wiki/Linus%27s_Law
[open access]: http://legacy.earlham.edu/~peters/fos/overview.htm
[Episciences Project]: http://episciences.org/
[Gowers]: http://gowers.wordpress.com/2013/01/16/why-ive-also-joined-the-good-guys/
[PLOS ONE]: http://www.plosone.org/
[already exploring]: http://www.plosone.org/article/info:doi/10.1371/journal.pone.0038680

  * Next, a couple of smaller issues:

    * Reproducibility: What font + text was on the pages used for the "stimulus
      detection" task? (It would only take a few words to say...)

    * Validity: how does color-blindness affect the results
      derived from the Stroop task performance measurements?

    * Blinding: Were the randomized controlled trials also blinded?

  * Finally, a review of Simmons' et al.'s [researcher degrees of
    freedom][RDOF] checklist (note 1: introduced to me by [Shauna]; thanks
    Shauna!; note 2: also, amusingly, [published][PSS-RDOF] in PSS!):

      * Simmons #1: *"Authors must decide the rule for terminating data
        collection before data collection begins and report this rule in the
        article."*

          * No data collection stopping rules were included in the paper.

      * Simmons #2: *"Authors must collect at least 20 observations per cell or
        else provide a compelling cost-of-data-collection justification."*

          * Per-cell observation counts were not included in the paper.

      * Simmons #3: *"Authors must list all variables collected in a study."*

          * I believe that all variables collected may be reported in the
            supplemental material published alongside the original paper but
            only a subset were reported in the original paper.

      * Simmons #4: *"Authors must report all experimental conditions,
        including failed manipulations."*

          * I don't see a claim that all experimental conditions have been
            reported.

      * Simmons #5: *"If observations are eliminated, authors must also report
        what the statistical results are if those observations are included."

          * I see some effort here, e.g., when the authors observe on their
            longitudinal study that the 59% of participants who did not
            complete the study were demographically similar to those who
            continued.

      * Simmons #6: *"If an analysis includes a covariate, authors must report
        the statistical results of the analysis without the covariate."

          * Some effort is also made here, particularly in Note #1. (That being
            said, Note #2 and the "speed/accuracy tradeoff" covariates seem
            just like what Simmons et al. are asking about...)

[RDOF]: http://people.psych.cornell.edu/~jec7/pcd%20pubs/simmonsetal11.pdf
[PSS-RDOF]: http://pss.sagepub.com/content/22/11/1359
[Shauna]: http://www.shaunagm.net/blog/2012/02/degrees-of-freedom-arent-free/

<!--
-->