We first sought to evaluate whether our primary results persisted to observer ratings of videotaped parent–child interactions. Interactions took place in laboratory space restructured to resemble a living room (i.e. couch, coffee table, area rug, pictures, etc.). Each parent–child dyad (i.e. mother–twin 1, mother–twin 2, father–twin 1, and father–twin 2) was asked to complete an 8-min task that was mildly to moderately frustrating (i.e. use an Etch-a Sketch to draw specific pictures, but parent and child could each use only use one dial, thereby requiring cooperation). Interaction data were coded using the Twin Parent–Child Interaction System (Deater-Deckard et al. 1997). Each observer received approximately 85 h of training and was required to pass observation examinations before coding videotapes. Observers attended biweekly coder meetings for ongoing training and to prevent ‘rater drift’. Observer reliability was assessed by randomly assigning 10% of all tapes to be rated by a second observer, and then comparing the primary and secondary ratings using intraclass correlations. Following training, each video was watched three times: once to code the behavior of the parent, once to code the