We assessed the impact of non-response and missing data on our findings using MICE (Multivariate Imputation by Chained Equations) [24] implemented using the ice routine [25] in Stata. This procedure creates multiple copies of the dataset and in each dataset replaces missing data with imputed values, sampled from their predictive distribution [26]. The use of this method is based on the Missing At Random (MAR) assumption, namely that conditional on the other data included in the imputation model, there should not be systematic differences between observed and missing values for a given variable. A number of variables were included to assist with the imputation. These included indicators of family adversity at enrolment such as home overcrowding, financial problems, and lack of social support; earlier (more complete) measures of the predictive factors considered in this manuscript such as mother’s social class, and other measures more proximal to the outcome such as maternal and young person’s substance use behaviour in early adolescence, and reported self-harm from an earlier clinic at 11 years. Missing data for the binary measure of self-harm was imputed