All simulations were conducted using SUR and the delta method (with sandwich variance estimates) for calculating confidence intervals. Stata code for using this method is provided in the Web Appendix. Table 1 shows results obtained using several other methods for standard error and confidence interval estimation, for three randomly selected subsample data sets and three 2-sample data sets. For both the subsample and the 2-sample strong IV scenarios, the SUR/delta method, sequential regression, Fieller's theorem, and the Bayesian method produced very similar confidence intervals, with the bootstrap method producing slightly wider confidence intervals. For the moderate IV and weak IV scenarios, the delta and sequential regression methods produce similar results; however, the Fieller, bootstrap, and Bayesian confidence intervals become substantially wider than the confidence intervals produced by these methods and often asymmetrical in the presence of a weak IV. This reflects the true sampling distribution of the IV estimate with a weak IV, which has long tails and is asymmetrical and is modeled poorly by a normal distribution. In the complete-data MR setting, the reliance on normality assumptions for constructing