Ideally, all datasets should use the same definitions and adjustments (e.g. age, gender, body mass index and so forth) and there should be consensus on what adjustments will be used upfront. This is easier if a central facility analyzes all the data, but as discussed above, this may often not be feasible. The definitions of adjusting variables also should ideally be agreed upfront; since some information may have already been collected in the past, it is important to ensure sufficient consistency in these definitions. The same applies to the definitions of main phenotypes/outcomes of interest. Some fields have an extreme variability of options on how to define outcomes, e.g. there were almost 500 different outcomes and analyses reported on an evaluation of asthma pharmacogenetics [20]. Clearly, some consensus is needed in such fields. In other cases, if the differences are subtle, it may be reasonable to accept them, e.g. at least 3 different sets of criteria are commonly used to define Parkinson’s disease worldwide and their concordance is high. For most teams in the field, it is extremely difficult to