The casecontrol study, in which separate samples are drawn from ‘cases’(people with a disease of interest, say) and from ‘controls’(people without the disease), is one of the most common designs in health research. In fact, Breslow (1996) has described such studies as “the backbone of epidemiology”. We shall concentrate on biostatistical applications, but the basic design is an efficient sampling strategy whenever cases are rare and examples are common in many other fields as well (business, social science, ecology, market research, for example). In particular, there has been a parallel development of much of the theory in the econometric literature on choicebased sampling (see Manski and McFadden 1981, Cosslett 1981 for example). There are two fundamentally different types of case control study:(set) matched studies, in which each case is matched with one or more controls, and unmatched studies, in which the case and control samples are drawn indepen dently, although there may be loose “frequency matching”, with the control sample allocated across strata defined by basic demographic variables in such a way that the distri bution of these variables in the control sample is similar to their expected distribution in the case sample. We are only concerned with unmatched studies here and, more specifically, only with the restricted class of population based studies in which the controls (and occasionally the cases as well) are selected using standard survey sampling techniques.
An excellent introduction to the strengths and potential pitfalls of casecontrol sampling is given by Breslow (1996, 2004). One of the most important and difficult challenges confronting anyone designing such a study is to ensure that controls really are drawn from the same population, using the same protocols, as the cases. In the words of Miettinen (1985), cases and controls “should be representative of the same base experience”. Failure to ensure this adequately in some early examples led to casecontrol sampling being