A Multiple-Record Systems Estimation Method that Takes.docx
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_05.gif)
《A Multiple-Record Systems Estimation Method that Takes.docx》由会员分享,可在线阅读,更多相关《A Multiple-Record Systems Estimation Method that Takes.docx(10页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、Biometrics 60, 510 516 June 2004 A Multiple-Record Systems Estimation Method that Takes Observed and Unobserved Heterogeneity into Account Elena Stanghellini Dipartimento di Scienze Statistiche, Universita di Perugia, 06100 Perugia, Italy email: elena.stanghellinistat.unipg.it and Peter G. M. van de
2、r Heijden Department of Methodology and Statistics, Utrecht University, P.O. Box 80.140, 3508 TC Utrecht, The Netherlands email: p.vanderheijdenfss.uu.nl Summary. We present a model to estimate the size of an unknown population from a number of lists that applies when the assumptions of (a) homogene
3、ity of capture probabilities of individuals and (b) marginal independence of lists are violated. This situation typically occurs in epidemiological studies, where the heterogeneity of individuals is severe and researchers cannot control the independence between sources of ascertainment. We discuss t
4、he situation when categorical covariates are available and the interest is not only in the total undercount, but also in the undercount within each stratum resulting from the cross- classication of the covariates. We also present several techniques for determining condence intervals of the undercoun
5、t within each stratum using the prole log likelihood, thereby extending the work of Cormack (1992, Biometrics 48, 567576). Key words: Conditional independence model; Extended latent class model; Graphical models; Hierarchical log linear models; Identiability; Observed information matrix; Prole log l
6、ikelihood; Strata. 1. Introduction In epidemiology multiple-record systems estimation methods are employed to estimate the size of an unknown population. Traditionally this was done using a log linear model. If there are two lists of individuals, it is necessary to assume indepen- dence of the two p
7、robabilities to be included in the lists. If there are more than two lists, and if other external informa- tion is not available, then it is possible to allow for interac- tion between the lists. An overview of this approach can be found in Bishop, Fienberg, and Holland (1975). In this tradi- tional
8、 approach an assumption is that in each record system the inclusion probabilities are homogeneous over the individu- als, i.e., individuals have the same probability of being caught (Chao et al., 2001). Violation of this homogeneity assumption will result in dependence between lists (see, for exampl
9、e, In- ternational Working Group for Disease Monitoring and Fore- casting, 1995). Many strategies are proposed to go around this restrictive and often unrealistic assumption and the objective of this article is to extend one of these approaches. Originally Bishop et al. (1975) proposed to tackle pos
10、sible heterogeneity by stratifying the sample in such a way that in each stratum the homogeneity would be fullled. This option is possible if there is an observed categorical covariate that is closely related to the probability of being in each of the lists. Later Alho (1990) and Huggins (1991) gene
11、ralized this approach by proposing a model where the capture probabil- ities of each list are a function of observed covariates that can be categorical or continuous. These two strategies have in common that they allow for observed heterogeneity, i.e., the heterogeneity of capture probabilities can
12、be taken into account by observed covariates. Alternatively, models are proposed for situations where such covariates are not available. One approach makes use of log linear models with homogeneous two-factor interac- tions (see International Working Group for Disease Monitor- ing and Forecasting, 1
13、995); a second one makes use of latent class models, where it is assumed that the individuals can be classied in a small number of groups (the latent classes) with homogeneous capture probabilities and lists are inde- pendent conditional on the latent classes (see Agresti, 1994; Coull and Agresti, 1
14、999); a third makes use of Rasch mod- els, where it is assumed that the individuals dier on a con- tinuous scale (see Darroch et al., 1993; Coull and Agresti, 1999; Fienberg, Johnson, and Junker, 1999). These three modeling strategies are closely related due to the relation that exists between the R
15、asch model, quasi-symmetry mod- els, and latent class analysis (see Lindsey, Clogg, and Greco, 1991). 510 A Multiple-Record Systems Estimation Method 511 js h=1 hjs H H t j t1 More recently, Biggeri et al. (1999) account for local de- pendence between a pair of lists in a latent class model. This ap
16、proach is justied in epidemiological studies to estimate the size of a human population as there is unobserved het- erogeneity between individuals and lists are set up for various purposes, so that the independence between lists is not guar- anteed. A parallel approach, which makes use of the Rasch
17、Xhj s with the convention that SR is running fastest, and let E(X) = m. In dening the likelihood for our model we should take into account that U is not observed. Let t = 2R. The marginal entries Y = H X are also independent Poisson random variables, with expected values E(Y js) = js. We denote by Y
18、 the vector with the counts in the marginal table, stacked model, has been proposed by Bartolucci and Forcina (2001). in a way that Y = LX with L = 1 IJ t. It then follows Their model includes parameters for the associations between that E(Y ) = = Lm. Moreover, as the entries in the cells two lists,
19、 after conditioning on the unobserved heterogeneity and marginalizing with respect to the other lists. In this article we extend the approach of Biggeri et al. (1999) by including a covariate in the model. We use a hier- where s = 0, . , 0, are zero by construction, the rst entry of each stratum j i
20、s not observable. We denote with Y the vector of the observable data obtained from Y by removing the entries 1, t + 1, . , (J 1)t + 1. Note that Y = LX , archical log linear model with one categorical latent variable with L = 1 IJ (t1). We will refer to X as the complete representing the unobserved
21、heterogeneity. The model can be data vector and to Y as the incomplete data vector. represented via a conditional independence graph (Whittaker, 1990; Edwards, 2000) with one unobserved node. As we are dealing with nonstandard latent variable models, we discuss identiability by assessing the rank of
22、 the information ma- trix. We provide separate estimates for the undercount in each stratum as dened by the covariate and complement the esti- mates with condence intervals evaluated by using the prole Let n = nj be the vector of the observed counts in stra- tum j and N = Nj be the vector of the tot
23、al counts in stratum j. The model considered in this article can be writ- ten as log(m) = Z with Z a design matrix and the vector of parameters. The unknown parameters of the models are N and and their estimates should be derived by maximizing the log likelihood l(y | MY = N ; , N ), with M = IJ 1 ,
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Multiple Record Systems Estimation Method that Takes
![提示](https://www.taowenge.com/images/bang_tan.gif)
链接地址:https://www.taowenge.com/p-7713.html
限制150内