All Academic, Inc. Research Logo

Info/CitationFAQResearchAll Academic Inc.
Document

Learning about polarization Using "How many X's do you know" surveys
Unformatted Document Text:  overdispersions ω k are constrained to the range (1, ∞), and so it is convenient to put a model on the inverses 1/ω k , which fall (0, 1).) The sample size of the McCarty et al. dataset is large enough that this noninformative model works fine; in general, however, it would be more appropriate to model the ω k ’s hierarchically also. We complete the Bayesian model with a noninformative uniform prior distribution for the hyperparameters µ α , µ β , σ α , σ β . The joint posterior density can then be written as, p(α, β, ω, µ α , µ β , σ α , σ β |y) ∝ n i=1 K k=1 y ik + ξ ik − 1 ξ ik − 1 1 ω k ξ ik ω k − 1 ω k y ik n i=1 N(α i |µ α , σ 2 α ) K k=1 N(β k |µ β , σ 2 β ), where ξ ik = e α i +β k /(ω k − 1), from the definition of the negative binomial distribution. The model as given has a nonidentifiability. Any constant C can be added to all the α i ’s and subtracted from all the β k ’s, and the likelihood will remain unchanged (since it depends on these parameters only through sums of the form α i + β k ). If we also add C to µ α and subtract C from µ β , then the prior density also is unchanged as well. It would be possible to identify the model by anchoring it at some arbitrary point—for example, setting µ α to zero—but we prefer to let all the parameters float, since including this redundancy can speed the Gibbs sampler computation (van Dyk and Meng, 2001). However, in summarizing the model we would like to identify the α and β’s so that each b k = e β k represents the proportion of the links in the network that go to members of group k. We identify the model in this way by renormalizing the b k ’s for the rarest names (in the McCarty et al. survey, these are Jacqueline, Christina, and Nicole) so that they line up to their proportions in the general population. We renormalize to the rare names rather than to all 12 names because there is evidence that respondents have difficulty recalling all their acquaintances with common names (see Killworth et al., 2003, and also Section 4.2 below). Finally, since the rarest names asked about in our survey are female names—and people tend to know more persons of their own sex—we further adjust by adding half the discrepancy between a set of intermediately-popular male and female names in our dataset. This procedure is complicated but is our best attempt at an accurate normalization for the general population (which is roughly half women and half men) given the particularities of the data we have at hand. In the future, it would be desirable to gather data on a balanced set of rare female and male names. The left panel of Figure 5 illustrates how after renormalization, the rare names in the dataset have groups sizes equal to their proportion in the population. This specific procedure is designed for the recall problems that exist in the McCarty et al. dataset. Researchers working with different datasets may have to develop a procedure that is appropriate to their specific data. In summary, for each simulation draw of the vector of model parameters, we define the constant C = C 1 + 12 C 2 , (6) where C 1 = log k∈G 1 e β k /P G 1 adjusts for the rare girls’ names, and C 2 = log k∈B 2 e β k /P B 2 − log k∈G 2 e β k /P G 2 represents the difference between boys’ and girls’ names. In these expressions, G 1 , G 2 , and B 2 are the set of rare girls’ names (Jacqueline, Christina, and Nicole), somewhat popular girls’ 9

Authors: Gelman, Andrew.
first   previous   Page 9 of 28   next   last



background image
overdispersions ω
k
are constrained to the range (1, ∞), and so it is convenient to put a model on the inverses
1/ω
k
, which fall (0, 1).) The sample size of the McCarty et al. dataset is large enough that this noninformative
model works fine; in general, however, it would be more appropriate to model the ω
k
’s hierarchically also.
We complete the Bayesian model with a noninformative uniform prior distribution for the hyperparameters
µ
α
, µ
β
, σ
α
, σ
β
. The joint posterior density can then be written as,
p(α, β, ω, µ
α
, µ
β
, σ
α
, σ
β
|y) ∝
n
i=1
K
k=1
y
ik
+ ξ
ik
− 1
ξ
ik
− 1
1
ω
k
ξ
ik
ω
k
− 1
ω
k
y
ik
n
i=1
N(α
i
α
, σ
2
α
)
K
k=1
N(β
k
β
, σ
2
β
),
where ξ
ik
= e
α
i
k
/(ω
k
− 1), from the definition of the negative binomial distribution.
The model as given has a nonidentifiability. Any constant C can be added to all the α
i
’s and subtracted
from all the β
k
’s, and the likelihood will remain unchanged (since it depends on these parameters only
through sums of the form α
i
+ β
k
). If we also add C to µ
α
and subtract C from µ
β
, then the prior density
also is unchanged as well. It would be possible to identify the model by anchoring it at some arbitrary
point—for example, setting µ
α
to zero—but we prefer to let all the parameters float, since including this
redundancy can speed the Gibbs sampler computation (van Dyk and Meng, 2001).
However, in summarizing the model we would like to identify the α and β’s so that each b
k
= e
β
k
represents the proportion of the links in the network that go to members of group k. We identify the model
in this way by renormalizing the b
k
’s for the rarest names (in the McCarty et al. survey, these are Jacqueline,
Christina, and Nicole) so that they line up to their proportions in the general population. We renormalize to
the rare names rather than to all 12 names because there is evidence that respondents have difficulty recalling
all their acquaintances with common names (see Killworth et al., 2003, and also Section 4.2 below). Finally,
since the rarest names asked about in our survey are female names—and people tend to know more persons
of their own sex—we further adjust by adding half the discrepancy between a set of intermediately-popular
male and female names in our dataset.
This procedure is complicated but is our best attempt at an accurate normalization for the general
population (which is roughly half women and half men) given the particularities of the data we have at
hand. In the future, it would be desirable to gather data on a balanced set of rare female and male names.
The left panel of Figure 5 illustrates how after renormalization, the rare names in the dataset have groups
sizes equal to their proportion in the population. This specific procedure is designed for the recall problems
that exist in the McCarty et al. dataset. Researchers working with different datasets may have to develop
a procedure that is appropriate to their specific data.
In summary, for each simulation draw of the vector of model parameters, we define the constant
C = C
1
+
1
2
C
2
,
(6)
where C
1
= log
k∈G
1
e
β
k
/P
G
1
adjusts for the rare girls’ names, and C
2
= log
k∈B
2
e
β
k
/P
B
2
log
k∈G
2
e
β
k
/P
G
2
represents the difference between boys’ and girls’ names. In these expressions, G
1
,
G
2
, and B
2
are the set of rare girls’ names (Jacqueline, Christina, and Nicole), somewhat popular girls’
9


Convention
Need a solution for abstract management? All Academic can help! Contact us today to find out how our system can help your annual meeting.
Submission - Custom fields, multiple submission types, tracks, audio visual, multiple upload formats, automatic conversion to pdf.
Review - Peer Review, Bulk reviewer assignment, bulk emails, ranking, z-score statistics, and multiple worksheets!
Reports - Many standard and custom reports generated while you wait. Print programs with participant indexes, event grids, and more!
Scheduling - Flexible and convenient grid scheduling within rooms and buildings. Conflict checking and advanced filtering.
Communication - Bulk email tools to help your administrators send reminders and responses. Use form letters, a message center, and much more!
Management - Search tools, duplicate people management, editing tools, submission transfers, many tools to manage a variety of conference management headaches!
Click here for more information.

first   previous   Page 9 of 28   next   last

©2008 All Academic, Inc.