I use hypothetical data in this example. The sum scores are the sum of 10 items, each with response categories coded 1 to 4. Hence, the sum score can range from 10 (10 × 1) to 40 (10 × 4). These data were collected from 300 people who belong to 6 groups: A – F.
set.seed(12345)
dt <- data.frame(grp = sample(LETTERS[1:6], 300, replace = TRUE))
dt$sum_score <- as.integer(cut(stats::rnorm(
nrow(dt), -15 + 1.5 * as.integer(factor(dt$grp)), 15
), stats::qnorm(seq(0, 1, length.out = 32), 0, 15))) + 9
summary(dt)
#> grp sum_score
#> Length:300 Min. :10.0
#> Class :character 1st Qu.:13.0
#> Mode :character Median :18.0
#> Mean :19.6
#> 3rd Qu.:24.0
#> Max. :40.0
We can visualize the data using a boxplot:
Once we have a dataset with a grouping variable (grp
in
this example), and a sum score (sum_score
in this example),
we are ready to use the ugcflss model fitting function. It will place a
call to RStan so expect the model to run for several seconds.
When running the model, we need to inform the package about the minimum and maximum response categories on the items that were summed to create the sum score and the number of items we summed, here: 1, 4 and 10 respectively.
fitted_model <- ugcflss_fit_model(
data = dt,
grouping_variable = "grp", sum_score = "sum_score",
minimum_item_response = 1, maximum_item_response = 4,
chains = 1, # you can likely ignore this argument
number_items = 10
)
#>
#> SAMPLING FOR MODEL 'ugcflss' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 4.4e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.44 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 1500 [ 0%] (Warmup)
#> Chain 1: Iteration: 150 / 1500 [ 10%] (Warmup)
#> Chain 1: Iteration: 300 / 1500 [ 20%] (Warmup)
#> Chain 1: Iteration: 450 / 1500 [ 30%] (Warmup)
#> Chain 1: Iteration: 600 / 1500 [ 40%] (Warmup)
#> Chain 1: Iteration: 750 / 1500 [ 50%] (Warmup)
#> Chain 1: Iteration: 751 / 1500 [ 50%] (Sampling)
#> Chain 1: Iteration: 900 / 1500 [ 60%] (Sampling)
#> Chain 1: Iteration: 1050 / 1500 [ 70%] (Sampling)
#> Chain 1: Iteration: 1200 / 1500 [ 80%] (Sampling)
#> Chain 1: Iteration: 1350 / 1500 [ 90%] (Sampling)
#> Chain 1: Iteration: 1500 / 1500 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 1.095 seconds (Warm-up)
#> Chain 1: 0.782 seconds (Sampling)
#> Chain 1: 1.877 seconds (Total)
#> Chain 1:
We see the iteration printout from Stan. Stan sometimes produces
warnings about divergent transitions (see adapt_delta
argument in ?ugcflss_fit_model()
) or maximum treedepth (see
max_treedepth
argument in
?ugcflss_fit_model()
). Setting both arguments to higher
values than their defaults – adapt_delta = .99
instead of
.9 or max_treedepth = 15
instead of 10 – when calling
ugcflss_fit_model(..., adapt_delta = .99, max_treedepth = 15)
can resolve these warnings. We have no such warnings here, so we
proceed.
One way to examine Bayesian models is to simulate data from the models then assess how well data from the models reflects the original data – this is called posterior predictive checking. To assess how well the simulated data matches the distribution of the original data, try:
# ppd_samples is the number of datasets we want to simulate
ugcflss_ppd_plot(fitted_model, ppd_samples = 500)
where y is the density of the overall data and yrep is the density of the simulated datasets.
You could also do this by group:
For more detailed assessment, it is possible to directly request simulated datasets:
ppd_draws <- ugcflss_ppd(fitted_model, ppd_samples = 500)
dim(ppd_draws) # should be 500 rows by 300 people
#> [1] 500 300
The bayesplot
package has several features for comparing
this distribution to the original data. For example, we can compare the
observed count for each value to the distribution of counts based on the
model:
# the ppd draws are on the 1--4 scale, transform to sum scores:
ppd_sum_score <- ppd_draws * fitted_model$user_input$n_items
bayesplot::ppc_bars(dt$sum_score, ppd_sum_score)
Here, the distribution (yrep error bars) should ideally contain the actual bars.
You could also check specific statistics, with the expectation that the model-implied distribution of the statistic includes the sample statistic:
# Checking the SD
bayesplot::ppc_stat(dt$sum_score, ppd_sum_score, stat = "sd")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Or the 75% percentile
q75 <- function(y) quantile(y, .75)
bayesplot::ppc_stat(dt$sum_score, ppd_sum_score, stat = "q75")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We now turn to several outputs that help us understand the group differences.
Results are all on the 1 – 4 scale, i.e. assuming the items were averaged rather than scored. I made this choice because this is the scale I see reported in manuscripts.
overall_median <- ugcflss_describe(fitted_model, stat = "median")
overall_median
#> # A tibble: 1 × 8
#> variable mean sd q5.5 q94.5 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 median 1.77 0.0653 1.66 1.87 0.999 832. 711.
The estimate of the median is 1.77. By default, I report 89% quantile intervals i.e. the middle 89% of the distribution of the parameter. So one might say the median was 1.77 (SE = 0.065), 89% CrI [1.66, 1.87], where CrI stands for credible interval.
Statistics such as rhat
, ess_bulk
and
ess_tail
are parameter convergence statistics, i.e. can we
trust that the parameter has converged to a final solution?
ess
stands for effective sample size, and we want this
statistic in the hundreds. rhat
under 1.01 or 1.05
(liberal) suggests parameter convergence. One way to make
rhat
smaller and ess_
larger is to increase
the number of iterations when calling ugcflss_fit_model()
.
For example, you can modify the model fitting call:
ugcflss_fit_model(..., warmup = 2000, sampling = 2000)
,
both arguments are 750 by default.
Other potential statistics are stat = "median"
,
stat = "mean"
, stat = "sd"
, or
stat = "quantile"
. When requesting quantiles, also specify
the tau
argument to set the desired percentile. I might
also want conventional 95% intervals:
overall_q78 <- ugcflss_describe(
fitted_model,
stat = "quantile", tau = .78, interval = .95
)
overall_q78
#> # A tibble: 1 × 8
#> variable mean sd q2.5 q97.5 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 quantile 2.51 0.0782 2.36 2.68 1.00 752. 617.
The 78% percentile was 2.51 (SE = 0.078), 95% CrI [2.36, 2.68].
For those more familiar with Bayesian analysis, you can request these
computed samples by calling
ugcflss_describe(..., return_draws = TRUE)
.
We can also request any of the statistics above by group:
ugcflss_describe(fitted_model, stat = "sd", by_group = TRUE)
#> # A tibble: 6 × 8
#> variable mean sd q5.5 q94.5 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 sd[A] 0.750 0.0502 0.670 0.823 1.00 879. 598.
#> 2 sd[B] 0.768 0.0423 0.694 0.833 0.999 799. 659.
#> 3 sd[C] 0.763 0.0470 0.688 0.834 1.00 798. 480.
#> 4 sd[D] 0.814 0.0433 0.746 0.885 1.00 614. 747.
#> 5 sd[E] 0.753 0.0471 0.675 0.824 1.00 585. 613.
#> 6 sd[F] 0.834 0.0488 0.761 0.913 1.00 424. 667.
These are the group standard deviations. Out of curiosity, I also show their sample estimates:
aggregate(I(sum_score / 10) ~ grp, dt, sd)
#> grp I(sum_score/10)
#> 1 A 0.6087974
#> 2 B 0.7200961
#> 3 C 0.6876214
#> 4 D 0.8904826
#> 5 E 0.6546732
#> 6 F 0.9851163
The model-based results from the package produce group standard deviations that are more similar to each other than those based on sample statistics.
As with the entire sample, one can modify the default interval using
interval = .xx
or request draws using
return_draws = TRUE
.
We can perform pairwise comparisons using:
pairwise_mean_diff <- ugcflss_pairwise(fitted_model, stat = "mean")
pairwise_mean_diff
#> # A tibble: 15 × 13
#> variable mean sd q5.5 q94.5 rhat ess_bulk ess_tail group_1
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 mean[B]-mean[… 0.0323 0.0874 -0.0947 0.180 1.00 970. 586. B
#> 2 mean[C]-mean[… -0.0199 0.0914 -0.178 0.121 0.999 1098. 565. C
#> 3 mean[D]-mean[… 0.143 0.116 -0.0159 0.347 1.00 462. 678. D
#> 4 mean[E]-mean[… 0.105 0.106 -0.0443 0.275 0.999 653. 691. E
#> 5 mean[F]-mean[… 0.178 0.142 -0.0125 0.424 1.00 425. 712. F
#> 6 mean[C]-mean[… -0.0522 0.0929 -0.210 0.0908 0.999 578. 599. C
#> 7 mean[D]-mean[… 0.111 0.106 -0.0325 0.300 0.999 551. 565. D
#> 8 mean[E]-mean[… 0.0729 0.0981 -0.0635 0.239 1.00 739. 565. E
#> 9 mean[F]-mean[… 0.146 0.133 -0.0290 0.389 1.00 465. 674. F
#> 10 mean[D]-mean[… 0.163 0.126 -0.0138 0.372 1.00 370. 626. D
#> 11 mean[E]-mean[… 0.125 0.114 -0.0340 0.330 0.999 425. 640. E
#> 12 mean[F]-mean[… 0.198 0.150 -0.00488 0.445 1.00 306. 608. F
#> 13 mean[E]-mean[… -0.0378 0.104 -0.215 0.108 1.00 750. 678. E
#> 14 mean[F]-mean[… 0.0350 0.115 -0.144 0.240 1.00 1011. 731. F
#> 15 mean[F]-mean[… 0.0728 0.122 -0.101 0.290 1.00 695. 645. F
#> # ℹ 4 more variables: group_2 <chr>, group_1_i <int>, group_2_i <int>,
#> # p_direction <dbl>
We see pairwise mean differences, for example: the mean of group B was less than the mean of group A by 0.03 (SE = 0.087), 89% CrI [-0.09, 0.18].
The p_direction
column has the probability of direction.
This is the probability that a coefficient whose estimate is positive
(negative) is indeed positive (negative). Hence, there was a 63.9%
chance that the 0.03 coefficient was positive. This is a Bayesian analog
to p-values, and quantifies the evidence in favour of a
particular direction for an effect.
We can also perform ratio and log-ratio comparisons. For example, here is the pairwise log-ratio comparison of group standard deviations:
ugcflss_pairwise(fitted_model, stat = "sd", comparison = "log-ratio")
#> # A tibble: 15 × 13
#> variable mean sd q5.5 q94.5 rhat ess_bulk ess_tail group_1
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 log(sd[B]/sd… 0.0241 0.0731 -0.0864 0.147 0.999 1442. 711. B
#> 2 log(sd[C]/sd… 0.0175 0.0723 -0.0946 0.138 1.00 1092. 422. C
#> 3 log(sd[D]/sd… 0.0826 0.0764 -0.0186 0.211 1.00 532. 607. D
#> 4 log(sd[E]/sd… 0.00471 0.0692 -0.105 0.114 1.01 1365. 657. E
#> 5 log(sd[F]/sd… 0.107 0.0840 -0.00643 0.247 1.00 352. 580. F
#> 6 log(sd[C]/sd… -0.00657 0.0676 -0.116 0.102 1.01 919. 519. C
#> 7 log(sd[D]/sd… 0.0585 0.0674 -0.0362 0.174 1.00 615. 604. D
#> 8 log(sd[E]/sd… -0.0194 0.0669 -0.137 0.0762 1.00 877. 629. E
#> 9 log(sd[F]/sd… 0.0829 0.0734 -0.0161 0.211 1.00 417. 466. F
#> 10 log(sd[D]/sd… 0.0651 0.0723 -0.0327 0.191 1.00 607. 634. D
#> 11 log(sd[E]/sd… -0.0128 0.0667 -0.124 0.0896 1.00 1245. 385. E
#> 12 log(sd[F]/sd… 0.0895 0.0815 -0.0158 0.230 1.01 407. 546. F
#> 13 log(sd[E]/sd… -0.0779 0.0754 -0.206 0.0235 1.00 358. 605. E
#> 14 log(sd[F]/sd… 0.0244 0.0600 -0.0658 0.132 1.00 835. 491. F
#> 15 log(sd[F]/sd… 0.102 0.0839 -0.00845 0.247 1.00 304. 490. F
#> # ℹ 4 more variables: group_2 <chr>, group_1_i <int>, group_2_i <int>,
#> # p_direction <dbl>
Ratios of positive numbers have an assymetric scale. When the numerator is less, the ratio will range between 0 and 1, otherwise, it will range between 1 and infinity. Using a log-ratio comparison transforms 0–1 to to a number less than 0, and numbers greater than 1 to a number above 0 – thus making the comparison scale symmetric.
comparison
must be one of "difference"
(default), "ratio"
, or "log-ratio"
.
Again, users who are more familiar with Bayesian statistics can
request the computed samples of the pairwise comparisons by calling
ugcflss_pairwise(..., return_draws = TRUE)
.
In addition to the earlier statistics, we can also compute the pairwise probability of superiority (PS). This statistics communicates how likely it is that members of one group will have responses greater than members from another group. For example, a PS of 60% implies that if we randomly recruited 100 pairs of respondents from two groups, A and B, members of group A would have the higher value in 60 of the pairs, while members of group B would have the higher value in 40 pairs. Thus a PS of 50% implies neither group has the higher values.
To perform pairwise comparisons using the PS, call:
The PS is both the stat
and comparison
when
calling the PS, the value of comparison
is ignored.
It is easier to parse several pairwise comparisons using plots. ugcflss can plot pairwise comparisons:
The topleft cell shows the result of median(B) / median(A) – its value is above 1 meaning the median of group B was greater than the median of group A – there was a 56% chance this is true. The two values in brackets are the credible intervals.
We do not show the probability of direction in the plot for the PS – it can be confusing.
Sometimes, specific values on scale scores have practical significance, so we may be interested in how likely respondents were to have a score that exceeded some threshold. We can request this statistic:
exceed_overall <- ugcflss_exceed_all(fitted_model)
exceed_overall
#> lo md hi points loc count
#> <num> <num> <num> <num> <int> <num>
#> 1: 0.874306979 0.90206206 0.92705604 1.0 1 0.903333333
#> 2: 0.797298623 0.83326797 0.86440571 1.1 2 0.836666667
#> 3: 0.727900422 0.76643016 0.80409885 1.2 3 0.756666667
#> 4: 0.677301132 0.71763368 0.75741674 1.3 4 0.720000000
#> 5: 0.629978634 0.67031363 0.71156876 1.4 5 0.663333333
#> 6: 0.570954525 0.61410009 0.65559179 1.5 6 0.613333333
#> 7: 0.522534953 0.56582378 0.60656280 1.6 7 0.560000000
#> 8: 0.485044769 0.52929652 0.57087149 1.7 8 0.526666667
#> 9: 0.444424793 0.48942157 0.53093379 1.8 9 0.486666667
#> 10: 0.399401169 0.44476171 0.48612688 1.9 10 0.440000000
#> 11: 0.359237602 0.40305155 0.44590578 2.0 11 0.396666667
#> 12: 0.320437471 0.36375040 0.40704253 2.1 12 0.363333333
#> 13: 0.284602978 0.32555519 0.36734884 2.2 13 0.323333333
#> 14: 0.247396519 0.28987459 0.32955799 2.3 14 0.290000000
#> 15: 0.214612090 0.25505329 0.29109428 2.4 15 0.246666667
#> 16: 0.184114329 0.22154398 0.25734352 2.5 16 0.226666667
#> 17: 0.159566496 0.19381006 0.22802531 2.6 17 0.190000000
#> 18: 0.139381230 0.17297864 0.20745035 2.7 18 0.173333333
#> 19: 0.123792450 0.15593847 0.18866198 2.8 19 0.160000000
#> 20: 0.108655749 0.13792314 0.16932708 2.9 20 0.140000000
#> 21: 0.090112564 0.11730328 0.14646380 3.0 21 0.123333333
#> 22: 0.071096556 0.09454403 0.12062316 3.1 22 0.096666667
#> 23: 0.057208680 0.07837233 0.10167262 3.2 23 0.076666667
#> 24: 0.048565334 0.06800139 0.08970437 3.3 24 0.070000000
#> 25: 0.041478285 0.05780314 0.07867568 3.4 25 0.063333333
#> 26: 0.032764012 0.04754509 0.06722579 3.5 26 0.046666667
#> 27: 0.026451694 0.03975143 0.05760578 3.6 27 0.040000000
#> 28: 0.021670989 0.03324832 0.04940079 3.7 28 0.036666667
#> 29: 0.014979117 0.02453229 0.03915094 3.8 29 0.026666667
#> 30: 0.005039976 0.01072672 0.02030725 3.9 30 0.006666667
#> 31: 0.000000000 0.00000000 0.00000000 4.0 31 0.000000000
#> lo md hi points loc count
Based on the model, 90.2% of respondents had a score greater than
1.0. If you simply count using sample statistics, we get 90.3%. The
lo
and hi
columns are the 89% CrI by default,
though these can be changed using
ugcflss_exceed_all(fitted_model, interval = .xx)
.
We can also request this statistic by group (results not shown):
One can also visualize these results:
And there are options to show some desired interval (89% by default), e.g.
The same is possible by group (results not shown):