You are viewing the site in preview mode

Skip to main content

Analytic methodology for demographic variation analyses for wave 1 of the global flourishing study

Abstract

In this article, we describe the statistical and design methodology of the demographic variation analyses used as part of a coordinated set of manuscripts for wave 1 of the Global Flourishing Study (GFS). Aspects covered include the following: evaluating demographic variation, accounting for the complex sampling design, missing data and imputation, and meta-analysis. We provide a brief illustrative example of the demographic variation analyses using a measure of purpose in life from the GFS survey and conclude by outlining some strengths and limitations of the analytic and statistical methodology employed.

Peer Review reports

Background

The Global Flourishing Study (GFS) is a large, multinational panel study that aims to explore the distribution, determinants, and interrelations of various concepts related to human well-being with more than 200,000 people across a geographically and culturally diverse set of countries around the world [1,2,3]. Interest in flourishing has surged in recent years across various fields like psychology, economics, and public health [4,5,6,7,8,9,10]. However, many aspects of well-being remain underexplored, especially globally, as much of the well-being literature has been shaped by Western perspectives [7, 11]. As a multinational panel study, the GFS provides an avenue to explore well-being and flourishing from a multicultural perspective to fill this gap.

The purpose of this article is to describe the methodology applied to the set of construct-specific demographic variation analyses that were produced using wave 1 data from the GFS, most of which are planned for inclusion in manuscripts that are being considered for publication as a coordinated set of manuscripts using Wave 1 of the GFS. These demographic variation analyses apply a common preregistration template principally focused on exploring the distribution of scores on indicators of aspects of flourishing in each country and describing demographic differences in these indicators across groups and countries.

Exploring demographic variation in aspects of flourishing provides a baseline for describing how individual differences may be related to flourishing from different angles. Many of the constructs included in the GFS are seldom included in cross-cultural cohort studies, see survey development report [12], providing a novel opportunity to gain multinational insights into various aspects of wellbeing. All analyses are conducted separately by country, which not only preserves potential heterogeneity in the interpretation of survey items across countries but allows the results to be contextualized considering the sociocultural particularities within each country. Then, country-specific results are pooled using meta-analytic techniques to summarize the distribution of the construct indicator. The analyses described in this article provide a template for evaluating the distribution of flourishing globally by maintaining a consistent methodology allowing for the comparability of results across aspects of flourishing.

There are three core components of the current article. First, we begin by providing a high-level description of the data and measures used in the demographic variation analyses. Next, we discuss aspects of the methodology, namely evaluating demographic variation, accounting for the complex sampling design, missing data and imputation, and meta-analysis. Lastly, we use the sense of purpose in life outcome (“I understand my purpose in life.”; 0 = Strongly disagree, 10 = Strongly agree; cf. [13]) to provide an illustrative example of the analyses and results that will be presented in the construct-specific demographic variation manuscripts, see Kim et al. [14] for more details on the purpose in life outcome.

Global Flourishing Study data

Currently available Wave 1 GFS data includes nationally representative samples of the adult population (18 years old and older) from 22 geographically and culturally diverse countries, including Argentina, Australia, Brazil, Egypt, Germany, Hong Kong (Special Administrative Region of China), India, Indonesia, Israel, Japan, Kenya, Mexico, Nigeria, the Philippines, Poland, South Africa, Spain, Sweden, Tanzania, Turkey, the United Kingdom, and the United States (Wave 1 data will also become available for mainland China once Wave 2 data are released in early 2025). These countries were selected to (a) maximize coverage of the world’s population; (b) ensure geographic, cultural, and religious diversity; and (c) prioritize feasibility and existing data collection infrastructure. Data collection was carried out by Gallup, a global analytics and advisory organization with decades of experience collecting global data on various aspects of human life. Most of the data for Wave 1 were collected in 2023, with some countries beginning data collection in 2022; exact dates of data collection vary by country [15]. The GFS is set to continue with four additional waves of annual panel data collection from 2024 to 2027. The precise sampling design that was used to collect Wave 1 data varied by country to ensure nationally representative samples for each country. Further details of the sampling design methodology are available elsewhere [15, 16].

Survey items included numerous aspects of well-being such as happiness and life satisfaction, physical and mental health, meaning and purpose, character and virtue, close social relationships, and financial and material stability [13] along with numerous other demographic, social, economic, political, religious, personality, childhood, community, and health variables. Development of the GFS survey occurred over eight distinct phases: (1) selection of core well-being and demographic questions; (2) solicitation of social, political, psychological, and demographic questions from domain experts worldwide; (3) revision of the initial survey draft based on feedback from scholars around the world representing various academic disciplines; (4) modification of question items following input from experts in multinational, multiregional, and multicultural survey research; (5) survey draft refinement based on compiled input from an open invitation to comment, posted publicly and sent to numerous listservs; (6) questionnaire optimization with support from Gallup survey design specialists; (7) adaptation of items from an interviewer-administered to a self-administered survey instrument using best practices for web survey design to minimize item non-response, illogical responses, and incomplete responses; and (8) confirmation by scholars in several participating countries that translations accurately captured the intended meaning of each question [3, 16].

The data are publicly available through the Center for Open Science (https://www.cos.io/gfs). During the translation process, Gallup adhered to TRAPD model (translation, review, adjudication, pretesting, and documentation) for cross-cultural survey research [17]. Additional information about methodology and survey development can be found in the GFS Questionnaire Development Report [2, 12] as well as the GFS Methodology [15], GFS Codebook (https://osf.io/cg76b), and GFS Translations documents [18].

Measures

Demographic variables

A total of 9 demographic variables were considered. Seven of those—age, gender, marital status, employment, religious service attendance, education, and immigration status—were assessed with the same categories across all countries. Religious affiliation was assessed in all countries, but the response options varied considerably across countries. Racial/ethnic identity was assessed in some but not all countries, and response options were unique to each country. Details about the demographic items and response options for each are reported in the GFS Questionnaire Development Report [3] and the GFS Codebook (https://osf.io/cg76b). For the purpose of the demographic variation analyses, recorded responses of “don’t know,” “refused,” “skipped,” and “prefer not to answer” were coded as missing.

For each manuscript in which demographic variation analyses are reported, summary statistics of the demographic variables (i.e., counts and proportions) are based on non-imputed “raw” data using complex survey-adjusted estimates. Summary statistics for the 7 demographic variables that were measured consistently across the countries are provided for the whole dataset in the main text. Online supplemental material will report summary statistics for this set of demographic characteristics by country, with the addition of country-level religious affiliation and racial/ethnic identity (when available).

Criterion/outcome variables

A range of continuous, binary, Likert-type, and nominal response scales were used to assess the different constructs included in the GFS. Means were estimated for approximately continuous variables. All Likert-type and nominal measures were recoded into binary variables (based on cutoffs defined a priori in the preregistrations), with proportions estimated for all binary variables.

Evaluating demographic variation

This section describes the analyses which were carried out within each country. The following section describes random effects meta-analyses used to summarize results across countries. Analyses were implemented across multiple software packages (R [19], Stata [20], SAS [21], and SPSS [22] to ensure consistency in results and ease of use by the larger core group [23]). Implementing the analyses in separate software also allows for a greater reach across fields because other scholars can utilize and replicate our analyses in their software of choice. Any deviations across software packages implementation are described below.

Separating analyses by country

The core analyses of the GFS were conducted separately within each country. As described below, summary statistics were obtained by random effects meta-analysis rather than for example by use of a multi-level model. A key advantage of this approach is that it does not presume cross-cultural measurement equivalence of the measures, which is important because most constructs were assessed using a single item and cognitive testing during the survey development process suggested some variation in the interpretation of items across countries [24, 25]. It is thus preferable to treat the measures as closely related, but not identical, assessments of each construct across the countries. We chose to conduct analyses separately for each country, which not only preserves potential heterogeneity in the interpretation of survey items across countries but allows the results to be contextualized in light of the sociocultural particularities within each country. This approach also aligns with our decision to use a random effects meta-analysis to combine estimates for each demographic category, which implies that the means/proportions we combine are not necessarily samples from an identical superpopulation but the estimates form a distribution of means/proportions that we aim to summarize.

Country means/proportions (and Gini coefficient)

Computing each country’s overall estimated mean/proportion of the outcome variable is based on the sampling weighted mean/proportion of observations. The presence of strata in the sampling design of some countries leads to the weighted mean/proportion also being averaged over strata. Variances and standard errors were computed using the Taylor series method [26,27,28]. For continuous outcomes, the Gini coefficient was also reported. The Gini coefficient ranges from 0 to 1 [29], where the lower bound reflects complete equality (i.e., all the persons receive the same value on the outcome) and the upper bound reflects complete inequality (i.e., one person has the highest possible score on the outcome and all others have 0). The estimator used for the Gini coefficient is based on the generalized linearization variance estimator [30].

Subgroup means and proportions

The main text of the manuscripts in which demographic variations are reported includes a table presenting the means/proportions of the measure for each demographic group category based on meta-analysis described below. The country-specific results will be reported in the online supplemental material of each manuscript. Within each country, we computed the within-group means/proportions. In general, the point estimates were obtained using a weighted mean/proportion. Subgroup estimates of variance, or standard error, for the estimated mean/proportion are such that the specific estimator used varies across countries because the sampling design differs based on whether strata are absent or present [31]. Interval estimates for the mean of continuous outcomes (ranged 0–10) were based on a Wald-type confidence interval where items were treated as continuous which could rarely lead to intervals exceeding the bounds of the observed range of values; in such cases we have advised authors to truncate the limits to be within the range of observed values. Some slight differences are present in implementations across R, SAS, SPSS, and Stata software.

The subgroup analyses incorporated a test of whether the mean/proportion varied between subgroups of the demographic characteristic. These tests were based on Wald-type test [32, 33]. These tests tended to be highly powered, especially in countries with larger sample sizes, and the resulting p values tended to be near 0. These country-specific tests were pooled across countries to report a “global p value” corresponding with a test of whether the outcome mean/proportion varied among subgroups for each demographic characteristic (see “Global p values” subsection below). Complementary forest plots of the pairwise differences for specific comparisons of interest may also be provided in the online supplements.

Accounting for the complex sampling design

Accounting for the complex sampling design was accomplished by utilizing the information provided by Gallup on the primary sampling unit (PSU) IDs, strata IDs, and sampling weights. The weighting variable and PSU/strata IDs were included in all country-specific analyses. A complexity arises when respondents are recruited face-to-face, because sometimes this results in groups (strata) with only one PSU. When a stratum has only a single case, this is known as a lonely PSU, and makes variance and standard error estimation more complex because traditional methods assume multiple PSUs within each stratum [27, 34]. We elected to use the “certainty” specification where single-PSU stratum do not contribute to the variance; this maintained relatively comparable results across statistical software depending on the level of missingness in the demographic characteristic and on the specific outcome. Complete details concerning the implementation of these methods to account for the complex sampling design of each country can be found in the open code [35]. The methods were generally the same across all software packages, with very minor exceptions. Although detailed discussion of the methods that were used to account for the complex sampling design is beyond the scope of this article, additional information about the issues that were encountered can be found in Padgett and colleagues [23]. We mostly relied on the default settings within each software package, which led to nearly identical results across software packages, with slight differences, principally in standard errors, mainly attributable to the imputation of missing data.

Missing data and multiple imputation

All missing variables are imputed using multivariate imputation by chained equations [36, 37]. The imputation model incorporated the criterion/outcome variable, all demographic characteristics, including race/ethnicity and religious affiliation when available, and sampling weights. The sampling weights were included as a variable in the imputation models. Including the sampling weight in the multiple imputation procedure allowed study missingness to be related to the propensity of being included in the study. We elected not to include strata as a predictor in countries where strata were available to avoid a singularity in the design matrix due to single-PSU strata. To account for variations in the assessment of certain variables across countries (e.g., race/ethnicity and religious affiliation), we conducted the imputation process separately for each country. The within-country imputation approach ensured that the imputation model accurately reflects country-specific contexts and assessment methods.

While conducting multiple imputation with five imputed datasets is a commonly used default [38], a more robust recommended number of imputations relates to the fraction of missing information (FMI) of the observed dataset [39]. The rate of missing data for this first wave of the GFS was quite low (< 5% for nearly all variables that were measured), and for the demographic characteristics in particular, the item with the largest percent missing was racial/ethnic identity at 1.6% (which is not being meta-analyzed due to varying response categories across countries). Across all the items used as demographic factors, the percent of respondents with any missing data was 3.2% (a rough approximation of the FMI is therefore 0.032). Using an efficiency argument (FMI/m ≤ 0.05) commonly used, the number of imputed datasets needed would be less than 3. In preliminary testing, we evaluated using more imputed datasets (m = 20) and found no meaningful differences in results compared to only 5 imputations or compared across software implementations of multiple imputation. Increasing past 5 imputed datasets was therefore thought to result in insufficient gains to justify the considerable increase in computational time due to the imputation being conducted separately by country and research team. However, we anticipate higher levels of missing data in subsequent waves due to wave-specific non-response, and analyzes should consider using at least 20 imputations in subsequent waves of data analysis in spite of the additional computing time.

Meta-analysis

The 22 countries were chosen to have broad geographical, cultural, and religious coverage; the countries include all six populated continents and represent about half of the world’s population. The random effects meta-analysis would be interpreted as estimating the pooled mean/proportion and the standard deviation of the indicator means/prevalences from a hypothetical underlying population of which the sample of 22 countries would be representative. While such an underlying population is hypothetical, given the broad diverse coverage of the 22 countries, this was viewed as a reasonable target of interest. However, the results for each of the 22 countries are also provided, which are of interest in their own right, and may also be useful for readers who would prefer not to consider this underlying hypothetical population. Moreover, we provide a population weighted fixed effects meta-analysis to evaluate similar indicator prevalences/means where the principal target of inference concerns individual people in the 22 countries rather than the countries themselves.

Preparing for meta-analyses

All meta-analyses were conducted in R [19] using the metafor package [40] through an open-source application developed for these analyses [35]. Please see https://wviechtb.github.io/metafor/ for more information on the metafor package. The effect size, or values to be meta-analyzed, depends on the scale of the outcome for a particular study. For continuous outcomes, the means were directly meta-analyzed using the within-country squared standard error as the variance estimate. For binary outcomes, the proportion was first converted using the logit transformation. A condition of this conversion is that the estimated within-country proportion must strictly be within the interval (0,1). To avoid the boundary constraint, we set bounds on the how small or large the estimated proportion could be for the purpose of meta-analysis based on the country sample sizes, that is \(\widehat{p}\in \left(\frac{1}{N}, 1-\frac{1}{N}\right)\), where \(N\) is the sample size of the country where \(\widehat{p}\) was estimated. Estimated proportions outside these bounds replaced with the closest bound. This bound was very rarely needed because most proportions were between 0.2 and 0.8, but the bound was necessary in rare groups, such as the “other” gender group, for some (but not all) outcomes. The standard error of the logit transformed proportions was obtained using the delta-method:

$$SE\left(logit\left(\widehat{p}\right)\right)={\left(\frac{1}{N\times \widehat{p} \times (1-\widehat{p})}\right)}^{0.5},$$

This equation was used to represent uncertainty on the transformed scale for input into the meta-analyses. For countries with large sample sizes, the resulting logit standard error can be quite small ([41], p. 40, Eq. 3.5). Additionally, it is possible to meta-analyze the proportions directly pooling the estimates in the same manner as with the means for continuous outcomes. This alternative approach would down-weight the influence of more extreme proportions when pooling across countries compared to the logit transformation approach.

Random effects meta-analysis

For all the core demographic GFS studies, a general random effects model was used. This model assumes that the effect sizes in the population follow a normal distribution [42,43,44], that is:

$${y}_{i}\sim \text{Normal}({y}_{i}^{*},{v}_{i})$$
$${y}_{i}^{*}\sim \text{Normal}(\theta ,{\tau }^{2})$$

where \({y}_{i}\) is the mean within each country, \({v}_{i}\) is the variance/uncertainty of \({y}_{i}\) within each country, \({y}_{i}^{*}\) is the unknown true mean for the subgroup within country, \(\theta\) is the population mean for the subgroup, and \({\tau }^{2}\) is the estimated variance/heterogeneity of \({y}_{i}^{*}\). The model was estimated using the Paule and Mandel estimator [45,46,47].

Heterogeneity in the pooled means/proportions was assessed using the estimated standard deviation of the distribution of means/proportions (\(\tau\)). All substantive manuscripts will include a forest plot for all meta-analytic estimates, which provide a better indication of the heterogeneity, along with the Q-statistics and Q-profile confidence intervals for \(\tau\). Additionally, prediction intervals for the country means/proportions described next provide a sense of the heterogeneity from the perspective of sampling variability at the country level.

Prediction intervals

For each manuscript in which demographic variation analyses are reported, we included prediction intervals based on the calibrated effect size from the random effects meta-analyses. The calibrated effect size is computed based on the meta-analysis results following well-established methods [48,49,50] that use the following formula,

$${\widetilde{y}}_{i}=\widehat{\theta }+\left({y}_{i}-\widehat{\theta }\right){\left(\frac{{\widehat{\tau }}^{2}}{{\widehat{\tau }}^{2}+{v}_{i}}\right)}^{0.5},$$

where \({\widetilde{y}}_{i}\) is the calibrated effect size of country \(i\). The calibrated effect sizes create prediction intervals [50]. A 95% prediction interval is an interval constructed so that the true mean/proportion for a randomly chosen country from the random effects distribution will fall within this interval 95% of the time. For binary outcomes, the prediction interval bounds were back-transformed from the logit scale to the proportion scale for reporting. Currently available Wave 1 GFS data includes a relatively low number of “studies” for a meta-analysis (i.e., 22 countries). Thus, we approximated the prediction interval bounds using the smallest and largest calibrated effect sizes resulting in a \(\frac{k-1}{k+1}\times 100\%\) prediction interval in line with Wang and Lee’s method of approximating prediction intervals without extrapolating beyond the observed data. The bounds for the prediction interval are at an \(\frac{<span class='convertEndash'><span class='convertEndash'><span class='convertEndash'>22-1</span></span></span>}{22+1}\times 100\approx\) 91% confidence level for most outcomes. We say most outcomes because some subgroup means/proportions were not estimable in some countries with smaller sample sizes. For example, in Egypt, we could not estimate the mean/proportion for the “other” gender group because no participants were in this category.

Population weighted meta-analysis

A fixed effects meta-analysis was conducted as a supplemental analysis to the random effects meta-analysis described above, providing an opportunity for researchers to consider both sets of results depending on which interpretative approach is most appropriate for their purposes. Inferences focused on differences across countries may utilize the random effects estimates, as these align with the target of inference, whereas analyses giving individuals equal weight align more with the results of the supplemental fixed effects meta-analyses. While the random effects meta-analysis assumes a distribution over the subgroup means/proportions across countries relaxing the measurement invariance assumption somewhat, the supplemental fixed effects does not a assume a distribution over the values meta-analyzed but more directly estimates the weighted average over countries where the weight in this analysis is the total 2023 population (rather than the observed sample size) within each country. Note the fixed effects approach taken here essentially estimates the effect across individuals in the various countries, and can be given this interpretation even if there is heterogeneity across countries in effect sizes [51]. The meta-analytic estimate is

$$\widehat{\theta }=\frac{\sum {w}_{i}{y}_{i}}{\sum {w}_{i}},$$

where \({y}_{i}\) is mean/proportion of the outcome within a subgroup for each country and \({w}_{i}\) is the weight for each country. A common choice for the weight is the inverse of the sampling variance \({v}_{i}\), but in this analysis we aimed to estimate the overall average by treating individuals with equal weight instead of countries with equal weight, and without assuming a common mean across countries. We therefore once again used a weight for each country that scales based on the total 2023 population size of each country.

Using the population sizes provided by Gallup, the fixed effects meta-analysis estimated the average subgroup mean/proportion, weighted by the population size of each country. The country sizes used to create weights are shown in Table 1.

Table 1 Gallup provided estimates of population sizes for population weighted meta-analysis

Global p values (combining p values from country-specific tests)

The harmonic mean p value was used to combine p values across different countries [52, 53]. The combined p value was used to test the null hypothesis of no differences in the mean/proportion for each construct indicator among subgroups in all countries, against the alternative hypothesis that in at least one country the mean/proportion differs among subgroups defined by that demographic variable. The harmonic mean p value method is more robust to dependency among pooled p values [53]. Although the country-specific tests are technically independent—an underlying assumption of most classic approaches to pooling p values [54]—assuming independence of the p values may not be entirely tenable given a common underlying set of items, translation procedures, data cleaning techniques, and imputation models. To account for multiple testing, we present Bonferroni-corrected p value thresholds for the meta-analytic results based on the number of demographic variables included [55, 56] in the primary meta-analytic results (corrected threshold \(\alpha =0.007\)). The Bonferroni adjustment for multiplicity was applied to the significance level cutoff (alpha) and not the p values (we divided alpha by the number of tests and not multiplying the p values by the number of tests). Providing the standard 0.05 significance threshold and Bonferroni-adjusted significance threshold provides transparency in how multiplicity was considered. However, the reported harmonic mean p value is relatively robust to multiple testing already maintaining a constant Type-I error rate regardless of the number of tests being conducted [53].

Example analysis—purpose in life

We will illustrate the aforementioned methodology and analyses and corresponding results with an example concerning understanding one’s purpose in life; see Kim et al. [14] for further details.

Construct overview and importance

A sense of purpose in life, the extent that people see their lives as having a sense of direction and goals that are anchored in core values, is a central component of human well-being [57, 58]. This factor is important in its own right, but it is also important because it shapes people’s trajectories of psychological, social, behavioral, spiritual, and physical health [59,60,61,62,63,64,65,66,67]. One indicator of purpose in life in the GFS survey is the item, “I understand my purpose in life,” self-reported from 0 = Strongly disagree to 10 = Strongly agree [13], which will be used to illustrate the analytic approach.

Illustrative results

Table 2 displays the nationally representative descriptive statistics for the 7 demographic variables of the entire observed sample that were assessed consistently across the 22 countries included in Wave 1 of the GFS (N = 202,898). Participant ages ranged the entire adult lifespan (18–80 +). The gender distribution was nearly balanced across female (51%) and male (49%), along with a small representation from other gender identities (0.3%). Most participants were married (53%), attained 9–15 years of education (57%), native-born (94%), and employed for an employer (39%). Regular attendance at religious services varied, with most never attending (37%), some attending once a week (19%), and others attending once a week or more (13%).

Table 2 Demographic characteristics of the Global Flourishing Study sample (wave 1)

Table 3 provides the overall country-specific estimates of purpose ordered by magnitude, along with the 95% confidence interval of the mean, standard deviation, and Gini coefficient. A similar table is presented in all construct-specific manuscripts that include demographic variation analyses following the template reported in this article. Caution should be applied when comparing means/proportions across countries because of differences in translation across languages, cultural differences in response styles, and potential seasonal variation; assessments were made in different countries during different times of the year, and this variation in timing might also influence results.

Table 3 Ordered mean of purpose in life score of each country

Countries from Asia (e.g., #1 Indonesia, #3 Philippines), Africa (e.g., #2 Kenya, #6 Nigeria, #7 South Africa, #9 Tanzania), and Latin America (e.g., #4 Mexico) dominated the top 10 rankings. This diversity suggests that high levels of understanding one’s purpose in life can transcend geographical and cultural boundaries, indicating an ability to achieve purpose in diverse circumstances. Traditionally high-income and more individualistic countries like #14 Spain, #16 Turkey, #17 Germany, #18 the United States, #19 Australia, #20 Sweden, and #21 the United Kingdom were lower in the rankings. This trend suggests that societies with strong community bonds, familial connections, rich cultural heritage, and perhaps a greater reliance on spiritual or religious frameworks tend to report higher levels of understanding one’s purpose in life. See Kim and colleagues [14] for further results and interpretation, along with an analogous set of results concerning another item on meaning in life.

Next, the results of the random effects meta-analyses that combined the subgroup means for the 7 demographic variables that were assessed consistently across all 22 countries are presented in Table 4. Religious affiliation and racial/ethnic identity were not measured consistently across the countries, so these demographic characteristics were not included in the meta-analyses. Each row of Table 4 represents a unique meta-analysis, totaling 34 in all; the online supplemental material in each manuscript will have a corresponding forest plot for each demographic category included in the meta-analysis, which displays the meta-analyzed effect estimate and the country-specific effect estimates. An example forest plot for the meta-analysis of mean purpose in life scores across countries for the 18–24 years age category is shown in Fig. 1.

Fig. 1
figure 1

Forest plot for the meta-analysis of purpose in life mean scores for 18–24 years age category

Table 4 Random effects meta-analyses for purpose in life outcome means by demographic category

Additionally, the forest plots are constructed such that all means/proportions are ordered by magnitude, and the y-axis varies to allow for a quick inspection of which countries have a high or low relative mean/proportion and whether these orders are similar across demographic categories. In addition to the forest plots for each mean/proportion, the online supplemental material of each manuscript will report the summary statistics (weighed counts/proportions) of each demographic characteristic and the weighted subgroup means/proportions for each country separately. These alternative groupings of results by country allows for comparison across demographic groups within a country, whereas the forest plots allow for comparison within demographic groups across countries. The meta-analyses using a population weighted (fixed effects) approach to estimate effects if we were to weigh within-country results by the size of the population the sample represents. Those results are not shown here but are reported in the online supplemental material of each individual manuscript.

Strengths and limitations

The analytic methodology employed for the study of demographic variation of flourishing has several strengths and limitations that should be noted. A notable strength is the broad population coverage of the GFS. The countries included in Wave 1 of the GFS encompass approximately 64% of the world’s population [3]. Most of the analytic methods employed are relatively well-established, with a long history of being used in either epidemiology, public health, psychology, or sociology. We aimed to employ rigorous methods that appropriately incorporate the unique complex sampling design used in each country to obtain robust standard errors. All the code to reproduce analyses is openly available in several languages (R, SAS, SPSS, and Stata) for researchers to explore these data and results. All analyses were conducted at the country-level before being pooled using meta-analytic techniques to account for uncertainty in the estimates and quantify heterogeneity across countries. We used a random effects “distribution of effects” perspective of meta-analytic methods for our primary analyses but also reported a fixed effects “population weighted” perspective as a supplemental analysis. Using different theoretical perspectives for pooling estimates provides flexibility to the reader to interpret which set of effects are appropriate for their purposes.

There are limitations to consider as well. Sources of heterogeneity in the weighted subgroup means across countries could be due to seasonality effects, differences in interpretation, differences due to quality of translation, differences in mode of data collection, differences in the process and variables used for constructing respondent level weights, and other possible reasons depending on the specific construct of interest [16]. Most of the psychosocial constructs that were assessed only had a single item to represent the overall construct (e.g., sense of purpose in life), many of which were assessed with binary or ordinal response scales with few categories. However, it is not uncommon for such items to be used in large-scale epidemiologic studies such as the GFS, and decisions about which items and response scales to use were guided by several carefully planned phases of survey development [12]. These limitations result in some measures in the GFS survey to not be a suitable fit for answering certain research questions. The use of single-item assessments provides less construct coverage and generally lower true-score reliability, resulting in less power to detect differences across demographic categories [68]. The obtained country sample sizes for Wave 1 were relatively large (ranging from ~ 1500 to 38,000), which helps to mitigate this concern to a degree.

Several limitations on the side of the statistical methods employed are noted next. The precise implementations of the methods to account for the complex sampling design can sometimes be not fully transparent, especially in software packages that require a license (e.g., SAS, Stata). The use of several software packages helped to identify the effects of any software-specific peculiarities. A common issue we needed to deal with involved handling “lonely PSUs” [34], but we aimed to always use a “certainty” specification that fixed the variance contribution to zero in such cases when estimating variance components. This approach has the limitation of potentially underestimating the variance, or standard error, for a particular estimate. However, to the best of our knowledge, there is no generally agreed upon approach for handling such instances, and our aim is to be transparent about these decisions to reduce non-reproducibility because of unclear analytic decisions and researcher degrees of freedom [69].

The analyses outlined in this article are relatively straightforward, but also varied to allow for multiple interpretive lenses to be applied (e.g., within-country vs. cross-country patterns). Implementing these coordinated analyses has its challenges, such as complications implementing analyses using complex sampling weights, multiple imputation, and meta-analysis across several statistical packages, and yet we found remarkably similar results across packages in spite of slightly different implementations [23].

Conclusions

The current article provides a description of the methods used in manuscripts reporting demographic variation analyses that leverage Wave 1 of the GFS, most of which are being considered for publication as a coordinated set of manuscripts based on the GFS. Using nationally representative data from 22 geographically and culturally diverse countries around the world, the set of planned demographic variation analyses of construct indictor prevalences/means related to well-being can play a role in identifying potentially vulnerable populations with lower well-being and also in identifying trends across countries and eventually also over time. The trends identified in the demographic variation analyses, supported by the methods described in this article, may also help shape the development of future interventions or policies aiming to promote well-being specifically for those vulnerable populations. However, the demographic differences themselves are purely descriptive and should not be interpreted causally. The interested reader is referred to our companion article, Analytic Methodology for Childhood Predictors Analyses for Wave 1 of the Global Flourishing Study [70], for a description of the methods used in the childhood predictors analyses of GFS outcomes manuscripts in the special collection.

Data availability

Data for Wave 1 of the GFS is available through the Center for Open Science (https://www.cos.io/gfs) upon submission of a pre-registration, and will be openly available without pre-registration beginning February 2025. Subsequent waves of the GFS will similarly be made available. Please see https://www.cos.io/gfs-access-data for more information about data access. Code for the GFS demographic variation analyses in multiple software is openly available (https://doiorg.publicaciones.saludcastillayleon.es/10.17605/osf.io/vbype).

References

  1. Crabtree S, English C, Johnson BR, Ritter Z, VanderWeele TJ. Global flourishing study: questionnaire development report. Gallup Inc.; 2021. Retrieved on 2024-05-10 from https://osf.io/y3t6m.

  2. Crabtree S, English C, Johnson BR, Ritter Z, VanderWeele TJ. Global flourishing study: 2024 questionnaire development report. Gallup Inc.; 2024. Retrieved on 2024-05-10 from https://osf.io/y3t6m.

  3. Johnson BR, VanderWeele TJ. The global flourishing study: a new era for the study of well-being. Int Bull Mission Res. 2022;46(2):272–5. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1177/23969393211068096.

    Article  Google Scholar 

  4. Adler MD, Fleurbaey M, (eds), The Oxford Handbook of Well-Being and Public Policy. New York: Oxford University Press; 2016. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1093/oxfordhb/9780199325818.001.0001.

  5. Crespo RF, Mesurado B. Happiness economics, eudaimonia and positive psychology: from happiness economics to flourishing economics. J Happiness Stud. 2015;16:931–46.

    Google Scholar 

  6. Huppert FA, So TT. Flourishing across Europe: application of a new conceptual framework for defining well-being. Soc Indic Res. 2013;110:837–61.

    PubMed  Google Scholar 

  7. Lomas T. Making waves in the great ocean: a historical perspective on the emergence and evolution of wellbeing scholarship. J Posit Psychol. 2022;17(2):257–70.

    Google Scholar 

  8. Seligman ME. Flourish: a visionary new understanding of happiness and well-being. New York: Simon and Schuster; 2011.

  9. Trudel-Fitzgerald C, Millstein RA, Von Hippel C, Howe CJ, Tomasso LP, Wagner GR, VanderWeele TJ. Psychological well-being as part of the public health debate? Insight into dimensions, interventions, and policy. BMC Public Health. 2019;19(1):1–11.

    Google Scholar 

  10. VanderWeele TJ, McNeely E, Koh HK. Reimagining health—flourishing. JAMA. 2019;321(17):1667–8.

    PubMed  Google Scholar 

  11. Henrich J, Heine SJ, Norenzayan A. Most people are not WEIRD. Nature. 2010;466(7302):1–29. Available from:https://doiorg.publicaciones.saludcastillayleon.es/10.1038/466029a.

    Article  CAS  Google Scholar 

  12. Lomas T, Bradshaw M, Case B, Cowden R, Fogelman A, Johnson K, et al. The development of the global flourishing study survey: charting the evolution of a new 109 item inventory of human flourishing. BMC Glob Public Health. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s44263-025-00139-9.

  13. VanderWeele TJ. On the promotion of human flourishing. Proc Natl Acad Sci. 2017;114(31):8148–56.

    PubMed  PubMed Central  CAS  Google Scholar 

  14. Kim ES, Padgett N, Bradshaw M, Shiba K, Chen Y, Ritchie-Dunham JL, et al. Mapping demographic variations in purpose and meaning across the world: a cross-national analysis of 22 countries in the global flourishing study. Under review. PsyArXiv Preprint available from: https://doiorg.publicaciones.saludcastillayleon.es/10.31219/osf.io/kme7y_v1.

  15. Ritter Z, Srinivasan R, Han Y, Chattopadhyay M, Honohan J, Johnson BR, VanderWeele TJ. Global flourishing study methodology. Gallup Inc; 2024. Available from: https://osf.io/k2s7u.

  16. Padgett RN, Cowden RG, Chattopadhyay M, Han Y, Honohan J, Ritter Z, Srinivasan R, Johnson BR, VanderWeele TJ. Survey sampling design in wave 1 of the global flourishing study. Eur J Epidemiol. In press. https://doiorg.publicaciones.saludcastillayleon.es/10.31234/osf.io/yuc4q.

  17. Harkness JA. Questionnaire translation. In: Harkness JA, Van de Vijver FJ, Mohler PP, editors. Cross-cultural survey methods, vol. 325. Hoboken: Wiley; 2003. p. 35–56.

  18. Johnson BR, Ritter Z, Fogleman A, Markham L, Stankov T, Srinivasan R, et al. The global flourishing study. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.17605/OSF.IO/3JTZ8.

  19. R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2024.

    Google Scholar 

  20. StataCorp. Stata statistical software: release 18. College Station: StataCorp LLC; 2023.

    Google Scholar 

  21. SAS Institute Inc. SAS/STAT 9.3 user’s guide. Cary: SAS Institute Inc; 2011.

    Google Scholar 

  22. IBM Corp. IBM SPSS Statistics for Windows,Version 29.0.2.0. Armonk: IBM Corp; 2023.

    Google Scholar 

  23. Padgett RN, Cowden RG, Bradshaw M, Chen Y, Jang SJ, Shiba K, Johnson BR, VanderWeele TJ. On coordinating “simple” analyses of international survey across multiple statistical software packages: a case study from the global flourishing study. Open Science Framework; 2024. Available from: https://osf.io/ebu2r.

  24. Cowden RG, Skinstad D, Lomas T, Johnson BR, VanderWeele TJ. Measuring wellbeing in the global flourishing study: insights from a cross-national analysis of cognitive interviews from 22 countries. Qual Quant. 2024. Available from:https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11135-024-01947-1.

    Article  Google Scholar 

  25. Johnson KA, Moon JW, VanderWeele TJ, Schnitker S, Johnson BR. Assessing religion and spirituality in a cross-cultural sample: development of religion and spirituality items for the global flourishing study. Relig Brain Behav. 2023:1–14. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/2153599X.2023.2217245.

  26. Lumley T. Analysis of complex survey samples. J Stat Softw. 2004;9(1):1–19.

    Google Scholar 

  27. Lumley T. Complex surveys: a guide to analysis using R. Hoboken: John Wiley & Sons; 2010.

    Google Scholar 

  28. Woodruff RS. A simple method for approximating the variance of a complicated estimate. J Am Stat Assoc. 1971;66(334):411–4.

    Google Scholar 

  29. Gini C. The scientific basis of fascism. Polit Sci Q. 1927;42(1):99–115.

    Google Scholar 

  30. Osier G. Variance estimation for complex indicators of poverty and inequality using linearization techniques. Surv Res Methods. 2009;3(3):167–95. https://doiorg.publicaciones.saludcastillayleon.es/10.18148/srm/2009.v3i3.369.

  31. Wolter KM. Introduction to variance estimation. Chicago: Springer; 2007.

  32. Lumley T, Scott A. Tests for regression models fitted to survey data. Aust N Z J Stat. 2014;56(1):1–14. Available from: doi: 10.1111/anzs.12065.

    Article  Google Scholar 

  33. Rao JN, Scott AJ. On chi-squared tests for multiway contingency tables with proportions estimated from survey data. Ann Stat. 1984;12(1):46–60.

    Google Scholar 

  34. Schneider B. How are R and Stata (mis)handling singleton strata? Pract Significance; 2022. Available from: https://www.practicalsignificance.com/posts/bugs-with-singleton-strata/. Accessed 25 Jan 2024.

  35. Padgett RN, Bradshaw M, Chen Y, Jang SJ, Shiba K, Johnson BR, VanderWeele TJ. Global flourishing study statistical analyses code. Center for Open Science; 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.17605/osf.io/vbype.

  36. Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. Available from: doi: 10.1136/bmj.b2393.

    Article  PubMed  PubMed Central  Google Scholar 

  37. van Buuren S. Flexible imputation of missing data. 2nd ed. Available from: https://stefvanbuuren.name/fimd/. Retrieved on February 7, 2024.

  38. van Buuren S, Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1–67. https://doiorg.publicaciones.saludcastillayleon.es/10.18637/jss.v045.i03.

    Article  Google Scholar 

  39. White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377–99.

    PubMed  Google Scholar 

  40. Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48. https://doiorg.publicaciones.saludcastillayleon.es/10.18637/jss.v036.i03.

    Article  Google Scholar 

  41. Lipsey MW, Wilson DB. Practical meta-analysis. Thousand Oaks: SAGE Publications; 2001.

    Google Scholar 

  42. Borenstein M, Hedges LV, Higgins JP, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 2010;1(2):97–111. Available from: doi: 10.1002/jrsm.12.

    Article  PubMed  Google Scholar 

  43. Frank MC, Braginsky M, Cachia J, Coles NA, Hardwicke TE, Hawkins RD, Mathur MB, Williams R. Experimentology: an open science approach to experimental psychology methods. MIT Press; 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.7551/mitpress/14810.001.0001.

  44. Hunter JE, Schmidt FL. Fixed effects vs. random effects meta-analysis models: implications for cumulative research knowledge. Int J Select Assess. 2000;8(4):275–92. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/1468-2389.00156.

    Article  Google Scholar 

  45. Paule RC, Mandel J. Consensus values and weighting factors. J Res Natl Bur Stand. 1982;87(5):377–85.

    Google Scholar 

  46. Viechtbauer W. Bias and efficiency of meta-analytic variance estimators in the random-effects model. J Educ Behav Stat. 2005;30(3):261–93.

    Google Scholar 

  47. Viechtbauer W, López-López JA, Sánchez-Meca J, Marín-Martínez F. A comparison of procedures to test for moderators in mixed-effects meta-regression models. Psychol Methods. 2015;20(3):360–74.

    PubMed  Google Scholar 

  48. Mathur MB, VanderWeele TJ. New metrics for meta-analyses of heterogeneous effects. Stat Med. 2019;38(8):1336–42.

    PubMed  Google Scholar 

  49. Mathur MB, VanderWeele TJ. Robust metrics and sensitivity analyses for meta-analyses of heterogeneous effects. Epidemiology. 2020;31(3):356–8.

    PubMed  PubMed Central  Google Scholar 

  50. Wang C-C, Lee W-C. A simple method to estimate prediction intervals and predictive distributions: summarizing meta-analyses beyond means and confidence intervals. Res Synth Methods. 2019;10(2):255–66.

    PubMed  Google Scholar 

  51. Rice K, Higgins JPT, Lumley T. A re-evaluation of fixed effect(s) meta-analysis. J R Stat Soc Series A. 2018;181(1):205–27. Available from:https://doiorg.publicaciones.saludcastillayleon.es/10.1111/rssa.12275.

    Article  Google Scholar 

  52. Vovk V, Wang R. Combining p-values via averaging. Biometrika. 2020;107(4):791–808.

    Google Scholar 

  53. Wilson DJ. The harmonic mean p-value for combining dependent tests. Proc Natl Acad Sci USA. 2019;116(4):1195–200.

    PubMed  PubMed Central  CAS  Google Scholar 

  54. Loughin TM. A systematic comparison of methods for combining p-values from independent tests. Comput Stat Data Anal. 2004;47(3):467–85. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.csda.2003.11.020.

    Article  Google Scholar 

  55. Abdi H. Bonferroni and Šidák corrections for multiple comparisons. Encyclopedia of Measurement and Statistics. 2007;3(1):1–9.

    Google Scholar 

  56. VanderWeele TJ, Mathur MB. Some desirable properties of the Bonferroni correction: is the Bonferroni correction really so bad? Am J Epidemiol. 2019;188(3):617–8.

    PubMed  Google Scholar 

  57. Frankl VE. Man’s search for meaning. Boston: Beacon Press; 2006.

  58. Ryff CD. Psychological well-being revisited: advances in the science and practice of eudaimonia. Psychother Psychosom. 2014;83(1):10–28. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1159/000353263.

    Article  PubMed  Google Scholar 

  59. Cohen R, Bavishi C, Rozanski A. Purpose in life and its relationship to all-cause mortality and cardiovascular events: a meta-analysis. Psychosom Med. 2016;78(2):122–33. Available from:https://doiorg.publicaciones.saludcastillayleon.es/10.1097/PSY.0000000000000274.

    Article  PubMed  Google Scholar 

  60. Guimond AJ, Shiba K, Kim ES, Kubzansky LD. Sense of purpose in life and inflammation in healthy older adults: a longitudinal study. Psychoneuroendocrinology. 2022;141: 105746.

    PubMed  PubMed Central  CAS  Google Scholar 

  61. Kim ES, Chen Y, Nakamura JS, Ryff CD, VanderWeele TJ. Sense of purpose in life and subsequent physical, behavioral, and psychosocial health: an outcome-wide approach. Am J Health Promot. 2021. Available from: doi: 10.1177/08901171211038545.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Kim ES, Nakamura JS, Strecher VJ, Cole SW. Reduced epigenetic age in older adults with high sense of purpose in life. J Gerontol A Biol Sci Med Sci. 2023;78(7):1092–9. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gerona/glad092.

  63. Kim ES, Strecher VJ, Ryff CD. Purpose in life and use of preventive health care services. Proc Natl Acad Sci U S A. 2014;111(46):16331–6. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.1414826111.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Okuzono SS, Shiba K, Kim ES, Shirai K, Nakaya N, Fujiwara T, et al. Ikigai and subsequent health and wellbeing among Japanese older adults: longitudinal outcome-wide analysis. Lancet Reg Health West Pac. 2022;21: 100391.

    PubMed  PubMed Central  Google Scholar 

  65. Zilioli S, Slatcher RB, Ong AD, Gruenewald TL. Purpose in life predicts allostatic load ten years later. J Psychosom Res. 2015;79(5):451–7.

    PubMed  PubMed Central  Google Scholar 

  66. Steptoe A, Fancourt D. Leading a meaningful life at older ages and its relationship with social engagement, prosperity, health, biology, and time use. Proc Natl Acad Sci. 2019;116(4):1207–12. Available from:https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.1814723116.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Sutin AR, Luchetti M, Aschwanden D, Stephan Y, Sesker AA, Terracciano A. Sense of meaning and purpose in life and risk of incident dementia: new data and meta-analysis. Arch Gerontol Geriatr. 2023;105: 104847.

    PubMed  Google Scholar 

  68. Zimmerman DW, Zumbo BD. Resolving the issue of how reliability is related to statistical power: Adhering to mathematical definitions. J Mod Appl Stat Methods. 2015;14(2):9–26.

    Google Scholar 

  69. Silberzahn R, Uhlmann EL, Martin DP, Anselmi P, Aust F, Awtrey E, et al. Many analysts, one data set: Making transparent how variations in analytic choices affect results. Adv Methods Pract. Psychol Sci. 2018;1(3):337–56. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/2515245917747646.

    Article  Google Scholar 

  70. Padgett RN, Bradshaw M, Chen Y, Cowden RG, Jang SJ, Kim ES, Shiba K, Johnson BR, VanderWeele TJ. Analytic methodology for childhood predictor analyses for wave 1 of the global flourishing study. BMC Glob Public Health. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s44263-025-00142-0.

Download references

Acknowledgements

Not applicable.

Funding

The GFS was supported by funding from the John Templeton Foundation (grant #61665), Templeton Religion Trust (#1308), Templeton World Charity Foundation (#0605), Well-Being for Planet Earth Foundation, Fetzer Institute (#4354), Well Being Trust, Paul L. Foster Family Foundation, and the David and Carol Myers Foundation. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of these organizations.

Author information

Authors and Affiliations

Authors

Contributions

TJV, BRJ, MB, YC, SJJ, RNP, and KS coordinated writing of code for different software and preparing analysis scripts; MB lead writing scripts for Stata; YC lead writing scripts for SAS; SJJ lead writing scripts for SPSS (which later turned into using R through SPSS); KS lead writing scripts for R; RNP contributed as needed for each package and wrote the meta-analysis online app; RNP and RGC drafted the initial version of this article; KSE wrote the initial interpretation of the sense of purpose application; all authors reviewed and help revise the manuscript; and BRJ and TJV are the principal investigators for the Global Flourishing Study.

Corresponding author

Correspondence to Tyler J. VanderWeele.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was granted by the institutional review boards at Baylor University (IRB Reference #: 1841317) and Gallup (IRB Reference #: 2021–11-02). Gallup’s IRB is an international organization providing coverage for all countries included in the GFS. All participants provided informed consent. The research conformed to the principles of the Helsinki Declaration.

Consent for publication

Consent by participants was given for their responses on the GFS to be used in subsequent publications.

Competing interests

Tyler J. VanderWeele reports partial ownership and licensing fees from Gloo Inc. The remaining authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Padgett, R., Bradshaw, M., Chen, Y. et al. Analytic methodology for demographic variation analyses for wave 1 of the global flourishing study. BMC Glob. Public Health 3, 28 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s44263-025-00140-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s44263-025-00140-2

Keywords