class: center, middle, inverse, title-slide # Specification Curve Analysis: A practical guide ## COMM 783 | University of Pennsylvania ###
Dani Cosme
### 2022-04-05 --- # The problem -- There are many different ways to test a given association and we usually only report one or a few model specifications. -- Model selection relies on choices by the researcher, and these choices are often arbitrary and sometimes driven by a desire for significant results. -- Given the same dataset, two researchers might choose to answer the same question in very different ways. <img src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41562-020-0912-z/MediaObjects/41562_2020_912_Fig1_HTML.png" width="70%" style="display: block; margin: auto;" /> Figure from [Simonsohn, Simmons, & Nelson, 2020](https://www.nature.com/articles/s41562-020-0912-z) --- # The solution -- According to [Simonsohn, Simmons, & Nelson, 2020](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2694998), the solution is to specify all "reasonable" models to test an association and assess the joint distribution across model specifications. -- <br> > Some researchers object to blindly running alternative specifications that may make little sense for theoretical or statistical reasons just for the sake of ‘robustness’. We are among those researchers. We believe one should test specifications that vary in as many of the potentially ad hoc assumptions as possible without testing any specifications that are not theoretically grounded. -- <br> This can be thought of as an explicit framework for sensitivity analyses / robustness checks, that enables inferential statistics across model specifications. --- # Preregistration & SCA --- # The value -- A better understanding of how conceptual and analytic decisions alter the association of interest. -- A more robust scientific literature with <span style="background-color: #21B0D4">increased generalizability and translational value</span>. -- .center[ <img src="https://media.giphy.com/media/WUq1cg9K7uzHa/giphy.gif" width="50%" /> ] --- # SCA overview -- ### 1. Specify all reasonable models -- ### 2. Plot specification curve showing association/effect estimates as a function of decisions -- ### 3. Test how inconsistent the curve results are given the null hypothesis of no association/effect --- ## 1. Identify reasonable models For the relationship of interest, determine the set of reasonable model specifications to test. -- #### Reasonable specifications should be: + Consistent with theory + Expected to be statistically valid + Non-redundant .center[ <img src="img/table1_2020.png" width="70%" /> ] Table from [Simonsohn, Simmons, & Nelson, 2020](https://www.nature.com/articles/s41562-020-0912-z) --- ## 1. Identify reasonable models Del Guidice guidance --- ## 2. Descriptive specification curve The specification curve visualizes the strength of the association/effect between two constructs of interest across model specifications and the analytic decisions associated with each model specification. -- .pull-left[ **Key features** * Two panels depicting 1) the curve and 2) the decisions * A vertical slice = information about a single model specification ] .pull-right[ <img src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41562-020-0912-z/MediaObjects/41562_2020_912_Fig2_HTML.png" width="100%" /> Figure from [Simonsohn, Simmons, & Nelson, 2020](https://www.nature.com/articles/s41562-020-0912-z) ] --- ## 2. Descriptive specification curve -- #### The curve * Model specifications are ranked -- * Shows the model magnitude, sign (positive or negative), and statistical significance -- * Often visualizes uncertainty around individual point estimates in the model specifications -- * May highlight a single a priori or previously reported association/effect estimates -- #### The decisions -- * Each row denotes a specific decision and whether or not that decision applied to a given model specification -- * Decisions are often grouped into categories to ease interpretation --- ## SCA examples .panelset[ .panel[.panel-name[Simonsohn et al., 2020] <img src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41562-020-0912-z/MediaObjects/41562_2020_912_Fig2_HTML.png" width="70%" style="display: block; margin: auto;" /> Figure from [Simonsohn, Simmons, & Nelson, 2020](https://www.nature.com/articles/s41562-020-0912-z) ] .panel[.panel-name[Orben & Przybylski, 2019] <img src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41562-018-0506-1/MediaObjects/41562_2018_506_Fig3_HTML.png" width="52%" style="display: block; margin: auto;" /> Example specification curve analysis (SCA) from [Orben & Przybylski, 2019](http://nature.com/articles/s41562-018-0506-1) ] .panel[.panel-name[Cosme & Lopez, 2020] <img src="https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/scan/PAP/10.1093_scan_nsaa155/2/nsaa155f1.jpeg?Expires=1652231337&Signature=bF-qwPQM2uk9U4ax9HcovP1UD5hfZarY7pDGZDfmV5cDwPYq~YWLDiHvpfoQtN0FwieSOYIMk1n5mLuS57oG36-atTuf9653INkwoEjrNge86E3dEqdzor~D3yhurWGE09vsuqLlm8M3TDr8IrFuujWzjFIGTWoVhxRW3zwt3cM1yzizhBTHZDmYCGInJXbQPQUFdrSYakpuRjTj8kj8RkkMWlnW4w3zRJrr0854Zvm4TobGS-IYDBTIRy9uBo35A9vt3h6-t2qWCC7tiTUi3xvIPxDiUIvOFFuLMO09~OdLSyFaMSpSxfwaD8fdKpaNfTlUPelvjsgdWkficjbIyQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA" width="57%" style="display: block; margin: auto;" /> Example specification curve analysis (SCA) from [Cosme & Lopez](https://psyarxiv.com/23mu5) ] .panel[.panel-name[Cosme et al., preprint] <br> <img src="img/cosme_curve.png" width="35%" style="display: block; margin: auto;" /> Example specification curve analysis (SCA) from [Cosme et al., preprint](https://psyarxiv.com/9cxfj) ] ] --- ## 3. Inferential statistics <br> #### Metrics of interest in the observed curve * Median curve estimate * The share of positive or negative associations that are statistically significant -- <br> .center[ ###But are the observed effects surprising given the null hypothesis? ] --- ## 3. Inferential statistics #### Test inconsistency with the null -- + Use bootstrap resampling to create a distribution of curves under the null hypothesis -- + Experimental designs = shuffle the randomly assigned variable(s) -- + Observational designs = "force" the null by removing the effect of x on y for each observation and sample from this null dataset -- + Estimate the curve and extract the curve median, and share of positive/negative significant associations -- + Repeat many times to get a distribution -- + Compare observed curve metrics to null curves to generate p-values --- ## 3. Inferential statistics #### Potential questions to test versus null -- + Is the median effect size in the observed curve statistically different than in the null distribution? -- + Is the share of dominant signs (e.g., positive or negative effects) that are statistically significant different than the null? <img src="img/table2_2020.png" width="75%" style="display: block; margin: auto;" /> Table from [Simonsohn, Simmons, & Nelson, 2020](https://www.nature.com/articles/s41562-020-0912-z) --- class: center, middle # Tutorial --- ## 1. Define reasonable specifications <br><br><br> .center[ ## What is the relationship between mental health ## and satisfaction with life? ] --- ## 1. Define reasonable specifications -- #### Ways of operationalizing of the IV "mental health" * `CEDS10` = depression score on the CESD-10 * `GAD` = anxiety score on the GAD-7 * `PANAS_negative_Affect` = negative affect score on the PANAS * `PSS` = perceived stress score on the PSS -- #### Control variables * `age` = age * `gender` = gender * `mother_edu` = maternal education -- #### Analytic decisions * Statistical modeling approach * Linear regression * Outliers * Use all data points * Winsorize to the mean +/- 3 SD --- ## 1. Define reasonable specifications ### Visualize decisions ![](SCA_tutorial_inferential_presentation_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Prep data These data are generated based on an existing dataset from a study looking at health and well-being * Create winsorized independent variables (+/- 3 SD from the mean) * Mean center and standardize each variable .scroll-output[ ```r # load data df = read.csv("sca_tutorial_inferences_data.csv", stringsAsFactors = FALSE) # tidy for modeling model_df = df %>% gather(variable, value, -PID, -age, -gender, -mother_edu, -SWLS) %>% group_by(variable) %>% mutate(mean_value = mean(value, na.rm = TRUE), sd3 = 3*sd(value, na.rm = TRUE)) %>% ungroup() %>% mutate(value_winsorized = ifelse(value > mean_value + sd3, mean_value + sd3, ifelse(value < mean_value - sd3, mean_value - sd3, value)), variable_winsorized = sprintf("%s_winsorized", variable)) %>% select(-mean_value, -sd3) %>% spread(variable_winsorized, value_winsorized) %>% group_by(PID) %>% fill(contains("winsorized"), .direction = "downup") %>% spread(variable, value) %>% gather(variable, value, -PID, -age, -gender, -mother_edu) %>% group_by(variable) %>% mutate(value = scale(value, center = TRUE, scale = TRUE)) %>% spread(variable, value) %>% select(PID, age, gender, mother_edu, SWLS, sort(tidyselect::peek_vars())) # print # model_df %>% # mutate_if(is.numeric, ~round(., 2)) %>% # DT::datatable(rownames = FALSE, extensions = 'FixedColumns', # options = list(scrollX = TRUE, # scrollY = TRUE, # pageLength = 5, # autoWidth = TRUE, # fixedColumns = list(leftColumns = 1), # dom = 't')) ``` ] --- ## Plot distributions ![](SCA_tutorial_inferential_presentation_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ## 2. Specify and estimate models There are various methods for running a large number of model specifications. Here, we unpack 3 different methods and discuss the advantages and disadvantages. --- ## estimate models using `MuMIn::dredge` .panelset[ .panel[.panel-name[info] #### Advantages * Easily runs all nested models from a maximal model * Provides model fit indices (e.g. AIC, BIC) #### Disadvantages / limitations * The max number of predictors = 30 * NAs must be omitted across all model specifications * Doesn’t give parameter estimates for factors directly; you need to extract them using `MuMIn::get.models()` * It may run nested models you aren't interested in if you don't subset * If you have multiple independent variables of interest, you will need to specify each maximal model separately and combine after running ] .panel[.panel-name[code] ``` ## Fixed term is "(Intercept)" ``` ] .panel[.panel-name[output]
] ] --- ## estimate models using `purrr` .panelset[ .panel[.panel-name[info] #### Advantages * Greater control over how models are specified #### Disadvantages / limitations * Need to code from scratch Note that if you are running linear mixed effects models, you can use the `broom.mixed` instead of the `broom` package to tidy the model output. ] .panel[.panel-name[code] ] .panel[.panel-name[output]
] ] --- ## estimate models using `specr` .panelset[ .panel[.panel-name[info] #### Advantages * Very simple to use! * You can easily run the models in specific subsets of your data #### Disadvantages / limitations * <controls> ] .panel[.panel-name[code] ] .panel[.panel-name[output]
] ] --- ## 3. Plot specification curve ### Guide to unpacking the curve Panel A * X-axis = ordered model specifications * Y-axis = standardized regression coefficient for the IV-DV relationship * Points = the standardized regression coefficient for a specific models * Error bars = 95% confidence intervals around the point estimate Panel B * X-axis = ordered model specifications (the same as panel A) * Y-axis (right) = analytic decision categories * Y-axis (left) = specific analytic decisions within each category * Lines = denote that a specific analytic decision was true for that model specification Color key * Red = regression coefficient was statistically significant values at p < .05 * Black/grey = regression coefficient was p > .05 --- ## Homebrew SCA plot with `purrr` output .panelset[ .panel[.panel-name[code] ```r # set plot aesthetics aes = theme_minimal(base_size = 11) + theme(legend.title = element_text(size = 10), legend.text = element_text(size = 9), axis.text = element_text(color = "black"), axis.line = element_line(colour = "black"), legend.position = "none", panel.border = element_blank(), panel.background = element_blank()) colors = c("yes" = "#F21A00", "no" = "black") # merge and tidy for plotting plot_data = output_purrr %>% arrange(estimate) %>% mutate(specification = row_number(), winsorized = ifelse(grepl("winsorized", x), "yes", "no"), significant.p = ifelse(p.value < .05, "yes", "no"), x = gsub("_winsorized", "", x)) # plot top panel top = plot_data %>% ggplot(aes(specification, estimate, color = significant.p)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high), size = .25, shape = "", alpha = .5) + geom_point(size = .5) + scale_color_manual(values = colors) + labs(x = "", y = "standardized\nregression coefficient\n") + aes # rename variables and plot bottom panel for (var in c(control_vars, "no covariates")) { plot_data = plot_data %>% mutate(!!var := ifelse(grepl(var, controls), "yes", "no")) } bottom = plot_data %>% gather(controls, control_value, eval(control_vars), `no covariates`) %>% gather(variable, value, x, controls, winsorized) %>% filter(control_value == "yes") %>% unique() %>% mutate(variable = factor(variable, levels = c("x", "winsorized", "controls"))) %>% ggplot(aes(x = specification, y = value, color = significant.p)) + geom_point(aes(x = specification, y = value), shape = 124, size = 3) + facet_grid(variable ~ 1, scales = "free_y", space = "free_y") + scale_color_manual(values = colors) + labs(x = "\nspecification number", y = "") + aes + theme(strip.text.x = element_blank()) ``` ] .panel[.panel-name[output] <img src="SCA_tutorial_inferential_presentation_files/figure-html/unnamed-chunk-16-1.png" width="360" style="display: block; margin: auto;" /> ] ] --- ## SCA plot with `specr` .panelset[ .panel[.panel-name[code] ```r output_specr %>% mutate(winsorized = ifelse(grepl("winsorized", x), "yes", "no"), x = gsub("_winsorized", "", x)) %>% plot_specs(., choices = c("x", "winsorized", "controls")) ``` ![](SCA_tutorial_inferential_presentation_files/figure-html/unnamed-chunk-17-1.png)<!-- --> ] .panel[.panel-name[output] <img src="SCA_tutorial_inferential_presentation_files/figure-html/unnamed-chunk-18-1.png" width="360" style="display: block; margin: auto;" /> ] ] --- ## 4. Inferential statistics 1. Run the SCA and extract the median, and the number of positive and negative statistically significant models 2. Use bootstrap resampling to create a distribution of curves under the null hypothesis 3. Generate p-values indicating how surprising the observed results are under the null hypothesis --- ## Null boostrap resampling .panelset[ .panel[.panel-name[overview] * Run SCA to retrieve associations between the independent and dependent variable in each model specification * Extract the dataset for each model specification (which was saved as a model object `fit` in the data frame) * Force the null on each specification by subtracting the effect of the independent variable of interest (b estimate * x) from the dependent variable (y_value) for each observation in the dataset * For each bootstrap, sample with replacement from the null dataset and run all model specifications to generate a curve * Extract median estimate, N positive & significant at p < .05, and N negative & significant p < .05 * Repeat process many times (e.g. 500 or 1000) ] .panel[.panel-name[define functions] `run_boot_null` = wrapper function to run the bootstrapping procedure or load an existing output file `sca_boot_null` = function that runs the boostrapping procedure `summarize_sca` = function to summarize the observed specification curve `summarize_boot_null` = function to summarize the bootstrapped curves ] .panel[.panel-name[bootstrap `specr` output]
] ] --- ## Generate inferential stats .panelset[ .panel[.panel-name[curve metrics] * Curve median * Share of positive statistically significant associations at p < .05 * Share of negative statistically significant associations at p < .05 ] .panel[.panel-name[p-values] These represent the number of times that an equally or more extreme value was observed in the bootstrapped null curve distribution (i.e. 1 / 500 = .002)
] ] --- ## Confidence intervals around curve metrics -- #### Sources of uncertainty * Sampling from the population of participants * Sampling from the population of reasonable model specifications -- #### Use bootstrap resampling at multiple levels * Create e.g. 500 bootstrap resampled datasets * For each dataset, sample from the model specifications e.g. 500, and estimate the curve for each of the 500 models and extract metrics * Repeat this process for each of the 500 datasets * Use this distribution to determine the confidence interval --- class:center, middle # Resources --- .panelset[ .panel[.panel-name[Reading list] * [Specification curve analysis - Simonsohn, Simmons, & Nelson, 2020](https://www.nature.com/articles/s41562-020-0912-z) * [Increasing Transparency Through a Multiverse Analysis - Steegen et al., 2016](https://journals.sagepub.com/doi/10.1177/1745691616658637) ] .panel[.panel-name[Example papers using SCA] * [Run All the Models! Dealing With Data Analytic Flexibility - Julia Rohrer](https://www.psychologicalscience.org/observer/run-all-the-models-dealing-with-data-analytic-flexibility) * [The association between adolescent well-being and digital technology use - Orben & Przybylski, 2019](http://nature.com/articles/s41562-018-0506-1) * [Screens, Teens, and Psychological Well-Being: Evidence From Three Time-Use-Diary Studies - Orben & Przybylski, 2019](https://journals.sagepub.com/doi/10.1177/0956797619830329) * [Neural indicators of food cue reactivity, regulation, and valuation and their associations with body composition and daily eating behavior - Cosme & Lopez, 2020](https://psyarxiv.com/23mu5/) ] .panel[.panel-name[Programming resources] * [`specr` documentation](https://masurp.github.io/specr/) * [EDUC 610, Functional Programming with R - Daniel Anderson](https://uo-datasci-specialization.github.io/c3-fun_program_r/schedule.html) * [R for Data Science - Grolemund & Wickham](https://r4ds.had.co.nz/many-models.html) ] ] --- class:center, middle # Thank you! The repository can be found at: [https://github.com/dcosme/specification-curves/](https://github.com/dcosme/specification-curves/) ---