Calculate SMD Directly from Data

Calculates the standardized mean difference for a variable directly from a data frame, automatically detecting whether the variable is continuous or categorical.

Usage

calculate_smd_from_data(
  data,
  var,
  trt_var,
  ref_group = NULL,
  method = c("auto", "cohens_d", "hedges_g", "arcsine", "logit", "raw"),
  conf_level = 0.95,
  continuous_threshold = 10
)

Arguments

data

A data frame containing the analysis data.

var

Character. Name of the variable to calculate SMD for.

trt_var

Character. Name of the treatment group variable.

ref_group

Character or NULL. Value of the reference (control) group. If NULL, uses the first level of the treatment variable.

method

Character. Method for SMD calculation:

"cohens_d": Cohen's d for continuous variables
"hedges_g": Hedges' g (bias-corrected) for continuous variables
"arcsine": Arcsine transformation for binary/categorical
"logit": Logit transformation for binary/categorical variables
"raw": Raw proportions/means without transformation
"auto" (default): Automatically selects based on variable type

conf_level

Numeric. Confidence level for CI (default: 0.95)

continuous_threshold

Integer. Minimum number of unique values to treat numeric variables as continuous (default: 10). Used only when method = "auto".

Value

A named list with components:

smd: The standardized mean difference. For multi-level categorical variables, returns the SMD with the maximum absolute value, preserving sign.
ci_lower: Lower bound of confidence interval
ci_upper: Upper bound of confidence interval
method: Method used
var_type: Detected variable type ("continuous" or "categorical")
se: Standard error of the SMD

Details

When method = "auto":

Numeric variables with > continuous_threshold unique values are treated as continuous (using Cohen's d)
Numeric variables with <= continuous_threshold unique values are treated as categorical
Character/factor variables are treated as categorical (using arcsine)

For categorical variables with more than 2 levels, the function calculates the maximum absolute SMD across all pairwise level comparisons.

Method-specific considerations:

"logit": Useful for binary variables but requires boundary handling for proportions at 0 or 1 (adds 0.5/N continuity correction). Results are on the logit scale; back-transformation is not straightforward.
"raw": Appropriate when no transformation is desired. Calculates SMD directly from raw proportions for binary variables, standardized by the pooled standard deviation of the binary variable.

Examples

if (FALSE) { # \dontrun{
# Create example data
adsl <- data.frame(
  AGE = c(rnorm(100, 55, 12), rnorm(100, 54, 11)),
  SEX = c(sample(c("M", "F"), 100, replace = TRUE, prob = c(0.4, 0.6)),
          sample(c("M", "F"), 100, replace = TRUE, prob = c(0.45, 0.55))),
  TRT01P = rep(c("Treatment", "Placebo"), each = 100)
)

# Continuous variable
calculate_smd_from_data(adsl, "AGE", "TRT01P", ref_group = "Placebo")

# Categorical variable
calculate_smd_from_data(adsl, "SEX", "TRT01P", ref_group = "Placebo")
} # }

Usage

Arguments

Value

Details

See also

Examples