Validates that a data_bundle conforms to the required structure for use in simulation tasks. The data_bundle is the output of a data_generator function and contains all data-related objects needed for model fitting and metric computation.
Details
The data_bundle must have the following structure:
train: A data.frame with at least 1 row (required)test: NULL or a data.frame (optional)response: A scalar character naming the response column in train (and test if not NULL)true_params: A named numeric vector where names exactly matchvars_of_interestvars_of_interest: A non-empty character vector of unique namesreferences: A named numeric vector where names exactly matchvars_of_interest(optional)meta: Optional named list with scalar values only
Validation rules:
trainmust be a data.frame with nrow >= 1testmust be NULL or a data.frameresponsemust be a scalar character present in train (and test if not NULL)true_paramsmust be a named numeric vectorvars_of_interestmust be a non-empty unique character vectorsetequal(names(true_params), vars_of_interest)must be TRUEIf provided,
setequal(names(references), vars_of_interest)must be TRUENo duplicate names in any named vector/list
metamust be a named list with scalar values only (if present)
Examples
# Valid data bundle
data_bundle <- list(
train = data.frame(x = 1:10, y = rnorm(10)),
test = data.frame(x = 11:15, y = rnorm(5)),
response = "y",
true_params = c(beta = 1.5, sigma = 0.5),
vars_of_interest = c("beta", "sigma"),
references = c(beta = 0, sigma = 1)
)
validate_data_bundle(data_bundle)