Title: | Longitudinal Imputation of Categorical Variables via a Joint Transition Model |
---|---|
Description: | Imputation of longitudinal categorical covariates. We use a methodological framework which ensures that the plausibility of transitions is preserved, overfitting and colinearity issues are resolved, and confounders can be utilized. See Mamouris (2023) <doi:10.1002/sim.9919> for an overview. |
Authors: | Pavlos Mamouris [aut, cre], Vahid Nassiri [aut, ctb], Geert Molenberghs [ctb], Geert Verbeke [ctb] |
Maintainer: | Pavlos Mamouris <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2024-11-22 04:06:12 UTC |
Source: | https://github.com/cran/ImputeLongiCovs |
A dataset containing longitudinal data. The outcome of interest is the smoking status with three states (smoker, exsmoker, neversmoker), which are represented via transitions. The difference from the initial data is the prob_matrix column.
data(analyses_data)
data(analyses_data)
A data frame with 2000 rows and 10 variables
patient_id: Unique identifier for each patient
tran_Year: numeric, starting from 1 up to the number of transitions
transition_year: text explanation of the transition
state_from: the state at the beginning of a transition
state_to: the state at the end of a transition
prob_matrix: the probability matrix that was generated by the initial data
cardio_state_from: cardiovascular disease at the beginning of the transition, binary, if 1 == Yes, else No
cardio_state_to: cardiovascular disease at the end of the transition, binary, if 1 == Yes, else No
flu_vaccination_state_from: flu vaccination at the end of the transition, binary, if 1 == Yes, else No
flu_vaccination_state_to: flu vaccination disease at the end of the transition, binary, if 1 == Yes, else No
create_probMatrix creates a variable that contains the transition probabilities ("initial", "forward", "backward", "intermittent", "observed")
create_probMatrix(input_data, patient_id)
create_probMatrix(input_data, patient_id)
input_data |
A dataset in a format similar to 'initial_data'. This dataset must contain the variables "state_from", which is the status at the beginning of the transition (say smoker in 2010), "state_to", which is the status at the end of the transition (say ex-smoker in 2011) and "tran_Year", which is an integer variable that is equal to the number of transitions. "tran_Year" == 1 means that the transition occurs from 2010 to 2011, "tran_Year" == 2, from 2011 to 2012, up to the total number of transitions |
patient_id |
A character variable that specifies the column name with the unique Id of the patient |
a data frame containing the column "prob_matrix"
create_probMatrix(initial_data, patient_id = "patient_id")
create_probMatrix(initial_data, patient_id = "patient_id")
impute_categorical_covariates imputes longitudinal categorical covariates through a joint model that accommodates initial, forward, backward, and intermittent transitions.
impute_categorical_covariates( input_data, patient_id, number_of_transitions, covariates_initial = NULL, covariates_transition = NULL, missing_variable_levels, startingyear = NULL, without_trans_prob, m = 1 )
impute_categorical_covariates( input_data, patient_id, number_of_transitions, covariates_initial = NULL, covariates_transition = NULL, missing_variable_levels, startingyear = NULL, without_trans_prob, m = 1 )
input_data |
A dataset in a format similar to 'analyses_data'. This dataset must contain the variables "state_from", which is the status at the beginning of the transition (say smoker in 2010), "state_to", which is the status at the end of the transition (say ex-smoker in 2011) and "tran_Year", which is an integer variable that is equal to the number of transitions. "tran_Year" == 1 means that the transition occurs from 2010 to 2011, "tran_Year" == 2, from 2011 to 2012, up to the total number of transitions Also, it must contain "prob_matrix" which captures all the transitions ("initial", "forward", "backward", "intermittent", "observed") that was calculated with the 'create_probMatrix' function |
patient_id |
A character variable that specifies the column name with the unique Id of the patient |
number_of_transitions |
The number of transitions needed. For example for years 2010, 2011 and 2012 there exist 2 transitions. |
covariates_initial |
The covariates to be used in the initial model |
covariates_transition |
The covariates to be used in the transition model |
missing_variable_levels |
The levels of the missing categorical outcome (e.g. "smoker", "ex-smoker", "never-smoker") |
startingyear |
If the starting year per patient has no missing values, specify it |
without_trans_prob |
This statement is useful when there are very high proportions of missing data and our initial and transition model cannot converge. It provides the user with two options. One, to "notImpute", namely to return NA and two, to "ImputeEqualProbabilities", i.e., the user can sample with equal probabilities. |
m |
Numeric, the number of imputed datasets |
It encloses three different functions. The 'initial_forward_function' imputes the longitudinal categorical covariate of interest based on whether in that transition the 'prob_matrix' of a patient was 'initial' or 'forward'. The 'imputeIntermittent' imputes the longitudinal categorical covariate for the intermittent transition and the 'backward_function' imputes the longitudinal categorical covariate for the backward transition.
a list of m data frames with no missing values in the categorical outcome
()
impute_categorical_covariates(analyses_data, patient_id = "patient_id", number_of_transitions = 2, covariates_initial = c("cardio_state_from", "flu_vaccination_state_from"), covariates_transition = c("cardio_state_to", "flu_vaccination_state_to"), missing_variable_levels = c("never-smoker", "smoker", "ex-smoker"), startingyear = NULL, without_trans_prob = "notImpute", m = 2)
impute_categorical_covariates(analyses_data, patient_id = "patient_id", number_of_transitions = 2, covariates_initial = c("cardio_state_from", "flu_vaccination_state_from"), covariates_transition = c("cardio_state_to", "flu_vaccination_state_to"), missing_variable_levels = c("never-smoker", "smoker", "ex-smoker"), startingyear = NULL, without_trans_prob = "notImpute", m = 2)
A dataset containing longitudinal data. The outcome of interest is the smoking status with three states (smoker, exsmoker, neversmoker), which are represented via transitions.
data(initial_data)
data(initial_data)
A data frame with 2000 rows and 9 variables
patient_id: Unique identifier for each patient
tran_Year: numeric, starting from 1 up to the number of transitions
transition_year: text explanation of the transition
state_from: the state at the beginning of a transition
state_to: the state at the end of a transition
cardio_state_from: cardiovascular disease at the beginning of the transition, binary, if 1 == Yes, else No
cardio_state_to: cardiovascular disease at the end of the transition, binary, if 1 == Yes, else No
flu_vaccination_state_from: flu vaccination at the end of the transition, binary, if 1 == Yes, else No
flu_vaccination_state_to: flu vaccination disease at the end of the transition, binary, if 1 == Yes, else No