Package 'ImputeLongiCovs'

Title: Longitudinal Imputation of Categorical Variables via a Joint Transition Model
Description: Imputation of longitudinal categorical covariates. We use a methodological framework which ensures that the plausibility of transitions is preserved, overfitting and colinearity issues are resolved, and confounders can be utilized. See Mamouris (2023) <doi:10.1002/sim.9919> for an overview.
Authors: Pavlos Mamouris [aut, cre], Vahid Nassiri [aut, ctb], Geert Molenberghs [ctb], Geert Verbeke [ctb]
Maintainer: Pavlos Mamouris <[email protected]>
License: GPL-2
Version: 0.1.0
Built: 2024-11-22 04:06:12 UTC
Source: https://github.com/cran/ImputeLongiCovs

Help Index


Analyses Data for imputing categorical covariates

Description

A dataset containing longitudinal data. The outcome of interest is the smoking status with three states (smoker, exsmoker, neversmoker), which are represented via transitions. The difference from the initial data is the prob_matrix column.

Usage

data(analyses_data)

Format

A data frame with 2000 rows and 10 variables

Details

  • patient_id: Unique identifier for each patient

  • tran_Year: numeric, starting from 1 up to the number of transitions

  • transition_year: text explanation of the transition

  • state_from: the state at the beginning of a transition

  • state_to: the state at the end of a transition

  • prob_matrix: the probability matrix that was generated by the initial data

  • cardio_state_from: cardiovascular disease at the beginning of the transition, binary, if 1 == Yes, else No

  • cardio_state_to: cardiovascular disease at the end of the transition, binary, if 1 == Yes, else No

  • flu_vaccination_state_from: flu vaccination at the end of the transition, binary, if 1 == Yes, else No

  • flu_vaccination_state_to: flu vaccination disease at the end of the transition, binary, if 1 == Yes, else No


create_probMatrix

Description

create_probMatrix creates a variable that contains the transition probabilities ("initial", "forward", "backward", "intermittent", "observed")

Usage

create_probMatrix(input_data, patient_id)

Arguments

input_data

A dataset in a format similar to 'initial_data'. This dataset must contain the variables "state_from", which is the status at the beginning of the transition (say smoker in 2010), "state_to", which is the status at the end of the transition (say ex-smoker in 2011) and "tran_Year", which is an integer variable that is equal to the number of transitions. "tran_Year" == 1 means that the transition occurs from 2010 to 2011, "tran_Year" == 2, from 2011 to 2012, up to the total number of transitions

patient_id

A character variable that specifies the column name with the unique Id of the patient

Value

a data frame containing the column "prob_matrix"

Examples

create_probMatrix(initial_data, patient_id = "patient_id")

impute_categorical_covariates

Description

impute_categorical_covariates imputes longitudinal categorical covariates through a joint model that accommodates initial, forward, backward, and intermittent transitions.

Usage

impute_categorical_covariates(
  input_data,
  patient_id,
  number_of_transitions,
  covariates_initial = NULL,
  covariates_transition = NULL,
  missing_variable_levels,
  startingyear = NULL,
  without_trans_prob,
  m = 1
)

Arguments

input_data

A dataset in a format similar to 'analyses_data'. This dataset must contain the variables "state_from", which is the status at the beginning of the transition (say smoker in 2010), "state_to", which is the status at the end of the transition (say ex-smoker in 2011) and "tran_Year", which is an integer variable that is equal to the number of transitions. "tran_Year" == 1 means that the transition occurs from 2010 to 2011, "tran_Year" == 2, from 2011 to 2012, up to the total number of transitions Also, it must contain "prob_matrix" which captures all the transitions ("initial", "forward", "backward", "intermittent", "observed") that was calculated with the 'create_probMatrix' function

patient_id

A character variable that specifies the column name with the unique Id of the patient

number_of_transitions

The number of transitions needed. For example for years 2010, 2011 and 2012 there exist 2 transitions.

covariates_initial

The covariates to be used in the initial model

covariates_transition

The covariates to be used in the transition model

missing_variable_levels

The levels of the missing categorical outcome (e.g. "smoker", "ex-smoker", "never-smoker")

startingyear

If the starting year per patient has no missing values, specify it

without_trans_prob

This statement is useful when there are very high proportions of missing data and our initial and transition model cannot converge. It provides the user with two options. One, to "notImpute", namely to return NA and two, to "ImputeEqualProbabilities", i.e., the user can sample with equal probabilities.

m

Numeric, the number of imputed datasets

Details

It encloses three different functions. The 'initial_forward_function' imputes the longitudinal categorical covariate of interest based on whether in that transition the 'prob_matrix' of a patient was 'initial' or 'forward'. The 'imputeIntermittent' imputes the longitudinal categorical covariate for the intermittent transition and the 'backward_function' imputes the longitudinal categorical covariate for the backward transition.

Value

a list of m data frames with no missing values in the categorical outcome

References

()

Examples

impute_categorical_covariates(analyses_data,
patient_id = "patient_id",
number_of_transitions = 2,
covariates_initial = c("cardio_state_from", "flu_vaccination_state_from"),
covariates_transition = c("cardio_state_to", "flu_vaccination_state_to"),
missing_variable_levels = c("never-smoker", "smoker", "ex-smoker"),
startingyear = NULL,
without_trans_prob = "notImpute",
m = 2)

Initial Data for imputing categorical covariates

Description

A dataset containing longitudinal data. The outcome of interest is the smoking status with three states (smoker, exsmoker, neversmoker), which are represented via transitions.

Usage

data(initial_data)

Format

A data frame with 2000 rows and 9 variables

Details

  • patient_id: Unique identifier for each patient

  • tran_Year: numeric, starting from 1 up to the number of transitions

  • transition_year: text explanation of the transition

  • state_from: the state at the beginning of a transition

  • state_to: the state at the end of a transition

  • cardio_state_from: cardiovascular disease at the beginning of the transition, binary, if 1 == Yes, else No

  • cardio_state_to: cardiovascular disease at the end of the transition, binary, if 1 == Yes, else No

  • flu_vaccination_state_from: flu vaccination at the end of the transition, binary, if 1 == Yes, else No

  • flu_vaccination_state_to: flu vaccination disease at the end of the transition, binary, if 1 == Yes, else No