The effects of weighting in the regression analysis of survey data collected using non-probabilistic sampling methods: A secondary data analysis
Abstract
Introduction
When surveys are conducted especially for hidden populations, data is rarely collected using random sampling which is the ideal way to collect representative data. However, it is common practice to analyse this data as if it was collected through random sampling
ignoring the sampling design. We sought to determine the effects of including weights in the analysis of survey data collected through non-probabilistic sampling methods.
Broad objective
To assess the effects of weighting on risk taking behaviours associated with STIs
among female sex workers (FSW) and long distance truck drivers (LDTD) in Beitbridge using weighted and unweighted logistic regression models.
Methods
Both inverse probability weighted and unweighted forward selection multivariate logistic modelling techniques were used to determine significant risk taking behaviours
associated with STIs in FSW and LDTD. Final models compared magnitude of the difference between odds ratios, the selection of final variables, standard errors,statistical significance of selected variables and the overall fit of the models to determine whether or not we believed weighted models were more appropriate for the analysis of the survey data for FSW and LDTD.
Results
For risk taking behaviours associated with STIs, inclusion of weights resulted in an
increase in the odds ratios, a decrease in the standard errors and narrowing of theconfidence intervals for the parameters in the weighted model for FSW. In the
weighted model for LDTD, the odds ratios were higher than in the unweighted model
and the confidence intervals were slightly narrow. However, the standard errors were
higher in the weighted models.
Conclusion
Based on the results, we concluded that weighting in the regression analysis of survey data collected using non probabilistic sampling methods helps to improve the precision of the regression estimates; hence weighted models should be used.