The effects of weighting in the regression analysis of survey data collected using non-probabilistic sampling methods: A secondary data analysis
MetadataShow full item record
Introduction When surveys are conducted especially for hidden populations, data is rarely collected using random sampling which is the ideal way to collect representative data. However, it is common practice to analyse this data as if it was collected through random sampling ignoring the sampling design. We sought to determine the effects of including weights in the analysis of survey data collected through non-probabilistic sampling methods. Broad objective To assess the effects of weighting on risk taking behaviours associated with STIs among female sex workers (FSW) and long distance truck drivers (LDTD) in Beitbridge using weighted and unweighted logistic regression models. Methods Both inverse probability weighted and unweighted forward selection multivariate logistic modelling techniques were used to determine significant risk taking behaviours associated with STIs in FSW and LDTD. Final models compared magnitude of the difference between odds ratios, the selection of final variables, standard errors,statistical significance of selected variables and the overall fit of the models to determine whether or not we believed weighted models were more appropriate for the analysis of the survey data for FSW and LDTD. Results For risk taking behaviours associated with STIs, inclusion of weights resulted in an increase in the odds ratios, a decrease in the standard errors and narrowing of theconfidence intervals for the parameters in the weighted model for FSW. In the weighted model for LDTD, the odds ratios were higher than in the unweighted model and the confidence intervals were slightly narrow. However, the standard errors were higher in the weighted models. Conclusion Based on the results, we concluded that weighting in the regression analysis of survey data collected using non probabilistic sampling methods helps to improve the precision of the regression estimates; hence weighted models should be used.