Reasons for using transformer encoder neural network predictions
To address the challenge of estimating population totals in large textual datasets within official statistics, where manual annotation is impractical, we propose a method using transformer encoder neural network predictions as the control variate in established survey sampling estimators. The applicability is demonstrated on Swedish police reports, for which approximately 1.5 million is being filed annually. Estimates with sufficiently low variance of the yearly number of hate crimes are derived using three different estimators. We conclude that the proposed method can provide efficient estimates with little time spent on manual annotation, applicable for use in official statistics based on textual data, for which unbiassedness is crucial.
Speaker
Hannes Waldetoft is a Phd student in statistics at Uppsala University, currently working with textual data and machine learning.
Please note. The subtitle was primarily generated by AI in collaboration with human review and may contain some errors.