A Comparative Study of Generative Language Models and Bias Evaluations
Keywords:
Natural language processingAbstract
The goal of this research is to compare bias metrics and compare generative language models (GenAI) models for bias. We consider a wide range of bias metrics and show that a negative regard score plays the most prominent role in determining whether a GenAI output is biased, unbiased, safe, or unsafe. Bias metrics relying heavily on pre-defined word lists, such as the HELM bias and honest scores, perform poorly. Among the GenAI models that we experimented with in this research, ‘GPT-4-1106-preview’ is considered the safest by Llama-safeguard. The ability to quantify bias in GenAI depends on the data used to pre-train an evaluation model or the extent of the words in pre-defined lists. Future work must consider developing bias metrics independent of pre-defined lists and pre-trained models.
Downloads
Published
Issue
Track Selection
License
Copyright (c) 2025 The Authors(s)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.