A Comparative Study of Generative Language Models and Bias Evaluations

Authors

Keywords:

Natural language processing

Abstract

The goal of this research is to compare bias metrics and compare generative language models (GenAI) models for bias. We consider a wide range of bias metrics and show that a negative regard score plays the most prominent role in determining whether a GenAI output is biased, unbiased, safe, or unsafe. Bias metrics relying heavily on pre-defined word lists, such as the HELM bias and honest scores, perform poorly. Among the GenAI models that we experimented with in this research, ‘GPT-4-1106-preview’ is considered the safest by Llama-safeguard. The ability to quantify bias in GenAI depends on the data used to pre-train an evaluation model or the extent of the words in pre-defined lists. Future work must consider developing bias metrics independent of pre-defined lists and pre-trained models.

DOI: https://doi.org/10.24135/ICONIP26

Downloads

Published

2025-03-18