Back to Search Start Over

From Guard Rails to Epic Fails: Can Generative AI Police Its Own Capacity for Offence?

Authors :
Tony Veale
Source :
MediAzioni, Vol 43, Pp A177-A194 (2024)
Publication Year :
2024
Publisher :
University of Bologna, 2024.

Abstract

Social media platforms have become the outlets of choice for many provocateurs in the digital-age. Not only do they afford egregious behaviours from their human users, this misbehaviour can serve to magnify, and even weaponize, the least desirable outputs of the generative AI systems (often called “bots”) that also operate upon them. In this paper we consider the responsibilities that AI system builders bear for the offences caused by their online creations, and explore what can they do to prevent, or mitigate, the worst excesses, whether explicit or implicit. As the term implies, explicit offence is overt and relatively easy to detect and root out, either in the final edit (in what we call “outer regulation”) or from the generative space itself (in what we call “inner regulation”). Conversely, implicit offence is subtle, mischievous and emergent, and is often crafted to bypass a censor’s built-in guardrails and filters. In line with recent developments in the technology of Large Language Models (LLMs), we argue that generative systems must approach the mitigation of offence as a dialogue, both with their own internal monitors and with their users. Here we will explore, in worked examples from simple generators, whether LLMs are sufficient to provide AI systems with the moral imagination they need to understand the implicit offences that emerge from the superficially innocent uses of words.

Details

Language :
English, Spanish; Castilian, French, Italian, Russian
ISSN :
19744382
Volume :
43
Database :
Directory of Open Access Journals
Journal :
MediAzioni
Publication Type :
Academic Journal
Accession number :
edsdoj.97b32cddce4a1d8a3d59c2f268f39c
Document Type :
article
Full Text :
https://doi.org/10.6092/issn.1974-4382/20544