Author: "Rickard, Ceara" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Rickard, Ceara"' showing total 3 results

Start Over Author "Rickard, Ceara"

3 results on '"Rickard, Ceara"'

1. Large Language Models and Their Ability for Health Disinformation Generation: Assessing the Adequacy of Current Safeguards and Risk Mitigation Measures

Author: Menz, Bradley, primary, Kuderer, Nicole, additional, Bacchi, Stephen, additional, Modi, Natansh, additional, Chin-Yee, Benjamin, additional, Hu, Tiancheng, additional, Rickard, Ceara, additional, Haseloff, Mark, additional, Vitry, Agnes, additional, McKinnon, Ross, additional, Kichenadasse, Ganessan, additional, Rowland, Andrew, additional, Sorich, Michael, additional, and Hopkins, Ashley, additional
Published: 2023
Full Text: View/download PDF

2. Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis

Author: Menz, Bradley D, Kuderer, Nicole M, Bacchi, Stephen, Modi, Natansh D, Chin-Yee, Benjamin, Hu, Tiancheng, Rickard, Ceara, Haseloff, Mark, Vitry, Agnes, McKinnon, Ross A, Kichenadasse, Ganessan, Rowland, Andrew, Sorich, Michael J, and Hopkins, Ashley M
Abstract: ObjectivesTo evaluate the effectiveness of safeguards to prevent large language models (LLMs) from being misused to generate health disinformation, and to evaluate the transparency of artificial intelligence (AI) developers regarding their risk mitigation processes against observed vulnerabilities.DesignRepeated cross sectional analysis.SettingPublicly accessible LLMs.MethodsIn a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI’s GPT-4 (via ChatGPT and Microsoft’s Copilot), Google’s PaLM 2 and newly released Gemini Pro (via Bard), Anthropic’s Claude 2 (via Poe), and Meta’s Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. For LLMs with observed safeguarding vulnerabilities, the processes for reporting outputs of concern were audited. 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards.Main outcome measuresThe main outcome measures were whether safeguards prevented the generation of health disinformation, and the transparency of risk mitigation processes against health disinformation.ResultsClaude 2 (via Poe) declined 130 prompts submitted across the two study timepoints requesting the generation of content claiming that sunscreen causes skin cancer or that the alkaline diet is a cure for cancer, even with jailbreaking attempts. GPT-4 (via Copilot) initially refused to generate health disinformation, even with jailbreaking attempts—although this was not the case at 12 weeks. In contrast, GPT-4 (via ChatGPT), PaLM 2/Gemini Pro (via Bard), and Llama 2 (via HuggingChat) consistently generated health disinformation blogs. In September 2023 evaluations, these LLMs facilitated the generation of 113 unique cancer disinformation blogs, totalling more than 40 000 words, without requiring jailbreaking attempts. The refusal rate across the evaluation timepoints for these LLMs was only 5% (7 of 150), and as prompted the LLM generated blogs incorporated attention grabbing titles, authentic looking (fake or fictional) references, fabricated testimonials from patients and clinicians, and they targeted diverse demographic groups. Although each LLM evaluated had mechanisms to report observed outputs of concern, the developers did not respond when observations of vulnerabilities were reported.ConclusionsThis study found that although effective safeguards are feasible to prevent LLMs from being misused to generate health disinformation, they were inconsistently implemented. Furthermore, effective processes for reporting safeguard problems were lacking. Enhanced regulation, transparency, and routine auditing are required to help prevent LLMs from contributing to the mass generation of health disinformation.
Published: 2024
Full Text: View/download PDF

3. Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.

Author: Menz BD, Kuderer NM, Bacchi S, Modi ND, Chin-Yee B, Hu T, Rickard C, Haseloff M, Vitry A, McKinnon RA, Kichenadasse G, Rowland A, Sorich MJ, and Hopkins AM
Subjects: Humans, Animals, Disinformation, Artificial Intelligence, Cross-Sectional Studies, Sunscreening Agents, Language, Camelids, New World, Skin Neoplasms
Abstract: Objectives: To evaluate the effectiveness of safeguards to prevent large language models (LLMs) from being misused to generate health disinformation, and to evaluate the transparency of artificial intelligence (AI) developers regarding their risk mitigation processes against observed vulnerabilities., Design: Repeated cross sectional analysis., Setting: Publicly accessible LLMs., Methods: In a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI's GPT-4 (via ChatGPT and Microsoft's Copilot), Google's PaLM 2 and newly released Gemini Pro (via Bard), Anthropic's Claude 2 (via Poe), and Meta's Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. For LLMs with observed safeguarding vulnerabilities, the processes for reporting outputs of concern were audited. 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards., Main Outcome Measures: The main outcome measures were whether safeguards prevented the generation of health disinformation, and the transparency of risk mitigation processes against health disinformation., Results: Claude 2 (via Poe) declined 130 prompts submitted across the two study timepoints requesting the generation of content claiming that sunscreen causes skin cancer or that the alkaline diet is a cure for cancer, even with jailbreaking attempts. GPT-4 (via Copilot) initially refused to generate health disinformation, even with jailbreaking attempts-although this was not the case at 12 weeks. In contrast, GPT-4 (via ChatGPT), PaLM 2/Gemini Pro (via Bard), and Llama 2 (via HuggingChat) consistently generated health disinformation blogs. In September 2023 evaluations, these LLMs facilitated the generation of 113 unique cancer disinformation blogs, totalling more than 40 000 words, without requiring jailbreaking attempts. The refusal rate across the evaluation timepoints for these LLMs was only 5% (7 of 150), and as prompted the LLM generated blogs incorporated attention grabbing titles, authentic looking (fake or fictional) references, fabricated testimonials from patients and clinicians, and they targeted diverse demographic groups. Although each LLM evaluated had mechanisms to report observed outputs of concern, the developers did not respond when observations of vulnerabilities were reported., Conclusions: This study found that although effective safeguards are feasible to prevent LLMs from being misused to generate health disinformation, they were inconsistently implemented. Furthermore, effective processes for reporting safeguard problems were lacking. Enhanced regulation, transparency, and routine auditing are required to help prevent LLMs from contributing to the mass generation of health disinformation., Competing Interests: Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: AMH holds an emerging leader investigator fellowship from the National Health and Medical Research Council (NHMRC), Australia; NDM is supported by a NHMRC postgraduate scholarship, Australia; MJS is supported by a Beat Cancer research fellowship from the Cancer Council South Australia; BDM’s PhD scholarship is supported by The Beat Cancer Project, Cancer Council South Australia, and the NHMRC, Australia; no support from any other organisation for the submitted work; AR and MJS are recipients of investigator initiated funding for research outside the scope of the current study from AstraZeneca, Boehringer Ingelheim, Pfizer, and Takeda; and AR is a recipient of speaker fees from Boehringer Ingelheim and Genentech outside the scope of the current study. There are no financial relationships with any other organisations that might have an interest in the submitted work in the previous three years to declare; no other relationships or activities that could appear to have influenced the submitted work., (© Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.)
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results on '"Rickard, Ceara"'

1. Large Language Models and Their Ability for Health Disinformation Generation: Assessing the Adequacy of Current Safeguards and Risk Mitigation Measures

2. Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis

3. Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

3 results on '"Rickard, Ceara"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources