Search

Your search keyword '"Sap, Maarten"' showing total 226 results

Search Constraints

Start Over You searched for: Author "Sap, Maarten" Remove constraint Author: "Sap, Maarten"
226 results on '"Sap, Maarten"'

Search Results

1. Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications

2. SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

3. BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data

4. Data Defenses Against Large Language Models

5. HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

6. AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

7. User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions

8. On the Resilience of Multi-Agent Systems with Malicious Agents

9. Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance

10. WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

11. HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs

12. PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

13. Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs

14. NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models

15. Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits

16. SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents

17. Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

18. Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate

19. Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

20. Riveter: Measuring Power and Social Dynamics Between Entities

21. Where Do People Tell Stories Online? Story Detection Across Online Communities

22. Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language

23. Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

24. FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

25. SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

26. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

27. NLPositionality: Characterizing Design Biases of Datasets and Models

28. COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

29. From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models

30. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models

31. Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting

32. Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

33. Modeling Empathic Similarity in Personal Narratives

34. BiasX: 'Thinking Slow' in Toxic Content Moderation with Explanations of Implied Social Biases

35. Queer In AI: A Case Study in Community-Led Participatory AI

36. Towards Countering Essentialism through Social Bias Reasoning

37. Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts

38. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

39. Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs

40. When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment

41. Twitter Sentiment Predicts Affordable Care Act Marketplace Enrollment

42. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

43. ProsocialDialog: A Prosocial Backbone for Conversational Agents

44. Aligning to Social Norms and Values in Interactive Narratives

45. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

46. Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow

47. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

48. Can Machines Learn Morality? The Delphi Experiment

49. Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

50. DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

Catalog

Books, media, physical & digital resources