Search

Your search keyword '"Balesni, Mikita"' showing total 10 results

Search Constraints

Start Over You searched for: Author "Balesni, Mikita" Remove constraint Author: "Balesni, Mikita"
10 results on '"Balesni, Mikita"'

Search Results

1. Frontier Models are Capable of In-context Scheming

2. The Two-Hop Curse: LLMs trained on A$\rightarrow$B, B$\rightarrow$C fail to learn A$\rightarrow$C

3. Towards evaluations-based safety cases for AI scheming

4. Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack

5. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

6. Large Language Models can Strategically Deceive their Users when Put Under Pressure

7. The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'

8. Taken out of context: On measuring situational awareness in LLMs

9. Controlling Steering with Energy-Based Models

10. A Causal Framework for AI Regulation and Auditing

Catalog

Books, media, physical & digital resources