Search

Your search keyword '"Elhage, Nelson"' showing total 13 results

Search Constraints

Start Over You searched for: Author "Elhage, Nelson" Remove constraint Author: "Elhage, Nelson" Database OpenAIRE Remove constraint Database: OpenAIRE
13 results on '"Elhage, Nelson"'

Search Results

1. The Capacity for Moral Self-Correction in Large Language Models

2. Constitutional AI: Harmlessness from AI Feedback

3. Measuring Progress on Scalable Oversight for Large Language Models

4. In-context Learning and Induction Heads

5. Toy Models of Superposition

6. Language Models (Mostly) Know What They Know

7. Predictability and Surprise in Large Generative Models

8. Scaling Laws and Interpretability of Learning from Repeated Data

9. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

10. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

11. Discovering Language Model Behaviors with Model-Written Evaluations

12. A General Language Assistant as a Laboratory for Alignment

13. Security impact ratings considered harmful

Catalog

Books, media, physical & digital resources