5 results on '"Abhinav Rastogi"'
Search Results
2. RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback.
3. MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts.
4. PERL: Parameter Efficient Reinforcement Learning from Human Feedback.
5. Improve Mathematical Reasoning in Language Models by Automated Process Supervision.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.