Anantvir SinghBuilding a Next-Gen Customer Service Agent using Langgraph: LLMs, ReAct and Autonomous Problem…Implementation can be found here.Oct 9Oct 9
Anantvir SinghFinetuning IBM Granite models using RLHF in Watsonx.aiProximal Policy Optimization in RLHF implementation from scratch using HuggingFace TRL.Sep 28Sep 28
Anantvir SinghFinetuning IBM Granite models using RLHF in Watsonx.ai — Part 1 : Reward ModelingWhat you’ll learn : 1. RLHF Overview How to implement Reward Modeling 2. How to fine tune open source models using RLHF on custom datasetsSep 27Sep 27