Building a Next-Gen Customer Service Agent using Langgraph: LLMs, ReAct and Autonomous Problem…Implementation can be found here.Oct 9, 2024Oct 9, 2024
Finetuning IBM Granite models using RLHF in Watsonx.aiProximal Policy Optimization in RLHF implementation from scratch using HuggingFace TRL.Sep 28, 2024Sep 28, 2024
Finetuning IBM Granite models using RLHF in Watsonx.ai — Part 1 : Reward ModelingWhat you’ll learn : 1. RLHF Overview How to implement Reward Modeling 2. How to fine tune open source models using RLHF on custom datasetsSep 27, 2024Sep 27, 2024