Title: Exploring Trustworthy Foundation Models: Benchmarking, Finetuning and Reasoning

Abstract: In the current landscape of machine learning, where foundation models must navigate imperfect real-world conditions such as noisy data and unexpected inputs, ensuring their trustworthiness through rigorousbenchmarking, safety-focused finetuning, and robust reasoning is more critical than ever. In this talk, I will focus on three recent research advancements that collectively advance these dimensions, offering a comprehensive approach to building trustworthy foundation models. For benchmarking, I will introduce CounterAnimal, a dataset designed to systematically evaluate CLIP’s vulnerability to realistic spurious correlations, revealing that scaling models or data quality can mitigate these biases, yet scaling data alone does not effectively address them. Transitioning to finetuning, we delve deep into the process of unlearning undesirable model behaviors. We propose a general framework to examine and understand the limitations of current unlearning methods and suggest enhanced revisions for more effective unlearning. Furthermore, addressing reasoning, we investigate the reasoning robustness under noisy rationales by constructing the NoRa dataset and propose contrastive denoising with noisy chain-of-thought, a method that markedly improves denoising-reasoning capabilities by contrasting noisy inputs with minimal clean supervision. 


UBC Crest The official logo of the University of British Columbia. Urgent Message An exclamation mark in a speech bubble. Caret An arrowhead indicating direction. Arrow An arrow indicating direction. Arrow in Circle An arrow indicating direction. Arrow in Circle An arrow indicating direction. Bluesky The logo for the Bluesky social media service. Chats Two speech clouds. Facebook The logo for the Facebook social media service. Information The letter 'i' in a circle. Instagram The logo for the Instagram social media service. External Link An arrow entering a square. Linkedin The logo for the LinkedIn social media service. Location Pin A map location pin. Mail An envelope. Menu Three horizontal lines indicating a menu. Minus A minus sign. Telephone An antique telephone. Plus A plus symbol indicating more or the ability to add. Search A magnifying glass. Twitter The logo for the Twitter social media service. Youtube The logo for the YouTube video sharing service.