School of Computing and Information Systems

smu, events

RSS XML iCal Kuala Lumpur, Singapore Time
This hCalendar-compliant page is optimized for search engines. View this calendar as published at sis.smu.edu.sg.

Inside Out: Improving Large Model Safety

Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities across diverse applications, yet they remain vulnerable to adversarial attacks through carefully crafted prompts and harmful visual inputs that circumvent safety mechanisms. Despite considerable efforts in reinforcement learning from human feedback (RLHF) and supervised fine-tuning, existing safeguards prove inadequate because these models operate as black-boxes without explanations for their decisions, making security vulnerabilities difficult to identify and eliminate. Addressing these challenges fundamentally requires understanding the inner safety mechanisms of these models to develop targeted mitigation strategies that can effectively defend against attacks. This dissertation presents four interconnected contributions to improve LLM and MLLM security through mechanistic understanding. We propose CASPER, a causality analysis framework operating at token, layer, and neuron levels that reveals how… Subtitle: PhD Dissertation Defense by ZHAO Wei. Contact: scisseminars@smu.edu.sg. Speaker Details: , ZHAO Wei PhD Candidate School of Computing and Information Systems Singapore Management University, Wei ZHAO is a Ph.D. Candidate in Computer Science at Singapore Management University, under the supervision of Professor Jun SUN. His research focuses on improving large model safety through understanding and enhancing the inner mechanisms of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). His PhD research addresses critical security vulnerabilities in these models, spanning… RSVP: . Reserve a seat: https://forms.office.com/Pages/ResponsePage.aspx?id=ynmKyZpakUeiQ_Bq_WdGTejbEKPlArBJhZomj91naG9UMDgyWVA3R1YwVE8yRlU3SExISDRZRzNCNS4u. Type: Seminars & Workshops. Subject: Information Technology & Systems. Audience: Public. Current Student. Academic Community. Tuesday, January 6, 2026, 9:30 AM – 10:30 AM. Meeting room 5.1, Level 5. SMU SCIS 1, Singapore 178902. For more info visit computing.smu.edu.sg.