BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//https://techplay.jp//JP
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALDESC:[AI Security and Privacy Team Seminar] Talk by Eric Wong
X-WR-CALNAME:[AI Security and Privacy Team Seminar] Talk by Eric Wong
X-WR-TIMEZONE:Asia/Tokyo
BEGIN:VTIMEZONE
TZID:Asia/Tokyo
BEGIN:STANDARD
DTSTART:19700101T000000
TZOFFSETFROM:+0900
TZOFFSETTO:+0900
TZNAME:JST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:996245@techplay.jp
SUMMARY:[AI Security and Privacy Team Seminar] Talk by Eric Wong
DTSTART;TZID=Asia/Tokyo:20260630T160000
DTEND;TZID=Asia/Tokyo:20260630T170000
DTSTAMP:20260616T225030Z
CREATED:20260526T060147Z
DESCRIPTION:イベント詳細はこちら\nhttps://techplay.jp/event/99624
 5?utm_medium=referral&utm_source=ics&utm_campaign=ics\n\nJailbreakBench
 や敵対的サンプルに対する証明可能防御などのAIセキ
 ュリティ分野で著名な業績を多数挙げられたUniversity of
  PennsylvaniaのProf. Eric Wongに東京科学大  大岡山キャンパ
 スでご講演いただくことになりました。オンサイトで
 のご聴講もぜひご検討ください。\n\n講演タイトル:: Und
 erstanding Safety & Alignment with Mechanistic Theory\n講演者：Eric W
 ong (University of Pennsylvania)\n日時:　6/30(火) 16:00-17:00  \n会
 場:　東京科学大  大岡山キャンパス 西8号館E 10F 系会
 議室 (1004)およびオンライン（Zoom）\nZoomリンク:URLは登
 録者のみに表示されます。\n\n概要: Why are LLM guardrails fu
 ndamentally so easily broken\, and how can we enforce them? This talk for
 malizes a mechanistic theory for studying safety problems. We begin with 
 one-layer transformers\, identifying rule-breaking as an inherent archite
 ctural vulnerability in the model's attention mechanism. This mechanistic
  theory framework (LogicBreaks) taught us a critical lesson: if attention
  is the key to breaking rules\, it may also be the key to enforcing them.
  Building upon this insight\, we expand the mechanistic theory to analyze
  attention-based interventions\, arriving at InstaBoost: an incredibly si
 mple yet highly effective steering method that boosts the model's attenti
 on on user-provided instructions during generation. This technique\, deve
 loped from analysis on one-layer transformers\, provides state-of-the-art
  control over large-scale LLMs with just five lines of code. \n\nプロ
 フィール: Eric Wong is an assistant professor at the Department of Co
 mputer and Information Science at the University of Pennsylvania. He lead
 s Brachio Lab on debugging machine learning and making systems actually d
 o what we want them to do. He is also a part of the ASSET Center on safe\
 , explainable\, and trustworthy AI systems. Previously\, He completed PhD
  at CMU advised by Zico Kolter\, and did a postdoc with Aleksander Madry.
LOCATION:オンライン
URL:https://techplay.jp/event/996245?utm_medium=referral&utm_source=ics&utm
 _campaign=ics
END:VEVENT
END:VCALENDAR