[Sds-seminars] [Sds-announce] S&DS Seminar, Weijie Su, 4/21/25, 4pm-5pm, KT, Rm 1327, "Do Large Language Models Need Statistical Foundations?"

Torres, Elizavette elizavette.torres at yale.edu
Tue Apr 15 10:55:03 EDT 2025


[Department of Statistics and Data Science]<https://statistics.yale.edu/>   Department of Statistics and Data Science <https://statistics.yale.edu/>

Weijie Su, Wharton University of Pennsylvania
[cid:image002.jpg at 01DBADF3.7CA79FE0]
Date: Monday, April 21, 2025
Time: 4:00PM to 5:00PM
Location: Kline Tower, 13th Floor, Rm. 1327 See map<http://maps.google.com/?q=219+Prospect+Street%2C+New+Haven%2C+CT%2C+06511%2C+us>
219 Prospect Street
New Haven, CT 06511
Webcast Option: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=3d4aebbb-a863-47d3-bf1b-b233012bcec0

Title: Do Large Language Models Need Statistical Foundations?

Information and Abstract:
In this talk, we advocate for the development of rigorous statistical foundations for large language models (LLMs). We begin by elaborating two key features that motivate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the complexity and black box nature of Transformer architectures. To illustrate how statistical insights can directly benefit LLM development and applications, we present two concrete examples. First, we demonstrate statistical inconsistencies and biases arising from the current approach to aligning LLMs with human preference. We propose a regularization term for aligning LLMs that is both necessary and sufficient to ensure consistent alignment. Second, we introduce a novel statistical framework to analyze the efficiency of watermarking schemes, with a focus on a watermarking scheme developed by OpenAI for which we derive optimal detection rules that outperform existing ones. Collectively, these findings showcase how statistical insights can address pressing challenges in LLMs while simultaneously illuminating new research avenues for the broader statistical community to advance responsible generative AI research. This talk is based on arXiv:2405.16455, 2404.01245, and 2503.10990.

3:30pm  - Pre-talk meet and greet teatime - 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.

For more details and upcoming events visit our website at https://statistics.yale.edu/calendar.

Department of Statistics and Data Science
Yale University
Kline Tower
219 Prospect Street
New Haven, CT 06511
https://statistics.yale.edu/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.yale.edu/pipermail/sds-seminars/attachments/20250415/7f55f573/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2925 bytes
Desc: image001.jpg
URL: <http://mailman.yale.edu/pipermail/sds-seminars/attachments/20250415/7f55f573/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 29311 bytes
Desc: image002.jpg
URL: <http://mailman.yale.edu/pipermail/sds-seminars/attachments/20250415/7f55f573/attachment-0001.jpg>
-------------- next part --------------
-- 
Sds-announce mailing list
Sds-announce at mailman.yale.edu
https://mailman.yale.edu/mailman/listinfo/sds-announce


More information about the Sds-seminars mailing list