NAACL 2025 Tutorial
Saturday, May 3rd, 9-12:30am MDT
Pecos Room
What do 📰 journalism, 🎵 music lyric composition, ⚖️ legal writing, 💭 psychological counseling and 🍽️ menu design all have in common? That's right! They are all human-centered tasks with unclear and subjective rewards. Although language models struggle to perform these tasks, humans are able to interpret and manage this subjectivity well. How can language models learn to do this? Come see how we might make progress in our tutorial.
The use of large language models (LLMs) in human-centered creative domains — such as journalism, scientific writing, and storytelling — has showcased their potential for content generation but highlighted a critical gap: planning. Planning, a fundamental process in many creative domains, refers to higher level decisions writers (or agents) make that influence textual output they produce. Planning is especially hard to perform in creative domains, where human rewards are often unclear or sparsely observed. This tutorial explores how planning has been learned and deployed in creative workflows. We will cover three aspects of creativity: Problem-Finding (how to define rewards and goals for creative tasks), Path-Finding (how to generate novel creative outputs that meet goals) and Evaluation (how to judge). We will also consider three learning settings: Full Data Regimens (when observational data for decisions and resulting text exist), Partial (when text exists but decisions can be inferred) and Low (when neither exist). The tutorial will end with practical demonstrations in computational journalism, web agents, and other creative domains. By bridging theoretical concepts and practical demonstrations, this tutorial aims to inspire new research directions in leveraging LLMs for creative planning tasks.
"The formulation of a problem is often more essential than its solution, which may be merely a matter of mathematical or experimental skill." — Albert Einstein, The Evolution of Physics
In this section, we explore how creative actors define tasks, goal-states, and rewards. We map problem-finding in NLP to learning complex rewards, examining approaches that mix multiple rewards, frame language modeling as inverse-reinforcement learning (IRL), and explore emulation learning settings. We'll look at how humans (and chimpanzees) learn rewards through observing motivations and end-state outputs, and how this applies to understanding human creative processes.
We'll examine how researchers have studied art students and found that those who focused most on defining the problem produced more creative work. In NLP, this translates to learning complex rewards through various approaches: mixing multiple rewards, framing language modeling as inverse-reinforcement learning, and exploring emulation learning settings. We'll discuss how scientists can "read through the lines" to understand actions taken in research papers, even when not explicitly mentioned, and how this ability is crucial for reproducibility in our field.
"Creativity involves breaking out of established patterns to look at things in a different way." — Edward de Bono, Serious Creativity
This section covers how humans develop alternative methods for solving problems through forward and backward approaches. Forward approaches involve direct training or prompting of models to generate sequences of actions, while backward approaches infer sequences of actions from available state information. We'll explore classical methods like means-ends analysis, backtracking, and regression planning, and how they're being incorporated into modern NLP systems.
We'll explore both forward and backward approaches to planning. Forward approaches include prompt-engineering and in-context learning, while researchers with more data can explicitly train planning agents. Backward approaches leverage state information to infer action sequences, drawing from classical methods like means-ends analysis and regression planning. We'll discuss how these methods are being integrated into modern NLP systems through latent variable modeling and variational inference, showing how they should be combined to infer and utilize latent plans in creative tasks.
"The unexamined life is not worth living." — Plato, Apology of Socrates
Since creative tasks often lack objective metrics for success, this section focuses on evaluation methods based on human preference. We'll examine both offline evaluation (comparing plans to what human plans would have been) and online evaluation (HCI-focused methodologies). We'll explore novel metrics like latent criticism and conditional perplexity, moving beyond traditional surface-level metrics like BLEU or ROUGE scores.
We'll discuss two main modes of evaluation: offline and online. Offline evaluation focuses on comparing our plans to what human plans would have been, using novel metrics like latent criticism and conditional perplexity. These metrics move beyond surface-level comparisons to examine structural aspects of the output. Online evaluation follows HCI methodologies, studying human preferences for recommendations, suggestions, edits, and other AI assistance. We'll explore how these evaluation methods help us understand and improve creative planning systems without relying solely on traditional metrics.