字典树（Trie）

字典树（Trie），又称前缀树（Prefix Tree）或单词查找树，是一种专门用于处理字符串集合的树形数据结构。它能够高效地进行字符串的插入、查找和前缀匹配操作。

核心概念

什么是字典树？

字典树利用字符串的公共前缀来减少查询时间，最大限度地减少无谓的字符串比较。想象一本字典，所有以 "app" 开头的单词（apple、application、apply...）都在一起，这就是字典树的思想。

结构特点

字典树具有以下特点：

根节点不包含字符：根节点是空节点，作为树的入口
每个节点代表一个字符：除根节点外，每个节点只包含一个字符
路径代表字符串：从根节点到某一节点的路径，连接起来就是该节点对应的字符串
共享公共前缀：具有相同前缀的字符串共享相同的路径前缀
结束标记：需要标记哪些节点是单词的结尾

图示示例

假设我们要存储单词集合：["app", "apple", "apply", "bat", "ball"]

          (root)
         /      \
        a        b
        |        |
        p        a
        |        |
        p*       t*
       / \       \
      l   l       l
      |   |       |
      e*  y*      l
                 |
                 e*

* 标记表示单词结尾
"app" 是 "apple" 和 "apply" 的前缀，它们共享 "app" 路径

节点设计

字典树的节点需要存储：

子节点的引用
是否是单词结尾的标记
（可选）关联的值（用于实现键值映射）

基本节点结构

/**
 * 字典树节点
 */
class TrieNode {
    // 子节点数组（假设只处理小写字母 a-z）
    TrieNode[] children;
    
    // 是否是单词结尾
    boolean isEnd;
    
    public TrieNode() {
        children = new TrieNode[26];
        isEnd = false;
    }
}

哈希表实现的节点

如果字符集较大或不确定，可以使用哈希表代替数组：

/**
 * 使用哈希表的字典树节点
 * 支持任意字符
 */
class TrieNode {
    Map<Character, TrieNode> children;
    boolean isEnd;
    
    public TrieNode() {
        children = new HashMap<>();
        isEnd = false;
    }
}

完整实现

以下是字典树的完整 Java 实现：

/**
 * 字典树（前缀树）
 * 
 * 时间复杂度：
 * - 插入：O(L)，L 为字符串长度
 * - 查找：O(L)
 * - 前缀匹配：O(L)
 * 
 * 空间复杂度：O(N * M)，N 为字符串数量，M 为平均长度
 */
public class Trie {
    private TrieNode root;
    
    public Trie() {
        root = new TrieNode();
    }
    
    /**
     * 插入单词
     * 时间复杂度: O(L)，L 为单词长度
     */
    public void insert(String word) {
        TrieNode node = root;
        
        for (char c : word.toCharArray()) {
            int index = c - 'a';
            
            // 如果子节点不存在，创建新节点
            if (node.children[index] == null) {
                node.children[index] = new TrieNode();
            }
            
            // 移动到子节点
            node = node.children[index];
        }
        
        // 标记单词结尾
        node.isEnd = true;
    }
    
    /**
     * 搜索单词是否存在
     * 时间复杂度: O(L)
     */
    public boolean search(String word) {
        TrieNode node = searchPrefix(word);
        return node != null && node.isEnd;
    }
    
    /**
     * 检查是否存在以 prefix 为前缀的单词
     * 时间复杂度: O(L)
     */
    public boolean startsWith(String prefix) {
        return searchPrefix(prefix) != null;
    }
    
    /**
     * 辅助方法：搜索前缀对应的节点
     */
    private TrieNode searchPrefix(String prefix) {
        TrieNode node = root;
        
        for (char c : prefix.toCharArray()) {
            int index = c - 'a';
            
            if (node.children[index] == null) {
                return null;  // 前缀不存在
            }
            
            node = node.children[index];
        }
        
        return node;
    }
    
    /**
     * 删除单词
     * 注意：只移除不再被其他单词需要的节点
     */
    public boolean delete(String word) {
        return deleteHelper(root, word, 0);
    }
    
    private boolean deleteHelper(TrieNode node, String word, int depth) {
        if (node == null) {
            return false;
        }
        
        // 到达单词末尾
        if (depth == word.length()) {
            if (!node.isEnd) {
                return false;  // 单词不存在
            }
            
            node.isEnd = false;
            
            // 如果没有子节点，可以删除
            return isEmpty(node);
        }
        
        int index = word.charAt(depth) - 'a';
        
        // 递归删除子节点
        if (deleteHelper(node.children[index], word, depth + 1)) {
            node.children[index] = null;
            
            // 如果当前节点不是单词结尾且没有其他子节点，可以删除
            return !node.isEnd && isEmpty(node);
        }
        
        return false;
    }
    
    /**
     * 检查节点是否没有子节点
     */
    private boolean isEmpty(TrieNode node) {
        for (TrieNode child : node.children) {
            if (child != null) {
                return false;
            }
        }
        return true;
    }
}

扩展功能

获取所有以某前缀开头的单词

/**
 * 获取所有以 prefix 为前缀的单词
 */
public List<String> getAllWordsWithPrefix(String prefix) {
    List<String> result = new ArrayList<>();
    
    TrieNode node = searchPrefix(prefix);
    if (node == null) {
        return result;
    }
    
    // 从前缀节点开始 DFS 收集所有单词
    collectWords(node, prefix, result);
    
    return result;
}

private void collectWords(TrieNode node, String prefix, List<String> result) {
    if (node.isEnd) {
        result.add(prefix);
    }
    
    for (int i = 0; i < 26; i++) {
        if (node.children[i] != null) {
            char c = (char) ('a' + i);
            collectWords(node.children[i], prefix + c, result);
        }
    }
}

自动补全功能

/**
 * 自动补全：返回最多 count 个补全建议
 */
public List<String> autoComplete(String prefix, int count) {
    List<String> result = new ArrayList<>();
    
    TrieNode node = searchPrefix(prefix);
    if (node == null) {
        return result;
    }
    
    collectWordsLimited(node, prefix, result, count);
    
    return result;
}

private void collectWordsLimited(TrieNode node, String prefix, List<String> result, int limit) {
    if (result.size() >= limit) {
        return;
    }
    
    if (node.isEnd) {
        result.add(prefix);
    }
    
    for (int i = 0; i < 26; i++) {
        if (node.children[i] != null && result.size() < limit) {
            char c = (char) ('a' + i);
            collectWordsLimited(node.children[i], prefix + c, result, limit);
        }
    }
}

统计单词出现次数

/**
 * 带计数功能的字典树节点
 */
class TrieNodeWithCount {
    TrieNodeWithCount[] children;
    boolean isEnd;
    int count;  // 单词出现次数
    
    public TrieNodeWithCount() {
        children = new TrieNodeWithCount[26];
        isEnd = false;
        count = 0;
    }
}

/**
 * 插入单词并返回当前出现次数
 */
public int insertWithCount(String word) {
    TrieNodeWithCount node = root;
    
    for (char c : word.toCharArray()) {
        int index = c - 'a';
        if (node.children[index] == null) {
            node.children[index] = new TrieNodeWithCount();
        }
        node = node.children[index];
    }
    
    node.isEnd = true;
    node.count++;
    return node.count;
}

复杂度分析

时间复杂度

操作	时间复杂度	说明
插入	$O(L)$	$L$ 为字符串长度
查找	$O(L)$	精确匹配
前缀匹配	$O(L)$	检查前缀是否存在
删除	$O(L)$	最坏情况需要遍历整个路径

与哈希表相比：

哈希表的查找是 $O(1)$ ，但无法高效进行前缀匹配
字典树的查找是 $O(L)$ ，但支持前缀操作

空间复杂度

最坏情况下，字典树的空间复杂度为 $O(N \times M \times K)$ ：

$N$ ：字符串数量
$M$ ：平均字符串长度
$K$ ：字符集大小

空间优化技巧：

使用哈希表代替数组：当字符集很大但实际使用的字符很少时
压缩字典树（Radix Tree）：合并单分支路径
后缀压缩：共享相同的后缀

应用场景

1. 自动补全

搜索引擎、输入法的自动补全功能是字典树最典型的应用。

用户输入: "app"
字典树返回: ["app", "apple", "application", "apply"]

2. 拼写检查

检查单词是否在字典中，或提供相似单词建议：

/**
 * 检查单词是否拼写正确
 */
public boolean isSpellingCorrect(String word) {
    return search(word);
}

/**
 * 获取编辑距离为 1 的所有可能正确单词
 */
public List<String> getSuggestions(String word) {
    Set<String> suggestions = new HashSet<>();
    
    // 删除一个字符
    for (int i = 0; i < word.length(); i++) {
        String modified = word.substring(0, i) + word.substring(i + 1);
        if (search(modified)) {
            suggestions.add(modified);
        }
    }
    
    // 替换一个字符
    for (int i = 0; i < word.length(); i++) {
        for (char c = 'a'; c <= 'z'; c++) {
            String modified = word.substring(0, i) + c + word.substring(i + 1);
            if (search(modified)) {
                suggestions.add(modified);
            }
        }
    }
    
    // 插入一个字符
    for (int i = 0; i <= word.length(); i++) {
        for (char c = 'a'; c <= 'z'; c++) {
            String modified = word.substring(0, i) + c + word.substring(i);
            if (search(modified)) {
                suggestions.add(modified);
            }
        }
    }
    
    return new ArrayList<>(suggestions);
}

3. IP 路由表

路由器使用字典树进行最长前缀匹配，确定数据包的转发路径：

/**
 * 简化的 IP 路由查找
 */
public String findLongestPrefixMatch(String ip) {
    TrieNode node = root;
    String match = null;
    
    StringBuilder current = new StringBuilder();
    for (char c : ip.toCharArray()) {
        int index = c - '0';
        if (node.children[index] == null) {
            break;
        }
        
        current.append(c);
        node = node.children[index];
        
        if (node.isEnd) {
            match = current.toString();
        }
    }
    
    return match;
}

4. 敏感词过滤

检测文本中是否包含敏感词：

/**
 * 敏感词检测
 * 返回文本中第一个敏感词的位置，如果没有返回 -1
 */
public int detectSensitiveWord(String text) {
    for (int i = 0; i < text.length(); i++) {
        TrieNode node = root;
        int j = i;
        
        while (j < text.length()) {
            int index = text.charAt(j) - 'a';
            if (node.children[index] == null) {
                break;
            }
            
            node = node.children[index];
            if (node.isEnd) {
                return i;  // 找到敏感词
            }
            j++;
        }
    }
    
    return -1;
}

5. 单词游戏

如 Boggle 游戏，在网格中找单词：

/**
 * 在二维字符网格中查找所有字典中的单词
 * LeetCode 212
 */
public List<String> findWords(char[][] board, String[] words) {
    // 构建字典树
    for (String word : words) {
        insert(word);
    }
    
    Set<String> result = new HashSet<>();
    int rows = board.length;
    int cols = board[0].length;
    
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            dfs(board, i, j, root, "", result);
        }
    }
    
    return new ArrayList<>(result);
}

private void dfs(char[][] board, int i, int j, TrieNode node, String word, Set<String> result) {
    if (i < 0 || i >= board.length || j < 0 || j >= board[0].length || board[i][j] == '#') {
        return;
    }
    
    char c = board[i][j];
    int index = c - 'a';
    
    if (node.children[index] == null) {
        return;
    }
    
    node = node.children[index];
    word += c;
    
    if (node.isEnd) {
        result.add(word);
    }
    
    // 标记为已访问
    board[i][j] = '#';
    
    // 四个方向搜索
    dfs(board, i + 1, j, node, word, result);
    dfs(board, i - 1, j, node, word, result);
    dfs(board, i, j + 1, node, word, result);
    dfs(board, i, j - 1, node, word, result);
    
    // 恢复
    board[i][j] = c;
}

字典树的变体

1. 后缀树（Suffix Tree）

存储字符串的所有后缀，用于解决复杂的字符串问题：

最长公共子串
最长重复子串
字符串压缩

2. 压缩字典树（Radix Tree / Patricia Tree）

合并单分支路径，节省空间：

普通字典树：          压缩字典树：
    r                    r
    |                    |
    a                    apple
    |                    
    p                    
    |                    
    p                    
    |                    
    l                    
    |                    
    e                    

3. 三叉搜索树（Ternary Search Tree）

每个节点有三个子节点：左、中、右。结合了字典树和二叉搜索树的优点。

与其他数据结构的比较

特性	字典树	哈希表	平衡树
查找	$O(L)$	$O(1)$ 平均	$O(L \log n)$
前缀匹配	$O(L)$	不支持	不支持
有序遍历	支持	不支持	支持
空间效率	较低（共享前缀时较高）	较高	较高

小结

字典树的核心要点：

优势：

快速的前缀匹配
插入和查找的时间与数据量无关，只与字符串长度有关
天然支持有序遍历

劣势：

空间开销较大
对于精确查找，不如哈希表高效

适用场景：

需要前缀匹配功能
字符串集合较大，有大量公共前缀
自动补全、拼写检查、敏感词过滤

练习

基础：

LeetCode 208：实现 Trie（前缀树）
LeetCode 720：词典中最长的单词

进阶： 3. LeetCode 211：添加与搜索单词 - 数据结构设计 4. LeetCode 677：键值映射 5. LeetCode 648：单词替换 6. LeetCode 212：单词搜索 II

挑战： 7. LeetCode 421：数组中两个数的最大异或值（二进制字典树） 8. LeetCode 1032：字符流 9. 实现支持通配符的字典树 10. 实现压缩字典树（Radix Tree）

核心概念​

什么是字典树？​

结构特点​

图示示例​

节点设计​

基本节点结构​

哈希表实现的节点​

完整实现​

扩展功能​

获取所有以某前缀开头的单词​

自动补全功能​

统计单词出现次数​

复杂度分析​

时间复杂度​

空间复杂度​

应用场景​

1. 自动补全​

2. 拼写检查​

3. IP 路由表​

4. 敏感词过滤​

5. 单词游戏​

字典树的变体​

1. 后缀树（Suffix Tree）​

2. 压缩字典树（Radix Tree / Patricia Tree）​

3. 三叉搜索树（Ternary Search Tree）​

与其他数据结构的比较​

小结​

练习​