哈希表

哈希表（Hash Table）是一种基于键值对的高效数据结构，通过哈希函数将键映射到数组索引，实现 O(1) 平均时间复杂度的查找、插入和删除操作。

基本概念

什么是哈希表？

哈希表也叫散列表，它通过哈希函数（Hash Function）将键（Key）转换为数组索引，从而快速定位元素的位置。

生活中的例子：

字典：通过拼音首字母快速查找字
快递柜：通过取件码快速找到对应的格子
图书馆：通过分类号快速定位书籍位置

哈希函数

哈希函数是将任意大小的数据映射到固定大小值的函数。一个好的哈希函数应该满足：

确定性：相同的输入总是产生相同的输出
高效性：计算速度快
均匀性：输出值均匀分布在值域
雪崩效应：输入的微小变化导致输出的巨大变化

// 常见的哈希函数设计
// 1. 除留余数法
public int hashMod(int key, int capacity) {
    return key % capacity;
}

// 2. 字符串哈希（Java String 的实现）
public int hashString(String s) {
    int h = 0;
    for (char c : s.toCharArray()) {
        h = 31 * h + c;  // 31 是一个质数，能减少冲突
    }
    return h;
}

// 3. 平方取中法
public int hashMidSquare(int key, int capacity) {
    long square = (long) key * key;
    // 取中间几位
    return (int) (square / 1000) % capacity;
}

// 4. 折叠法
public int hashFold(int key, int capacity) {
    int sum = 0;
    while (key > 0) {
        sum += key % 1000;  // 每三位一组
        key /= 1000;
    }
    return sum % capacity;
}

解释：

除留余数法：最常用的方法，简单高效。选择容量为质数可以减少冲突
字符串哈希：Java 的 String 类使用 s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 公式
平方取中法：将关键字平方后取中间几位，适合关键字位数较大的情况
折叠法：将关键字分割成几部分后叠加，适合关键字位数很大且分布均匀的情况

哈希冲突

当两个不同的键映射到同一个索引位置时，就发生了哈希冲突。冲突是不可避免的，因为键的可能性远大于数组容量（鸽巢原理）。

冲突解决方法

1. 链地址法（Separate Chaining）

每个数组位置存储一个链表，冲突的元素都添加到同一个链表中。

public class ChainingHashMap<K, V> {
    private static class Node<K, V> {
        K key;
        V value;
        Node<K, V> next;
        
        Node(K key, V value) {
            this.key = key;
            this.value = value;
        }
    }
    
    private Node<K, V>[] table;
    private int size;
    private int capacity;
    private static final double LOAD_FACTOR = 0.75;
    
    @SuppressWarnings("unchecked")
    public ChainingHashMap(int capacity) {
        this.capacity = capacity;
        this.table = (Node<K, V>[]) new Node[capacity];
        this.size = 0;
    }
    
    // 哈希函数
    private int hash(K key) {
        return (key.hashCode() & 0x7FFFFFFF) % capacity;  // 保证非负
    }
    
    // 查找 - 平均 O(1)，最坏 O(n)
    public V get(K key) {
        int index = hash(key);
        Node<K, V> node = table[index];
        
        while (node != null) {
            if (node.key.equals(key)) {
                return node.value;
            }
            node = node.next;
        }
        
        return null;
    }
    
    // 插入 - 平均 O(1)，最坏 O(n)
    public void put(K key, V value) {
        if (size >= capacity * LOAD_FACTOR) {
            resize();
        }
        
        int index = hash(key);
        Node<K, V> node = table[index];
        
        // 检查是否已存在
        while (node != null) {
            if (node.key.equals(key)) {
                node.value = value;  // 更新
                return;
            }
            node = node.next;
        }
        
        // 头插法
        Node<K, V> newNode = new Node<>(key, value);
        newNode.next = table[index];
        table[index] = newNode;
        size++;
    }
    
    // 删除 - 平均 O(1)，最坏 O(n)
    public V remove(K key) {
        int index = hash(key);
        Node<K, V> node = table[index];
        Node<K, V> prev = null;
        
        while (node != null) {
            if (node.key.equals(key)) {
                if (prev == null) {
                    table[index] = node.next;
                } else {
                    prev.next = node.next;
                }
                size--;
                return node.value;
            }
            prev = node;
            node = node.next;
        }
        
        return null;
    }
    
    // 扩容
    @SuppressWarnings("unchecked")
    private void resize() {
        Node<K, V>[] oldTable = table;
        capacity *= 2;
        table = (Node<K, V>[]) new Node[capacity];
        size = 0;
        
        // 重新哈希
        for (Node<K, V> node : oldTable) {
            while (node != null) {
                put(node.key, node.value);
                node = node.next;
            }
        }
    }
    
    public int size() {
        return size;
    }
}

解释：

每个桶（bucket）是一个链表，存放哈希值相同的所有元素
查找时需要遍历链表，链表越长效率越低
负载因子 = 元素数量 / 桶数量，超过阈值时需要扩容
Java HashMap 在链表长度超过 8 时会转为红黑树，提高查找效率

2. 开放寻址法（Open Addressing）

当发生冲突时，按照某种规则寻找下一个空位置。

public class OpenAddressingHashMap<K, V> {
    private static class Entry<K, V> {
        K key;
        V value;
        boolean deleted;  // 标记是否被删除
        
        Entry(K key, V value) {
            this.key = key;
            this.value = value;
            this.deleted = false;
        }
    }
    
    private Entry<K, V>[] table;
    private int size;
    private int capacity;
    private static final double LOAD_FACTOR = 0.5;
    
    @SuppressWarnings("unchecked")
    public OpenAddressingHashMap(int capacity) {
        this.capacity = capacity;
        this.table = (Entry<K, V>[]) new Entry[capacity];
        this.size = 0;
    }
    
    private int hash(K key) {
        return (key.hashCode() & 0x7FFFFFFF) % capacity;
    }
    
    // 线性探测：依次往后找
    private int probe(int hash, int i) {
        return (hash + i) % capacity;
    }
    
    // 查找
    public V get(K key) {
        int h = hash(key);
        
        for (int i = 0; i < capacity; i++) {
            int index = probe(h, i);
            Entry<K, V> entry = table[index];
            
            if (entry == null) {
                return null;  // 没找到
            }
            
            if (!entry.deleted && entry.key.equals(key)) {
                return entry.value;
            }
        }
        
        return null;
    }
    
    // 插入
    public void put(K key, V value) {
        if (size >= capacity * LOAD_FACTOR) {
            resize();
        }
        
        int h = hash(key);
        Entry<K, V> newEntry = new Entry<>(key, value);
        
        for (int i = 0; i < capacity; i++) {
            int index = probe(h, i);
            Entry<K, V> entry = table[index];
            
            if (entry == null || entry.deleted) {
                table[index] = newEntry;
                size++;
                return;
            }
            
            if (entry.key.equals(key)) {
                entry.value = value;  // 更新
                return;
            }
        }
        
        throw new IllegalStateException("Hash table is full");
    }
    
    // 删除（懒惰删除）
    public V remove(K key) {
        int h = hash(key);
        
        for (int i = 0; i < capacity; i++) {
            int index = probe(h, i);
            Entry<K, V> entry = table[index];
            
            if (entry == null) {
                return null;
            }
            
            if (!entry.deleted && entry.key.equals(key)) {
                entry.deleted = true;  // 标记为已删除
                size--;
                return entry.value;
            }
        }
        
        return null;
    }
    
    @SuppressWarnings("unchecked")
    private void resize() {
        Entry<K, V>[] oldTable = table;
        capacity *= 2;
        table = (Entry<K, V>[]) new Entry[capacity];
        size = 0;
        
        for (Entry<K, V> entry : oldTable) {
            if (entry != null && !entry.deleted) {
                put(entry.key, entry.value);
            }
        }
    }
}

探测方法对比：

方法	公式	特点
线性探测	(h + i) % capacity	简单，但容易产生聚集
平方探测	(h + i²) % capacity	减少聚集，但可能无法找到空位
双重哈希	(h1 + i * h2) % capacity	分布均匀，效果最好

解释：

开放寻址法中所有元素都存在数组中，没有额外的链表开销
删除操作需要特殊处理，通常使用"懒惰删除"标记
线性探测容易产生一次聚集（连续的占用位置）
负载因子通常控制在 0.5 以下，以保证性能

Java HashMap 详解

底层结构

Java HashMap 采用数组 + 链表 + 红黑树的结构：

// JDK 8 HashMap 简化结构
public class HashMap<K, V> {
    // 默认初始容量 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    
    // 最大容量
    static final int MAXIMUM_CAPACITY = 1 << 30;
    
    // 默认负载因子 0.75
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    
    // 链表转红黑树的阈值
    static final int TREEIFY_THRESHOLD = 8;
    
    // 红黑树转链表的阈值
    static final int UNTREEIFY_THRESHOLD = 6;
    
    // 哈希桶数组
    transient Node<K, V>[] table;
    
    // 键值对数量
    transient int size;
    
    // 扩容阈值 = capacity * loadFactor
    int threshold;
    
    // 负载因子
    final float loadFactor;
    
    // 节点类
    static class Node<K, V> implements Map.Entry<K, V> {
        final int hash;
        final K key;
        V value;
        Node<K, V> next;
        
        Node(int hash, K key, V value, Node<K, V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
    }
}

哈希计算

// HashMap 的哈希方法
static final int hash(Object key) {
    int h;
    // 高 16 位异或低 16 位，增加随机性
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

// 计算索引位置
// n 是数组长度，必须是 2 的幂
// (n - 1) & hash 等价于 hash % n，但位运算更快
int index = (n - 1) & hash;

解释：

高 16 位异或低 16 位是为了让哈希值的高位也能参与运算，减少冲突
数组长度必须是 2 的幂，这样 (n - 1) & hash 就相当于取模运算

put 方法流程

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K, V>[] tab; Node<K, V> p; int n, i;
    
    // 1. 如果数组为空，初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    
    // 2. 计算索引，如果该位置为空，直接插入
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K, V> e; K k;
        
        // 3. 如果 key 相同，直接覆盖
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        
        // 4. 如果是红黑树节点
        else if (p instanceof TreeNode)
            e = ((TreeNode<K, V>)p).putTreeVal(this, tab, hash, key, value);
        
        // 5. 链表处理
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 链表长度达到 8，转为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        
        // 6. 找到已存在的 key，更新 value
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            return oldValue;
        }
    }
    
    // 7. 检查是否需要扩容
    if (++size > threshold)
        resize();
    
    return null;
}

扩容机制

final Node<K, V>[] resize() {
    Node<K, V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    
    // 计算新容量
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1;  // 扩容为原来的 2 倍
    }
    else if (oldThr > 0)
        newCap = oldThr;
    else {
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    
    // 创建新数组
    @SuppressWarnings({"rawtypes", "unchecked"})
    Node<K, V>[] newTab = (Node<K, V>[])new Node[newCap];
    table = newTab;
    threshold = newThr;
    
    // 数据迁移
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K, V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    // 单个节点，直接计算新位置
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // 红黑树拆分
                    ((TreeNode<K, V>)e).split(this, newTab, j, oldCap);
                else {
                    // 链表拆分
                    // 优化：利用 e.hash & oldCap 判断位置
                    // 原位置 or 原位置 + oldCap
                    Node<K, V> loHead = null, loTail = null;
                    Node<K, V> hiHead = null, hiTail = null;
                    Node<K, V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

扩容优化：

JDK 8 在扩容时，链表元素的新位置要么是原位置，要么是原位置 + oldCap
这样避免了重新计算哈希值，提高效率
这是因为容量是 2 的幂，扩容后高位多了一位参与索引计算

常见问题

1. 两数之和

public int[] twoSum(int[] nums, int target) {
    Map<Integer, Integer> map = new HashMap<>();
    
    for (int i = 0; i < nums.length; i++) {
        int complement = target - nums[i];
        
        if (map.containsKey(complement)) {
            return new int[]{map.get(complement), i};
        }
        
        map.put(nums[i], i);
    }
    
    return new int[]{-1, -1};
}

解释：

使用哈希表存储已遍历的元素及其索引
查找 complement 的时间复杂度为 O(1)
总时间复杂度 O(n)，空间复杂度 O(n)

2. 字符串中第一个唯一字符

public int firstUniqChar(String s) {
    Map<Character, Integer> count = new HashMap<>();
    
    // 统计每个字符出现的次数
    for (char c : s.toCharArray()) {
        count.merge(c, 1, Integer::sum);
    }
    
    // 找到第一个只出现一次的字符
    for (int i = 0; i < s.length(); i++) {
        if (count.get(s.charAt(i)) == 1) {
            return i;
        }
    }
    
    return -1;
}

3. 字母异位词分组

public List<List<String>> groupAnagrams(String[] strs) {
    Map<String, List<String>> map = new HashMap<>();
    
    for (String s : strs) {
        // 将字符串排序作为 key
        char[] chars = s.toCharArray();
        Arrays.sort(chars);
        String key = new String(chars);
        
        map.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
    }
    
    return new ArrayList<>(map.values());
}

// 优化：使用计数作为 key
public List<List<String>> groupAnagramsOptimized(String[] strs) {
    Map<String, List<String>> map = new HashMap<>();
    
    for (String s : strs) {
        int[] count = new int[26];
        for (char c : s.toCharArray()) {
            count[c - 'a']++;
        }
        
        // 将计数数组转为字符串作为 key
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 26; i++) {
            sb.append('#').append(count[i]);
        }
        String key = sb.toString();
        
        map.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
    }
    
    return new ArrayList<>(map.values());
}

4. LRU 缓存

public class LRUCache {
    private static class Node {
        int key, value;
        Node prev, next;
        
        Node(int key, int value) {
            this.key = key;
            this.value = value;
        }
    }
    
    private final int capacity;
    private final Map<Integer, Node> cache;
    private final Node head, tail;  // 哨兵节点
    
    public LRUCache(int capacity) {
        this.capacity = capacity;
        this.cache = new HashMap<>();
        
        // 初始化双向链表
        head = new Node(0, 0);
        tail = new Node(0, 0);
        head.next = tail;
        tail.prev = head;
    }
    
    public int get(int key) {
        Node node = cache.get(key);
        if (node == null) return -1;
        
        // 移动到头部（最近使用）
        remove(node);
        addToHead(node);
        
        return node.value;
    }
    
    public void put(int key, int value) {
        Node node = cache.get(key);
        
        if (node != null) {
            // 更新值并移动到头部
            node.value = value;
            remove(node);
            addToHead(node);
        } else {
            // 新节点
            Node newNode = new Node(key, value);
            cache.put(key, newNode);
            addToHead(newNode);
            
            // 超过容量，删除最久未使用的
            if (cache.size() > capacity) {
                Node lru = tail.prev;
                remove(lru);
                cache.remove(lru.key);
            }
        }
    }
    
    private void remove(Node node) {
        node.prev.next = node.next;
        node.next.prev = node.prev;
    }
    
    private void addToHead(Node node) {
        node.next = head.next;
        node.prev = head;
        head.next.prev = node;
        head.next = node;
    }
}

解释：

使用哈希表实现 O(1) 查找
使用双向链表维护访问顺序
头部是最近使用的，尾部是最久未使用的
哨兵节点简化边界处理

哈希表与其他结构对比

操作	哈希表	数组（有序）	链表	平衡树
查找	O(1) 平均	O(log n)	O(n)	O(log n)
插入	O(1) 平均	O(n)	O(1)	O(log n)
删除	O(1) 平均	O(n)	O(1)	O(log n)
有序遍历	不支持	O(n)	O(n log n)	O(n)
范围查询	不支持	O(log n + k)	不支持	O(log n + k)

选择建议：

使用哈希表：需要快速查找、插入、删除，不关心顺序
使用有序数组：需要二分查找、范围查询，数据变化少
使用平衡树：需要有序遍历、范围查询、高效查找

小结

本章我们学习了：

哈希表原理：哈希函数、哈希冲突、冲突解决方法
链地址法：每个位置存储链表，简单高效
开放寻址法：探测寻找空位置，空间效率高
Java HashMap：数组 + 链表 + 红黑树，JDK 8 优化
实际应用：两数之和、LRU 缓存等经典问题

练习

实现一个简单的哈希表（使用链地址法）
LeetCode 1：两数之和
LeetCode 146：LRU 缓存机制
LeetCode 49：字母异位词分组
LeetCode 128：最长连续序列
设计一个支持插入、删除、随机获取元素的数据结构（O(1) 时间复杂度）

基本概念​

什么是哈希表？​

哈希函数​

哈希冲突​

冲突解决方法​

1. 链地址法（Separate Chaining）​

2. 开放寻址法（Open Addressing）​

Java HashMap 详解​

底层结构​

哈希计算​

put 方法流程​

扩容机制​

常见问题​

1. 两数之和​

2. 字符串中第一个唯一字符​

3. 字母异位词分组​

4. LRU 缓存​

哈希表与其他结构对比​

小结​

练习​