Python 数据结构

Python 内置了多种数据结构，用于存储和组织数据。本章将深入介绍列表、元组、字典和集合的底层原理、操作方法和最佳实践。

列表（List）

列表是最常用的数据结构，可以存储有序的元素序列。列表是可变的，可以包含任意类型的元素。

创建列表

empty = []

fruits = ["苹果", "香蕉", "橙子"]

mixed = [1, "hello", 3.14, True]

numbers = list(range(5))

chars = list("hello")

列表的底层实现

Python 列表底层是一个动态数组，它存储的是对象的引用（指针），而不是对象本身。

import sys

lst = [1, 2, 3, 4, 5]
print(f"列表大小: {sys.getsizeof(lst)} 字节")
print(f"每个元素引用: {sys.getsizeof(lst[0])} 字节")

列表的内存布局可以理解为：

列表对象
┌─────────────────────────────────┐
│ ob_refcnt (引用计数)             │
│ ob_type (类型指针)               │
│ ob_size (元素个数)               │
│ allocated (分配空间)             │
│ ob_item ──────────────────────┐ │
└───────────────────────────────┼─┘
                                ▼
                    ┌─────┬─────┬─────┬─────┬─────┐
                    │  *  │  *  │  *  │  *  │  *  │
                    └──┬──┴──┬──┴──┬──┴──┬──┴──┬──┘
                       │     │     │     │     │
                       ▼     ▼     ▼     ▼     ▼
                      int   int   int   int   int
                       1     2     3     4     5

扩容机制：当列表空间不足时，Python 会自动扩容。扩容策略是：新容量约为原容量的 1.125 倍（对于较大的列表），这保证了均摊 O(1) 的追加操作时间复杂度。

import sys

lst = []
sizes = []
for i in range(20):
    lst.append(i)
    sizes.append(sys.getsizeof(lst))

print("容量变化:", sizes)

访问元素

fruits = ["苹果", "香蕉", "橙子", "葡萄"]

print(fruits[0])
print(fruits[1])

print(fruits[-1])
print(fruits[-2])

try:
    print(fruits[10])
except IndexError as e:
    print(f"索引越界: {e}")

切片操作

切片是 Python 中非常强大的功能，它创建的是原列表的浅拷贝。

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

print(numbers[2:5])
print(numbers[:5])
print(numbers[5:])
print(numbers[::2])
print(numbers[::-1])

print(numbers[1:7:2])
print(numbers[7:1:-1])

new_list = numbers[:]
print(new_list is numbers)

切片语法解析：list[start:stop:step]

start：起始索引（默认为 0）
stop：结束索引（不包含，默认为列表长度）
step：步长（默认为 1）

修改列表

fruits = ["苹果", "香蕉", "橙子"]

fruits[0] = "葡萄"
print(fruits)

fruits.append("草莓")
print(fruits)

fruits.insert(1, "桃子")
print(fruits)

fruits.extend(["西瓜", "芒果"])
print(fruits)

fruits.remove("香蕉")
print(fruits)

del fruits[0]
print(fruits)

popped = fruits.pop()
print(f"弹出: {popped}")
print(fruits)

popped_first = fruits.pop(0)
print(f"弹出第一个: {popped_first}")

fruits.clear()
print(fruits)

列表方法详解

lst = [3, 1, 4, 1, 5, 9, 2, 6]

print(lst.count(1))
print(lst.index(4))

lst_copy = lst.copy()
print(lst_copy is lst)

lst.sort()
print(lst)

lst.sort(reverse=True)
print(lst)

lst.sort(key=lambda x: -x)
print(lst)

lst.reverse()
print(lst)

new_sorted = sorted(lst)
new_reversed = list(reversed(lst))

时间复杂度分析

了解列表操作的时间复杂度对于编写高效代码非常重要：

操作	时间复杂度	说明
索引访问 `lst[i]`	O(1)	直接通过偏移量访问
赋值 `lst[i] = x`	O(1)	直接修改
追加 `lst.append(x)`	O(1)	均摊复杂度，偶尔需要扩容
弹出末尾 `lst.pop()`	O(1)	直接删除
弹出指定位置 `lst.pop(i)`	O(n)	需要移动后续元素
插入 `lst.insert(i, x)`	O(n)	需要移动后续元素
删除 `lst.remove(x)`	O(n)	需要查找并移动元素
查找 `x in lst`	O(n)	需要遍历
获取长度 `len(lst)`	O(1)	列表对象存储了长度
切片 `lst[a:b]`	O(k)	k 是切片长度

import time

def test_append(n):
    lst = []
    start = time.time()
    for i in range(n):
        lst.append(i)
    return time.time() - start

def test_insert_start(n):
    lst = []
    start = time.time()
    for i in range(n):
        lst.insert(0, i)
    return time.time() - start

n = 100000
print(f"追加 {n} 个元素: {test_append(n):.4f} 秒")
print(f"头部插入 {n} 个元素: {test_insert_start(n):.4f} 秒")

列表推导式

列表推导式提供了一种简洁的方式来创建列表。

squares = [x**2 for x in range(10)]
print(squares)

evens = [x for x in range(10) if x % 2 == 0]
print(evens)

result = ["偶数" if x % 2 == 0 else "奇数" for x in range(5)]
print(result)

matrix = [[i*j for j in range(3)] for i in range(3)]
print(matrix)

flat = [x for row in matrix for x in row]
print(flat)

words = ["hello", "world", "python"]
lengths = [(word, len(word)) for word in words]
print(lengths)

列表推导式 vs 普通循环：

import time

n = 100000

start = time.time()
squares = []
for x in range(n):
    squares.append(x**2)
time_loop = time.time() - start

start = time.time()
squares = [x**2 for x in range(n)]
time_comp = time.time() - start

print(f"普通循环: {time_loop:.4f} 秒")
print(f"列表推导式: {time_comp:.4f} 秒")

列表推导式通常比普通循环更快，因为它的迭代在 C 层面完成。

使用列表实现栈

列表天然适合作为栈使用，因为 append() 和 pop() 操作都是 O(1)。

stack = []

stack.append(1)
stack.append(2)
stack.append(3)
print(f"栈: {stack}")

top = stack.pop()
print(f"弹出: {top}")
print(f"栈: {stack}")

print(f"栈顶: {stack[-1]}")
print(f"是否为空: {len(stack) == 0}")

使用列表实现队列（不推荐）

虽然可以用列表实现队列，但效率不高，因为从头部删除元素是 O(n)。

from collections import deque

queue = deque([1, 2, 3])

queue.append(4)
print(f"入队: {queue}")

first = queue.popleft()
print(f"出队: {first}")
print(f"队列: {queue}")

元组（Tuple）

元组是不可变的有序序列。不可变性意味着元组创建后不能修改，这使得元组可以作为字典的键或集合的元素。

创建元组

empty = ()

single = (1,)

point = (1, 2, 3)
mixed = (1, "hello", 3.14)

coordinates = 1, 2, 3

numbers = tuple([1, 2, 3])

chars = tuple("hello")

注意：单元素元组必须加逗号，否则 Python 会将其解释为括号内的表达式。

not_tuple = (1)
print(type(not_tuple))

real_tuple = (1,)
print(type(real_tuple))

元组的不可变性

元组本身不可变，但如果元组包含可变对象，这些对象可以修改：

t = (1, 2, [3, 4])
t[2].append(5)
print(t)

try:
    t[0] = 10
except TypeError as e:
    print(f"错误: {e}")

元组的优势

不可变性：数据安全，不会被意外修改
作为字典键：可以作为字典的键
性能更好：比列表占用更少内存，创建更快
语义明确：表示固定长度的记录

import sys

lst = [1, 2, 3, 4, 5]
tpl = (1, 2, 3, 4, 5)

print(f"列表内存: {sys.getsizeof(lst)} 字节")
print(f"元组内存: {sys.getsizeof(tpl)} 字节")

d = {(1, 2): "点A", (3, 4): "点B"}
print(d[(1, 2)])

元组解包

元组解包是 Python 中非常实用的特性：

point = (10, 20, 30)
x, y, z = point
print(f"x={x}, y={y}, z={z}")

a, *b = (1, 2, 3, 4, 5)
print(f"a={a}, b={b}")

*head, tail = (1, 2, 3, 4, 5)
print(f"head={head}, tail={tail}")

first, *middle, last = (1, 2, 3, 4, 5)
print(f"first={first}, middle={middle}, last={last}")

for x, y in [(1, 'a'), (2, 'b'), (3, 'c')]:
    print(f"{x} -> {y}")

命名元组

collections.namedtuple 创建具有命名字段的元组：

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(10, 20)

print(f"x={p.x}, y={p.y}")
print(f"p[0]={p[0]}, p[1]={p[1]}")

Person = namedtuple('Person', 'name age city')
person = Person('张三', 25, '北京')
print(f"{person.name}, {person.age}岁, {person.city}")

print(person._asdict())

new_person = person._replace(age=26)
print(new_person)

print(person._fields)

字典（Dict）

字典是键值对的无序集合，通过键来高效访问值。Python 3.7+ 中，字典保持插入顺序。

创建字典

empty = {}

person = {"name": "张三", "age": 20, "city": "北京"}

person = dict(name="张三", age=20, city="北京")

pairs = [("name", "张三"), ("age", 20)]
person = dict(pairs)

keys = ["name", "age", "city"]
default_dict = dict.fromkeys(keys, "未知")
print(default_dict)

squares = {x: x**2 for x in range(5)}
print(squares)

字典的底层实现

Python 字典使用哈希表实现。每个键通过哈希函数计算出一个哈希值，然后映射到表中的一个位置。

哈希表结构（简化）
┌─────────────────────────────────────────────┐
│ 索引 │   哈希值   │     键     │     值     │
├──────┼───────────┼───────────┼───────────┤
│  0   │           │   (空)    │           │
│  1   │  1234567  │   "name"  │   "张三"   │
│  2   │           │   (空)    │           │
│  3   │  7654321  │   "age"   │    20     │
│  4   │           │   (空)    │           │
│  5   │  9876543  │   "city"  │   "北京"   │
└─────────────────────────────────────────────┘

哈希冲突处理：当两个键的哈希值映射到同一位置时，Python 使用开放寻址法解决冲突。

字典扩容：当元素数量超过容量的 2/3 时，字典会自动扩容，保证 O(1) 的平均查找时间。

访问字典

person = {"name": "张三", "age": 20, "city": "北京"}

print(person["name"])

print(person.get("name"))
print(person.get("gender", "未知"))

try:
    print(person["gender"])
except KeyError as e:
    print(f"键不存在: {e}")

print(person.setdefault("gender", "男"))
print(person)

修改字典

person = {"name": "张三", "age": 20}

person["city"] = "北京"
print(person)

person["age"] = 21
print(person)

person.update({"country": "中国", "age": 22})
print(person)

del person["city"]
print(person)

age = person.pop("age")
print(f"弹出: {age}")

gender = person.pop("gender", None)
print(f"弹出不存在的键: {gender}")

item = {"a": 1, "b": 2}.popitem()
print(f"弹出最后一项: {item}")

person.clear()
print(person)

遍历字典

person = {"name": "张三", "age": 20, "city": "北京"}

for key in person:
    print(key)

for key in person.keys():
    print(key)

for value in person.values():
    print(value)

for key, value in person.items():
    print(f"{key}: {value}")

for i, (key, value) in enumerate(person.items()):
    print(f"{i}. {key}: {value}")

字典的时间复杂度

操作	时间复杂度	说明
访问 `d[key]`	O(1)	哈希查找
赋值 `d[key] = v`	O(1)	哈希插入
删除 `del d[key]`	O(1)	哈希删除
查找 `key in d`	O(1)	哈希查找
获取长度 `len(d)`	O(1)	存储长度
遍历	O(n)	遍历所有元素

字典的视图对象

keys()、values()、items() 返回的是视图对象，不是列表。视图对象是动态的，会反映字典的变化：

person = {"name": "张三", "age": 20}
keys = person.keys()
print(keys)

person["city"] = "北京"
print(keys)

print(list(keys))
print(list(person.values()))
print(list(person.items()))

合并字典

d1 = {"a": 1, "b": 2}
d2 = {"b": 3, "c": 4}

merged = {**d1, **d2}
print(merged)

merged = d1 | d2
print(merged)

d1 |= d2
print(d1)

字典的常见应用

from collections import defaultdict

word_list = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
word_count = defaultdict(int)
for word in word_list:
    word_count[word] += 1
print(dict(word_count))

students = [('A', '张三'), ('B', '李四'), ('A', '王五')]
grade_students = defaultdict(list)
for grade, name in students:
    grade_students[grade].append(name)
print(dict(grade_students))

d = defaultdict(lambda: "N/A")
d["name"] = "张三"
print(d["name"])
print(d["age"])

集合（Set）

集合是无序且不重复的元素集合，支持数学集合运算。

创建集合

empty = set()

fruits = {"苹果", "香蕉", "橙子"}

numbers = set([1, 2, 3, 4, 5])

chars = set("hello")
print(chars)

squares = {x**2 for x in range(5)}
print(squares)

集合的底层实现

集合底层也是哈希表，只存储键，不存储值。这使得集合的成员检测非常高效。

import time

lst = list(range(100000))
s = set(range(100000))

start = time.time()
for _ in range(1000):
    99999 in lst
list_time = time.time() - start

start = time.time()
for _ in range(1000):
    99999 in s
set_time = time.time() - start

print(f"列表查找: {list_time:.4f} 秒")
print(f"集合查找: {set_time:.4f} 秒")

集合操作

fruits = {"苹果", "香蕉"}

fruits.add("橙子")
print(fruits)

fruits.remove("苹果")
print(fruits)

fruits.discard("葡萄")

popped = fruits.pop()
print(f"弹出: {popped}")

fruits.clear()
print(fruits)

集合运算

set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

union = set1 | set2
union = set1.union(set2)
print(f"并集: {union}")

intersection = set1 & set2
intersection = set1.intersection(set2)
print(f"交集: {intersection}")

difference = set1 - set2
difference = set1.difference(set2)
print(f"差集: {difference}")

sym_diff = set1 ^ set2
sym_diff = set1.symmetric_difference(set2)
print(f"对称差集: {sym_diff}")

set_a = {1, 2, 3}
set_b = {1, 2, 3, 4, 5}
print(f"set_a 是 set_b 的子集: {set_a.issubset(set_b)}")
print(f"set_b 是 set_a 的超集: {set_b.issuperset(set_a)}")
print(f"无交集: {set_a.isdisjoint({4, 5, 6})}")

不可变集合

frozenset 是不可变版本的集合，可以作为字典的键或集合的元素：

fs = frozenset([1, 2, 3])
print(fs)

try:
    fs.add(4)
except AttributeError as e:
    print(f"不可变: {e}")

d = {frozenset([1, 2]): "点A"}
print(d[frozenset([1, 2])])

深拷贝与浅拷贝

理解深拷贝和浅拷贝对于正确处理嵌套数据结构至关重要。Python 中变量存储的是对象的引用，而非对象本身。当你"复制"一个变量时，实际上复制的是引用。

浅拷贝

浅拷贝创建一个新对象，但只复制顶层元素的引用。对于包含可变对象的嵌套结构，浅拷贝后内部对象仍然共享同一份数据。

import copy

original = [[1, 2], [3, 4]]

# 三种浅拷贝方式
shallow = original.copy()      # 方法1：使用列表的 copy 方法
shallow = original[:]          # 方法2：使用切片
shallow = copy.copy(original)  # 方法3：使用 copy 模块

# 修改嵌套对象会影响原列表
shallow[0][0] = 999
print(f"原列表: {original}")  # [[999, 2], [3, 4]] - 被修改了！
print(f"浅拷贝: {shallow}")   # [[999, 2], [3, 4]]

# 但添加新元素不影响原列表
shallow.append([5, 6])
print(f"原列表: {original}")  # [[999, 2], [3, 4]] - 不变
print(f"浅拷贝: {shallow}")   # [[999, 2], [3, 4], [5, 6]]

关键点：浅拷贝只复制"第一层"。修改 shallow[0] 这个列表对象的内容会影响 original[0]，因为它们指向同一个列表对象。但 shallow.append() 是在顶层操作，不影响原列表。

深拷贝

深拷贝递归复制所有层级的对象，创建完全独立的副本。无论嵌套多深，修改副本不会影响原始对象。

import copy

original = [[1, 2], [3, 4]]

# 深拷贝：递归复制所有层级
deep = copy.deepcopy(original)

# 修改嵌套对象不影响原列表
deep[0][0] = 999
print(f"原列表: {original}")  # [[1, 2], [3, 4]] - 不受影响
print(f"深拷贝: {deep}")      # [[999, 2], [3, 4]]

性能考虑：深拷贝比浅拷贝更慢，因为它需要递归遍历所有层级的对象。对于简单的不可变元素列表，浅拷贝足够使用；对于复杂嵌套结构，根据实际需求选择。

拷贝对比图解

原列表 original
┌─────────────────┐
│ [0] ────────┐   │
│ [1] ─────┐  │   │
└──────────┼──┼───┘
           │  │
           ▼  ▼
        ┌─────┐ ┌─────┐
        │ 1,2 │ │ 3,4 │
        └─────┘ └─────┘

浅拷贝 shallow
┌─────────────────┐
│ [0] ────────┐   │  (与原列表指向同一个子列表)
│ [1] ─────┐  │   │
└──────────┼──┼───┘
           │  │
           ▼  ▼
        ┌─────┐ ┌─────┐
        │ 1,2 │ │ 3,4 │  (共享)
        └─────┘ └─────┘

深拷贝 deep
┌─────────────────┐
│ [0] ────────┐   │
│ [1] ─────┐  │   │
└──────────┼──┼───┘
           │  │
           ▼  ▼
        ┌─────┐ ┌─────┐
        │ 1,2 │ │ 3,4 │  (独立的副本)
        └─────┘ └─────┘

实际应用场景

import copy

class Config:
    def __init__(self, settings):
        self.settings = copy.deepcopy(settings)

original_settings = {
    "database": {"host": "localhost", "port": 3306},
    "cache": {"enabled": True}
}

config = Config(original_settings)
config.settings["database"]["host"] = "remote.server"

print(f"原始配置: {original_settings}")
print(f"修改后配置: {config.settings}")

数据结构选择指南

按需求选择

需求	推荐数据结构	原因
有序存储，频繁修改	列表	O(1) 尾部操作
有序存储，不需要修改	元组	更省内存，可作为字典键
快速查找、去重	集合	O(1) 成员检测
键值映射	字典	O(1) 查找
队列操作	deque	两端 O(1) 操作
计数	Counter	专门用于计数
固定字段记录	namedtuple	可读性好

性能对比

import time
from collections import deque

n = 100000

lst = []
start = time.time()
for i in range(n):
    lst.insert(0, i)
list_insert_time = time.time() - start

d = deque()
start = time.time()
for i in range(n):
    d.appendleft(i)
deque_insert_time = time.time() - start

print(f"列表头部插入 {n} 次: {list_insert_time:.4f} 秒")
print(f"deque 头部插入 {n} 次: {deque_insert_time:.4f} 秒")

小结

本章我们深入学习了 Python 的内置数据结构：

列表：动态数组实现，支持索引、切片、推导式
元组：不可变序列，可作为字典键，性能更好
字典：哈希表实现，O(1) 查找，保持插入顺序
集合：无序不重复，支持集合运算
深浅拷贝：理解引用和复制的区别
时间复杂度：选择合适数据结构的关键

练习

实现一个函数，统计字符串中每个字符出现的次数
使用列表推导式生成九九乘法表
实现一个简单的缓存系统，使用字典存储，支持过期时间
比较列表和集合在成员检测上的性能差异
实现一个函数，深度展平嵌套列表
使用 defaultdict 实现分组功能

列表（List）​

创建列表​

列表的底层实现​

访问元素​

切片操作​

修改列表​

列表方法详解​

时间复杂度分析​

列表推导式​

使用列表实现栈​

使用列表实现队列（不推荐）​

元组（Tuple）​

创建元组​

元组的不可变性​

元组的优势​

元组解包​

命名元组​

字典（Dict）​

创建字典​

字典的底层实现​

访问字典​

修改字典​

遍历字典​

字典的时间复杂度​

字典的视图对象​

合并字典​

字典的常见应用​

集合（Set）​

创建集合​

集合的底层实现​

集合操作​

集合运算​

不可变集合​

深拷贝与浅拷贝​

浅拷贝​

深拷贝​

拷贝对比图解​

实际应用场景​

数据结构选择指南​

按需求选择​

性能对比​

小结​

练习​

参考资源​