八股 - 杂项
Bloom Filter 布隆过滤器
A space-efficient probabilistic data structure that tells you whether an element “definitely not in a set” or “possibly in a set”. Allow quick check of membership at the cost of occasional false positive.
It provides two apis
put(elem: T) -> void
might_contains(elem: T) -> boolean
A bloom filter uses:
A bit array of size
m(initially all zeros)kindependent hash functions that map elements to array positions
Adding an element:
Hash the element with each of the
khash functionsSet the corresponding bits in the array to 1
Checking membership:
Hash the element with the same
khash functionsCheck if ALL corresponding bits are 1
If any bit is 0 → element is definitely NOT in the set
If all bits are 1 → element MIGHT be in the set
Why false positives occur: Different elements can hash to the same bit positions. If enough other elements have set all the bits that your query element hashes to, you’ll get a false positive.
Calculation of false positive rate
TODO:
Key properties
Space efficient: Uses much less space than storing actual elements
Fast operations: O(k) time for both insertion and lookup
One-way errors: Can have false positives, but never false negatives
No deletions: Standard bloom filters don’t support removing elements (though variants like counting bloom filters do)
Common applications
Database query optimization: Check if a record might exist before expensive disk lookup
Web caching: Quickly determine if a URL might be cached
Network routing: Efficiently represent routing tables
Distributed systems: Coordinate between nodes about what data they might have
Example: Avoid revisiting URL in a web crawler
TODO:
缓存淘汰算法
缓存容量有限,因此我们需要决定哪些数据需要被淘汰。
LRU 缓存
LRU 缓存的思路是仅保留最近使用过的前 K 个数据。
LRU 缓存提供两个 API:
get(key): 返回键对应的值。时间复杂度 O(1)put(key, value): 插入或更新一个键值对。时间复杂度 O(1)
其实现基于哈希表和双链表。
双链表的头节点是最近使用的键值对,尾节点是最久未使用的键值对。
哈希表存储键到链表节点(指针)的映射。
为了做到O(1)的读写,双链表需提供以下 API
class ListNode:
key: int
value: int
prev: ListNode
next: ListNode
# O(1), Remove a specific node (pass in a pointer)
removeNode(node: ListNode) -> None
# O(1), Insert a node at head
pushAtHead(node: ListNode) -> None
# O(1), Remove a node at tail
popAtTail() -> ListNode
每读取一个键,或改写一个已存在的键的值,我们都要把它移到链表头部,以此表示这个键值对最近被使用过。“移到链表头部"是通过removeNode + pushAtHead实现的
# move node to head
curNode = table[key]
self.dll.removeNode(curNode)
self.dll.pushAtHead(curNode)
每插入一对新的键值对,如果双链表的长度超过 K ,我们删除链表尾节点(淘汰),再将新的键值对作为新的头节点, 以此保证我们只存储最近使用过的前 K 个键值对。
LRUCache 的 Python 代码如下
class LRUCache:
def __init__(self, capacity: int):
self.table: dict[int, ListNode] = {}
self.dll = DoublyLinkedList()
self.capacity = capacity
def get(self, key) -> int:
if key not in self.table:
return -1
node = self.table[key]
self._refresh_node(node)
return node.value
def put(self, key, value) -> None:
if key in self.table:
# move node in dll to head
curNode = self.table[key]
curNode.value = value
self._refresh_node(curNode)
else:
# create new node (evict if full), insert at head
if self._isFull():
self._evict()
newNode = ListNode(key, value)
self.dll.pushAtHead(newNode)
self.table[key] = newNode
def _refresh_node(self, node: ListNode) -> None:
"""
When a key is used, we refresh its node in the doubly-linked list by moving it to head.
"""
self.dll.removeNode(node)
self.dll.pushAtHead(node)
def _isFull(self) -> bool:
return len(self.table) >= self.capacity
def _evict(self) -> None:
deletedNode = self.dll.popAtTail()
self.table.pop(deletedNode.key)
DoublyLinkedList 和 ListNode 的实现如下
class ListNode:
def __init__(self, key: int, value: int):
self.key = key
self.value = value
self.prev = None
self.next = None
class DoublyLinkedList:
def __init__(self):
# 虚拟头尾节点
self.vhead = ListNode(0, 0)
self.vtail = ListNode(0, 0)
self.vhead.next = self.vtail
self.vtail.prev = self.vhead
def removeNode(self, node: ListNode) -> None:
prevNode, nextNode = node.prev, node.next
prevNode.next = nextNode
nextNode.prev = prevNode
node.prev = node.next = None
def pushAtHead(self, node: ListNode) -> None:
node.next = self.vhead.next
node.prev = self.vhead
self.vhead.next.prev = node
self.vhead.next = node
def popAtTail(self) -> ListNode:
if self.vtail.prev == self.vhead:
return None
tailNode = self.vtail.prev
self.removeNode(tailNode)
return tailNode
注:
- 双链表可以使用虚拟头/尾节点,简化实现