Lecture 9 Cache Part 4 | newrain-zh博客

type

status

date

slug

summary

Leases

Caching site obtains finite-duration control from master copy 缓存站点从主副本中获取有限持续时间控制

duration is called “lease period” , typically few seconds 持续时间称为“租赁期”，通常为几秒钟

multiple sites can obtain read lease; only one can get write lease 多个站点可以获取读取租约;只有一个可以获得写租约

leases have to be renewed , else control expires with lease - 租约必须续签，否则控制权将随租约到期

multiple read leases or at most one write lease at a time - 多个读租约或一次最多一个写租约

互斥机制实质上就是租约，同一时间内至多只能有一个租约存在。

可以同时存在多个读租约，但只能存在一个写租约，并且读租约和写租约互斥。当然不持有任何租约也是可以的。

租约到期也可以选择不续期

Limiting cases

lease duration = 0 → Check on use

lease duration = →callback(Targeted Notification)

租约时长设置为 0 相当于 Check on use

租约时长设置为无限长相当于 callback

Advantages

generalizes the check on use and callback schemes 概括了使用检查和回调方案

lease duration can be tuned to adapt to mutation rate 可以调整租赁期限以适应突变率

Lease duration is a clean tuning knob for design flexibility

conceptually simple yet flexible 概念简单而灵活

Key Assumption

关键假设

Clocks tick at the same rate everywhere

clocks do NOT have to be synchronized

absolute time does not matter

only relative time (i.e.,clock tick rate) matters

Time becomes a hidden communication channel

Client server 都以同一时间做作为参考。

确定租约的时间非常困难。

服务器决定租约时长。

如果存在网络延迟较高，client的租约比预期设置的要短。

要考虑client 、 Server 时钟不同步

自适应租约机制

Disadvantages

lease-holder has total autonomy during lease ; revocation ? 租约期间，租约持有者拥有完全自主权；（如何）撤销？

writers delayed while read lease holders complete their leases 写操作会被延迟，直至读租约持有者完成其租约

more traffic than callback ( but less than check on use) 流量比回调多（但比 “使用时检查” 少）

租约的流量成本，介于 “callback（低流量）” 和 “check on use（高流量）” 之间。

keeplives for callback only per server , not per lease 回调的保活仅针对每个服务器，而非每个租约

租约流量于持有租约的对象数量成正比

论文：Leases: an efficient fault-tolerant mechanism for distributed file cache consistency

Skip Scary Parts

绕过棘手环节

Base Idea

When write-sharing detected , turn off caching everywhere 检测到写共享时，关闭所有缓存

all references go directly master copy 所有引用都进入主副本

Resumer caching when write-sharing ends 写入共享结束时的恢复缓存

当检测到写共享时，关闭所有缓存，所有访问指向主副本。

关闭所有缓存时有针对性和粗粒度的。

Advantages

Precise sigle-copy semantics (event at byte-level consistency)

Excellent fallback position 出色的降级策略

Exemplifies good engineering:”Handle average case well; worst case safely”

Coda protocol achieves this differently: treats write-sharing as exception(conflict)

体现了优秀的工程设计理念：“常态场景处理高效，极端场景安全可靠” Coda 协议通过不同方式实现这一点：将写共享视为异常（冲突）

Good adaptation of caching aggressiveness to workload

缓存策略的激进性可根据工作负载自适应调整

Disadvantages

Server maintains state

Server aware of every use of data (open)

Faith-Based Caching

often given the dignified term “eventual consistency” 经常被赋予“最终一致性”的尊严术语

Base Idea

blindly assume cached data is valid for a while 盲目假设缓存数据在一段时间内是有效的

periodically check(based on the time since last check) 基于上次检查的时间间隔，定期进行检查

no communication needed during trust period 信任期间无需任何通信

Original use

small variant is a TTL field for each object used in web caching , gives content creator modicum of control

一个小的变体是在 Web 缓存中为每个对象使用的 TTL 字段，这为内容创作者提供了一定程度的控制权。

Imprecise and weak approximation to one-copy semantics （对单副本语义的不精确且弱的近似）

not just in the presence to network failures(partitions) 不仅在网络故障（分区）情况下如此

no clear basis for reasoning about state of system 没有清晰的依据来推断系统状态。

由于弱一致性，开发者和用户无法通过明确规则推断当前读到的数据是否是最新的，缓存与主副本是否一致。“某个操作后数据会处于什么状态”。比如：无法确定 “刚写入的数据，多久后能被所有节点读到”，也无法预判 “缓存中的数据是否已经失效”—— 系统状态是模糊的。

DNS 缓存采用的就是该机制。在分布式文件系统中这不是一个明智的选择。

基于信用的缓存机制和最终一致性并无二致，一样令人失望。

Faith-Based Caching 是一种通过 “信任期 + 周期性验证” 实现的弱一致性缓存策略，其核心价值在于用有限的不一致性换取高性能和低成本。

Methods 1-5 used controlled and precise approximations 方法 1-5 采用了可控且精确的近似方式

e.g., session semantics at open-close granularity

例如，基于 “打开 - 关闭” 粒度的会话语义

offered clear basis for reasoning

为推断系统状态提供了清晰的依据

Advantages:

simple implementation

Server is stateless

Disadvantages

User-visible inconsistencies sometimes seen(make)

用户有时会遇到可见的数据不一致问题

Blind faith sometimes misplaced!

盲目信任有时会出错

Not as efficient as callback-based schemes

效率不如基于回调的方案

Consistency guarantee offered by DropBox

even through not a caching system ,consistency model across replicas is important

remember : “Technical excellence only weakly correlated with business success”

Inverse also true: “Business success does not imply technical excellence”

(MS-DOS,Sun NFS v3 , and DropBox all exemplify this point)

Pass the Buck

Basic Idea

Let the user trigger cache revalidation(hit “reload”)

otherwise , all cached copies assumed valid forever

Equivalent to infinte-TTL faith based caching

arose in the context of the Web

Advantages

trivial to implement , no server changes

avoids frivolous cache maintenance traffic

Disadvantages

places burden on user

user may be clueless about level of consistency needed

Easier to reason about than faith-based caching

until you hit “reload”, you world is guaranteed to be frozen。

Picking a Victim -l

May involve many factors 可能涉及许多因素

how soon evicted object might be needed again

多长时间内会再次需要被驱逐的对象

how expensive it might be to fetch again

再次获取的成本有多高

how much space is freed up by eviction

驱逐对象释放了多少空间

Ideal victim : very large object that is not needed for a long time, and is very cheap to fetch again

长期不需要的非常大的对象，而且再次获取的成本非常低

这里的意思是淘汰机制理想的情况下。

分布式中墨菲定律始终存在。

In the face of failures , pain of non-serviceable miss also relevant

(defer discussion of this to disconnected operation)

在故障发生时，“无法处理的缓存未命中” 所带来的负面影响也需要关注

将这一问题的讨论推迟到 “断连操作（disconnected operation）” 部分

Uniform fetch cost assumption reasonable near hardware

在靠近硬件的地方，“获取成本均匀” 这一假设是合理的。

Non-uniform cost more likely at higher system layers

在更高的系统层级中，非均匀成本更可能出现

variable size objects are obvious source of non-uniform cost

可变大小的对象是非均匀成本的明显来源

rotational/seek delay in disks is another source

磁盘的旋转 / 寻道延迟是另一个来源

differing network quality to different servers

到不同服务器的网络质量差异

Picking a Victim -lI

Vast majority of research assumes

绝大多数研究均基于以下假设：

fixed size , fixed fetch cost ,equal importance

固定大小、固定获取成本、同等重要性

only significant variable is then

distance in the future to next reference to an object

唯一关键变量为对象下次被访问的时间间隔

considerable implementation simplicity

bit maps for allocation , trivial accounting

实现显著简化：

采用位图进行资源分配，记账开销极低

many examples

hardware: cache entries , lines 缓存条目、缓存行

disk I/O:blocks , sectors ：数据块、扇区

OS: virtual memory pages , disk blocks 虚拟内存页、磁盘块

Abstract problem formulation from above assumptions

基于上述假设的抽象问题构建如下：

set of equal-sized data containers (aka”frames” in VM mgmt) 一组等大小的数据容器（在虚拟机管理中也称为 “页框”）

large set of equal-sized , equal-importance data objects(aka “pages”)

大量相等大小、同等重要性的数据对象（也称为 “页面”）

sequence of integers (”reference string”) → ids of accessed pages

整数序列（“引用串”）—— 即被访问页面的 ID

only metric of interest is miss ratio 唯一关注的指标是未命中率

操作系统、虚拟内存、硬件上使用缓存的技术。

Predicting Distance to Next Reference

if reference string known in advance , problem is simple

OPT is the (theoretical) optimal replacement algorithm

all online(i.e. real) algorithms give higher or equal miss ratios

pick object that will be referenced furthest in future

requires knowledge of the future

challenge is to approximate OPT as closely as possible

huge amount of work in this area

Nothing approximates OPT closely over all reference strings

LRU is often a good approximation to OPT (not always)

“least recently used”

assumes recent past is a good predictor of the near future

captures temporal locality very well & induces total ordering of frames

在许多实际场景中，最近最少使用算法 LRU 是对最优策略的一个极佳近似。

Typical LRU Data Structure

LRU（Least Recently Used）是一种基于时间局部性原理的缓存淘汰算法，其核心思想是：最近最少使用的数据最有可能在未来被淘汰。该算法维护一个按访问时间排序的数据结构，当缓存空间不足时，优先淘汰最久未被访问的数据。

Leases

Key Assumption

Skip Scary Parts

Faith-Based Caching

Pass the Buck

Picking a Victim -l

Picking a Victim -lI

Predicting Distance to Next Reference

Typical LRU Data Structure

newrain-zh

交流频道

加入我们的社群讨论分享