Lazy loaded image
📖Lecture 3 Ideal vs Real Distributed Systems
Words 2321Read Time 6 min
2025-7-17
2025-7-19
type
status
date
slug
summary
tags
category
icon
password
原文
 
Ideal
  • Perfect in-order delivery of
    • uncorrupted packets with predictable delays
      以可预测的延迟完美地按顺序传送未损坏的数据包
  • Perfect server hardware with predictable delays
    • 完美的服务器硬件,可预测延迟
  • Perfect operating conditions at server’s data center
    • 服务器数据中心的完美运行条件(指无自然灾害、空调故障、电力故障、冷却系统故障等)
Real
  • Packet loss , corruption , re-ordering , unpredictable delays
    • 丢包、损坏、重新排序,不可预测的延迟
  • Flaky server hardware and buggy server software
    • 不稳定的服务器硬件和漏洞百出的服务器软件
  • Power failures , air-conditioning failures , hurricanes , tornados, tsunamis , floods ,…
    • 电力故障,空调故障,飓风,龙卷风,海啸,洪水,…
 
Failure independence of clients and servers adds complexity
客户和服务器的失败独立性增加了复杂性
 

RPC Transport

Local procedure call : 1 invocation = 1 execution
Trivial to emulate in ideal distributed system
How to guarantee this in spite of system flakiness?
 
Two approaches to handling this flakiness(不稳定性)
  • Approaches 1: Outsource! Why suffer this headache? 这里的意思是指调用成熟的库或者使用成熟的框架。
  • Approaches 2: The buck has to stop somewhere! Do it yourself
 

Approach 1 : Outsource pain

Use TCP as foundation
  • layer RPC on top of it
  • simpler code (Project 1)
TCP = “ Transmission Controller Protocol” guarantees 可靠性传输协议
  • reliable delivery(no data is ever lost or corrupted) 可靠的传输(数据不会丢失或损坏)
  • in-order delivery(bytes arrive in the exact order they were sent) 按顺序送达(字节按发送的确切顺序送达)
  • unlimited data size (feel free to ship a GB if you want) 无限的数据大小(如果你需要,可随意发送 GB 数据)
  • abstraction of continuous pipeline between sender and receiver
read() may return fewer than number of bytes requested read() 返回的字节数可能少于请求的字节数
在传输方面交给 TCP,以减少自己实现可靠传输的代价。
TCP 并不保证不切割数据包,也就说可能发送的数据包并不是以一个整体进行传输。
 
What is NOT guaranteed(price paid for using TCP) 不保证的内容(使用 TCP 的代价)
  • data is inserted in certain-size chunks comes out in those size chunks 数据以一定大小的块插入,以一定大小的块输出
  • no preservation of write() boundaries
  • aka “ data is re-framed in transit”
read() may return fewer than number of bytes requested read() 返回的字节数可能少于请求的字节数
这一部分主要是对 TCP 协议的讲述,使用 TCP 的代价,需要注意的是粘包、还有数据包切分等问题。
 

Approach 2: Do it Yourself

Basic idea → Retransmission 基本思想-重新传输
  • lost packets for transient reasons common 出于短暂原因丢失的数据包很常见
  • giving up too soon is pessimistic
    • (maybe server never received your request) 也许服务器从未收到您的请求
       
Implementation
  • send request packet , then start timer 发送请求数据包,然后启动计时器
  • if reply not in when timer goes off ,retransmit and start timer 如果计时器关闭时没有收到回复,则重新传输并启动计时器
  • … and again… and again… and again… and again 再次重试
  • finally give up and declare failure 最终放弃并宣布失败
Problem with blind retransmission
  • perhaps server is still computing or perhaps it is overloaded 也许服务器仍在运算,也许已超载
  • or perhaps it sent a reply and this was lost 或者它发送了回复,但丢失了
  • duplicate execution violates RPC semantics 重复执行违反 RPC 语义
Solution: Duplicate Elimination (using Sequence Numbers)
Note: TCP implements retransmission and duplicate elimination 注意:TCP实现了重传和重复消除功能
序列号解决重传问题(幂等问题)
不同进程并不共享 TCP 连接。
自主实现采用的底层协议通常为 UDP 协议。
 

How TCP Ensures Delivery

TCP 如何确保交付
TCP is a streaming protocol (aka “byte stream” protocol)
  • ACKS refer to byte number rather than packet number ACKS 指的是字节编号而不是数据包编号
  • breakup of byte sequence into packets happens at lower layer 在下层将字节序列分解成数据包
notion image
📔
TCP 滑动窗口和确认

Timeouts in Distributed Systems

How do you pick a perfect timeout value?
  • in the worst case , no perfect value exists 在最坏的情况下,不存在完美值
  • at best , using known statistics , one can pick a “reasonable” value 充其量,利用已知的统计数据,我们可以选择一个“合理”值
    • can be wrong , sometimes giving up too soon 可能是错误的,有时候过早放弃。
  • no matter what value is picked , it could be “too soon” 无论选择什么值,都可能“太早”
    • reply could arrive just after you give up 答复可能在你放弃之后。
      延时是可以计算的,端到端的响应时间是可以估算的,可以使用概率分布,可以选择一个均值加上 1 倍标准差、2 倍或三倍。这是实际中采用的做法。实际上即便这么设置在最坏的情况下也可能不尽任意。
       
What should server do when it sees a duplicate? 当 server 看到重复项时应该怎么做?
May mean any of the following possibilities happened
  1. reply lost
  1. reply crossed retransmitted request (回复还在传输中,客户端又重传了)
  1. compute time was excessive 计算时间过长
  1. client was too impatient
 
Knowledge at server is always stale relative to client and vice versa
The best server can do is to retransmit reply 服务器所能做的就是重新发送回复
Replies must be preserved 必须保留回复
  • only 1 reply saved per connection
  • cannot re-compute reply 无法重新计算回复
    • would result in multiple computations per invocation 会导致每次调用进行多次计算
保留回复才能避免多次计算,至于保留多少,服务器至少要保留最近一次的回复记录。
Q:能否在解释一下为什么必须保留回复,所以服务器只为每个连接保存最新的回复?
A:只需要保存最新的回复,因为收到下一个请求这一事实本身,就意味着客户端必定已经收到了你的回复,否则客户端就永远不会推进到下一个请求。

Exactly-once Semantics

theoretical ideal
How long to keep old replies and sequence numbers? 旧回复和序列号要保留多长时间?
  • rigorous interpretation of “RPC” → forever!
  • across server crashes too
    • they have to be saved in non-volatile memory
    • server response has to be after non-volatile write
    • disk(or flash) latency on every RPC
  • clean undo of partial computations before crash 彻底撤销崩溃前的部分计算
回复和磁盘写入不能并行。RPC 的性能受限于写入设备的性能。
📖
Exactly-once
对于调用者(caller)发出的每次远程调用请求,被调用方(callee)确保精确执行且仅执行一次业务逻辑,即使遇到网络故障、节点崩溃等异常情况。
本质是通过 幂等性 + 原子状态机 + 持久化日志 在应用层模拟出的语义。可以理解是远程调用模拟本地调用。
真正的 Exactly不存在,但是通过”事务+幂等+快照+人工修复”可无限接近。
 
Such an RPC would have exactly-once semantics 这样的 RPC 将具有精确一次的语义
  • success return from RPC call → call executed exactly once RPC 调用成功返回 → 调用被精确执行一次
  • call blocks indefinitely , no failure return 调用无限期阻塞,无失败返回
 
Not appropriate for many real applications 不适合许多实际应用
  • too slow because of synchronous disk writes
  • indefinite blocking unacceptable in many cases - 在许多情况下,无限期阻塞是不可接受的
  • application-level recovery precluded 排除应用级恢复
  • requires transactional semantics for server actions 要求服务器操作采用事务语义
Exactyly-once 操作繁琐且效率低下,实际开发中会放宽语义要求。

At-most-once Semantics

practically achievable
至多一次语义
How to avoid indefinite blocking?
  • declare timeout if call takes longer than specified bound 如果调用时间超过指定时限,则宣布超时
 
Such an RPC has at-most-once semantics
  • refers to what can be inferred in the worst case 指在最坏情况下可以推断出的结果
  • success → call executed exactly once 成功 → 调用正好执行一次
  • timeout → call executed once or not at all 超时 → 调用执行一次或根本不执行
 
Many possible reasons for RPC timeout RPC 超时可能有多种原因
  • request and retries never got to server 请求和重试从未到达服务器
  • server died while working on request 服务器在处理请求时宕机
  • network broke while server working on request 服务器在处理请求时网络中断
  • server still working on request 服务器仍在处理请求
  • server replied , but reply lost 服务器已回复,但回复丢失
  • server resent reply , but all copies of reply lost 服务器重新回复,但所有回复副本均已丢失
 
Server may be sluggish or unreachable 服务器可能迟缓或无法访问
  • complicates setting of timeout value 使超时值的设置复杂化
  • probes to check server health during long calls 探测,用于在长时间调用期间检查服务器运行状况
  • server responds with busy if still working 如果仍在工作,服务器将以忙响应
  • essentially a keepalive mechanism 本质上是一种保活机制

Orphaned Computations

孤儿计算
Danger with at-most-once semantics 至多一次的风险
  • client sends request , server starts computing
  • network failure occurs 发生网络故障
  • server continues , unaware its work is useless 服务器继续工作,却不知道自己的工作毫无用处
    • server may hold resources(e.g.locks),slowing other activity
Orphan detection and extermination are difficult 难以消灭孤儿计算
typically require application-specific recovery 通常需要针对特定应用的恢复
 
“Failure” closely related to “timeout value” "失败 "与 "超时值 "密切相关
  • fundamental limitation in a distributed system 分布式系统的基本限制
  • due to absence of out-of-band error detection 由于没有带外误差检测无法区分服务器死亡和网络故障
    • can’t tell server death from network failure ①
 
client 发送请求给Server , Server 还在计算,Client 已经超时或宕机,无法看到 Server 的回复,这种请求称为”孤儿请求“。
孤儿计算的风险:浪费计算资源、死锁冲突、脏数据写入、状态分裂等
例如 调用下单服务超时,而下单服务只是还在处理业务,调用方却认为失败,其实已经下单成功。如果业务逻辑足够复杂,例如下单成功推送了短信,但是用户在 APP 上却是下单失败的提醒。
📖
① 由于缺乏带外错误检测机制,无法区分服务器宕机与网络故障
无法通过心跳机制来确认Server是宕机还是网络发生了故障。
  1. 多维度健康检测
    1. 带外检测(Out-of-Band Monitoring)硬件级别
    2. 跨路径探测(Multi-Path Probe)
      1. 从不同的网络区域同时探测目标服务器
  1. 分布式共识协议辅助判断
    1. Quorum投票机制
    2. 故障检测器(Failure Detector)
      1. Φ-accrual算法(如Akka/Aeron所用)不熟,先标注下。
  1. 业务层设计
    1. 有限重试+断路器模式(重试+熔断)
    2. 心跳检测接口暴露更多信息,客户端检查响应中的业务指标而不仅仅是 HTTP 状态码
    3.  
       
       
 
上一篇
Lecture 2 Overview of Typical RPC Mechanism
下一篇
Lecture 4 Semantics

Comments
Loading...