Skip to main content

Randomised Algorithms 隨機算法

Categories of Algorithms by design paradigm 按設計範式劃分的算法類別

  • Divide and conquer algorithms, e.g., quicksort algorithm 分而治之算法,例如快速排序算法
  • Dynamic programming algorithms 動態規劃算法
  • Mathematical programming algorithms, e.g., linear programming 線性規劃算法
  • Search and enumeration algorithms 搜索與列舉算法
    • Brute force algorithms, enumerating all possible candidate solutions and check 蠻力算法,枚舉所有可能的候選解決方案並檢查
    • Improved brute force algorithms, e.g., branch and bound algorithms 改進的蠻力算法,例如分支界限算法
    • Heuristic algorithms 启發式算法
      • Local search, e.g., greedy search 局部搜索,例如貪婪搜索
      • Randomised algorithms, which include Evolutionary Computation, etc. 隨機算法,包括演化計算等

Heuristic algorithms 启發式算法

  • CS Definition of heuristic: a (usually simple) algorithm that produces a good enough solution for a problem in a reasonable time frame CS 啟發式的定義:一種(通常是簡單的)算法,可以在合理的時間範圍內為問題產生足夠好的解決方案
    • Heuristic: "'find' or 'discover'" 啟發式:"'查找'或'發現'"
    • Solutions: (Usually) Non-optimal but satisfactory 解決方案:(通常)非最佳但令人滿意的
    • Faster: alternative to brute force (exhaustive) search 更快:替代蠻力(窮舉)搜索
    • Trade of optimality, completeness, accuracy, or precision for speed. Usually to solve problems that are difficult for other algorithms, e.g., brute force algorithms 以速度為代價的最佳化,完整性,準確性或精度。通常用於解決其他算法難以解決的問題,例如蠻力算法
    • Includes deterministic, e.g., local search and randomised algorithms 啟發式算法包括確定性算法和隨機算法

Randomised algorithms 隨機算法

  • Randomised algorithm: An heuristic algorithm that makes random choices during execution to produce a result 隨機算法:一種啟發式算法,在執行過程中進行隨機選擇以產生結果
    • Takes a source of random numbers to make random choices 隨機選擇的來源是隨機數
    • Behaviour, e.g., output or running time can vary even on a fixed input 執行行為,例如輸出或運行時間,即使在固定輸入的情況下也可能變化
  • The goal: design an algorithm and analyse it to show that its behaviour is likely to be good, on every input 目標:設計一個算法並對其進行分析,以表明它的行為可能對每個輸入都是好的
  • Two categories of randomised algorithms: 隨機算法的兩個類別:
    • Using random numbers to help find a solution to a problem 使用隨機數幫助找到問題的解決方案
    • Using random numbers to improve a solution to a problem 使用隨機數改進問題的解決方案
  • For the first category, two representative simple algorithms: 第一類別的兩個代表性簡單算法:
    • Las Vegas algorithm 拉斯維加斯算法
    • Monte Carlo algorithm 蒙特卡羅算法

Randomised algorithms vs deterministic algorithms 隨機算法 vs 確定性算法

1

Monte Carlo algorithm vs Las Vegas algorithm 蒙特卡羅算法 vs 拉斯維加斯算法

  • Las Vegas algorithm: a randomised algorithm that always gives correct results, the only variation from one run to another is the running time 一種始終給出正確結果的隨機算法,每次運行的唯一變化是運行時間
  • Monte Carlo algorithm: a randomised algorithm whose running time is deterministic, but whose results may be incorrect with a certain (typically small) probability. 一種始終給出正確結果的隨機算法,每次運行的唯一變化是運行時間
  • Differences:
    • Monte Carlo algorithm runs for a fixed number of steps 蒙特卡洛算法運行固定步數
    • Las Vegas algorithm runs in an infinite loop until the correct results are found 拉斯維加斯算法運行無限循環,直到找到正確的結果
    • Las Vegas algorithm can be converted into Monte Carlo algorithm using early termination 拉斯維加斯算法可以使用早期終止轉換為蒙特卡洛算法

Why use Randomised Algorithms? 何時使用隨機算法?

  • The element search problem is very simple, but 元素搜索問題很簡單,但是
    • For deterministic sequential (linear) search algorithm, e.g., search the array one by one from the beginning: 對於確定性順序(線性)搜索算法,例如,從頭開始一個一個地搜索數組:
      • Average run time complexity is O(n+12)O(n)O(\frac{n+1}{2}) \approxeq O(n) 平均運行時間複雜度為 O(n+12)O(n)O(\frac{n+1}{2}) \approxeq O(n)
      • The worst run time complexity is O(n)O(n) 最壞的運行時間複雜度為 O(n)O(n)
    • For Las Vegas algorithm: 對於拉斯維加斯算法:
      • The average run time complexity depends on the input, e.g., the array (Question: what is the average time complexity of if half the array contains 0s, the other half contains 1s?) 平均運行時間複雜度取決於輸入,例如數組(問題:如果數組的一半包含 0,另一半包含 1,則平均時間複雜度為多少?)
      • The worst run time complexity is unbound, i.e., could get super unlucky 最壞的運行時間複雜度是無限的,即可能遇到非常不幸的情況
    • For Monte Carlo algorithm: 對於蒙特卡洛算法:
      • The run time (including worst) complexity is fixed: O(1)O(1) 運行時間(包括最壞)複雜度是固定的:O(1)O(1)
      • But with some probability of error: not finding the element 但是有一定的錯誤概率:找不到元素

Warming up example: quicksort algorithm 預熱示例:快速排序算法

  • The sorting problem: given a array A of n numbers, sort the numbers in increasing order (or decreasing order) 排序問題:給定一個包含 n 個數字的數組 A,將數字按照升序(或降序)排序

2

Example 2: randomised quicksort algorithm 示例 2:隨機快速排序算法

  • Deterministic quicksort algorithm time complexity: O(n log n) on average for a random permutation array (n is the size of the array) 確定性快速排序算法時間複雜度:對於隨機排列數組(n 是數組的大小),平均為 O(n log n)
  • The worst case: O(n2)O(n^2), e.g., for an sorted array (n is the size of the array) 最壞情況:O(n2)O(n^2),例如對於排序過的數組(n 是數組的大小)
  • Randomised quicksort algorithm: random partitioning, i.e., selecting a pivot randomly: 隨機快速排序算法:隨機分區,即隨機選擇一個基準值:
    • Average run time complexity: O(nlogn)O(n log n) 平均運行時間複雜度:O(nlogn)O(n log n)
    • The worst case: O(2nlogn)O(2n log n) 最壞情況:O(2nlogn)O(2n log n)
    • Proof: Probabilistic Analysis and Randomized Quicksort
  • An example of the second category of randomised algorithms: using random numbers to improve a solution to a problem 一個第二類別隨機算法的例子:使用隨機數改進問題的解決方案

Problem 1: Matching one Bolt to n Nuts 一個螺栓與 n 個螺帽的配對問題

The everyday problem: given one bolt and a collection of n nuts of different sizes, find a nut match the bolt (i.e., the nut is the same size as the bolt) 每天的問題:給定一個螺栓和 n 個不同大小的螺帽,找到一個螺帽與螺栓配對(即螺帽的大小與螺栓相同)

In mathematical form: given an array of n elements, find the first element of which the value equals to x 數學形式:給定一個包含 n 個元素的數組,找到第一個值等於 x 的元素

How to solve it using algorithms? How to solve it efficiently? 如何使用算法解決它?如何有效地解決它?

Solving Problem 1 using Randomise algorithms 用隨機算法解決問題 1

The problem: given an array of n elements, find the first element of which the value equals to x 這個問題:給定一個包含 n 個元素的數組,找到第一個值等於 x 的元素

Solution to Problem 1

Solution 1: Las Vegas algorithm 拉斯維加斯算法

begin
repeat
Randomly select one element a out of n elements.
until a == x
end

Solution 2: Monte Carlo algorithm 蒙特卡羅算法

begin
i := 0
repeat
Randomly select one element a out of n elements.
i := i + 1
until (a == x)||(i == k)
end

Problem 2: How can you help a disorganised carpenter? 如何幫助一個不太有條理的木匠?

A disorganized carpenter has a collection of n nuts of distinct sizes and n bolts and would like to find the corresponding pairs of bolts and nuts. Each nut matches exactly one bolt (and vice versa), and he can only compare nuts to bolts i.e., he can neither compare nuts to nuts nor bolts to bolts. The carpenter is disorganised and does not know which nut matches which bolt. 一個不太有條理的木匠有 n 個不同大小的螺帽和 n 個螺栓,他想要找到對應的螺栓和螺帽。每個螺帽與一個螺栓完全匹配(反之亦然),他只能比較螺帽與螺栓,即他不能比較螺帽與螺帽,也不能比較螺栓與螺栓。木匠不太有條理,不知道哪個螺帽與哪個螺栓匹配。

The brute-force solution: to compare each nut with all bolts to find the matching bolt (i.e., the bolt of the same size as the nut). 野蠻解決方案:對每個螺帽與所有螺栓進行比較,以找到匹配的螺栓(即與螺帽相同大小的螺栓)。

Time complexity: O(n2)O(n^2) 複雜度:O(n2)O(n^2)

Can you help the carpenter match the nuts and bolts quickly? 你能幫助木匠快速匹配螺帽和螺栓嗎?

Solution to Problem 2 解決問題 2 的解決方案

  • Question: can you design an algorithm based on the idea of randomised quick sort to help the disorganised carpenter (Problem 2)? (The complexity should be O(nlogn)O(n log n)) 你能設計一個基於隨機快速排序算法的想法來幫助不太有條理的木匠(問題 2)嗎?(複雜度應該是 O(nlogn)O(n log n)
  • This is not a trivial problem for deterministic algorithms: see the paper: Matching Nuts and Bolts (Noga Alon et al.) - better pdf 這不是確定性算法的一個微不足道的問題:請參閱論文:Matching Nuts and Bolts (Noga Alon et al.) - better pdf

Problem 3: How can you help Dan Brown? 如何幫助丹·布朗?

Dan Brown is having holiday on Mars. He discovers a digital manuscript (a long string) of The Da Vinci Code in his laptop. He wants to compare this manuscript with his final version in his PC (also a long string) at home on Earth to see whether they are the same. He can access his manuscript in his PC at home, but the communication is very expensive. 丹·布朗正在火星上度假。他在筆記本電腦中發現了《達芬奇密碼》的數字手稿(一長串)。他想將這份手稿與他在地球上家中 PC(也是一長串)中的最終版本進行比較,看看它們是否相同。他可以在家裡的個人電腦上訪問他的手稿,但通信費用非常高。

What should he do to compare these two versions via this expensive communication channel? 他應該怎麼做才能通過這個昂貴的通信通道來比較這兩個版本?

Hint: Richard M. Karp (1991), An introduction to randomized algorithms, 34, Discrete Mathematics, 175-176, Algorithm 5.2

Solution to Problem 3 解決問題 3 的解決方案

A disorganized carpenter has a collection of n nuts of distinct sizes and n bolts and would like to find the corresponding pairs of bolts and nuts. Each nut matches exactly one bolt (and vice versa), and he can only compare nuts to bolts i.e., he can neither compare nuts to nuts nor bolts to bolts. The carpenter is disorganised and does not know which nut matches which bolt. 一個不太有條理的木匠有 n 個不同大小的螺帽和 n 個螺栓,他想要找到對應的螺栓和螺帽。每個螺帽與一個螺栓完全匹配(反之亦然),他只能比較螺帽與螺栓,即他不能比較螺帽與螺帽,也不能比較螺栓與螺栓。木匠不太有條理,不知道哪個螺帽與哪個螺栓匹配。

The brute-force solution: to compare each nut with all bolts to find the matching bolt (i.e., the bolt of the same size as the nut). 野蠻解決方案:對每個螺帽與所有螺栓進行比較,以找到匹配的螺栓(即與螺帽相同大小的螺栓)。

Time complexity: O(n2)O(n^2) 複雜度:O(n2)O(n^2)

Can you help the carpenter match the nuts and bolts quickly? 你能幫助木匠快速匹配螺帽和螺栓嗎?

Hint: what if the carpenter does NOT have the constraints, i.e., he can compare nuts to nuts or bolts to bolts. 提示:如果木匠沒有約束怎麼辦,即,他可以將螺母與螺母或螺栓與螺栓進行比較。

What is the essence of this Nuts and Bolts Problem (Lock and Key problem)? 這個螺帽和螺栓問題(鎖和鑰匙問題)的本質是什麼?

The one solution to the above problems 以上問題的唯一解決方案

Randomised algorithms 隨機算法

  • "For many problems, a randomised algorithm is the simplest, the fastest or both" - Prabhakar Raghavan, Vice President of Engineering at Google. "對於許多問題,隨機算法是最簡單的,最快的或兩者兼具。" - Prabhakar Raghavan,谷歌工程副總裁。

Applications of randomised algorithms 隨機算法的應用

  • Mathematics: 數學
    • Number theory, e.g., primality test 數論,例如素性檢驗
    • Computational Geometry: graph algorithms, e.g., minimum cut 計算幾何:圖形算法,例如最小割
    • Linear algebra: matrix computations 線性代數:矩陣計算
  • Computer Science: 電腦科學
    • Data analysis: PageRank 數據分析:PageRank
    • Parallel computing: Deadlock avoidance 平行計算:避免死鎖
    • Optimisation: Nature inspired Optimisation and Search algorithms 优化:自然启发的优化和搜索算法
  • Computational biology: DNA read alignment 計算生物學:DNA 讀取比對

Advantages/Disadvantages of randomised algorithms 隨機算法的優點/缺點

  • Advantages: 優點:
    • Simplicity: usually very easy to implement 簡單性:通常很容易實現
    • Performance: usually produce (near-) optimum solutions with high probability 性能:通常以高概率產生(近似)最佳解
  • Disadvantages: 缺點:
    • Getting incorrect answer with a finite probability. 以有限概率得到錯誤答案。
    • Solution: Repeat the algorithm many times and take the best answer. 解決方案:重複多次算法並取最佳答案。
    • Difficult to analyse the running time and probability of getting an incorrect solution 難以分析運行時間和獲得錯誤解的概率
    • Getting truly random numbers is impossible in practice. 實際上很難得到真正的隨機數。

Conclusion 結論

  • Randomness is a resource to improve simplicity and efficiency of algorithms. 隨機性是提高算法簡單性和效率的資源。
  • In fact, randomness is also the important 'resource' in other fields, such as biology, physics and even theology. 事實上,隨機性也是其他領域(如生物學,物理學甚至神學)中重要的"資源"。
  • "For many problems, a randomised algorithm is the simplest, the fastest or both" - Prabhakar Raghavan, Vice President of Engineering at Google. "對於許多問題,隨機算法是最簡單的,最快的或兩者兼具。" - Prabhakar Raghavan,谷歌工程副總裁。

Further readings