Skip to main content

Introduction 介紹

What is Computational Vision? 什麼是計算視覺?

  • First consider 'Visual Perception' 首先考慮"視覺感知"
    • to know what is where, by looking. 通過看來知道什麼在哪裡。
    • vision is the process of discovering from images what is present in the world, and where it is. 視覺是從圖像中發現世界中存在什麼以及它在哪裡的過程。

The acquisition of knowledge about objects and events in the environment through information processing of light emitted or reflected from objects 通過對物體發出或反射的光進行信息處理,獲取有關環境中物體和事件的知識

  • To make computers "See" 讓電腦"看"
  • "Automatic inference" of "properties" of "the world" from "images" 從"圖像"中"自動推斷""世界"的"屬性"
Automatic inference 自動推理Inference without (or minimal) human intervention 在沒有(或最少)人為乾預的情況下進行推理
The world 世界The real unconstrained 3D physical world Constrained/Engineered environments 真實的無約束 3D 物理世界 約束/工程環境
Constrained/Engineered environments 受限/工程環境
Image 圖像2D projection of the electromagnetic signal provided by the world 世界提供的電磁信號的二維投影
Properties 特性Geometric: shape, size, location, distance, 幾何:形狀、大小、位置、距離、
Material : color, texture, reflectivity, transparency 材料:顏色、紋理、反射率、透明度
Temporal: direction of motion (in 3D), speed, events 時間:運動方向(3D)、速度、事件
Illumination: light source specification, light source color 照度:光源規格、光源顏色
Symbolic: objects' class, object's ID 符號:對象的類,對象的 ID

Is it easy? 它容易嗎?

  • All people can "see" equally well 所有人都能同樣清楚地"看到"
  • Babies can "see" from birth 嬰兒出生就能"看到"
  • Really primitive animals can "see" 真正的原始動物可以"看見"
  • We "see" effortlessly (at least it feels this way) 我們毫不費力地"看到"(至少感覺是這樣)
  • Vision is immediate 視覺是立即的
  • Vision appears to be flawless 視覺似乎是無瑕的

Computational Vision is challenging 計算視覺是具有挑戰性的

  • Vision needs to reverse the imaging process which is a many-to-one mapping (...recover lost information). 視覺需要反轉映射,這是一對多的映射(...恢復丟失的信息)。
  • Vision needs to cope with an inherently imperfect imaging process (...recover lost information) 視覺需要應對本質上不完美的映射過程(...恢復丟失的信息)。
  • Vision needs to cope with discretized images of a practically continuous world (...recover lost information). 視覺需要應對實際上是連續的世界的離散圖像(...恢復丟失的信息)。
  • The mere complexity of the task is enormous! 任務的複雜性是巨大的!
  • Huge portion of our brain is dedicated to visual perception. 我們大腦的大部分都是用於視覺感知。

Approaching the problem computationally 以計算的方式解決問題

  • Constrain/simplify the world 約束/簡化世界
  • Constrain/simplify the task (i.e., the desired output) 約束/簡化任務(即所需的輸出)
  • Devise universal guiding assumptions or heuristics 設計通用的指導假設或啟發式方法
  • Incorporate explicit knowledge about the world 將關於世界的明確知識融入其中
  • Use experience (learning) to improve performance 使用經驗(學習)來提高性能

Applications of Computational Vision 計算視覺的應用

  • Automated navigation with obstacle avoidance 自動導航與障礙物避免
  • Object/target detection and recognition 物體/目標檢測和識別
  • Place/scene recognition 地點/場景識別
  • Manufacturing and assembly 製造和組裝
  • Document processing 文檔處理
  • Quality control 品質控制
  • Biomedical applications 生物醫學應用
  • Accessibility tools for the disabled 禁用人士的可訪問性工具
  • Human computer interfaces 人機界面

Biological Vision 生物視覺

  • Light and image formation 光和圖像形成
  • Retinal Processing 視網膜處理
  • Colour Vision 色彩視覺
  • Visual Pathway 視覺通路

Some Mathematics 一些數學

  • Vectors and Matrices 向量和矩陣
  • Magnitude and Direction 大小和方向
  • Angle and Rotation 角度和旋轉
  • Differentiation 微分

Electromagnetic Spectrum 電磁波譜


Visible Light 可見光

  • Humans perceive electromagnetic radiation with wavelengths 380-760nm (1 nm = 10-9 m) 人類感知的電磁輻射波長為 380-760nm(1 nm = 10-9 m)


Light Capturing Devices 光捕獲設備

  • In the beginning: Formation of photopigments (>3BYA) 一開始:光觸媒的形成(>3BYA)
    • Molecules in which light triggers a physical or chemical change. 光觸發物質中的分子物理或化學變化。
    • Captured photons lead to release of energy (of different forms) 捕獲的光子導致能量的釋放(不同形式)
    • Released energy is used for different purposes 釋放的能量用於不同的目的
      • Building food (photosynthesis) 建立食物(光合作用)
      • Behavioral reaction (nerve reaction) 行為反應(神經反應)

Light Capturing Devices 光捕獲設備

  • Photocells 光電池


Evolution of eyes 進化之眼

  • Single cell – 1D capture of light 一個細胞 - 光的 1D 捕獲 4

  • Multiple cell – Better direction resolution 多個細胞 - 更好的方向分辨率 5

Evolution of eyes 進化之眼

  • Multiple cell – Better direction resolution 多個細胞 - 更好的方向分辨率 6

  • But...where is the image? 但是...圖像在哪裡? 7

  • A pinhole camera 一個孔相機 8

  • Dilemma: 難題 9

  • Solution: Use of light refraction and hence lenses 解決方案:使用光折射和鏡片


Refraction (Snell's Law) 折射(斯奈爾定律)


Wave crests can't be created or destroyed at the interface, so to make the waves match up, the light has to change direction. 波峰不能在界面上創建或銷毀,因此為了使波匹配,光必須改變方向。

Evolution of eyes 進化之眼

  • Formation of lens and retina 鏡片和視網膜的形成


The Human Eye 人眼


Pinhole Camera Model 孔相機模型


Pinhole Camera: Basic geometry 基本幾何學


Pinhole Camera: Perspective projection 透視投影


Image Formation 圖像形成


  • f = the focal length (in meters)

  • 1/f = the power of the lens (dioptres)

  • Human eye has power ~59 dioptres

    • 1/f = 50 dioptres;
    • f = 1/50 = 0.02 m
  • Most of the refractive power of the human eye comes from the air-cornea boundary (49 of 59 dioptres) 人眼的大部分折射能力來自空氣-角膜邊界(59 個中的 49 個 dioptres) 18

  • As an object moves closer the power of the lens must increase to accommodate 當物體靠近時,透鏡的光焦度必須增加以適應

  • So if the object is infinitely far away $ 1/f = 1/ + 1/0.02 = 50 dioptres$

  • But if it is 1m away the lens must change shape to produce a sharp image $ 1/f = 1/1 + 1/0.02 = 51 dioptres$

  • As an object moves in world how does it move across the image plane? 當物體在世界中移動時,它如何在圖像平面上移動? 19

  • If the image plane is curved then as θ gets larger this becomes a worse and worse approximation 如果圖像平面是曲面的,那麼隨著 θ 變大,這將變得越來越糟糕

Summary 摘要

  • Module Outline 模塊大綱
  • Uses of Computational Vision 計算視覺的用途
  • Image formation 圖像形成
  • Very early visual processing 非常早期的視覺處理
  • Filling in and perceptual effects 填充和感知效果


  • Vicki Bruce, Visual Perception, Chapters 1 - 3
  • Neil Carlson, Physiology of Behavior, Chapter 3, "Vision"