Week 4 Lecture Notes: Input Systems & Interaction Fundamentals

Virtual, Augmented and Spatial Computing

1 Overview

This week moves from hardware and perception (Weeks 1–3) into the first layer of interaction design: how users communicate intent to XR systems. We examine the full range of input modalities available across our device set and establish design principles that apply regardless of platform.

2 1. The Input Landscape in XR

Unlike desktop or mobile computing, XR has no settled input standard. Each platform has made different bets:

Meta Quest 2/3: Touch controllers + hand tracking
HTC Vive Pro: Wand controllers + SteamVR tracking
Pico Neo Eye: Controller + eye tracking
Hololens 2: Hand tracking + gaze (no controller)
Snap Spectacles: Tap gesture + voice
Display glasses: Typically paired with phone/controller

This diversity is a design challenge. A well-designed XR experience should either: 1. Target a specific device and optimise for its input, or 2. Abstract input so it degrades gracefully across devices

3 2. Controller-Based Interaction

3.1 2.1 Ray Casting

Ray casting is the most common far-field interaction technique. A ray is projected from the controller tip; when it intersects an interactable object, the user can select it.

Advantages: - Works at any distance - Familiar (analogous to a laser pointer) - Low fatigue

Disadvantages: - Imprecise for small targets - Breaks immersion (visible ray is artificial) - Difficult for manipulation tasks

3.2 2.2 Direct Interaction

Near-field interaction where the controller (or virtual hand) physically overlaps with an object to grab or activate it.

Advantages: - Intuitive — mirrors real-world grasping - High precision for close objects

Disadvantages: - Requires moving close to objects - Can cause collisions with virtual geometry

3.3 2.3 Haptic Feedback

Haptic feedback is a critical but often underused channel. Even simple vibration pulses significantly improve interaction confidence.

Design guidelines: - Use short pulses (50–100ms) for selection confirmation - Use sustained vibration for “holding” states - Vary intensity to convey different interaction types - Never use haptics without a corresponding visual cue

4 3. Hand Tracking

4.1 3.1 Skeletal Model

Modern hand tracking systems (Quest 2/3, Hololens 2) track a 26-joint skeletal model per hand in real time. This enables: - Pinch detection (index + thumb proximity) - Custom pose recognition - Full finger articulation for expressive avatars

4.2 3.2 Gesture Design Principles

Not all gestures are equal. Good XR gestures are: - Distinct — not easily confused with natural hand movement - Comfortable — can be held or repeated without fatigue - Discoverable — users can find them without instruction - Reversible — easy to cancel or undo

4.3 3.3 Limitations

Occlusion: hands block each other; fingers block joints
Lighting: poor lighting degrades tracking quality
Fatigue: sustained hand poses are tiring
Precision: less precise than controllers for small targets

5 4. Gaze and Eye Tracking

5.1 4.1 Gaze as Input

Gaze input uses where the user is looking as a selection signal. Two main patterns:

Dwell selection: Look at target for a fixed duration (typically 1–2 seconds) to activate. - Pro: completely hands-free - Con: slow, tiring, unnatural

Gaze + confirm: Look to target, then use a secondary input (pinch, button, voice) to confirm. - Pro: fast, natural, avoids Midas Touch - Con: requires secondary input channel

5.2 4.2 The Midas Touch Problem

Named after the mythological king: everything you look at turns to gold (activates). Gaze-only systems must carefully distinguish intentional gaze from casual glancing.

Mitigations: - Require dwell time - Use gaze + confirm pattern - Provide clear visual feedback of gaze state - Allow users to disable gaze input

5.3 4.3 Foveated Rendering

Eye tracking enables foveated rendering: rendering the area the user is looking at in full resolution, and reducing quality in the periphery. This can dramatically reduce GPU load.

Available on: Pico Neo Eye (hardware-level), some Quest 3 features.

5.4 4.4 Analytics Use

Eye tracking data is valuable for UX research: - Attention heatmaps - Fixation duration on UI elements - Saccade patterns (rapid eye movements between points)

Ethics note: Eye tracking data is biometric. Treat it with the same care as fingerprint or facial recognition data.

6 5. Designing for Physical Constraints

6.1 5.1 Gorilla Arm Effect

Sustained arm elevation causes rapid fatigue. This was first observed in early touchscreen kiosks where users had to reach up to interact. In XR, it’s worse because: - Sessions can be longer - Users may not notice fatigue building - Sudden fatigue can cause loss of balance

Design rule: Default interaction zone is waist to shoulder height, within arm’s reach.

6.2 5.2 Fitts’ Law in 3D

Fitts’ Law predicts movement time based on target size and distance:

MT = a + b × log₂(2D/W)

Where D = distance to target, W = target width.

In 3D XR: - Minimum comfortable target size: ~2cm at arm’s length - Targets at the edge of the field of view take longer to acquire - Moving targets are significantly harder to select

6.3 5.3 Interaction Zones

Zone	Distance	Best Input
Intimate	0–0.5m	Direct grab, touch
Personal	0.5–1.5m	Near ray cast, hand
Social	1.5–3m	Ray cast, gaze
Public	3m+	Gaze, voice

7 6. Interaction State Design

Every interactive object in XR should implement a clear state machine:

Default → Hover → Selected → Activated → Released

Each state transition should be communicated through at least two feedback channels (visual + audio, or visual + haptic).

7.1 State Design Checklist

Default state is visually distinct from interactive state
Hover state provides clear affordance
Selection is confirmed with feedback
Activation has clear start and end signals
Release returns object to appropriate state

8 Self-Check Questions

What is the Midas Touch Problem and how can it be mitigated?
Why does Fitts’ Law still apply in 3D XR environments?
What are the three channels of interaction feedback?
When would you choose hand tracking over controller input?
What is foveated rendering and which device in our lab supports it?

9 References

LaViola, J.J. et al. (2017) 3D User Interfaces: Theory and Practice (2nd ed.). Addison-Wesley.
Bowman, D.A. et al. (2004) 3D User Interfaces: Theory and Practice. Pearson.
Fitts, P.M. (1954) “The information capacity of the human motor system in controlling the amplitude of movement.” Journal of Experimental Psychology, 47(6), 381–391.
Poupyrev, I. et al. (1996) “The go-go interaction technique.” UIST ’96 Proceedings.
Unity XR Interaction Toolkit: docs.unity3d.com/Packages/com.unity.xr.interaction.toolkit