When the Macintosh first made graphical user interfaces popular the notion of each person having their own computer was novel. Today's technology landscape is characterized by multiple computers per person many with far more capacity than that original Mac. The world of input devices, display devices and interactive techniques is far richer than those Macintosh days. Despite all of this diversity in possible interactions very few of these integrate well with each other. The monolithic isolated user interface architecture that characterized the Macintosh still dominates a great deal of today's personal computing. This talk will explore how possible ways to change that architecture so that information, interaction and communication flows more smoothly among our devices and those of our associates.
The emerging field of Human-Robot Interaction is undergoing rapid growth, motivated by important societal challenges and new applications for personal robotic technologies for the general public. In this talk, I highlight several projects from my research group to illustrate recent research trends to develop socially interactive robots that work and learn with people as partners. An important goal of this work is to use interactive robots as a scientific tool to understand human behavior, to explore the role of physical embodiment in interactive technology, and to use these insights to design robotic technologies that can enhance human performance and quality of life. Throughout the talk I will highlight synergies with HCI and connect HRI research goals to specific applications in healthcare, education, and communication.
We explore the use of tracked 2D object motion to enable novel approaches to interacting with video. These include moving annotations, video navigation by direct manipulation of objects, and creating an image composite from multiple video frames. Features in the video are automatically tracked and grouped in an off-line preprocess that enables later interactive manipulation. Examples of annotations include speech and thought balloons, video graffiti, path arrows, video hyperlinks, and schematic storyboards. We also demonstrate a direct-manipulation interface for random frame access using spatial constraints, and a drag-and-drop interface for assembling still images from videos. Taken together, our tools can be employed in a variety of applications including film and video editing, visual tagging, and authoring rich media such as hyperlinked video.
Watching a long unedited video is usually a boring experience. In this paper we examine a particular subset of videos, tour videos, in which the video is captured by walking about with a running camera with the goal of conveying the essence of some place. We present a system that makes the process of sharing and watching a long tour video easier, less boring, and more informative. To achieve this, we augment the tour video with a map-based storyboard, where the tour path is reconstructed, and coherent shots at different locations are directly visualized on the map. This allows the viewer to navigate the video in the joint location-time space. To create such a storyboard we employ an automatic pre-processing component to parse the video into coherent shots, and an authoring tool to enable the user to tie the shots with landmarks on the map. The browser-based viewing tool allows users to navigate the video in a variety of creative modes with a rich set of controls, giving each viewer a unique, personal viewing experience. Informal evaluation shows that our approach works well for tour videos compared with conventional media players.
A history-of-user-operations function helps make applications easier to use. For example, users may have access to an operation history list in an application to undo or redo a past operation. To provide an overview of a long operation history and help users find target interactions or application states quickly, visual representations of operation history have been proposed. However, most previous systems are tightly integrated with target applications and difficult to apply to new applications. We propose an application-independent method that can visualize the operation history of arbitrary GUI applications by monitoring the input and output GUI events from outside of the target application. We implemented a prototype system that visualizes operation sequences of generic Java Awt/Swing applications using an annotated comic strip metaphor. We tested the system with various applications and present results from a user study.
Panning and zooming interfaces for exploring very large images containing billions of pixels (gigapixel images) have recently appeared on the internet. This paper addresses issues that arise when creating and rendering auditory and textual annotations for such images. In particular, we define a distance metric between each annotation and any view resulting from panning and zooming on the image. The distance then informs the rendering of audio annotations and text labels. We demonstrate the annotation system on a number of panoramic images.
We describe OctoPocus, an example of a dynamic guide that combines on-screen feedforward and feedback to help users learn, execute and remember gesture sets. OctoPocus can be applied to a wide range of single-stroke gestures and recognition algorithms and helps users progress smoothly from novice to expert performance. We provide an analysis of the design space and describe the results of two experi-ments that show that OctoPocus is significantly faster and improves learning of arbitrary gestures, compared to con-ventional Help menus. It can also be adapted to a mark-based gesture set, significantly improving input time compared to a two-level, four-item Hierarchical Marking menu.
This paper introduces kinematic templates, an end-user tool for defining content-specific motor space manipulations in the context of editing 2D visual compositions. As an example, a user can choose the "sandpaper" template to define areas within a drawing where cursor movement should slow down. Our current implementation provides templates that amplify or dampen the cursor's speed, attenuate jitter in a user's movement, guide movement along paths, and add forces to the cursor. Multiple kinematic templates can be defined within a document, with overlapping templates resulting in a form of function composition. A template's strength can also be varied, enabling one to improve one's strokes without losing the human element. Since kinematic templates guide movements, rather than strictly prescribe them, they constitute a visual composition aid that lies between unaided freehand drawing and rigid drawing aids such as snapping guides, masks, and perfect geometric primitives.
Attribute gates are a new user interface element designed to address the problem of concurrently setting attributes and moving objects between territories on a digital tabletop. Motivated by the notion of task levels in activity theory, and crossing interfaces, attribute gates allow users to operationalize multiple subtasks in one smooth movement. We present two configurations of attribute gates; (1) grid gates which spatially distribute attribute values in a regular grid, and require users to draw trajectories through the attributes; (2) polar gates which distribute attribute values on segments of concentric rings, and require users to align segments when setting attribute combinations. The layout of both configurations was optimised based on targeting and steering laws derived from Fitts' Law. A study compared the use of attribute gates with traditional contextual menus. Users of attribute gates demonstrated both increased performance and higher mutual awareness.
This paper explores the intersection of emerging surface technologies, capable of sensing multiple contacts and of-ten shape information, and advanced games physics engines. We define a technique for modeling the data sensed from such surfaces as input within a physics simulation. This affords the user the ability to interact with digital objects in ways analogous to manipulation of real objects. Our technique is capable of modeling both multiple contact points and more sophisticated shape information, such as the entire hand or other physical objects, and of mapping this user input to contact forces due to friction and collisions within the physics simulation. This enables a variety of fine-grained and casual interactions, supporting finger-based, whole-hand, and tangible input. We demonstrate how our technique can be used to add real-world dynamics to interactive surfaces such as a vision-based tabletop, creating a fluid and natural experience. Our approach hides from application developers many of the complexities inherent in using physics engines, allowing the creation of applications without preprogrammed interaction behavior or gesture recognition.
Sphere is a multi-user, multi-touch-sensitive spherical display in which an infrared camera used for touch sensing shares the same optical path with the projector used for the display. This novel configuration permits: (1) the enclosure of both the projection and the sensing mechanism in the base of the device, and (2) easy 360-degree access for multiple users, with a high degree of interactivity without shadowing or occlusion. In addition to the hardware and software solution, we present a set of multi-touch interaction techniques and interface concepts that facilitate collaborative interactions around Sphere. We designed four spherical application concepts and report on several important observations of collaborative activity from our initial Sphere installation in three high-traffic locations.
We demonstrate a pressure-sensitive depth sorting technique that extends standard two-dimensional (2D) manipulation techniques, particularly those used with multi-touch or multi-point controls. We combine this layering operation with a page-folding metaphor for more fluid interaction in applications requiring 2D sorting and layout.
Creating multiple prototypes facilitates comparative reasoning, grounds team discussion, and enables situated exploration. However, current interface design tools focus on creating single artifacts. This paper introduces the Juxtapose code editor and runtime environment for designing multiple alternatives of both application logic and interface parameters. For rapidly comparing code alternatives, Juxtapose introduces selectively parallel source editing and execution. To explore parameter variations, Juxtapose automatically creates control interfaces for "tuning" application variables at runtime. This paper describes techniques to support design exploration for desktop, mobile, and physical interfaces, and situates this work in a larger design space of tools for explorative programming. A summative study of Juxtapose with 18 participants demonstrated that parallel editing and execution are accessible to interaction designers and that designers can leverage these techniques to survey more options, faster.
Users increasingly interact with a heterogeneous collection of computing devices. The applications that users employ on those devices, however, still largely provide user experiences that assume the use of a single computer. This failure is due in part to the difficulty of creating user experiences that span multiple devices, particularly the need to manage identifying, connecting to, and communicating with other devices. In this paper we present an infrastructure based on instant messaging that simplifies adding that additional functionality to applications. Our infrastructure elevates device ownership to a first class property, allowing developers to provide functionality that spans personal devices without writing code to manage users' devices or establish connections among them. It also provides simple mechanisms for applications to send information, events, or commands between a user's devices. We demonstrate the effectiveness of our infrastructure by presenting a set of sample applications built with it and a user study demonstrating that developers new to the infrastructure can implement all of the cross-device functionality for three applications in, on average, less than two and a half hours.
This paper explores architectural support for interfaces combining pen, paper, and PC. We show how the event-based approach common to GUIs can apply to augmented paper, and describe additions to address paper's distinguishing characteristics. To understand the developer experience of this architecture, we deployed the toolkit to 17 student teams for six weeks. Analysis of the developers' code provided insight into the appropriateness of events for paper UIs. The usage patterns we distilled informed a second iteration of the toolkit, which introduces techniques for integrating interactive and batched input handling, coordinating interactions across devices, and debugging paper applications. The study also revealed that programmers created gesture handlers by composing simple ink measurements. This desire for informal interactions inspired us to include abstractions for recognition. This work has implications beyond paper - designers of graphical tools can examine API usage to inform iterative toolkit development.
Collocation preferences represent the commonly used expressions, idioms, and word pairings of a language. Because collocation preferences arise from consensus usage, rather than a set of well-defined rules, they must be learned on a case-by-case basis, making them particularly challenging for non-native speakers of a language. To assist non-native speakers with these parts of a language, we developed AwkChecker, the first end-user tool geared toward helping non-native speakers detect and correct collocation errors in their writing. As a user writes, AwkChecker automatically flags collocation errors and suggests replacement expressions that correspond more closely to consensus usage. These suggestions include example usage to help users choose the best candidate. We describe AwkChecker's interface, its novel methods for detecting collocation errors and suggesting alternatives, and an early study of its use by non-native English speakers at our institution. Collectively, these contributions advance the state of the art in writing aids for non-native speakers.
We present Inky, a command line for shortcut access to common web tasks. Inky aims to capture the efficiency benefits of typed commands while mitigating their usability problems. Inky commands have little or no new syntax to learn, and the system displays rich visual feedback while the user is typing, including missing parameters and contextual information automatically clipped from the target web site. Inky is an example of a new kind of hybrid between a command line and a GUI interface. We describe the design and implementation of two prototypes of this idea, and report the results of a preliminary user study.
Internet usage on mobile devices continues to grow as users seek anytime, anywhere access to information. Because users frequently search for businesses, directory assistance has been the focus of many voice search applications utilizing speech as the primary input modality. Unfortunately, mobile settings often contain noise which degrades performance. As such, we present Search Vox, a mobile search interface that not only facilitates touch and text refinement whenever speech fails, but also allows users to assist the recognizer via text hints. Search Vox can also take advantage of any partial knowledge users may have about the business listing by letting them express their uncertainty in an intuitive way using verbal wildcards. In simulation experiments conducted on real voice search data, leveraging multimodal refinement resulted in a 28% relative reduction in error rate. Providing text hints along with the spoken utterance resulted in even greater relative reduction, with dramatic gains in recovery for each additional character.
We present ILoveSketch, a 3D curve sketching system that captures some of the affordances of pen and paper for professional designers, allowing them to iterate directly on concept 3D curve models. The system coherently integrates existing techniques of sketch-based interaction with a number of novel and enhanced features. Novel contributions of the system include automatic view rotation to improve curve sketchability, an axis widget for sketch surface selection, and implicitly inferred changes between sketching techniques. We also improve on a number of existing ideas such as a virtual sketchbook, simplified 2D and 3D view navigation, multi-stroke NURBS curve creation, and a cohesive gesture vocabulary. An evaluation by a professional designer shows the potential of our system for deployment within a real design process.
We present the design of Lineogrammer, a diagram-drawing system motivated by the immediacy and fluidity of pencil-drawing. We attempted for Lineogrammer to feel like a modeless diagramming "medium" in which stylus input is immediately interpreted as a command, text label or a drawing element, and drawing elements snap to or sculpt from existing elements. An inferred dual representation allows geometric diagram elements, no matter how they were entered, to be manipulated at granularities ranging from vertices to lines to shapes. We also integrate lightweight tools, based on rulers and construction lines, for controlling higher-level diagram attributes, such as symmetry and alignment. We include preliminary usability observations to help identify areas of strength and weakness with this approach.
Digital paint is one of the more successful interactive applications of computing. Brushes that apply various effects to an image have been central to this success. Current painting techniques ignore the underlying image. By considering that image we can help the user paint more effectively. There are algorithms that assist in selecting regions to paint including flood fill, intelligent scissors and graph cut. Selected regions and the algorithms to create them introduce conceptual layers between the user and the painting task. We propose a series of "edge-respecting brushes" that spread paint or other effects according to the edges and texture of the image being modified. This restores the simple painting metaphor while providing assistance in working with the shapes already in the image. Our most successful fill brush algorithm uses competing least-cost-paths to identify what should be selected and what should not.
Tactile feedback allows devices to communicate with users when visual and auditory feedback are inappropriate. Unfortunately, current vibrotactile feedback is abstract and not related to the content of the message. This often clash-es with the nature of the message, for example, when sending a comforting message.
We propose addressing this by extending the repertoire of haptic notifications. By moving an actuator perpendicular to the user's skin, our prototype device can tap the user. Moving the actuator parallel to the user's skin induces rub-bing. Unlike traditional vibrotactile feedback, tapping and rubbing convey a distinct emotional message, similar to those induced by human-human touch.
To enable these techniques we built a device we call soundTouch. It translates audio wave files into lateral motion using a voice coil motor found in computer hard drives. SoundTouch can produce motion from below 1Hz to above 10kHz with high precision and fidelity.
We present the results of two exploratory studies. We found that participants were able to distinguish a range of taps and rubs. Our findings also indicate that tapping and rubbing are perceived as being similar to touch interactions exchanged by humans.
Current pen input mainly utilizes the position of the pen tip, and occasionally, a button press. Other possible device parameters, such as rolling the pen around its longitudinal axis, are rarely used. We explore pen rolling as a supporting input modality for pen-based interaction. Through two studies, we are able to determine 1) the parameters that separate intentional pen rolling for the purpose of interaction from incidental pen rolling caused by regular writing and drawing, and 2) the parameter range within which accurate and timely intentional pen rolling interactions can occur. Building on our experimental results, we present an exploration of the design space of rolling-based interaction techniques, which showcase three scenarios where pen rolling interactions can be useful: enhanced stimulus-response compatibility in rotation tasks , multi-parameter input, and simplified mode selection.
Interacting with mobile devices using touch can lead to fingers occluding valuable screen real estate. For the smallest devices, the idea of using a touch-enabled display is almost wholly impractical. In this paper we investigate sensing user touch around small screens like these. We describe a prototype device with infra-red (IR) proximity sensors embedded along each side and capable of detecting the presence and position of fingers in the adjacent regions. When this device is rested on a flat surface, such as a table or desk, the user can carry out single and multi-touch gestures using the space around the device. This gives a larger input space than would otherwise be possible which may be used in conjunction with or instead of on-display touch input. Following a detailed description of our prototype, we discuss some of the interactions it affords.
We present Scratch Input, an acoustic-based input technique that relies on the unique sound produced when a fingernail is dragged over the surface of a textured material, such as wood, fabric, or wall paint. We employ a simple sensor that can be easily coupled with existing surfaces, such as walls and tables, turning them into large, unpowered and ad hoc finger input surfaces. Our sensor is sufficiently small that it could be incorporated into a mobile device, allowing any suitable surface on which it rests to be appropriated as a gestural input surface. Several example applications were developed to demonstrate possible interactions. We conclude with a study that shows users can perform six Scratch Input gestures at about 90% accuracy with less than five minutes of training and on wide variety of surfaces.
The venerable desktop metaphor is beginning to show signs of strain in supporting modern knowledge work. In this paper, we examine how the desktop metaphor can be re-framed, shifting the focus away from a low-level (and increasingly obsolete) focus on documents and applications to an interface based upon the creation of and interaction with manually declared, semantically meaningful activities. We begin by unpacking some of the foundational assumptions of desktop interface design, describe an activity-based model for organizing the desktop interface based on theories of cognition and observations of real-world practice, and identify a series of high-level system requirements for interfaces that use activity as their primary organizing principle. Based on these requirements, we present the novel interface design of the Giornata system, a prototype activity-based desktop interface, and share initial findings from a longitudinal deployment of the Giornata system in a real-world setting.
A proactive display is an application that selects content to display based on the set of users who have been detected nearby. For example, the Ticket2Talk  proactive display application presented content for users so that other people would know something about them.
It is our view that promising patterns for proactive display applications have been discovered, and now we face the need for frameworks to support the range of applications that are possible in this design space.
In this paper, we present the Proactive Display (ProD) Framework, which allows for the easy construction of proactive display applications. It allows a range of proactive display applications, including ones already in the literature. ProD also enlarges the design space of proactive display systems by allowing a variety of new applications that incorporate different views of social life and community.
Window management research has aimed to leverage users' tasks to organize the growing number of open windows in a useful manner. This research has largely assumed task classifications to be binary -- either a window is in a task, or not -- and context-independent. We suggest that the continual evolution of tasks can invalidate this approach and instead propose a fuzzy association model in which windows are related to one another by varying degrees. Task groupings are an emergent property of our approach. To support the association model, we introduce the WindowRank algorithm and its use in determining window association. We then describe Taskposé, a prototype window switch visualization embodying these ideas, and report on a week-long user study of the system.
Directional faceted browsers, such as the popular column browser iTunes, let a person pick an instance from any column-facet to start their search for music. The expected effect is that any columns to the right are filtered. In keeping with this directional filtering from left to right, however, the unexpected effect is that the columns to the left of the click provide no information about the possible associations to the selected item. In iTunes, this means that any selection in the Album column on the right returns no information about either the Artists (immediate left) or Genres (leftmost) associated with the chosen album.
Backward Highlighting (BH) is our solution to this problem, which allows users to see and utilize, during search, associations in columns to the left of a selection in a directional column browser like iTunes. Unlike other possible solutions, this technique allows such browsers to keep direction in their filtering, and so provides users with the best of both directional and non-directional styles. As well as describing BH in detail, this paper presents the results of a formative user study, showing benefits for both information discovery and subsequent retention in memory.
The Web is ephemeral. Pages change frequently, and it is nearly impossible to find data or follow a link after the underlying page evolves. We present Zoetrope, a system that enables interaction with the historicalWeb (pages, links, and embedded data) that would otherwise be lost to time. Using a number of novel interactions, the temporal Web can be manipulated, queried, and analyzed from the context of familar pages. Zoetrope is based on a set of operators for manipulating content streams. We describe these primitives and the associated indexing strategies for handling temporal Web data. They form the basis of Zoetrope and enable our construction of new temporal interactions and visualizations.
We present a new server-side architecture that enables rapid prototyping and deployment of mobile web applications created from existing web sites. Key to this architecture is a remote control metaphor in which the mobile device controls a fully functional browser that is embedded within a proxy server. Content is clipped from the proxy browser, transformed if necessary, and then sent to the mobile device as a typical web page. Users' interactions with that content on the mobile device control the next steps of the proxy browser. We have found this approach to work well for creating mobile sites from a variety of existing sites, including those that use dynamic HTML and AJAX technologies. We have conducted a small user study to evaluate our model and API with experienced web programmers.
We propose new interaction techniques that support better browsing of large HTML tables on small screen devices, such as mobile phones. We propose three modes for browsing tables: normal mode, record mode, and cell mode. Normal mode renders tables in the ordinary way, but provides various useful functions for browsing large tables, such as hiding unnecessary rows and columns. Record mode regards each row (or column) as the basic information unit and displays it in a record-like format with column (or row) headers, while cell mode regards each cell as the basic unit and displays each cell together with its corresponding row and column headers. For these table presentations, we need to identify row and column headers that explain the meaning of rows and columns. To provide users with both row and column headers even when the tables have attributes for only one of them, we introduce the concept of keys and develop a method of automatically discovering attributes and keys in tables. Another issue in these presentations is how to handle composite cells spanning multiple rows or columns. We determine the semantics of such composite cells and render them in appropriate ways in accordance with their semantics.
We introduce a new type of interactive surface technology based on a switchable projection screen which can be made diffuse or clear under electronic control. The screen can be continuously switched between these two states so quickly that the change is imperceptible to the human eye. It is then possible to rear-project what is perceived as a stable image onto the display surface, when the screen is in fact transparent for half the time. The clear periods may be used to project a second, different image through the display onto objects held above the surface. At the same time, a camera mounted behind the screen can see out into the environment. We explore some of the possibilities this type of screen technology affords, allowing surface computing interactions to extend 'beyond the display'. We present a single self-contained system that combines these off-screen interactions with more typical multi-touch and tangible surface interactions. We describe the technical challenges in realizing our system, with the aim of allowing others to experiment with these new forms of interactive surfaces.
Numerous methods have been proposed that allow mobile devices to determine where they are located (e.g., home or office) and in some cases, predict what activity the user is currently engaged in (e.g., walking, sitting, or driving). While useful, this sensing currently only tells part of a much richer story. To allow devices to act most appropriately to the situation they are in, it would also be very helpful to know about their placement - for example whether they are sitting on a desk, hidden in a drawer, placed in a pocket, or held in one's hand - as different device behaviors may be called for in each of these situations. In this paper, we describe a simple, small, and inexpensive multispectral optical sensor for identifying materials in proximity to a device. This information can be used in concert with e.g., location information, to estimate, for example, that the device is "sitting on the desk at home", or "in the pocket at work". This paper discusses several potential uses of this technology, as well as results from a two-part study, which indicates that this technique can detect placement at 94.4% accuracy with real-world placement sets.
This paper presents Foldable User Interfaces (FUI), a combination of a 3D GUI with windows imbued with the physics of paper, and Foldable Input Devices (FIDs). FIDs are sheets of paper that allow realistic transformations of graphical sheets in the FUI. Foldable input devices are made out of construction paper augmented with IR reflectors, and tracked by computer vision. Window sheets can be picked up and flexed with simple movements and deformations of the FID. FIDs allow a diverse lexicon of one-handed and two-handed interaction techniques, including folding, bending, flipping and stacking. We show how these can be used to ease the creation of simple 3D models, but also for tasks such as page navigation.
Modern computer displays tend to be in fixed size, rigid, and rectilinear rendering them insensitive to the visual area demands of an application or the desires of the user. Foldable displays offer the ability to reshape and resize the interactive surface at our convenience and even permit us to carry a very large display surface in a small volume. In this paper, we implement four interactive foldable display designs using image projection with low-cost tracking and explore display behaviors using orientation sensitivity.