When browsing a long video using a traditional timeline slider control, its effectiveness and precision degrade as a video's length grows. When browsing videos with more frames than pixels in the slider, aside from some frames being inaccessible, scrolling actions cause sudden jumps in a video's continuity as well as video frames to flash by too fast for one to assess the content. We propose a content-aware dynamic timeline control that is designed to overcome these limitations. Our timeline control decouples video speed and playback speed, and leverages video content analysis to allow salient shots to be presented at an intelligible speed. Our control also takes advantage of previous work on elastic sliders, which allows us to produce an accurate navigation control.
While onboard navigation systems are gaining in importance, maps are still the medium of choice for laying out a route to a destination and for way finding. However, even with a map, one is almost always more comfortable navigating a route the second time due to the visual memory of the route. To make the first time navigating a route feel more familiar, we present a system that integrates a map with a video automatically constructed from panoramic imagery captured at close intervals along the route. The routing information is used to create a variable speed video depicting the route. During playback of the video, the frame and field of view are dynamically modulated to highlight salient features along the route and connect them back to the map. A user interface is demonstrated to allow exploration of the combined map, video, and textual driving directions. We discuss the construction of the hybrid map and video interface. Finally, we report the results of a study that provides evidence of the effectiveness of such a system for route following.
We describe Chronicle, a new system that allows users to explore document workflow histories. Chronicle captures the entire video history of a graphical document, and provides links between the content and the relevant areas of the history. Users can indicate specific content of interest, and see the workflows, tools, and settings needed to reproduce the associated results, or to better understand how it was constructed to allow for informed modification. Thus, by storing the rich information regarding the document's history workflow, Chronicle makes any working document a potentially powerful learning tool. We outline some of the challenges surrounding the development of such a system, and then describe our implementation within an image editing application. A qualitative user study produced extremely encouraging results, as users unanimously found the system both useful and easy to use.
We explore the use of tracked 2D object motion to enable novel approaches to interacting with video. These include moving annotations, video navigation by direct manipulation of objects, and creating an image composite from multiple video frames. Features in the video are automatically tracked and grouped in an off-line preprocess that enables later interactive manipulation. Examples of annotations include speech and thought balloons, video graffiti, path arrows, video hyperlinks, and schematic storyboards. We also demonstrate a direct-manipulation interface for random frame access using spatial constraints, and a drag-and-drop interface for assembling still images from videos. Taken together, our tools can be employed in a variety of applications including film and video editing, visual tagging, and authoring rich media such as hyperlinked video.
Watching a long unedited video is usually a boring experience. In this paper we examine a particular subset of videos, tour videos, in which the video is captured by walking about with a running camera with the goal of conveying the essence of some place. We present a system that makes the process of sharing and watching a long tour video easier, less boring, and more informative. To achieve this, we augment the tour video with a map-based storyboard, where the tour path is reconstructed, and coherent shots at different locations are directly visualized on the map. This allows the viewer to navigate the video in the joint location-time space. To create such a storyboard we employ an automatic pre-processing component to parse the video into coherent shots, and an authoring tool to enable the user to tie the shots with landmarks on the map. The browser-based viewing tool allows users to navigate the video in a variety of creative modes with a rich set of controls, giving each viewer a unique, personal viewing experience. Informal evaluation shows that our approach works well for tour videos compared with conventional media players.
We describe our system called MobileASL for real-time video communication on the current U.S. mobile phone network. The goal of MobileASL is to enable Deaf people to communicate with Sign Language over mobile phones by compressing and transmitting sign language video in real-time on an off-the-shelf mobile phone, which has a weak processor, uses limited bandwidth, and has little battery capacity. We develop several H.264-compliant algorithms to save system resources while maintaining ASL intelligibility by focusing on the important segments of the video. We employ a dynamic skin-based region-of-interest (ROI) that encodes the skin at higher quality at the expense of the rest of the video. We also automatically recognize periods of signing versus not signing and raise and lower the frame rate accordingly, a technique we call variable frame rate (VFR).
We show that our variable frame rate technique results in a 47% gain in battery life on the phone, corresponding to an extra 68 minutes of talk time. We also evaluate our system in a user study. Participants fluent in ASL engage in unconstrained conversations over mobile phones in a laboratory setting. We find that the ROI increases intelligibility and decreases guessing. VFR increases the need for signs to be repeated and the number of conversational breakdowns, but does not affect the users' perception of adopting the technology. These results show that our sign language sensitive algorithms can save considerable resources without sacrificing intelligibility.
We explore the use of tracked 2D object motion to enable novel approaches to interacting with video. These include moving annotations, video navigation by direct manipulation of objects, and creating an image composite from multiple video frames. Features in the video are automatically tracked and grouped in an off-line preprocess that enables later interactive manipulation. Examples of annotations include speech and thought balloons, video graffiti, path arrows, video hyperlinks, and schematic storyboards. We also demonstrate a direct-manipulation interface for random frame access using spatial constraints, and a drag-and-drop interface for assembling still images from videos. Taken together, our tools can be employed in a variety of applications including film and video editing, visual tagging, and authoring rich media such as hyperlinked video.
We explore the use of tracked 2D object motion to enable novel approaches to interacting with video. These include moving annotations, video navigation by direct manipulation of objects, and creating an image composite from multiple video frames. Features in the video are automatically tracked and grouped in an off-line preprocess that enables later interactive manipulation. Examples of annotations include speech and thought balloons, video graffiti, path arrows, video hyperlinks, and schematic storyboards. We also demonstrate a direct-manipulation interface for random frame access using spatial constraints, and a drag-and-drop interface for assembling still images from videos. Taken together, our tools can be employed in a variety of applications including film and video editing, visual tagging, and authoring rich media such as hyperlinked video.
Watching a long unedited video is usually a boring experience. In this paper we examine a particular subset of videos, tour videos, in which the video is captured by walking about with a running camera with the goal of conveying the essence of some place. We present a system that makes the process of sharing and watching a long tour video easier, less boring, and more informative. To achieve this, we augment the tour video with a map-based storyboard, where the tour path is reconstructed, and coherent shots at different locations are directly visualized on the map. This allows the viewer to navigate the video in the joint location-time space. To create such a storyboard we employ an automatic pre-processing component to parse the video into coherent shots, and an authoring tool to enable the user to tie the shots with landmarks on the map. The browser-based viewing tool allows users to navigate the video in a variety of creative modes with a rich set of controls, giving each viewer a unique, personal viewing experience. Informal evaluation shows that our approach works well for tour videos compared with conventional media players.