Object Category Recognition using Sparse Localized Features
Talker: David Lowe (University of British Columbia, Vancouver, Canada)
Abstract:
Within the past few years, the problem of recognizing object
categories has captured the imagination of the computer vision
community. Although there has already been good progress on this
problem, our current computer vision systems remain far behind the
capabilities of human vision. This talk will present an approach that
borrows some ideas from models of biological vision. Our approach
builds on the standard Hubel-Wiesel architecture for biological
vision, in which feature complexity and position/scale invariance are
built up in stages by alternating template matching and pooling
operations. Features are computed densely over the image rather than
by relying on interest point selection, but computational cost is kept
low by increasing invariance and subsampling. The approach also
differs from most current work in computer vision by choosing
features through random selection from training examples rather than
through a clustering process. Sparsity is increased by constraining
the number of feature inputs and using feature selection. We also
demonstrate the value of retaining some position and scale information
above the intermediate feature level, which provides an intermediate
choice between bag-of-words and constellation models. Our final model
performs well on several standard datasets.
Biography:
David Lowe is a professor of Computer Science at the University of
British Columbia and a Fellow of the Canadian Institute for Advanced
Research. He received his Ph.D. in computer science from Stanford
University in 1984. From 1984 to 1987 he was an Assistant Professor
at the Courant Institute of Mathematical Sciences at New York
University. He is a member of the scientific advisory board for
Evolution Robotics. His research interests include object
recognition, local invariant features for image matching, panorama
stitching, and computational models of human visual recognition.
Less is More: Coded Computational Photography
Talker: Ramesh Raskar (Mitsubishi Electric Research Labs (MERL), Cambridge, MA)
Abstract:
Computational photography combines plentiful computing, digital sensors, modern optics, actuators, and smart lights to escape the limitations of traditional cameras, enables novel imaging applications and simplifies many computer vision tasks. Unbounded dynamic range, variable focus, resolution, and depth of field, hints about shape, reflectance, and lighting, and new interactive forms of photos that are partly snapshots and partly videos are just some of the new applications found in Computational Photography.
I will discuss Coded Photography which involves encoding of the photographic signal and post-capture decoding for improved scene analysis.With film-like photography, the captured image is a 2D projection of the scene. Due to limited capabilities of the camera, the recorded image is a partial representation of the view. Nevertheless, the captured image is ready for human consumption: what you see is what you almost get in the photo. In Coded Photography, the goal is to achieve a potentially richer representation of the scene during the encoding process. In some cases, Computational Photography reduces to 'Epsilon Photography', where the scene is recorded via multiple images, each captured by epsilon variation of the camera parameters. For example, successive images (or neighboring pixels) may have a different exposure, focus, aperture, view, illumination, or instant of capture. Each setting allows recording of partial information about the scene and the final image is reconstructed from these multiple observations. In Coded Computational Photography, the recorded image may appear distorted or random to a human observer. But the corresponding decoding recovers valuable information about the scene.
'Less is more' in Coded Photography. By blocking light over time or space, we can preserve more details about the scene in the recorded single photograph.
(a) Coded Exposure: By blocking light in time, by fluttering the shutter open and closed in a carefully chosen binary sequence, we can preserve high spatial frequencies of fast moving objects to support high quality motion deblurring.
(b) Coded Aperture Optical Heterodyning: By blocking light near the sensor with a sinusoidal grating mask, we can record 4D light field on a 2D sensor. And by blocking light with a mask at the aperture, we can extend the depth of field and achieve full resolution digital refocussing.
(c) Coded Illumination: By observing blocked light at silhouettes, a multi-flash camera can locate depth discontinuities in challenging scenes without depth recovery.
(d) Coded Sensing: By sensing intensities with lateral inhibition, a gradient sensing camera can record large as well as subtle changes in intensity to recover a high-dynamic range image.
I will show several applications of coding exposure, aperture, illumination and sensing and describe emerging techniques to recover scene parameters from coded photographs.
Biography:
Ramesh Raskar studies computational aspects of images and illumination. He is a Senior Research Scientist at Mitsubishi Electric Research Labs (MERL) in Cambridge, MA. During his doctoral research at U. of North Carolina at Chapel Hill, he developed a framework for sensor-assisted projectors. His recent work spans a range of topics in computational photography, projective emitters, non-photorealistic rendering and intelligent user interfaces. Current projects include optical heterodyning photography, flutter shutter camera, composite RFID (RFIG), multi-flash non-photorealistic camera for depth edge detection, locale-aware mobile projectors, image fusion for context enhancement and quadric transfer methods for multi-projector curved screen displays.
Dr. Raskar received the TR100 Award, Technology Review's 100 Top Young Innovators Under 35 worldwide, 2004 and Global Indus Technovator Award 2003, instituted at MIT to recognize the top 20 Indian technology innovators on the globe. He holds 25 US patents and has received Mitsubishi Electric Invention Awards in 2003, 2004 and 2006.
http://www.merl.com/people/raskar/raskar.html
Optimal Algorithms in Multiview Geometry
Talker: Richard Hartley (Department of Information Engineering, Australian National University and NICTA, Canberra, Australia)
Abstract:
This talk gives a survey of some new optimization methods that have recently been used to solve various problems in Computer Vision.
In the past, the main methods for solving problems in Multiview Vision Geometry have been iterative techniques, which may suffer from falling into local minima, and trouble with convergence. In addition, in order to converge satisfactorily, they must be preceded by a method that gives an approximate solution. Recent research has turned to finding guaranteed globally optimal solutions to such problems. In the last few years, several methods have been suggested for doing this, either for finding the usual least-squares solution, or the minimax (L-infinity) solution. Techniques include quasi-convex optimization, Second Order Cone Programming, Branch-and-bound techniques and fractional programming. Such methods have been applied with success to many of the common geometric problems, such as multiview triangulation, camera resection, homography estimation, vanishing point estimation and projective reconstruction for scenes containing a plane. More recently a mixture of techniques has provided an optimal solution for essential-matrix estimation (under L-infinity norm), calibrated camera pose estimation, and motion of a vehicle with rigidly placed cameras. This is achieved using Branch-and-Bound and Linear Programming or SOCP. In a more applied context, such techniques have been applied to the problem of tracking a deformable sheet of material (such as sheet of paper), and also to MRF optimization problems.
Biography:
Professor Richard Hartley graduated in Mathematics, receiving a BSc from the Australian National University, and a PhD from the University of Toronto. Currently, he is affiliated with the Department of Information Engineering at the Australian National University, and is a member of the Vision Science, Technology and Applications Program in NICTA, a research laboratory funded by the Australian Government. Previously, he worked at the General Electric Research and Development Center in Schenectady New York from 1985 to 2001. During the period 1985-1988, he was involved in the design and implementation of Computer-Aided Design tools for electronic design and created a very successful design system called the Parsifal Silicon Compiler. In 1991 he was awarded GE's Dushman Award for this work.
He began work in Image Understanding and Scene Reconstruction for GE's Simulation and Control Systems Division. This division built large-scale flight-simulators. Dr. Hartley's projects in this area were in the construction of terrain models and texture mosaics from aerial and satellite imagery.
In 1991, he began an extended research effort applying geometric techniques to the analysis of video. This research led to fundamental advances in machine-understanding of video, and opened up one of the most popular areas of Computer Vision research in the 1990s. In 2000, he co-authored, with Andrew Zisserman, a book Multiple View Geometry in Computer Vision for Cambridge University Press, summarizing the previous decade's research in this area. This book has become one of the most popular research reference texts in Computer Vision.
He has authored over 120 papers in Photogrammetry, Computer Vision, Geometric Topology, Geometric Voting Theory, Computational Geometry and Computer-Aided Design, and holds 34 US patents.
Machine Vision in Early Days - Japan's pioneering contributions -
Talker: Masakazu Ejiri (R&D Consultant in Industrial Science, formerly at Central Research Laboratory, Hitachi, Ltd.)
Abstract:
The history of machine vision started in the mid-1960s by the efforts of Japanese industry researchers. A variety of prominent vision-based systems was made possible by creating and evolving real-time image processing techniques, and was applied to factory automation, office automation, and even to social automation during the 1970-2000 period. In this article, these historical attempts are briefly explained to promote understanding of the pioneering efforts that opened the door and formed the bases of today's computer vision research.
Biography:
Dr. Masakazu Ejiri received the B.E. degree in Mechanical Engineering and the Dr. Eng. degree in Electrical Engineering, both from Osaka University, Japan, in 1959 and 1967, respectively. Since 1959, he had been with the Central Research Laboratory of Hitachi Ltd., Tokyo, Japan, until his retirement in 2003. He used to work in the area of Control Engineering, Robotics, Pattern Recognition, Machine Vision and Machine Intelligence, and his achievements include the development of the world-first, computer-controlled, fully-automatic transistor assembly system in 1973, by applying innovative machine vision technology.
While working for Hitachi, he spent 1967-1968 as a Visiting Professor at the University of Illinois, Chicago, and 1977-1981 as the Vice President of HISL Inc., California, USA. He also served as the Vice President of the International Association for Pattern Recognition (IAPR) during 1990-1992, as the Governing Board Member of the IAPR during 1992-2002, and as the President of the Robotics Society of Japan (RSJ) during 2001-2003.
He received several awards from academic and industrial societies for his achievements and services, including the Joseph F. Engelberger Technology Award from the Robotics Industries Association in 2005. He is a Fellow of the IEEE (Institute of Electrical and Electronics Engineers), a Fellow of the IAPR, a Fellow of the IEICE (Institute of Electronics, Information and Communication Engineers of Japan), and a Fellow of the RSJ. Presently, he is the Chair of the IEICE F&M (Fellows and Masters) Committee, and the Vice President of the Trans-disciplinary Federation of Science and Technology. He will also serve as the General Chair of the ICPR 2008 to be held in Tampa, Florida, USA.