The goal of scene understanding is to obtain as much semantic knowledge of a given scene image as possible. This include categorization (labeling the whole scene), object detection (predicting object locations by bounding boxes), and semantic segmentation (labeling each pixel). Due to this very general formulation, there is a wide range of applications, such as urban scene understanding for automotive applications, generic object detection, or inferring semantics of remote sensing data.

2015

Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding.
Clemens-Alexander Brust and Sven Sickert and Marcel Simon and Erik Rodner and Joachim Denzler.
International Conference on Computer Vision Theory and Applications (VISAPP). 510-517. 2015. BibTeX pdf www code more ...

Abstract: Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.
Efficient Convolutional Patch Networks for Scene Understanding.
Clemens-Alexander Brust and Sven Sickert and Marcel Simon and Erik Rodner and Joachim Denzler.
CVPR Workshop on Scene Understanding (CVPR-WS). 2015. BibTeX pdf code presentation more ...

Abstract: In this paper, we present convolutional patch networks, which are convolutional (neural) networks (CNN) learned to distinguish different image patches and which can be used for pixel-wise labeling. We show how to easily learn spatial priors for certain categories jointly with their appearance. Experiments for urban scene understanding demonstrate state-of-the-art results on the LabelMeFacade dataset. Our approach is implemented as a new CNN framework especially designed for semantic segmentation with fully-convolutional architectures.

2014

ARTOS -- Adaptive Real-Time Object Detection System.
Björn Barz and Erik Rodner and Joachim Denzler.
arXiv preprint arXiv:1407.2721. 2014. BibTeX pdf www code more ...

Abstract: ARTOS is all about creating, tuning, and applying object detection models with just a few clicks. In particular, ARTOS facilitates learning of models for visual object detection by eliminating the burden of having to collect and annotate a large set of positive and negative samples manually and in addition it implements a fast learning technique to reduce the time needed for the learning step. A clean and friendly GUI guides the user through the process of model creation, adaptation of learned models to different domains using in-situ images, and object detection on both offline images and images from a video stream. A library written in C++ provides the main functionality of ARTOS with a C-style procedural interface, so that it can be easily integrated with any other project.
Interactive Adaptation of Real-Time Object Detectors.
Daniel Göhring and Judy Hoffman and Erik Rodner and Kate Saenko and Trevor Darrell.
International Conference on Robotics and Automation (ICRA). 1282-1289. 2014. BibTeX pdf www more ...

Abstract: In the following paper, we present a framework for quickly training 2D object detectors for robotic perception. Our method can be used by robotics practitioners to quickly (under 30 seconds per object) build a large-scale real-time perception system. In particular, we show how to create new detectors on the fly using large-scale internet image databases, thus allowing a user to choose among thousands of available categories to build a detection system suitable for the particular robotic application. Furthermore, we show how to adapt these models to the current environment with just a few in-situ images. Experiments on existing 2D benchmarks evaluate the speed, accuracy, and flexibility of our system.

2013

Large-Scale Gaussian Process Multi-Class Classification for Semantic Segmentation and Facade Recognition.
Björn Fröhlich and Erik Rodner and Michael Kemmler and Joachim Denzler.
Machine Vision and Applications. 24(5): 1043-1053. 2013. BibTeX pdf

2012

Efficient Semantic Segmentation with Gaussian Processes and Histogram Intersection Kernels.
Alexander Freytag and Björn Fröhlich and Erik Rodner and Joachim Denzler.
International Conference on Pattern Recognition (ICPR). 3313-3316. 2012. BibTeX pdf
As Time Goes By: Anytime Semantic Segmentation with Iterative Context Forests.
Björn Fröhlich and Erik Rodner and Joachim Denzler.
Symposium of the German Association for Pattern Recognition (DAGM). 1-10. 2012. BibTeX pdf
Semantic Segmentation with Millions of Features: Integrating Multiple Cues in a Combined Random Forest Approach.
Björn Fröhlich and Erik Rodner and Joachim Denzler.
Asian Conference on Computer Vision (ACCV). 218-231. 2012. BibTeX pdf

2010

A Fast Approach for Pixelwise Labeling of Facade Images.
Björn Fröhlich and Erik Rodner and Joachim Denzler.
International Conference on Pattern Recognition (ICPR). 3029-3032. 2010. BibTeX pdf

2009

Global Context Extraction for Object Recognition Using a Combination of Range and Visual Features.
Michael Kemmler and Erik Rodner and Joachim Denzler.
Dynamic 3D Imaging Workshop. 96-109. 2009. BibTeX pdf