2020
The whole is more than its parts? From explicit to implicit pose normalization.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 42(3): 749-763. 2020. more ... Abstract: Fine-grained classification describes the automated recognition of visually similar object categories like birds species. Previous works were usually based on explicit pose normalization, i.e., the detection and description of object parts. However, recent models based on a final global average or bilinear pooling have achieved a comparable accuracy without this concept. In this paper, we analyze the advantages of these approaches over generic CNNs and explicit pose normalization approaches. We also show how they can achieve an implicit normalization of the object pose. A novel visualization technique called activation flow is introduced to investigate limitations in pose handling in traditional CNNs like AlexNet and VGG. Afterward, we present and compare the explicit pose normalization approach neural activation constellations and a generalized framework for the final global average and bilinear pooling called -pooling. We observe that the latter often achieves a higher accuracy improving common CNN models by up to 22.9%, but lacks the interpretability of the explicit approaches. We present a visualization approach for understanding and analyzing predictions of the model to address this issue. Furthermore, we show that our approaches for fine-grained recognition are beneficial for other fields like action recognition. |
2019
Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 41(5): 1088-1101. 2019. (Pre-print published in 2018.) more ... Abstract: Automatic detection of anomalies in space- and time-varying measurements is an important tool in several fields, e.g., fraud detection, climate analysis, or healthcare monitoring. We present an algorithm for detecting anomalous regions in multivariate spatio-temporal time-series, which allows for spotting the interesting parts in large amounts of data, including video and text data. In opposition to existing techniques for detecting isolated anomalous data points, we propose the "Maximally Divergent Intervals" (MDI) framework for unsupervised detection of coherent spatial regions and time intervals characterized by a high Kullback-Leibler divergence compared with all other data given. In this regard, we define an unbiased Kullback-Leibler divergence that allows for ranking regions of different size and show how to enable the algorithm to run on large-scale data sets in reasonable time using an interval proposal technique. Experiments on both synthetic and real data from various domains, such as climate analysis, video surveillance, and text forensics, demonstrate that our method is widely applicable and a valuable tool for finding interesting events in different types of data. |
2018
Active Learning for Regression Tasks with Expected Model Output Changes.
British Machine Vision Conference (BMVC). 2018. more ... Abstract: Annotated training data is the enabler for supervised learning. While recording data at large scale is possible in some application domains, collecting reliable annotations is time-consuming, costly, and often a project's bottleneck. Active learning aims at reducing the annotation effort. While this field has been studied extensively for classification tasks, it has received less attention for regression problems although the annotation cost is often even higher. We aim at closing this gap and propose an active learning approach to enable regression applications. To address continuous outputs, we build on Gaussian process models -- an established tool to tackle even non-linear regression problems. For active learning, we extend the expected model output change (EMOC) framework to continuous label spaces and show that the involved marginalizations can be solved in closed-form. This mitigates one of the major drawbacks of the EMOC principle. We empirically analyze our approach in a variety of application scenarios. In summary, we observe that our approach can efficiently guide the annotation process and leads to better models in shorter time and at lower costs. |
2017
Generalized orderless pooling performs implicit salient matching.
International Conference on Computer Vision (ICCV). 4970-4979. 2017. |
Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks.
International Journal of Computer Vision (IJCV). 121(2): 253-280. 2017. |
Multivariate anomaly detection for Earth observations: a comparison of algorithms and feature extraction techniques.
Earth System Dynamics. 8(3): 677-696. 2017. |
Deep bilinear features for Her2 scoring in digital pathology.
Current Directions in Biomedical Engineering. 3(2): 811-814. 2017. |
Fast Learning and Prediction for Object Detection using Whitened CNN Features.
arXiv preprint arXiv:1704.02930. 2017. |
Automatic Classification of Cancerous Tissue in Laserendomicroscopy Images of the Oral Cavity using Deep Learning.
Scientific Reports. 7(1): 41598-017. 2017. |
Maximally Divergent Intervals for Extreme Weather Event Detection.
MTS/IEEE OCEANS Conference Aberdeen. 1-9. 2017. more ... Abstract: We approach the task of detecting anomalous or extreme events in multivariate spatio-temporal climate data using an unsupervised machine learning algorithm for detection of anomalous intervals in time-series. In contrast to many existing algorithms for outlier and anomaly detection, our method does not search for point-wise anomalies, but for contiguous anomalous intervals. We demonstrate the suitability of our approach through numerous experiments on climate data, including detection of hurricanes, North Sea storms, and low-pressure fields. |
2016
Semantic Volume Segmentation with Iterative Context Integration for Bio-medical Image Stacks.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 26(1): 197-204. 2016. more ... Abstract: Automatic recognition of biological structures like membranes or synapses is important to analyze organic processes and to understand their functional behavior. To achieve this, volumetric images taken by electron microscopy or computer tomography have to be segmented into meaningful semantic regions. We are extending iterative context forests which were developed for 2D image data to image stack segmentation. In particular, our method is able to learn high-order dependencies and import contextual information, which often can not be learned by conventional Markov random field approaches usually used for this task. Our method is tested on very different and challenging medical and biological segmentation tasks. |
Fine-tuning Deep Neural Networks in Continuous Learning Scenarios.
ACCV Workshop on Interpretation and Visualization of Deep Neural Nets (ACCV-WS). 2016. more ... Abstract: The revival of deep neural networks and the availability of ImageNet laid the foundation for recent success in highly complex recognition tasks. However, ImageNet does not cover all visual concepts of all possible application scenarios. Hence, application experts still record new data constantly and expect the data to be used upon its availability. In this paper, we follow this observation and apply the classical concept of fine-tuning deep neural networks to scenarios where data from known or completely new classes is continuously added. Besides a straightforward realization of continuous fine-tuning, we empirically analyze how computational burdens of training can be further reduced. Finally, we visualize how the networks attention maps evolve over time which allows for visually investigating what the network learned during continuous fine-tuning. |
Convolutional Neural Networks as a Computational Model for the Underlying Processes of Aesthetics Perception.
ECCV Workshop on Computer Vision for Art Analysis. 2016. |
Multivariate Anomaly Detection for Earth Observations: A Comparison of Algorithms and Feature Extraction Techniques.
Earth System Dynamics. 2016. in discussion |
Using Statistical Process Control for detecting anomalies in multivariate spatiotemporal Earth Observations.
European Geosciences Union General Assembly. 2016. |
ImageNet pre-trained models with batch normalization.
CoRR. 2016. more ... Abstract: Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most state-of-the-art approaches. In this paper, we present a new set of pretrained models with popular state-of-the-art architectures for the Caffe framework. The first release includes Residual Networks (ResNets) with generation script as well as the batch-normalization-variants of AlexNet and VGG19. All models outperform previous models with the same architecture. The models and training code are available at http://www.inf-cv.uni-jena.de/Research/CNN+Models.html and https://github.com/cvjena/cnn-models. |
Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks.
arXiv preprint arXiv:1606.04333. 2016. more ... Abstract: Neural networks and especially convolutional neural networks are of great interest in current computer vision research. However, many techniques, extensions, and modifications have been published in the past, which are not yet used by current approaches. In this paper, we study the application of a method called QuickProp for training of deep neural networks. In particular, we apply QuickProp during learning and testing of fully convolutional networks for the task of semantic segmentation. We compare QuickProp empirically with gradient descent, which is the current standard method. Experiments suggest that QuickProp can not compete with standard gradient descent techniques for complex computer vision tasks like semantic segmentation. |
Watch, Ask, Learn, and Improve: A Lifelong Learning Cycle for Visual Recognition.
European Symposium on Artificial Neural Networks (ESANN). 381-386. 2016. more ... Abstract: We present WALI, a prototypical system that learns object categories over time by continuously watching online videos. WALI actively asks questions to a human annotator about the visual content of observed video frames. Thereby, WALI is able to receive information about new categories and to simultaneously improve its generalization abilities. The functionality of WALI is driven by scalable active learning, efficient incremental learning, as well as state-of-the-art visual descriptors. In our experiments, we show qualitative and quantitative statistics about WALI's learning process. WALI runs continuously and regularly asks questions. |
Vegetation segmentation in cornfield images using bag of words.
Advanced Concepts for Intelligent Vision Systems (ACIVS). 193-204. 2016. more ... Abstract: We provide an alternative methodology for vegetation segmentation in cornfield images. The process includes two main steps, which makes the main contribution of this approach: (a) a low-level segmentation and (b) a class label assignment using Bag of Words (BoW) representation in conjunction with a supervised learning framework. The experimental results show our proposal is adequate to extract green plants in images of maize fields. The accuracy for classification is 95.3 % which is comparable to values in current literature. |
Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches.
British Machine Vision Conference (BMVC). 2016. |
Large-scale Active Learning with Approximated Expected Model Output Changes.
German Conference on Pattern Recognition (GCPR). 179-191. 2016. more ... Abstract: Incremental learning of visual concepts is one step towards reaching human capabilities beyond closed-world assumptions. Besides recent progress, it remains one of the fundamental challenges in computer vision and machine learning. Along that path, techniques are needed which allow for actively selecting informative examples from a huge pool of unlabeled images to be annotated by application experts. Whereas a manifold of active learning techniques exists, they commonly suffer from one of two drawbacks: (i) either they do not work reliably on challenging real-world data or (ii) they are kernel-based and not scalable with the magnitudes of data current vision applications need to deal with. Therefore, we present an active learning and discovery approach which can deal with huge collections of unlabeled real-world data. Our approach is based on the expected model output change principle and overcomes previous scalability issues. We present experiments on the large-scale MS-COCO dataset and on a dataset provided by biodiversity researchers. Obtained results reveal that our technique clearly improves accuracy after just a few annotations. At the same time, it outperforms previous active learning approaches in academic and real-world scenarios. |
SeaCLEF 2016: Object Proposal Classification for Fish Detection in Underwater Videos.
Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum. 481-489. 2016. |
Chimpanzee Faces in the Wild: Log-Euclidean CNNs for Predicting Identities and Attributes of Primates.
German Conference on Pattern Recognition (GCPR). 51-63. 2016. more ... Abstract: In this paper, we investigate how to predict attributes of chimpanzees such as identity, age, age group, and gender. We build on convolutional neural networks, which lead to significantly superior results compared with previous state-of-the-art on hand-crafted recognition pipelines. In addition, we show how to further increase discrimination abilities of CNN activations by the Log-Euclidean framework on top of bilinear pooling. We finally introduce two curated datasets consisting of chimpanzee faces with detailed meta-information to stimulate further research. Our results can serve as the foundation for automated large-scale animal monitoring and analysis. |
Maximally Divergent Intervals for Anomaly Detection.
ICML Workshop on Anomaly Detection (ICML-WS). 2016. Best Paper Award |
Impatient DNNs - Deep Neural Networks with Dynamic Time Budgets.
British Machine Vision Conference (BMVC). 2016. |
Active and Continuous Exploration with Deep Neural Networks and Expected Model Output Changes.
NIPS Workshop on Continual Learning and Deep Networks (NIPS-WS). 2016. more ... Abstract: The demands on visual recognition systems do not end with the complexity offered by current large-scale image datasets, such as ImageNet. In consequence, we need curious and continuously learning algorithms that actively acquire knowledge about semantic concepts which are present in available unlabeled data. As a step towards this goal, we show how to perform continuous active learning and exploration, where an algorithm actively selects relevant batches of unlabeled examples for annotation. These examples could either belong to already known or to yet undiscovered classes. Our algorithm is based on a new generalization of the Expected Model Output Change principle for deep architectures and is especially tailored to deep neural networks. Furthermore, we show easy-to-implement approximations that yield efficient techniques for active selection. Empirical experiments show that our method outperforms currently used heuristics. |
2015
Active Learning and Discovery of Object Categories in the Presence of Unnameable Instances.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4343-4352. 2015. more ... Abstract: Current visual recognition algorithms are "hungry" for data but massive annotation is extremely costly. Therefore, active learning algorithms are required that reduce labeling efforts to a minimum by selecting examples that are most valuable for labeling. In active learning, all categories occurring in collected data are usually assumed to be known in advance and experts should be able to label every requested instance. But do these assumptions really hold in practice? Could you name all categories in every image? Existing algorithms completely ignore the fact that there are certain examples where an oracle can not provide an answer or which even do not belong to the current problem domain. Ideally, active learning techniques should be able to discover new classes and at the same time cope with queries an expert is not able or willing to label. To meet these observations, we present a variant of the expected model output change principle for active learning and discovery in the presence of unnameable instances. Our experiments show that in these realistic scenarios, our approach substantially outperforms previous active learning methods, which are often not even able to improve with respect to the baseline of random query selection. |
Efficient Convolutional Patch Networks for Scene Understanding.
CVPR Workshop on Scene Understanding (CVPR-WS). 2015. more ... Abstract: In this paper, we present convolutional patch networks, which are convolutional (neural) networks (CNN) learned to distinguish different image patches and which can be used for pixel-wise labeling. We show how to easily learn spatial priors for certain categories jointly with their appearance. Experiments for urban scene understanding demonstrate state-of-the-art results on the LabelMeFacade dataset. Our approach is implemented as a new CNN framework especially designed for semantic segmentation with fully-convolutional architectures. |
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding.
International Conference on Computer Vision Theory and Applications (VISAPP). 510-517. 2015. more ... Abstract: Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult. |
Fine-grained Classification of Identity Document Types with Only One Example.
Machine Vision Applications (MVA). 126 - 129. 2015. more ... Abstract: This paper shows how to recognize types of identity documents, such as passports, using state-of-the-art visual recognition approaches. Whereas recognizing individual parts on identity documents with a standardized layout is one of the old classics in computer vision, recognizing the type of the document and therefore also the layout is a challenging problem due to the large variation of the documents. In our paper, we evaluate different techniques for this application including feature representations based on recent achievements with convolutional neural networks. |
Fine-grained Recognition Datasets for Biodiversity Analysis.
CVPR Workshop on Fine-grained Visual Classification (CVPR-WS). 2015. |
Automated analysis of confocal laser endomicroscopy images to detect head and neck cancer.
Head and Neck. 38(1): 2015. |
Beyond Thinking in Common Categories: Predicting Obstacle Vulnerability using Large Random Codebooks.
Machine Vision Applications (MVA). 198-201. 2015. more ... Abstract: Obstacle detection for advanced driver assistance systems has focused on building detectors for only a few number of categories so far, such as pedestrians and cars. However, vulnerable obstacles of other categories are often dismissed, such as wheel-chairs and baby strollers. In our work, we try to tackle this limitation by presenting an approach which is able to predict the vulnerability of an arbitrary obstacle independently from its category. This allows for using models not specifically tuned for category recognition. To classify the vulnerability, we apply a generic category-free approach based on large random bag-of-visual-words representations (BoW), where we make use of both the intensity image as well as a given disparity map. In experimental results, we achieve a classification accuracy of over 80% for predicting one of four vulnerability levels for each of the 10000 obstacle hypotheses detected in a challenging dataset of real urban street scenes. Vulnerability prediction in general and our working algorithm in particular, pave the way to more advanced reasoning in autonomous driving, emergency route planning, as well as reducing the false-positive rate of obstacle warning systems. |
Analysis and Classification of Microscopy Images with Cell Border Distance Statistics.
Jahrestagung der Deutschen Gesellschaft für Medizinische Physik (DGMP). 2015. |
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks.
International Conference on Computer Vision (ICCV). 1143-1151. 2015. more ... Abstract: Part models of object categories are essential for challenging recognition tasks, where differences in categories are subtle and only reflected in appearances of small parts of the object. We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. The key idea is to find constellations of neural activation patterns computed using convolutional neural networks. In our experiments, we outperform existing approaches for fine-grained recognition on the CUB200-2011, Oxford PETS, and Oxford Flowers dataset in case no part or bounding box annotations are available and achieve state-of-the-art performance for the Stanford Dog dataset. We also show the benefits of neural constellation models as a data augmentation technique for fine-tuning. Furthermore, our paper unites the areas of generic and fine-grained classification, since our approach is suitable for both scenarios. |
Local Novelty Detection in Multi-class Recognition Problems.
IEEE Winter Conference on Applications of Computer Vision (WACV). 813-820. 2015. |
Understanding Object Descriptions in Robotics by Open-vocabulary Object Retrieval and Detection.
International Journal of Robotics Research (IJRR). 35(1-3): 265-280. 2015. |
2014
Instance-weighted Transfer Learning of Active Appearance Models.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1426-1433. 2014. more ... Abstract: There has been a lot of work on face modeling, analysis, and landmark detection, with Active Appearance Models being one of the most successful techniques. A major drawback of these models is the large number of detailed annotated training examples needed for learning. Therefore, we present a transfer learning method that is able to learn from related training data using an instance-weighted transfer technique. Our method is derived using a generalization of importance sampling and in contrast to previous work we explicitly try to tackle the transfer already during learning instead of adapting the fitting process. In our studied application of face landmark detection, we efficiently transfer facial expressions from other human individuals and are thus able to learn a precise face Active Appearance Model only from neutral faces of a single individual. Our approach is evaluated on two common face datasets and outperforms previous transfer methods. |
Open-vocabulary Object Retrieval.
Robotics Science and Systems (RSS). 41, ISBN 978-0-9923747-0-9. 2014. Awarded with an AAAI invited talk more ... Abstract: In this paper, we address the problem of retrieving objects based on open-vocabulary natural language queries: Given a phrase describing a specific object, e.g., the corn flakes box, the task is to find the best match in a set of images containing candidate objects. When naming objects, humans tend to use natural language with rich semantics, including basic-level categories, fine-grained categories, and instance-level concepts such as brand names. Existing approaches to large-scale object recognition fail in this scenario, as they expect queries that map directly to a fixed set of pre-trained visual categories, e.g. ImageNet synset tags. We address this limitation by introducing a novel object retrieval method. Given a candidate object image, we first map it to a set of words that are likely to describe it, using several learned image-to-text projections. We also propose a method for handling open-vocabularies, i.e., words not contained in the training data. We then compare the natural language query to the sets of words predicted for each candidate and select the best match. Our method can combine category- and instance-level semantics in a common representation. We present extensive experimental results on several datasets using both instance-level and category-level matching and show that our approach can accurately retrieve objects based on extremely varied open-vocabulary queries. The source code of our approach will be publicly available together with pre-trained models and could be directly used for robotics applications. |
Nonparametric Part Transfer for Fine-grained Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2489-2496. 2014. more ... Abstract: In the following paper, we present an approach for fine-grained recognition based on a new part detection method. In particular, we propose a nonparametric label transfer technique which transfers part constellations from objects with similar global shapes. The possibility for transferring part annotations to unseen images allows for coping with a high degree of pose and view variations in scenarios where traditional detection models (such as deformable part models) fail. Our approach is especially valuable for fine-grained recognition scenarios where intraclass variations are extremely high, and precisely localized features need to be extracted. Furthermore, we show the importance of carefully designed visual extraction strategies, such as combination of complementary feature types and iterative image segmentation, and the resulting impact on the recognition performance. In experiments, our simple yet powerful approach achieves 35.9% and 57.8% accuracy on the CUB-2010 and 2011 bird datasets, which is the current best performance for these benchmarks. |
Interactive Adaptation of Real-Time Object Detectors.
International Conference on Robotics and Automation (ICRA). 1282-1289. 2014. more ... Abstract: In the following paper, we present a framework for quickly training 2D object detectors for robotic perception. Our method can be used by robotics practitioners to quickly (under 30 seconds per object) build a large-scale real-time perception system. In particular, we show how to create new detectors on the fly using large-scale internet image databases, thus allowing a user to choose among thousands of available categories to build a detection system suitable for the particular robotic application. Furthermore, we show how to adapt these models to the current environment with just a few in-situ images. Experiments on existing 2D benchmarks evaluate the speed, accuracy, and flexibility of our system. |
Birds of a Feather Flock Together - Local Learning of Mid-level Representations for Fine-grained Recognition.
ECCV Workshop on Parts and Attributes (ECCV-WS). 2014. |
Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods.
International Conference on Pattern Recognition (ICPR) - FEAST workshop. 2014. Best Poster Award more ... Abstract: The bag-of-visual-word (BoW) model is one of the most common concepts for image categorization and feature extraction. Although our community developed powerful BoW approaches for visual recognition and it serves as a great ad-hoc solution, unfortunately, there are several drawbacks that most researchers might be not aware of. In this paper, we aim at seeing behind the curtains and point to some of the negative aspects of these approaches which go usually unnoticed: (i) although BoW approaches are often motivated by relating clusters to meaningful object parts, this relation does not hold in practice with low-dimensional features such as HOG, and standard clustering method, (ii) clusters can be chosen randomly without loss in performance, (iii) BoW is often only collecting background statistics, and (iv) cluster assignments are not robust to small spatial shifts. Furthermore, we show the effect of BoW quantization and the related loss of visual information by a simple inversion method called HoggleBoW. |
Semantic Volume Segmentation with Iterative Context Integration.
Open German-Russian Workshop on Pattern Recognition and Image Understanding (OGRW). 220-225. 2014. more ... Abstract: Automatic recognition of biological structures like membranes or synapses is important to analyze organic processes and to understand their functional behavior. To achieve this, volumetric images taken by electron microscopy or computed tomography have to be segmented into meaningful regions. We are extending iterative context forests which were developed for 2D image data for image stack segmentation. In particular, our method s able to learn high order dependencies and import contextual information, which often can not be learned by conventional Markov random field approaches usually used for this task. Our method is tested for very different and challenging medical and biological segmentation tasks. |
ARTOS -- Adaptive Real-Time Object Detection System.
arXiv preprint arXiv:1407.2721. 2014. more ... Abstract: ARTOS is all about creating, tuning, and applying object detection models with just a few clicks. In particular, ARTOS facilitates learning of models for visual object detection by eliminating the burden of having to collect and annotate a large set of positive and negative samples manually and in addition it implements a fast learning technique to reduce the time needed for the learning step. A clean and friendly GUI guides the user through the process of model creation, adaptation of learned models to different domains using in-situ images, and object detection on both offline images and images from a video stream. A library written in C++ provides the main functionality of ARTOS with a C-style procedural interface, so that it can be easily integrated with any other project. |
Selecting Influential Examples: Active Learning with Expected Model Output Changes.
European Conference on Computer Vision (ECCV). 562-577. 2014. more ... Abstract: In this paper, we introduce a new general strategy for active learning. The key idea of our approach is to measure the expected change of model outputs, a concept that generalizes previous methods based on expected model change and incorporates the underlying data distribution. For each example of an unlabeled set, the expected change of model predictions is calculated and marginalized over the unknown label. This results in a score for each unlabeled example that can be used for active learning with a broad range of models and learning algorithms. In particular, we show how to derive very efficient active learning methods for Gaussian process regression, which implement this general strategy, and link them to previous methods. We analyze our algorithms and compare them to a broad range of previous active learning strategies in experiments showing that they outperform state-of-the-art on well-established benchmark datasets in the area of visual object recognition. |
Part Detector Discovery in Deep Convolutional Neural Networks.
Asian Conference on Computer Vision (ACCV). 162-177. 2014. more ... Abstract: Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery and localization without the necessity to actually train the network on the current dataset. Our approach called part detector discovery (PDD) is based on analyzing the gradient maps of the network outputs and finding activation centers spatially related to annotated semantic parts or bounding boxes. This allows us not just to obtain excellent performance on the CUB200-2011 dataset, but in contrast to previous approaches also to perform detection and bird classification jointly without requiring a given bounding box annotation during testing and ground-truth parts during training. |
Part Localization by Exploiting Deep Convolutional Networks.
ECCV Workshop on Parts and Attributes (ECCV-WS). 2014. |
Exemplar-specific Patch Features for Fine-grained Recognition.
German Conference on Pattern Recognition (GCPR). 144-156. 2014. more ... Abstract: In this paper, we present a new approach for fine-grained recognition or subordinate categorization, tasks where an algorithm needs to reliably differentiate between visually similar categories, e.g. different bird species. While previous approaches aim at learning a single generic representation and models with increasing complexity, we propose an orthogonal approach that learns patch representations specifically tailored to every single test exemplar. Since we query a constant number of images similar to a given test image, we obtain very compact features and avoid large-scale training with all classes and examples. Our learned mid-level features are build on shape and color detectors estimated from discovered patches reflecting small highly discriminative structures in the queried images. We evaluate our approach for fine-grained recognition on the CUB-2011 birds dataset and show that high recognition rates can be obtained by model combination. |
Asymmetric and Category Invariant Feature Transformations for Domain Adaptation.
International Journal of Computer Vision (IJCV). 109(1-2): 28-41. 2014. more ... Abstract: We address the problem of visual domain adaptation for transferring object models from one dataset or visual domain to another. We introduce a unified flexible model for both supervised and semi-supervised learning that allows us to learn transformations between domains. Additionally, we present two instantiations of the model, one for general feature adaptation/alignment, and one specifically designed for classification. First, we show how to extend metric learning methods for domain adaptation, allowing for learning metrics independent of the domain shift and the final classifier used. Furthermore, we go beyond classical metric learning by extending the method to asymmetric, category independent transformations. Our framework can adapt features even when the target domain does not have any labeled examples for some categories, and when the target and source features have different dimensions. Finally, we develop a joint learning framework for adaptive classifiers, which outperforms competing methods in terms of multi-class accuracy and scalability. We demonstrate the ability of our approach to adapt object recognition models under a variety of situations, such as differing imaging conditions, feature types, and codebooks. The experiments show its strong performance compared to previous approaches and its applicability to large-scale scenarios. |
2013
Transform-based Domain Adaptation for Big Data.
NIPS Workshop on New Directions in Transfer and Multi-Task Learning (NIPS-WS). 2013. abstract version of arXiv:1308.4200 more ... Abstract: Images seen during test time are often not from the same distribution as images used for learning. This problem, known as domain shift, occurs when training classifiers from object-centric internet image databases and trying to apply them directly to scene understanding tasks. The consequence is often severe performance degradation and is one of the major barriers for the application of classi- fiers in real-world systems. In this paper, we show how to learn transform-based domain adaptation classifiers in a scalable manner. The key idea is to exploit an implicit rank constraint, originated from a max-margin domain adaptation formulation, to make optimization tractable. Experiments show that the transformation between domains can be very efficiently learned from data and easily applied to new categories |
Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations.
arXiv preprint arXiv:1308.4200. 2013. |
Fine-grained Categorization - Short Summary of our Entry for the ImageNet Challenge 2012.
arXiv preprint arXiv:1310.4759. 2013. |
Semi-Supervised Domain Adaptation with Instance Constraints.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 668 - 675. 2013. |
Approximations of Gaussian Process Uncertainties for Visual Recognition Problems.
Scandinavian Conference on Image Analysis (SCIA). 182-194. 2013. |
An Efficient Approximation for Gaussian Process Regression (2013) Technical Report TR-FSU-INF-CV-2013-01 |
I Want To Know More: Efficient Multi-Class Incremental Learning Using Gaussian Processes.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 23(3): 402-407. 2013. |
Automatic Identification of Novel Bacteria using Raman Spectroscopy and Gaussian Processes.
Analytica Chimica Acta. 29-37. 2013. |
Scalable Transform-based Domain Adaptation.
ICCV Workshop on Visual Domain Adaptation (ICCV-WS). 2013. |
Beyond the closed-world assumption: The importance of novelty detection and open set recognition.
GCPR Workshop on Unsolved Problems in Pattern Recognition (GCPR-WS). 2013. |
Efficient Learning of Domain-invariant Image Representations.
International Conference on Learning Representations (ICLR). 2013. |
One-class Classification with Gaussian Processes.
Pattern Recognition. 3507-3518. 2013. |
Segmentation of Microorganism in Complex Environments.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 23(4): 512-517. 2013. |
Labeling examples that matter: Relevance-Based Active Learning with Gaussian Processes.
German Conference on Pattern Recognition (GCPR). 282-291. 2013. more ... Abstract: Active learning is an essential tool to reduce manual annotation costs in the presence of large amounts of unsupervised data. In this paper, we introduce new active learning methods based on measuring the impact of a new example on the current model. This is done by deriving model changes of Gaussian process models in closed form. Furthermore, we study typical pitfalls in active learning and show that our methods automatically balance between the exploitation and the exploration trade-off. Experiments are performed with established benchmark datasets for visual object recognition and show that our new active learning techniques are able to outperform state-of-the-art methods. |
Kernel Null Space Methods for Novelty Detection.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3374-3381. 2013. |
Large-Scale Gaussian Process Multi-Class Classification for Semantic Segmentation and Facade Recognition.
Machine Vision and Applications. 24(5): 1043-1053. 2013. |
2012
Large-Scale Gaussian Process Classification using Random Decision Forests.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 22(1): 113-120. 2012. |
Lernen mit wenigen Beispielen für die visuelle Objekterkennung.
Ausgezeichnete Informatikdissertationen 2011. 2012. in german |
Beyond Classification - Large-scale Gaussian Process Inference and Uncertainty Prediction.
Big Data Meets Computer Vision: First International Workshop on Large Scale Visual Recognition and Retrieval (NIPS-WS). 2012. This workshop article is a short version abstract of our ACCV'12 paper. |
Rapid Uncertainty Computation with Gaussian Processes and Histogram Intersection Kernels.
Asian Conference on Computer Vision (ACCV). 511-524. 2012. Best Paper Honorable Mention Award |
Large-Scale Gaussian Process Classification with Flexible Adaptive Histogram Kernels.
European Conference on Computer Vision (ECCV). 85-98. 2012. |
Efficient Semantic Segmentation with Gaussian Processes and Histogram Intersection Kernels.
International Conference on Pattern Recognition (ICPR). 3313-3316. 2012. |
As Time Goes By: Anytime Semantic Segmentation with Iterative Context Forests.
Symposium of the German Association for Pattern Recognition (DAGM). 1-10. 2012. |
Multi-Person Tracking-by-Detection based on Calibrated Multi-Camera Systems.
International Conference on Computer Vision and Graphics. 743-751. 2012. |
Semantic Segmentation with Millions of Features: Integrating Multiple Cues in a Combined Random Forest Approach.
Asian Conference on Computer Vision (ACCV). 218-231. 2012. |
Divergence-Based One-Class Classification Using Gaussian Processes.
British Machine Vision Conference (BMVC). 50.1-50.11. 2012. http://dx.doi.org/10.5244/C.26.50 |
2011
Efficient Multi-Class Incremental Learning Using Gaussian Processes.
Open German-Russian Workshop on Pattern Recognition and Image Understanding (OGRW). 182-185. 2011. more ... Abstract: One of the main assumptions in machine learning is that sufficient training data is available in advance and batch learning can be applied. However, because of the dynamics in a lot of applications, this assumption will break down in almost all cases over time. Therefore, classifiers have to be able to adapt themselves when new training data from existing or new classes becomes available, training data is changed or should be even removed. In this paper, we present a method allowing efficient incremental learning of a Gaussian process classifier. Experimental results show the benefits in terms of needed computation times compared to building the classifier from the scratch. |
One-Class Classification for Anomaly Detection in Wire Ropes with Gaussian Processes in a Few Lines of Code.
Machine Vision Applications (MVA). 219-222. 2011. |
Detection of Microorganisms in Complex Microscopy Images.
Open German-Russian Workshop on Pattern Recognition and Image Understanding (OGRW). 115-118. 2011. |
Learning with Few Examples for Binary and Multiclass Classification Using Regularization of Randomized Trees.
Pattern Recognition Letters. 32(2): 244-251. 2011. |
Efficient Gaussian process classification using random decision forests.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 184-187. 2011. 10.1134/S1054661811020337 |
2010
A Fast Approach for Pixelwise Labeling of Facade Images.
International Conference on Pattern Recognition (ICPR). 3029-3032. 2010. |
One-Class Classification with Gaussian Processes.
Asian Conference on Computer Vision (ACCV). 489-500. 2010. |
One-Shot Learning of Object Categories using Dependent Gaussian Processes.
Annual Symposium of the German Association for Pattern Recognition (DAGM). 232-241. 2010. |
Efficient Gaussian Process Classification using Random Decision Forests.
International Conference on Pattern Recognition and Image Analysis (PRIA), St. Petersburg, Russia. 93-96. 2010. |
Multiple Kernel Gaussian Process Classification for Generic 3D Object Recognition From Time-of-Flight Images.
International Conference on Image and Vision Computing. 1-8. 2010. |
2009
Learning with Few Examples by Transferring Feature Relevance.
Annual Symposium of the German Association for Pattern Recognition (DAGM). 252-261. 2009. |
Randomized Probabilistic Latent Semantic Analysis for Scene Recognition.
Iberoamerican Congress on Pattern Recognition (CIARP). 945-953. 2009. |
Global Context Extraction for Object Recognition Using a Combination of Range and Visual Features.
Dynamic 3D Imaging Workshop. 96-109. 2009. |
2008
On Fusion of Range and Intensity Information Using Graph-Cut for Planar Patch Segmentation.
International Journal of Intelligent Systems Technologies and Applications. 5(3/4): 365-373. 2008. more ... Abstract: Planar patch detection aims at simplifying data from 3-D imaging sensors to a more compact scene description. We propose a fusion of intensity and depth information using Graph-Cut methods for this problem. Different known algorithms are additionally evaluated on lowresolution high-framerate image sequences and used as an initialization for the Graph-Cut approach. In experiments we show a significant improvement of the detected patch boundaries after the refinement with our method. |
Learning with Few Examples using a Constrained Gaussian Prior on Randomized Trees.
Vision, Modelling, and Visualization Workshop (VMV). 159-168. 2008. |
Difference of Boxes Filters Revisited: Shadow Suppression and Efficient Character Segmentation.
IAPR Workshop on Document Analysis Systems. 263-269. 2008. more ... Abstract: A robust segmentation is the most important part of an automatic character recognition system (e.g. document pro- cessing, license plate recognition etc.). In our contribution we present an efficient segmentation framework using a pre- processing step for shadow suppression combined with a local thresholding technique. The method is based on a combination of difference of boxes filters and a new ternary segmentation, which are both simple low-level image oper- ations. We also draw parallels to a recently published work on a ganglion cell model and show that our approach is theoret- ically more substantiated as well as more robust and more efficient in practice. Systematic evaluation of noisy input data as well as results on a large dataset of license plate images 1 show the robustness and efficiency of our proposed method. Our results can be applied easily to any optical char- acter recognition system resulting in an impressive gain of robustness against nonlinear illumination. |