info@itechprosolutions.in | +91 9790176891

IMAGE PROCESSING 2017

Category Archives

Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion

Abstract:
This paper advocates a novel video saliency detection method based on the spatial-temporal saliency fusion and low-rank coherency guided saliency diffusion. In sharp contrast to the conventional methods, which conduct saliency detection locally in a frame-by-frame way and could easily give rise to incorrect low-level saliency map, in order to overcome the existing difficulties, this paper proposes to fuse the color saliency based on global motion clues in a batch-wise fashion. And we also propose low-rank coherency guided spatial-temporal saliency diffusion to guarantee the temporal smoothness of saliency maps. Meanwhile, a series of saliency boosting strategies are designed to further improve the saliency accuracy. First, the original long-term video sequence is equally segmented into many short-term frame batches, and the motion clues of the individual video batch are integrated and diffused temporally to facilitate the computation of color saliency. Then, based on the obtained saliency clues, inter-batch saliency priors are modeled to guide the low-level saliency fusion. After that, both the raw color information and the fused low-level saliency are regarded as the low-rank coherency clues, which are employed to guide the spatial-temporal saliency diffusion with the help of an additional permutation matrix serving as the alternative rank selection strategy. Thus, it could guarantee the robustness of the saliency map’s temporal consistence, and further boost the accuracy of the computed saliency map. Moreover, we conduct extensive experiments on five public available benchmarks, and make comprehensive, quantitative evaluations between our method and 16 state-of-the-art techniques. All the results demonstrate the superiority of our method in accuracy, reliability, robustness, and versatility.


Video Anomaly Detection With Compact Feature Sets for Online Performance

Abstract:
Over the past decade, video anomaly detection has been explored with remarkable results. However, research on methodologies suitable for online performance is still very limited. In this paper, we present an online framework for video anomaly detection. The key aspect of our framework is a compact set of highly descriptive features, which is extracted from a novel cell structure that helps to define support regions in a coarse-to-fine fashion. Based on the scene’s activity, only a limited number of support regions are processed, thus limiting the size of the feature set. Specifically, we use foreground occupancy and optical flow features. The framework uses an inference mechanism that evaluates the compact feature set via Gaussian Mixture Models, Markov Chains, and Bag-of-Words in order to detect abnormal events. Our framework also considers the joint response of the models in the local spatio-temporal neighborhood to increase detection accuracy. We test our framework on popular existing data sets and on a new data set comprising a wide variety of realistic videos captured by surveillance cameras. This particular data set includes surveillance videos depicting criminal activities, car accidents, and other dangerous situations. Evaluation results show that our framework outperforms other online methods and attains a very competitive detection performance compared with state-of-the-art non-online methods.


Variational Bayesian Approach to Multiframe Image Restoration

Abstract:
Image restoration is a fundamental problem in the field of image processing. The key objective of image restoration is to recover clean images from images degraded by noise and blur. Recently, a family of new statistical techniques called variational Bayes (VB) has been introduced to image restoration, which enables us to automatically tune parameters that control restoration. While information from one image is often insufficient for high-quality restoration, however, current state-of-the-art methods of image restoration via VB approaches use only a single-degraded image to recover a clean image. In this paper, we propose a novel method of multiframe image restoration via a VB approach, which can achieve higher image quality while tuning parameters automatically. Given multiple degraded images, this method jointly estimates a clean image and other parameters, including an image warping parameter introduced for the use of multiple images, through Bayesian inference that we enable by making full use of VB techniques. Through various experiments, we demonstrate the effectiveness of our multiframe method by comparing it with single-frame one, and also show the advantages of our VB approach over non-VB approaches.


Toward More Accurate Iris Recognition Using Cross-Spectral Matching

Abstract:
Iris recognition systems are increasingly deployed for large-scale applications such as national ID programs, which continue to acquire millions of iris images to establish identity among billions. However, with the availability of variety of iris sensors that are deployed for the iris imaging under different illumination/environment, significant performance degradation is expected while matching such iris images acquired under two different domains (either sensor-specific or wavelength-specific). This paper develops a domain adaptation framework to address this problem and introduces a new algorithm using Markov random fields model to significantly improve cross-domain iris recognition. The proposed domain adaptation framework based on the naive Bayes nearest neighbor classification uses a real-valued feature representation, which is capable of learning domain knowledge. Our approach to estimate corresponding visible iris patterns from the synthesis of iris patches in the near infrared iris images achieves outperforming results for the cross-spectral iris recognition. In this paper, a new class of bi-spectral iris recognition system that can simultaneously acquire visible and near infra-red images with pixel-to-pixel correspondences is proposed and evaluated. This paper presents experimental results from three publicly available databases; PolyU cross-spectral iris image database, IIITD CLI and UND database, and achieve outperforming results for the cross-sensor and cross-spectral iris matching.


The Shape Interaction Matrix-Based Affine Invariant Mismatch Removal for Partial-Duplicate Image Search

Abstract:
Mismatch removal is a key step in many computer vision problems. In this paper, we handle the mismatch removal problem by adopting shape interaction matrix (SIM). Given the homogeneous coordinates of the two corresponding point sets, we first compute the SIMs of the two point sets. Then, we detect the mismatches by picking out the most different entries between the two SIMs. Even under strong affine transformations, outliers, noises, and burstiness, our method can still work well. Actually, this paper is the first non-iterative mismatch removal method that achieves affine invariance. Extensive results on synthetic 2D points matching data sets and real image matching data sets verify the effectiveness, efficiency, and robustness of our method in removing mismatches. Moreover, when applied to partial-duplicate image search, our method reaches higher retrieval precisions with shorter time cost compared with the state-of-the-art geometric verification methods.


Structure-From-Motion in Spherical Video Using the von Mises-Fisher Distribution

Abstract:
In this paper, we present a complete pipeline for computing structure-from-motion from the sequences of spherical images. We revisit problems from multiview geometry in the context of spherical images. In particular, we propose methods suited to spherical camera geometry for the spherical-n-point problem (estimating camera pose for a spherical image) and calibrated spherical reconstruction (estimating the position of a 3-D point from multiple spherical images). We introduce a new probabilistic interpretation of spherical structure-from-motion which uses the von Mises-Fisher distribution to model noise in spherical feature point positions. This model provides an alternate objective function that we use in bundle adjustment. We evaluate our methods quantitatively and qualitatively on both synthetic and real world data and show that our methods developed for spherical images outperform straightforward adaptations of methods developed for perspective images. As an application of our method, we use the structure-from-motion output to stabilise the viewing direction in fully spherical video.


Structure-Based Low-Rank Model With Graph Nuclear Norm Regularization for Noise Removal

Abstract:
Nonlocal image representation methods, including group-based sparse coding and block-matching 3-D filtering, have shown their great performance in application to low-level tasks. The nonlocal prior is extracted from each group consisting of patches with similar intensities. Grouping patches based on intensity similarity, however, gives rise to disturbance and inaccuracy in estimation of the true images. To address this problem, we propose a structure-based low-rank model with graph nuclear norm regularization. We exploit the local manifold structure inside a patch and group the patches by the distance metric of manifold structure. With the manifold structure information, a graph nuclear norm regularization is established and incorporated into a low-rank approximation model. We then prove that the graph-based regularization is equivalent to a weighted nuclear norm and the proposed model can be solved by a weighted singular-value thresholding algorithm. Extensive experiments on additive white Gaussian noise removal and mixed noise removal demonstrate that the proposed method achieves a better performance than several state-of-the-art algorithms.


Single-Scale Fusion: An Effective Approach to Merging Images

Abstract:
Due to its robustness and effectiveness, multi-scale fusion (MSF) based on the Laplacian pyramid decomposition has emerged as a popular technique that has shown utility in many applications. Guided by several intuitive measures (weight maps) the MSF process is versatile and straightforward to be implemented. However, the number of pyramid levels increases with the image size, which implies sophisticated data management and memory accesses, as well as additional computations. Here, we introduce a simplified formulation that reduces MSF to only a single level process. Starting from the MSF decomposition, we explain both mathematically and intuitively (visually) a way to simplify the classical MSF approach with minimal loss of information. The resulting single-scale fusion (SSF) solution is a close approximation of the MSF process that eliminates important redundant computations. It also provides insights regarding why MSF is so effective. While our simplified expression is derived in the context of high dynamic range imaging, we show its generality on several well-known fusion-based applications, such as image compositing, extended depth of field, medical imaging, and blending thermal (infrared) images with visible light. Besides visual validation, quantitative evaluations demonstrate that our SSF strategy is able to yield results that are highly competitive with traditional MSF approaches.


Single Image Super-Resolution via Adaptive High-Dimensional Non-Local Total Variation and Adaptive Geometric Feature

Abstract:
Single image super-resolution (SR) is very important in many computer vision systems. However, as a highly ill-posed problem, its performance mainly relies on the prior knowledge. Among these priors, the non-local total variation (NLTV) prior is very popular and has been thoroughly studied in recent years. Nevertheless, technical challenges remain. Because NLTV only exploits a fixed non-shifted target patch in the patch search process, a lack of similar patches is inevitable in some cases. Thus, the non-local similarity cannot be fully characterized, and the effectiveness of NLTV cannot be ensured. Based on the motivation that more accurate non-local similar patches can be found by using shifted target patches, a novel multishifted similar-patch search (MSPS) strategy is proposed. With this strategy, NLTV is extended as a newly proposed super-high-dimensional NLTV (SHNLTV) prior to fully exploit the underlying non-local similarity. However, as SHNLTV is very high-dimensional, applying it directly to SR is very difficult. To solve this problem, a novel statistics-based dimension reduction strategy is proposed and then applied to SHNLTV. Thus, SHNLTV becomes a more computationally effective prior that we call adaptive high-dimensional non-local total variation (AHNLTV). In AHNLTV, a novel joint weight strategy that fully exploits the potential of the MSPS-based non-local similarity is proposed. To further boost the performance of AHNLTV, the adaptive geometric duality (AGD) prior is also incorporated. Finally, an efficient split Bregman iteration-based algorithm is developed to solve the AHNLTV-AGD-driven minimization problem. Extensive experiments validate the proposed method achieves better results than many state-of-the-art SR methods in terms of both objective and subjective qualities.


Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples

Abstract:
This paper addresses the problem of face recognition when there is only few, or even only a single, labeled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e., additive nuisance variables, such as bad lighting and wearing of glasses) and non-linear (i.e., non-additive pixel-wise nuisance variables, such as expression changes). The small number of labeled examples means that it is hard to remove these nuisance variables between the training and testing faces to obtain good recognition performance. To address the problem, we propose a method called semi-supervised sparse representation-based classification. This is based on recent work on sparsity, where faces are represented in terms of two dictionaries: a gallery dictionary consisting of one or more examples of each person, and a variation dictionary representing linear nuisance variables (e.g., different lighting conditions and different glasses). The main idea is that: we use the variation dictionary to characterize the linear nuisance variables via the sparsity framework and prototype face images are estimated as a gallery dictionary via a Gaussian mixture model, with mixed labeled and unlabeled samples in a semi-supervised manner, to deal with the non-linear nuisance variations between labeled and unlabeled samples. We have done experiments with insufficient labeled samples, even when there is only a single labeled sample per person. Our results on the AR, Multi-PIE, CAS-PEAL, and LFW databases demonstrate that the proposed method is able to deliver significantly improved performance over existing methods.


Semi-Supervised Multi-View Discrete Hashing for Fast Image Search

Abstract:
Hashing is an important method for fast neighbor search on large scale dataset in Hamming space. While most research on hash models are focusing on single-view data, recently the multi-view approaches with a majority of unsupervised multi-view hash models have been considered. Despite of existence of millions of unlabeled data samples, it is believed that labeling a handful of data will remarkably improve the searching performance. In this paper, we propose a semi-supervised multi-view hash model. Besides incorporating a portion of label information into the model, the proposed multi-view model differs from existing multi-view hash models in three-fold: 1) a composite discrete hash learning modeling that is able to minimize the loss jointly on multi-view features when using relaxation on learning hashing codes; 2) exploring statistically uncorrelated multi-view features for generating hash codes; and 3) a composite locality preserving modeling for locally compact coding. Extensive experiments have been conducted to show the effectiveness of the proposed semi-supervised multi-view hash model as compared with related multi-view hash models and semi-supervised hash models.


Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval

Abstract:
Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, letting alone the unsupervised retrieval task. We propose the selective convolutional descriptor aggregation (SCDA) method. The SCDA first localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and the dimensionality is reduced into a short feature vector using the best practices we found. The SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained data sets confirm the effectiveness of the SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA’s high-mean average precision in fine-grained retrieval. Moreover, on general image retrieval data sets, the SCDA achieves comparable retrieval results with the state-of-the-art general image retrieval approaches.


Robust Multi-Exposure Image Fusion: A Structural Patch Decomposition Approach

Abstract:
We propose a simple yet effective structural patch decomposition approach for multi-exposure image fusion (MEF) that is robust to ghosting effect. We decompose an image patch into three conceptually independent components: signal strength, signal structure, and mean intensity. Upon fusing these three components separately, we reconstruct a desired patch and place it back into the fused image. This novel patch decomposition approach benefits MEF in many aspects. First, as opposed to most pixel-wise MEF methods, the proposed algorithm does not require post-processing steps to improve visual quality or to reduce spatial artifacts. Second, it handles RGB color channels jointly, and thus produces fused images with more vivid color appearance. Third and most importantly, the direction of the signal structure component in the patch vector space provides ideal information for ghost removal. It allows us to reliably and efficiently reject inconsistent object motions with respect to a chosen reference image without performing computationally expensive motion estimation. We compare the proposed algorithm with 12 MEF methods on 21 static scenes and 12 deghosting schemes on 19 dynamic scenes (with camera and object motion). Extensive experimental results demonstrate that the proposed algorithm not only outperforms previous MEF algorithms on static scenes but also consistently produces high quality fused images with little ghosting artifacts for dynamic scenes. Moreover, it maintains a lower computational cost compared with the state-of-the-art deghosting schemes.


Robust ImageGraph: Rank-Level Feature Fusion for Image Search

Abstract:
Recently, feature fusion has demonstrated its effectiveness in image search. However, bad features and inappropriate parameters usually bring about false positive images, i.e., outliers, leading to inferior performance. Therefore, a major challenge of fusion scheme is how to be robust to outliers. Towards this goal, this paper proposes a rank-level framework for robust feature fusion. First, we define Rank Distance to measure the relevance of images at rank level. Based on it, Bayes similarity is introduced to evaluate the retrieval quality of individual features, through which true matches tend to obtain higher weight than outliers. Then, we construct the directed ImageGraph to encode the relationship of images. Each image is connected to its K nearest neighbors with an edge, and the edge is weighted by Bayes similarity. Multiple rank lists resulted from different methods are merged via ImageGraph. Furthermore, on the fused ImageGraph, local ranking is performed to re-order the initial rank lists. It aims at local optimization, and thus is more robust to global outliers. Extensive experiments on four benchmark data sets validate the effectiveness of our method. Besides, the proposed method outperforms two popular fusion schemes, and the results are competitive to the state-of-the-art.


Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions

Abstract:
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available data sets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.


Robust and Low-Rank Representation for Fast Face Identification With Occlusions

Abstract:
In this paper, we propose an iterative method to address the face identification problem with block occlusions. Our approach utilizes a robust representation based on two characteristics in order to model contiguous errors (e.g., block occlusion) effectively. The first fits to the errors a distribution described by a tailored loss function. The second describes the error image as having a specific structure (resulting in low-rank in comparison with image size). We will show that this joint characterization is effective for describing errors with spatial continuity. Our approach is computationally efficient due to the utilization of the alternating direction method of multipliers. A special case of our fast iterative algorithm leads to the robust representation method, which is normally used to handle non-contiguous errors (e.g., pixel corruption). Extensive results on representative face databases (in constrained and unconstrained environments) document the effectiveness of our method over existing robust representation methods with respect to both identification rates and computational time.


RGBD Salient Object Detection via Deep Fusion

Abstract:
Numerous efforts have been made to design various low-level saliency cues for RGBD saliency detection, such as color and depth contrast features as well as background and color compactness priors. However, how these low-level saliency cues interact with each other and how they can be effectively incorporated to generate a master saliency map remain challenging problems. In this paper, we design a new convolutional neural network (CNN) to automatically learn the interaction mechanism for RGBD salient object detection. In contrast to existing works, in which raw image pixels are fed directly to the CNN, the proposed method takes advantage of the knowledge obtained in traditional saliency detection by adopting various flexible and interpretable saliency feature vectors as inputs. This guides the CNN to learn a combination of existing features to predict saliency more effectively, which presents a less complex problem than operating on the pixels directly. We then integrate a superpixel-based Laplacian propagation framework with the trained CNN to extract a spatially consistent saliency map by exploiting the intrinsic structure of the input image. Extensive quantitative and qualitative experimental evaluations on three data sets demonstrate that the proposed method consistently outperforms the state-of-the-art methods.


Predicting the Quality of Fused Long Wave Infrared and Visible Light Images

Abstract:
The capability to automatically evaluate the quality of long wave infrared (LWIR) and visible light images has the potential to play an important role in determining and controlling the quality of a resulting fused LWIR-visible light image. Extensive work has been conducted on studying the statistics of natural LWIR and visible images. Nonetheless, there has been little work done on analyzing the statistics of fused LWIR and visible images and associated distortions. In this paper, we analyze five multi-resolution-based image fusion methods in regards to several common distortions, including blur, white noise, JPEG compression, and non-uniformity. We study the natural scene statistics of fused images and how they are affected by these kinds of distortions. Furthermore, we conducted a human study on the subjective quality of pristine and degraded fused LWIR-visible images. We used this new database to create an automatic opinion-distortion-unaware fused image quality model and analyzer algorithm. In the human study, 27 subjects evaluated 750 images over five sessions each. We also propose an opinion-aware fused image quality analyzer, whose relative predictions with respect to other state-of-the-art models correlate better with human perceptual evaluations than competing methods. An implementation of the proposed fused image quality measures can be found at https://github.com/ujemd/NSS-of-LWIR-and-Vissible-Images. Also, the new database can be found at http://bit.ly/2noZlbQ.


Person Re-Identification via Distance Metric Learning With Latent Variables

Abstract:
In this paper, we propose an effective person re-identification method with latent variables, which represents a pedestrian as the mixture of a holistic model and a number of flexible models. Three types of latent variables are introduced to model uncertain factors in the re-identification problem, including vertical misalignments, horizontal misalignments and leg posture variations. The distance between two pedestrians can be determined by minimizing a given distance function with respect to latent variables, and then be used to conduct the re-identification task. In addition, we develop a latent metric learning method for learning the effective metric matrix, which can be solved via an iterative manner: once latent information is specified, the metric matrix can be obtained based on some typical metric learning methods; with the computed metric matrix, the latent variables can be determined by searching the state space exhaustively. Finally, extensive experiments are conducted on seven databases to evaluate the proposed method. The experimental results demonstrate that our method achieves better performance than other competing algorithms.


Pairwise Operator Learning for Patch-Based Single-Image Super-Resolution

Abstract:
Motivated by the fact that image patches could be inherently represented by matrices, single-image super-resolution is treated as a problem of learning regression operators in a matrix space in this paper. The regression operators that map low-resolution image patches to high-resolution image patches are generally defined by the left and right multiplication operators. The pairwise operators are, respectively, used to extract the raw and column information of low-resolution image patches for recovering high-resolution estimations. The patch-based regression algorithm possesses three favorable properties. First, the proposed super-resolution algorithm is efficient during both training and testing, because image patches are treated as matrices. Second, the data storage requirement of the optimal pairwise operator is far less than most popular single-image super-resolution algorithms, because only two small sized matrices need to be stored. Last, the super-resolution performance is competitive with most popular single-image super-resolution algorithms, because both raw and column information of image patches is considered. Experimental results show the efficiency and effectiveness of the proposed patch-based single-image super-resolution algorithm.


Object-Based Multiple Foreground Segmentation in RGBD Video

Abstract:
We present an RGB and Depth (RGBD) video segmentation method that takes advantage of depth data and can extract multiple foregrounds in the scene. This video segmentation is addressed as an object proposal selection problem formulated in a fully-connected graph, where a flexible number of foregrounds may be chosen. In our graph, each node represents a proposal, and the edges model intra-frame and inter-frame constraints on the solution. The proposals are selected based on an RGBD video saliency map in which depth-based features are utilized to enhance the identification of foregrounds. Experiments show that the proposed multiple foreground segmentation method outperforms related techniques, and the depth cue serves as a helpful complement to RGB features. Moreover, our method provides performance comparable to the state-of-the-art RGB video segmentation techniques on regular RGB videos with estimated depth maps.


Linear Spectral Clustering Superpixel

Abstract:
In this paper, we present a superpixel segmentation algorithm called linear spectral clustering (LSC), which is capable of producing superpixels with both high boundary adherence and visual compactness for natural images with low computational costs. In LSC, a normalized cuts-based formulation of image segmentation is adopted using a distance metric that measures both the color similarity and the space proximity between image pixels. However, rather than directly using the traditional eigen-based algorithm, we approximate the similarity metric through a deliberately designed kernel function such that pixel values can be explicitly mapped to a high-dimensional feature space. We then apply the conclusion that by appropriately weighting each point in this feature space, the objective functions of the weighted K-means and the normalized cuts share the same optimum points. Consequently, it is possible to optimize the cost function of the normalized cuts by iteratively applying simple K-means clustering in the proposed feature space. LSC possesses linear computational complexity and high memory efficiency, since it avoids both the decomposition of the affinity matrix and the generation of the large kernel matrix. By utilizing the underlying mathematical equivalence between the two types of seemingly different methods, LSC successfully preserves global image structures through efficient local operations. Experimental results show that LSC performs as well as or even better than the state-of-the-art superpixel segmentation algorithms in terms of several commonly used evaluation metrics in image segmentation. The applicability of LSC is further demonstrated in two related computer vision tasks.


Learning to Hash With Optimized Anchor Embedding for Scalable Retrieval

Abstract:
Sparse representation and image hashing are powerful tools for data representation and image retrieval respectively. The combinations of these two tools for scalable image retrieval, i.e., sparse hashing (SH) methods, have been proposed in recent years and the preliminary results are promising. The core of those methods is a scheme that can efficiently embed the (high-dimensional) image features into a low-dimensional Hamming space, while preserving the similarity between features. Existing SH methods mostly focus on finding better sparse representations of images in the hash space. We argue that the anchor set utilized in sparse representation is also crucial, which was unfortunately underestimated by the prior art. To this end, we propose a novel SH method that optimizes the integration of the anchors, such that the features can be better embedded and binarized, termed as Sparse Hashing with Optimized Anchor Embedding. The central idea is to push the anchors far from the axis while preserving their relative positions so as to generate similar hashcodes for neighboring features. We formulate this idea as an orthogonality constrained maximization problem and an efficient and novel optimization framework is systematically exploited. Extensive experiments on five benchmark image data sets demonstrate that our method outperforms several state-of-the-art related methods.


Learning Short Binary Codes for Large-scale Image Retrieval

Abstract:
Large-scale visual information retrieval has become an active research area in this big data era. Recently, hashing/binary coding algorithms prove to be effective for scalable retrieval applications. Most existing hashing methods require relatively long binary codes (i.e., over hundreds of bits, sometimes even thousands of bits) to achieve reasonable retrieval accuracies. However, for some realistic and unique applications, such as on wearable or mobile devices, only short binary codes can be used for efficient image retrieval due to the limitation of computational resources or bandwidth on these devices. In this paper, we propose a novel unsupervised hashing approach called min-cost ranking (MCR) specifically for learning powerful short binary codes (i.e., usually the code length shorter than 100 b) for scalable image retrieval tasks. By exploring the discriminative ability of each dimension of data, MCR can generate one bit binary code for each dimension and simultaneously rank the discriminative separability of each bit according to the proposed cost function. Only top-ranked bits with minimum cost-values are then selected and grouped together to compose the final salient binary codes. Extensive experimental results on large-scale retrieval demonstrate that MCR can achieve comparative performance as the state-of-the-art hashing algorithms but with significantly shorter codes, leading to much faster large-scale retrieval.


Learning Multilayer Channel Features for Pedestrian Detection

Abstract:
Pedestrian detection based on the combination of convolutional neural network (CNN) and traditional handcrafted features (i.e., HOG+LUV) has achieved great success. In general, HOG+LUV are used to generate the candidate proposals and then CNN classifies these proposals. Despite its success, there is still room for improvement. For example, CNN classifies these proposals by the fully connected layer features, while proposal scores and the features in the inner-layers of CNN are ignored. In this paper, we propose a unifying framework called multi-layer channel features (MCF) to overcome the drawback. It first integrates HOG+LUV with each layer of CNN into a multi-layer image channels. Based on the multi-layer image channels, a multi-stage cascade AdaBoost is then learned. The weak classifiers in each stage of the multi-stage cascade are learned from the image channels of corresponding layer. Experiments on Caltech data set, INRIA data set, ETH data set, TUD-Brussels data set, and KITTI data set are conducted. With more abundant features, an MCF achieves the state of the art on Caltech pedestrian data set (i.e., 10.40% miss rate). Using new and accurate annotations, an MCF achieves 7.98% miss rate. As many non-pedestrian detection windows can be quickly rejected by the first few stages, it accelerates detection speed by 1.43 times. By eliminating the highly overlapped detection windows with lower scores after the first stage, it is 4.07 times faster than negligible performance loss.


Latent Semantic Minimal Hashing for Image Retrieval

Abstract:
Hashing-based similarity search is an important technique for large-scale query-by-example image retrieval system, since it provides fast search with computation and memory efficiency. However, it is a challenge work to design compact codes to represent original features with good performance. Recently, a lot of unsupervised hashing methods have been proposed to focus on preserving geometric structure similarity of the data in the original feature space, but they have not yet fully refined image features and explored the latent semantic feature embedding in the data simultaneously. To address the problem, in this paper, a novel joint binary codes learning method is proposed to combine image feature to latent semantic feature with minimum encoding loss, which is referred as latent semantic minimal hashing. The latent semantic feature is learned based on matrix decomposition to refine original feature, thereby it makes the learned feature more discriminative. Moreover, a minimum encoding loss is combined with latent semantic feature learning process simultaneously, so as to guarantee the obtained binary codes are discriminative as well. Extensive experiments on several well-known large databases demonstrate that the proposed method outperforms most state-of-the-art hashing methods.


Joint Defogging and Demosaicking

Abstract:
Image defogging is a technique used extensively for enhancing visual quality of images in bad weather conditions. Even though defogging algorithms have been well studied, defogging performance is degraded by demosaicking artifacts and sensor noise amplification in distant scenes. In order to improve the visual quality of restored images, we propose a novel approach to perform defogging and demosaicking simultaneously. We conclude that better defogging performance with fewer artifacts can be achieved when a defogging algorithm is combined with a demosaicking algorithm simultaneously. We also demonstrate that the proposed joint algorithm has the benefit of suppressing noise amplification in distant scenes. In addition, we validate our theoretical analysis and observations for both synthesized data sets with ground truth fog-free images and natural scene data sets captured in a raw format.


Haze Removal Using the Difference- Structure-Preservation Prior

Abstract:
Fog cover is generally present in outdoor scenes, which limits the potential for efficient information extraction from images. In this paper, the goal of the developed algorithm is to obtain an optimal transmission map as well as to remove hazes from a single input image. To solve the problem, we meticulously analyze the optical model and recast the initial transmission map under an additional boundary prior. For better preservation of the results, the difference-structure-preservation dictionary could be learned, such that the local consistency features of the transmission map could be well preserved after coefficient shrinkage. Experimental results show that the method preserves the natural appearance of the image.


Graph-Driven Diffusion and Random Walk Schemes for Image Segmentation

Abstract:
We propose graph-driven approaches to image segmentation by developing diffusion processes defined on arbitrary graphs. We formulate a solution to the image segmentation problem modeled as the result of infectious wavefronts propagating on an image-driven graph, where pixels correspond to nodes of an arbitrary graph. By relating the popular susceptible-infected-recovered epidemic propagation model to the Random Walker algorithm, we develop the normalized random walker and a lazy random walker variant. The underlying iterative solutions of these methods are derived as the result of infections transmitted on this arbitrary graph. The main idea is to incorporate a degree-aware term into the original Random Walker algorithm in order to account for the node centrality of every neighboring node and to weigh the contribution of every neighbor to the underlying diffusion process. Our lazy random walk variant models the tendency of patients or nodes to resist changes in their infection status. We also show how previous work can be naturally extended to take advantage of this degree-aware term, which enables the design of other novel methods. Through an extensive experimental analysis, we demonstrate the reliability of our approach, its small computational burden and the dimensionality reduction capabilities of graph-driven approaches. Without applying any regular grid constraint, the proposed graph clustering scheme allows us to consider pixel-level, node-level approaches, and multidimensional input data by naturally integrating the importance of each node to the final clustering or segmentation solution. A software release containing implementations of this paper and supplementary material can be found at: http://cvsp.cs.ntua.gr/research/GraphClustering/.


Fast Bayesian JPEG Decompression and Denoising With Tight Frame Priors

Abstract:
JPEG decompression can be understood as an image reconstruction problem similar to denoising or deconvolution. Such problems can be solved within the Bayesian maximum a posteriori probability framework by iterative optimization algorithms. Prior knowledge about an image is usually described by the l1 norm of its sparse domain representation. For many problems, if the sparse domain forms a tight frame, optimization by the alternating direction method of multipliers can be very efficient. However, for JPEG, such solution is not straightforward, e.g., due to quantization and subsampling of chrominance channels. Derivation of such solution is the main contribution of this paper. In addition, we show that a minor modification of the proposed algorithm solves simultaneously the problem of image denoising. In the experimental section, we analyze the behavior of the proposed decompression algorithm in a small number of iterations with an interesting conclusion that this mode outperforms full convergence. Example images demonstrate the visual quality of decompression and quantitative experiments compare the algorithm with other state-of-the-art methods.


Enhanced Just Noticeable Difference Model for Images With Pattern Complexity

Abstract:
The just noticeable difference (JND) in an image, which reveals the visibility limitation of the human visual system (HVS), is widely used for visual redundancy estimation in signal processing. To determine the JND threshold with the current schemes, the spatial masking effect is estimated as the contrast masking, and this cannot accurately account for the complicated interaction among visual contents. Research on cognitive science indicates that the HVS is highly adapted to extract the repeated patterns for visual content representation. Inspired by this, we formulate the pattern complexity as another factor to determine the total masking effect: the interaction is relatively straightforward with a limited masking effect in a regular pattern, and is complicated with a strong masking effect in an irregular pattern. From the orientation selectivity mechanism in the primary visual cortex, the response of each local receptive field can be considered as a pattern; therefore, in this paper, the orientation that each pixel presents is regarded as the fundamental element of a pattern, and the pattern complexity is calculated as the diversity of the orientation in a local region. Finally, considering both pattern complexity and luminance contrast, a novel spatial masking estimation function is deduced, and an improved JND estimation model is built. Experimental results on comparing with the latest JND models demonstrate the effectiveness of the proposed model, which performs highly consistent with the human perception. The source code of the proposed model is publicly available at http://web.xidian.edu.cn/wjj/en/index.html.


Discriminative Multi-View Interactive Image Re-Ranking

Abstract:
Given an unreliable visual patterns and insufficient query information, content-based image retrieval is often suboptimal and requires image re-ranking using auxiliary information. In this paper, we propose a discriminative multi-view interactive image re-ranking (DMINTIR), which integrates user relevance feedback capturing users’ intentions and multiple features that sufficiently describe the images. In DMINTIR, heterogeneous property features are incorporated in the multi-view learning scheme to exploit their complementarities. In addition, a discriminatively learned weight vector is obtained to reassign updated scores and target images for re-ranking. Compared with other multi-view learning techniques, our scheme not only generates a compact representation in the latent space from the redundant multi-view features but also maximally preserves the discriminative information in feature encoding by the large-margin principle. Furthermore, the generalization error bound of the proposed algorithm is theoretically analyzed and shown to be improved by the interactions between the latent space and discriminant function learning. Experimental results on two benchmark data sets demonstrate that our approach boosts baseline retrieval quality and is competitive with the other state-of-the-art re-ranking strategies.


Discriminative Elastic-Net Regularized Linear Regression

Abstract:
In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.


Discriminant Context Information Analysis for Post-Ranking Person Re-Identification

Abstract:
Existing approaches for person re-identification are mainly based on creating distinctive representations or on learning optimal metrics. The achieved results are then provided in the form of a list of ranked matching persons. It often happens that the true match is not ranked first but it is in the first positions. This is mostly due to the visual ambiguities shared between the true match and other “similar” persons. At the current state, there is a lack of a study of such visual ambiguities which limit the re-identification performance within the first ranks. We believe that an analysis of the similar appearances of the first ranks can be helpful in detecting, hence removing, such visual ambiguities. We propose to achieve such a goal by introducing an unsupervised post-ranking framework. Once the initial ranking is available, content and context sets are extracted. Then, these are exploited to remove the visual ambiguities and to obtain the discriminant feature space which is finally exploited to compute the new ranking. An in-depth analysis of the performance achieved on three public benchmark data sets support our believes. For every data set, the proposed method remarkably improves the first ranks results and outperforms the state-of-the-art approaches.


Detail-Enhanced Multi-Scale Exposure Fusion

Abstract:
Multi-scale exposure fusion is an effective image enhancement technique for a high dynamic range (HDR) scene. In this paper, a new multi-scale exposure fusion algorithm is proposed to merge differently exposed low dynamic range (LDR) images by using the weighted guided image filter to smooth the Gaussian pyramids of weight maps for all the LDR images. Details in the brightest and darkest regions of the HDR scene are preserved better by the proposed algorithm without relative brightness change in the fused image. In addition, a new weighted structure tensor is introduced to the differently exposed images and it is adopted to design a detail extraction component for the proposed fusion algorithm, such that users are allowed to manipulate fine details in the enhanced image according to their preference. The proposed multi-scale exposure fusion algorithm is also applied to design a simple single image brightening algorithm for both low-light imaging and back-light imaging.


Correlated Topic Vector for Scene Classification

Abstract:
Scene images usually involve semantic correlations, particularly when considering large-scale image data sets. This paper proposes a novel generative image representation, correlated topic vector, to model such semantic correlations. Oriented from the correlated topic model, correlated topic vector intends to naturally utilize the correlations among topics, which are seldom considered in the conventional feature encoding, e.g., Fisher vector, but do exist in scene images. It is expected that the involvement of correlations can increase the discriminative capability of the learned generative model and consequently improve the recognition accuracy. Incorporated with the Fisher kernel method, correlated topic vector inherits the advantages of Fisher vector. The contributions to the topics of visual words have been further employed by incorporating the Fisher kernel framework to indicate the differences among scenes. Combined with the deep convolutional neural network (CNN) features and Gibbs sampling solution, correlated topic vector shows great potential when processing large-scale and complex scene image data sets. Experiments on two scene image data sets demonstrate that correlated topic vector improves significantly the deep CNN features, and outperforms existing Fisher kernel-based features.


Comments on “Steganography Using Reversible Texture Synthesis”

Abstract:
Message hiding in texture image synthesis is a novel steganography approach by which we resample a smaller texture image and synthesize a new texture image with a similar local appearance and an arbitrary size. However, the mirror operation over the image boundary is flawed and is easy to attack. We propose an attacking method on this steganography, which can not only detect the stego-images but can also extract the hidden messages.


Clearing the Skies: A Deep Network Architecture for Single-Image Rain Removal

Abstract:
We introduce a deep network architecture called DerainNet for removing rain streaks from an image. Based on the deep convolutional neural network (CNN), we directly learn the mapping relationship between rainy and clean image detail layers from data. Because we do not possess the ground truth corresponding to real-world rainy images, we synthesize images with rain for training. In contrast to other common strategies that increase depth or breadth of the network, we use image processing domain knowledge to modify the objective function and improve deraining with a modestly sized CNN. Specifically, we train our DerainNet on the detail (high-pass) layer rather than in the image domain. Though DerainNet is trained on synthetic data, we find that the learned network translates very effectively to real-world images for testing. Moreover, we augment the CNN framework with image enhancement to improve the visual results. Compared with the state-of-the-art single image de-raining methods, our method has improved rain removal and much faster computation time after network training.


Blind Facial Image Quality Enhancement Using Non-Rigid Semantic Patches

Abstract:
We propose a new way to solve a very general blind inverse problem of multiple simultaneous degradations, such as blur, resolution reduction, noise, and contrast changes, without explicitly estimating the degradation. The proposed concept is based on combining semantic non-rigid patches, problem-specific high-quality prior data, and non-rigid registration tools. We show how a significant quality enhancement can be achieved, both visually and quantitatively, in the case of facial images. The method is demonstrated on the problem of cellular photography quality enhancement of dark facial images for different identities, expressions, and poses, and is compared with the state-of-the-art denoising, deblurring, super-resolution, and color-correction methods.


Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Abstract:
The discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance. In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. Specifically, residual learning and batch normalization are utilized to speed up the training process as well as boost the denoising performance. Different from the existing discriminative denoising models which usually train a specific model for additive white Gaussian noise at a certain noise level, our DnCNN model is able to handle Gaussian denoising with unknown noise level (i.e., blind Gaussian denoising). With the residual learning strategy, DnCNN implicitly removes the latent clean image in the hidden layers. This property motivates us to train a single DnCNN model to tackle with several general image denoising tasks, such as Gaussian denoising, single image super-resolution, and JPEG image deblocking. Our extensive experiments demonstrate that our DnCNN model can not only exhibit high effectiveness in several general image denoising tasks, but also be efficiently implemented by benefiting from GPU computing.


Page 1 of 212
RECENT PAPERS