info@itechprosolutions.in | +91 9790176891

IMAGE PROCESSING 2018 PROJECTS

Category Archives

Contrast Enhancement Based on Intrinsic Image Decomposition

Abstract:

In this paper, we propose to introduce intrinsic image decomposition priors into decomposition models for contrast enhancement. Since image decomposition is a highly illposed problem, we introduce constraints on both reflectance and illumination layers to yield a highly reliable solution. We regularize the reflectance layer to be piecewise constant by introducing a weighted ℓ1 norm constraint on neighboring pixels according to the color similarity, so that the decomposed reflectance would not be affected much by the illumination information. The illumination layer is regularized by a piecewise smoothness constraint. The proposed model is effectively solved by the Split Bregman algorithm. Then, by adjusting the illumination layer, we obtain the enhancement result. To avoid potential color artifacts introduced by illumination adjusting and reduce computing complexity, the proposed decomposition model is performed on the value channel in HSV space. Experiment results demonstrate that the proposed method performs well for a wide variety of images, and achieves better or comparable subjective and objective quality compared with the state-of-the-art methods.

Image Forgery Localization via Integrating Tampering Possibility Maps

Abstract:

Over the past decade, many efforts have been made in passive image forensics. Although it is able to detect tampered images at high accuracies based on some carefully designed mechanisms, localization of the tampered regions in a fake image still presents many challenges, especially when the type of tampering operation is unknown. Some researchers have realized that it is necessary to integrate different forensic approaches in order to obtain better localization performance. However, several important issues have not been comprehensively studied, for example, how to select and improve/readjust proper forensic approaches, and how to fuse the detection results of different forensic approaches to obtain good localization results. In this paper, we propose a framework to improve the performance of forgery localization via integrating tampering possibility maps. In the proposed framework, we first select and improve two existing forensic approaches, i.e., statistical feature-based detector and copy-move forgery detector, and then adjust their results to obtain tampering possibility maps. After investigating the properties of possibility maps and comparing various fusion schemes, we finally propose a simple yet very effective strategy to integrate the tampering possibility maps to obtain the final localization results. The extensive experiments show that the two improved approaches used in our framework significantly outperform the state-of-the-art techniques, and the proposed fusion results achieve the best F1-score in the IEEE IFS-TC Image Forensics Challenge.

A smart phone image processing application for plant disease diagnosis

Abstract:

Although professional agriculture engineers are responsible for the recognition of plant diseases, intelligent systems can be used for their diagnosis in early stages. The expert systems that have been proposed in the literature for this purpose, are often based on facts described by the user or image processing of plant photos in visible, infrared, light etc. The recognition of a disease can often be based on symptoms like lesions or spots in various parts of a plant. The color, area and the number of these spots can determine to a great extent the disease that has mortified a plant. Higher cost molecular analyses and tests can follow if necessary. A Windows Phone application is described here capable of recognizing vineyard diseases through photos of the leaves with an accuracy higher than 90%. This application can easily be extended for different plant diseases and different smart phone platforms.

Contrast Enhancement Based on Intrinsic Image Decomposition

Abstract:

In this paper, we propose to introduce intrinsic image decomposition priors into decomposition models for contrast enhancement. Since image decomposition is a highly illposed problem, we introduce constraints on both reflectance and illumination layers to yield a highly reliable solution. We regularize the reflectance layer to be piecewise constant by introducing a weighted ℓ1 norm constraint on neighboring pixels according to the color similarity, so that the decomposed reflectance would not be affected much by the illumination information. The illumination layer is regularized by a piecewise smoothness constraint. The proposed model is effectively solved by the Split Bregman algorithm. Then, by adjusting the illumination layer, we obtain the enhancement result. To avoid potential color artifacts introduced by illumination adjusting and reduce computing complexity, the proposed decomposition model is performed on the value channel in HSV sp

Retinal Disease Screening Through Local Binary Patterns

Abstract:

This paper investigates discrimination capabilities in the texture of fundus images to differentiate between pathological and healthy images. For this purpose, the performance of local binary patterns (LBP) as a texture descriptor for retinal images has been explored and compared with other descriptors such as LBP filtering and local phase quantization. The goal is to distinguish between diabetic retinopathy (DR), age-related macular degeneration (AMD), and normal fundus images analyzing the texture of the retina background and avoiding a previous lesion segmentation stage. Five experiments (separating DR from normal, AMD from normal, pathological from normal, DR from AMD, and the three different classes) were designed and validated with the proposed procedure obtaining promising results. For each experiment, several classifiers were tested. An average sensitivity and specificity higher than 0.86 in all the cases and almost of 1 and 0.99, respectively, for AMD detection were achieved. These results suggest that the method presented in this paper is a robust algorithm for describing retina texture and can be useful in a diagnosis aid system for retinal disease screening.

A Survey on Brain Tumor Detection Using Image Processing Techniques

Abstract:

Biomedical Image Processing is a growing and demanding field. It comprises of many different types of imaging methods likes CT scans, X-Ray and MRI. These techniques allow us to identify even the smallest abnormalities in the human body. The primary goal of medical imaging is to extract meaningful and accurate information from these images with the least error possible. Out of the various types of medical imaging processes available to us, MRI is the most reliable and safe. It does not involve exposing the body to any sorts of harmful radiation. This MRI can then be processed, and the tumor can be segmented. Tumor Segmentation includes the use of several different techniques. The whole process of detecting brain tumor from an MRI can be classified into four different categories: Pre-Processing, Segmentation, Optimization and Feature Extraction. This survey involves reviewing the research by other professionals and compiling it into one paper.

A Memristive Multilayer Cellular Neural Network With Applications to Image Processing

Abstract:

The memristor has been extensively studied in electrical engineering and biological sciences as a means to compactly implement the synaptic function in neural networks. The cellular neural network (CNN) is one of the most implementable artificial neural network models and capable of massively parallel analog processing. In this paper, a novel memristive multilayer CNN (Mm-CNN) model is presented along with its performance analysis and applications. In this new CNN design, the memristor crossbar circuit acts as the synapse, which realizes one signed synaptic weight with a pair of memristors and performs the synaptic weighting compactly and linearly. Moreover, the complex weighted summation is executed in an efficient way with a proper design of Mm-CNN cell circuits. The proposed Mm-CNN has several merits, such as compactness, nonvolatility, versatility, and programmability of synaptic weights. Its performance in several image processing applications is illustrated through simulations.

Comments on “Steganography Using Reversible Texture Synthesis”

Abstract:

Message hiding in texture image synthesis is a novel steganography approach by which we resample a smaller texture image and synthesize a new texture image with a similar local appearance and an arbitrary size. However, the mirror operation over the image boundary is flawed and is easy to attack. We propose an attacking method on this steganography, which can not only detect the stego-images but can also extract the hidden messages.

A New Rule for Cost Reassignment in Adaptive Steganography

Abstract:

In steganography schemes, the distortion function is used to define modification costs on cover elements, which is distinctly vital to the security of modern adaptive steganography. There are several successful rules for reassigning the costs defined by a given distortion function, which can promote the security level of the corresponding steganographic algorithm. In this paper, we propose a novel cost reassignment rule, which is applied to not one but a batch of existing distortion functions. We find that the costs assigned on some pixels by several steganographic methods may be very different even though these methods exhibit close security levels. We call such pixels “controversial pixel”. Experimental results show that steganalysis features are not sensitive to controversial pixels; therefore, these pixels are suitable to carry more payloads. We name this rule the controversial pixels prior (CPP) rule. Following the rule, we propose a cost reassignment scheme. Through extensive experiments on several kinds of stego algorithms, steganalysis features, and cover databases, we demonstrate that the CPP rule can improve the security of the state-of-the-art steganographic algorithms for spatial images.

Enhanced password processing scheme based on visual cryptography and OCR

Abstract:

Traditional password conversion scheme for user authentication is to transform the passwords into hash values. These hash-based password schemes are comparatively simple and fast because those are based on text and famed cryptography. However, those can be exposed to cyber-attacks utilizing password by cracking tool or hash-cracking online sites. Attackers can thoroughly figure out an original password from hash value when that is relatively simple and plain. As a result, many hacking accidents have been happened predominantly in systems adopting those hash-based schemes. In this work, we suggest enhanced password processing scheme based on image using visual cryptography (VC). Different from the traditional scheme based on hash and text, our scheme transforms a user ID of text type to two images encrypted by VC. The user should make two images consisted of subpixels by random function with SEED which includes personal information. The server only has user’s ID and one of the images instead of password. When the user logs in and sends another image, the server can extract ID by utilizing OCR (Optical Character Recognition). As a result, it can authenticate user by comparing extracted ID with the saved one. Our proposal has lower computation, prevents cyber-attack aimed at hash-cracking, and supports authentication not to expose personal information such as ID to attackers.

Reversible Data Hiding in Encrypted Images With Distributed Source Encoding

Abstract:

This paper proposes a novel scheme of reversible data hiding in encrypted images using distributed source coding. After the original image is encrypted by the content owner using a stream cipher, the data-hider compresses a series of selected bits taken from the encrypted image to make room for the secret data. The selected bit series is Slepian-Wolf encoded using low-density parity check codes. On the receiver side, the secret bits can be extracted if the image receiver has the embedding key only. In case the receiver has the encryption key only, he/she can recover the original image approximately with high quality using an image estimation algorithm. If the receiver has both the embedding and encryption keys, he/she can extract the secret data and perfectly recover the original image using the distributed source decoding. The proposed method outperforms the previously published ones.

Recovery of lost color and depth frames in multiview videos

Abstract:

In this paper, we consider an integrated error concealment system for lost color frames and lost depth frames in multiview videos with depths. We first proposed a pixel-based color error-concealment method with the use of depth information. Instead of assuming that the same moving object in consecutive frames has minimal depth difference, as is done in a state-of-the-art method, a more realistic situation in which the same moving object in consecutive frames can be in different depths is considered. In the derived motion vector candidate set, we consider all the candidate motion vectors in the set, and weight the reference pixels by the depth differences to obtain the final recovered pixel. Compared to two state-of-the-art methods, the proposed method has average PSNR gains of up to 8.73 dB and 3.98 dB respectively. Second, we proposed an iterative depth frame error-concealment method. The initial recovered depth frame is obtained by DIBR (depth-image-based rendering) from another available view. The holes in the recovered depth frame are then filled in the proposed priority order. Preprocessing methods (depth difference compensation and inconsistent pixel removal) are performed to improve the performance. Compared with a method that uses the available motion vector in a color frame to recover the lost depth pixels, the HMVE (hybrid motion vector extrapolation) method, the inpainting method and the proposed method have gains of up to 4.31 dB, 10.29 dB and 6.04 dB, respectively. Finally, for the situation in which the color and the depth frames are lost at the same time, our two methods jointly perform better with a gain of up to 7.79 dB.

Exploring Duplicated Regions in Natural Images

Abstract:

Duplication of image regions is a common method for manipulating original images, using typical software like Adobe Photoshop, 3DS MAX, etc. In this study, we propose a duplication detection approach that can adopt two robust features based on discrete wavelet transform (DWT) and kernel principal component analysis (KPCA). Both schemes provide excellent representations of the image data for robust block matching. Multiresolution wavelet coefficients and KPCA-based projected vectors corresponding to image-blocks are arranged into a matrix for lexicographic sorting. Sorted blocks are used for making a list of similar point-pairs and for computing their offset frequencies. Duplicated regions are then segmented by an automatic technique that refines the list of corresponding point-pairs and eliminates the minimum offset-frequency threshold parameter in the usual detection method. A new technique that extends the basic algorithm for detecting Flip and Rotation types of forgeries is also proposed. This method uses global geometric transformation and the labeling technique to indentify the mentioned forgeries. Experiments with a good number of natural images show very promising results, when compared with the conventional PCA-based approach. A quantitative analysis indicate that the wavelet-based feature outperforms PCA- or KPCA-based features in terms of average precision and recall in the noiseless, or uncompressed domain, while KPCA-based feature obtains excellent performance in the additive noise and lossy JPEG compression environments.

IPAD: Intensity Potential for Adaptive De-quantization

Abstract:

Display devices at bit-depth of 10 or higher have been mature but the mainstream media source is still at bitdepth of 8. To accommodate the gap, the most economic solution is to render source at low bit-depth for high bit-depth display, which is essentially the procedure of de-quantization. Traditional methods, like zero-padding or bit replication, introduce annoying false contour artifacts. To better estimate the least-significant bits, later works use filtering or interpolation approaches, which exploit only limited neighbor information, can not thoroughly remove the false contours. In this paper, we propose a novel intensity potential field to model the complicated relationships among pixels. The potential value decreases as the spatial distance to the field source increases and the potentials from different field sources are additive. Based on the proposed intensity potential field, an adaptive de-quantization procedure is then proposed to convert low bit-depth images to high bit-depth ones. To the best of our knowledge, this is the first attempt to apply potential field for natural images. The proposed potential field preserves local consistency and models the complicated contexts well. Extensive experiments on natural, synthetic, and high dynamic range image datasets validate the efficiency of the proposed intensity potential field. Significant improvements have been achieved over the stateof- the-art methods on both PSNR and SSIM.

Moiré Photo Restoration Using Multiresolution Convolutional Neural Networks

Abstract:

Digital cameras and mobile phones enable us to conveniently record precious moments. While digital image quality is constantly being improved, taking high-quality photos of digital screens still remains challenging because the photos are often contaminated with moiré patterns, a result of the interference between the pixel grids of the camera sensor and the device screen. Moiré patterns can severely damage the visual quality of photos. However, few studies have aimed to solve this problem. In this paper, we introduce a novel multiresolution fully convolutional network for automatically removing moiré patterns from photos. Since a moiré pattern spans over a wide range of frequencies, our proposed network performs a nonlinear multiresolution analysis of the input image before computing how to cancel moiré artefacts within every frequency band. We also create a large-scale benchmark data set with 1 00 000+ image pairs for investigating and evaluating moiré pattern removal algorithms. Our network achieves the state-of-the-art performance on this data set in comparison to existing learning architectures for image restoration problems.

Composing Semantic Collage for Image Retargeting

Abstract:

Image retargeting has been applied to display images of any size via devices with various resolutions (e.g., cell phone, TV monitors). To fit an image with the target resolution, certain unimportant regions need to be deleted or distorted and the key problem is to determine the importance of each pixel. Existing methods predict pixel-wise importance in a bottom-up manner via eye fixation estimation or saliency detection. In contrast, the proposed algorithm estimates the pixel-wise importance based on a top-down criterion where the target image maintains the semantic meaning of the original image. To this end, several semantic components corresponding to foreground objects, action contexts, and background regions are extracted. The semantic component maps are integrated by a classification guided fusion network. Specifically, the deep network classifies the original image as object or scene-oriented, and fuses the semantic component maps according to classification results. The network output, referred to as the semantic collage with the same size as the original image, is then fed into any existing optimization method to generate the target image. Extensive experiments are carried out on the RetargetMe dataset and S-Retarget database developed in this work. Experimental results demonstrate the merits of the proposed algorithm over the state-of-the-art image retargeting methods.

Learning Depth from Single Images with Deep Neural Network Embedding Focal Length

Abstract:

Learning depth from a single image, as an important issue in scene understanding, has attracted a lot of attention in the past decade. The accuracy of the depth estimation has been improved from conditional Markov random fields, non-parametric methods, to deep convolutional neural networks most recently. However, there exist inherent ambiguities in recovering 3D from a single 2D image. In this paper, we first prove the ambiguity between the focal length and monocular depth learning, and verify the result using experiments, showing that the focal length has a great influence on accurate depth recovery. In order to learn monocular depth by embedding the focal length, we propose a method to generate synthetic varying-focal-length dataset from fixed-focal-length datasets, and a simple and effective method is implemented to fill the holes in the newly generated images. For the sake of accurate depth recovery, we propose a novel deep neural network to infer depth through effectively fusing the middle-level information on the fixed-focal-length dataset, which outperforms the state-of-the-art methods built on pretrained VGG. Furthermore, the newly generated varying-focallength dataset is taken as input to the proposed network in both learning and inference phases. Extensive experiments on the fixed- and varying-focal-length datasets demonstrate that the learned monocular depth with embedded focal length is significantly improved compared to that without embedding the focal length information.

Superpixel Hierarchy

Abstract:

Superpixel segmentation has been one of the most important tasks in computer vision. In practice, an object can be represented by a number of segments at finer levels with consistent details or included in a surrounding region at coarser levels. Thus, a superpixel segmentation hierarchy is of great importance for applications that require different levels of image details. However, there is no method that can generate all scales of superpixels accurately in real-time. In this paper, we propose the super hierarchy algorithm which is able to generate multiscale superpixels as accurately as the state-of-the-art methods but with one to two orders of magnitude speed-up. The proposed algorithm can be directly integrated with recent efficient edge detectors to significantly outperform the state-of-the-art methods in terms of segmentation accuracy. Quantitative and qualitative evaluations on a number of applications demonstrate that the proposed algorithm is accurate and efficient in generating a hierarchy of superpixels.

Image Haze Removal via Reference Retrieval and Scene Prior

Abstract:

Photography of hazy scene typically suffers from low-contrast which degrades the visibility of the scene. The performance of single image dehazing methods is limited by the priors or constraints. In this paper, we present an effective method for haze removal, which utilizes its retrieved correlated haze-free images as external information. The correlated hazefree images are with scene prior offering scene structure and local high frequency information for dehazing, although variations in viewpoints, scales and illumination conditions exist. To utilize those reference more effectively, global geometric registration and local block matching towards the hazy input are performed to reinforce the spatial correlations. Based on the registration, different kinds of external information are estimated. In addition, we combine those additional external information with internal constraint and regularization for estimating scene transmission map. Experiments demonstrate our approach can produce dehazing results with better visual quality compared to other stateof- the-art methods.

Image Super-resolution with Parametric Sparse Model Learning

Abstract:

Recovering a high-resolution (HR) image from its low-resolution (LR) version is an ill-posed inverse problem. Learning accurate prior of HR images is of great importance to solve this inverse problem. Existing super-resolution (SR) methods either learn a non-parametric image prior from training data (a large set of LR/HR patch pairs) or estimate a parametric prior from the LR image analytically. Both methods have their limitations: the former lacks flexibility when dealing with different SR settings; while the latter often fails to adapt to spatially varying image structures. In this paper, we propose to take a hybrid approach toward image SR by combining those two lines of ideas – that is, a parametric sparse prior of HR images is learned from the training set as well as the input LR image. By exploiting the strengths of both worlds, we can more accurately recover the sparse codes and therefore HR image patches than conventional sparse coding approaches. Experimental results show that the proposed hybrid SR method significantly outperforms existing model-based SR methods and is highly competitive to current state-of-the-art learning-based SR methods in terms of both subjective and objective image qualities.

Blurriness-guided Unsharp Masking

Abstract:

In this paper, a highly-adaptive unsharp masking (UM) method is proposed and called the blurriness-guided UM, or BUM, in short. The proposed BUM exploits the estimated local blurriness as the guidance information to perform pixelwise enhancement. The consideration of local blurriness is motivated by the fact that enhancing a highly-sharp or a highlyblurred image region is undesirable, since this could easily yield unpleasant image artifacts due to over-enhancement or noise enhancement, respectively. Our proposed BUM algorithm has two powerful adaptations as follows. First, the enhancement strength is adjusted for each pixel on the input image according to the degree of local blurriness measured at the local region of this pixel’s location. All such measurements collectively form the blurriness map, from which the scaling matrix can be obtained using our proposed mapping process. Second, we also consider the type of layer-decomposition filter exploited for generating the base layer and the detail layer, since this consideration would effectively help to prevent over-enhancement artifacts. In our work, the layer-decomposition filter is considered from the viewpoint of edge-preserving type versus non-edge-preserving type. Extensive simulations experimented on various test images have clearly demonstrated that our proposed BUM is able to consistently yield superior enhanced images with better perceptual quality to that of using a fixed enhancement strength or other state-of-the-art adaptive UM methods.

Landmark Free Face Attribute Prediction

Abstract:

Face attribute prediction in the wild is important for many facial analysis applications yet it is very challenging due to ubiquitous face variations. In this paper, we address face attribute prediction in the wild by proposing a novel method, lAndmark Free Face AttrIbute pRediction (AFFAIR). Unlike traditional face attribute prediction methods that require facial landmark detection and face alignment, AFFAIR uses an endto- end learning pipeline to jointly learn a hierarchy of spatial transformations that optimize facial attribute prediction with no reliance on landmark annotations or pre-trained landmark detectors. AFFAIR achieves this through simultaneously 1) learning a global transformation which effectively alleviates negative effect of global face variation for the following attribute prediction tailored for each face, 2) locating the most relevant facial part for attribute prediction and 3) aggregating the global and local features for robust attribute prediction. Within AFFAIR, a new competitive learning strategy is developed that effectively enhances global transformation learning for better attribute prediction. We show that with zero information about landmarks, AFFAIR achieves state-of-the-art performance on three face attribute prediction benchmarks, which simultaneously learns the face-level transformation and attribute-level localization within a unified framework.

A Variational Pansharpening Approach Based on Reproducible Kernel Hilbert Space and Heaviside Function

Abstract:

Pansharpening is an important application in remote sensing image processing. It can increase the spatialresolution of a multispectral image by fusing it with a high spatial-resolution panchromatic image in the same scene, which brings great favor for subsequent processing such as recognition, detection, etc. In this paper, we propose a continuous modeling and sparse optimization based method for the fusion of a panchromatic image and a multispectral image. The proposed model is mainly based on reproducing kernel Hilbert space (RKHS) and approximated Heaviside function (AHF). In addition, we also propose a Toeplitz sparse term for representing the correlation of adjacent bands. The model is convex and solved by the alternating direction method of multipliers which guarantees the convergence of the proposed method. Extensive experiments on many real datasets collected by different sensors demonstrate the effectiveness of the proposed technique as compared with several state-of-the-art pansharpening approaches.

Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions

Abstract:

Depth estimation is a fundamental problem for light field photography applications. Numerous methods have been proposed in recent years, which either focus on crafting cost terms for more robust matching, or on analyzing the geometry of scene structures embedded in the epipolar-plane images. Significant improvements have been made in terms of overall depth estimation error; however, current state-of-the-art methods still show limitations in handling intricate occluding structures and complex scenes with multiple occlusions. To address these challenging issues, we propose a very effective depth estimation framework which focuses on regularizing the initial label confidence map and edge strength weights. Specifically, we first detect partially occluded boundary regions (POBR) via superpixel based regularization. Series of shrinkage/reinforcement operations are then applied on the label confidence map and edge strength weights over the POBR. We show that after weight manipulations, even a low-complexity weighted least squares model can produce much better depth estimation than state-of-the-art methods in terms of average disparity error rate, occlusion boundary precision-recall rate, and the preservation of intricate visual features.

A Gabor Feature-based Quality Assessment Model for the Screen Content Images

Abstract:

In this paper, an accurate and efficient full-reference image quality assessment (IQA) model using the extracted Gabor features, called Gabor feature-based model (GFM), is proposed for conducting objective evaluation of screen content images (SCIs). It is well-known that the Gabor filters are highly consistent with the response of the human visual system (HVS), and the HVS is highly sensitive to the edge information. Based on these facts, the imaginary part of the Gabor filter that has odd symmetry and yields edge detection is exploited to the luminance of the reference and distorted SCI for extracting their Gabor features, respectively. The local similarities of the extracted Gabor features and two chrominance components, recorded in the LMN color space, are then measured independently. Finally, the Gabor-feature pooling strategy is employed to combine these measurements and generate the final evaluation score. Experimental simulation results obtained from two large SCI databases have shown that the proposed GFM model not only yields a higher consistency with the human perception on the assessment of SCIs but also requires a lower computational complexity, compared with that of classical and state-of-the-art IQA models.

Visual Tracking with Weighted Adaptive Local Sparse Appearance Model via Spatio-Temporal Context Learning

Abstract:

Sparse representation has been widely exploited to develop an effective appearance model for object tracking due to its well discriminative capability in distinguishing the target from its surrounding background. However, most of these methods only consider either the holistic representation or the local one for each patch with equal importance, and hence may fail when the target suffers from severe occlusion or largescale pose variation. In this paper, we propose a simple yet effective approach that exploits rich feature information from reliable patches based on weighted local sparse representation that takes into account the importance of each patch. Specifically, we design a reconstruction-error based weight function with the reconstruction error of each patch via sparse coding to measure the patch reliability. Moreover, we explore spatio-temporal context information to enhance the robustness of the appearance model, in which the global temporal context is learned via incremental subspace and sparse representation learning with a novel dynamic template update strategy to update the dictionary, while the local spatial context considers the correlation between the target and its surrounding background via measuring the similarity among their sparse coefficients. Extensive experimental evaluations on two large tracking benchmarks demonstrate favorable performance of the proposed method over some stateof- the-art trackers.

Simultaneously Discovering and Localizing Common Objects in Wild Images

Abstract:

Motivated by the recent success of supervised and weakly supervised common object discovery, in this work we move forward one step further to tackle common object discovery in a fully unsupervised way. Generally, object co-localization aims at simultaneously localizing objects of the same class across a group of images. Traditional object localization/detection usually trains specific object detectors which require bounding box annotations of object instances, or at least image-level labels to indicate the presence/absence of objects in an image. Given a collection of images without any annotations, our proposed fully unsupervised method is to simultaneously discover images that contain common objects and also localize common objects in corresponding images. Without requiring to know the total number of common objects, we formulate this unsupervised object discovery as a sub-graph mining problem from a weighted graph of object proposals, where nodes correspond to object proposals and edges represent the similarities between neighbouring proposals. The positive images and common objects are jointly discovered by finding sub-graphs of strongly connected nodes, with each sub-graph capturing one object pattern. The optimization problem can be efficiently solved by our proposed maximal-flow-based algorithm. Instead of assuming each image contains only one common object, our proposed solution can better address wild images where each image may contain multiple common objects or even no common object. Moreover, our proposed method can be easily tailored to the task of image retrieval in which the nodes correspond to the similarity between query and reference images. Extensive experiments on PASCAL VOC 2007 and Object Discovery datasets demonstrate that even without any supervision, our approach can discover/localize common objects of various classes in the presence of scale, view point, appearance variation, and partial occlusions. We also conduct broad experiments on image retrieval benchmarks, Holidays and Oxford5k datasets, to show that our proposed method, which considers both the similarity between query and reference images and also similarities among reference images, can help improve the retrieval results significantly.

FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising

Abstract:

Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising. However, these methods mostly learn a specific model for each noise level, and require multiple models for denoising images with different noise levels. They also lack flexibility to deal with spatially variant noise, limiting their applications in practical denoising. To address these issues, we present a fast and flexible denoising convolutional neural network, namely FFDNet, with a tunable noise level map as the input. The proposed FFDNet works on downsampled subimages, achieving a good trade-off between inference speed and denoising performance. In contrast to the existing discriminative denoisers, FFDNet enjoys several desirable properties, including (i) the ability to handle a wide range of noise levels (i.e., [0, 75]) effectively with a single network, (ii) the ability to remove spatially variant noise by specifying a non-uniform noise level map, and (iii) faster speed than benchmark BM3D even on CPU without sacrificing denoising performance. Extensive experiments on synthetic and real noisy images are conducted to evaluate FFDNet in comparison with state-of-the-art denoisers. The results show that FFDNet is effective and efficient, making it highly attractive for practical denoising applications.

Deep Representation-Based Feature Extraction and Recovering for Finger-Vein Verification

Abstract:

Finger-vein biometrics has been extensively investigated for personal verification. Despite recent advances in finger-vein verification, current solutions completely depend on domain knowledge and still lack the robustness to extract finger-vein features from raw images. This paper proposes a deep learning model to extract and recover vein features using limited a priori knowledge. First, based on a combination of the known state-of-the-art handcrafted finger-vein image segmentation techniques, we automatically identify two regions: a clear region with high separability between finger-vein patterns and background, and an ambiguous region with low separability between them. The first is associated with pixels on which all the above-mentioned segmentation techniques assign the same segmentation label (either foreground or background), while the second corresponds to all the remaining pixels. This scheme is used to automatically discard the ambiguous region and to label the pixels of the clear region as foreground or background. A training data set is constructed based on the patches centered on the labeled pixels. Second, a convolutional neural network (CNN) is trained on the resulting data set to predict the probability of each pixel of being foreground (i.e., vein pixel), given a patch centered on it. The CNN learns what a finger-vein pattern is by learning the difference between vein patterns and background ones. The pixels in any region of a test image can then be classified effectively. Third, we propose another new and original contribution by developing and investigating a fully convolutional network to recover missing finger-vein patterns in the segmented image. The experimental results on two public finger-vein databases show a significant improvement in terms of finger-vein verification accuracy.

RECENT PAPERS