info@itechprosolutions.in | +91 9790176891

DOT NET 2016 Projects

Category Archives

An Efficient Tree-based Self-Organizing Protocol for Internet of Things

Abstract

     Tree networks are widely applied in Sensor Networks of Internet of Things (IoTs). This paper proposes an Efficient Tree-based Self-organizing Protocol (ETSP) for sensor networks of IoTs. In ETSP, all nodes are divided into two kinds: network nodes and non-network nodes. Network nodes can broadcast packets to their neighboring nodes. Non-network nodes collect the broadcasted packets and determine whether to join the network. During the self-organizing process, we use different metrics such as number of child nodes, hop, communication distance and residual energy to reach available sink nodes’ weight, the node with max weight will be selected as sink node. Non-network nodes can be turned into network nodes when they join the network successfully. Then a tree-based network can be obtained one layer by one layer. The topology is adjusted dynamically to balance energy consumption and prolong network lifetime. We conduct experiments with NS2 to evaluate ETSP. Simulation results show that our proposed protocol can construct a reliable tree-based network quickly. With the network scale increasing, the self-organization time, average hop and packet loss ratio won’t increase more. Furthermore, the success rate of packet in ETSP is much higher compared with AODV and DSDV.


Seeing Is Believing: Sharing Real-time Visual Traffic Information via Vehicular Clouds

Abstract

        From today’s conventional cars to tomorrow’s self-driving cars, advances in technology will enable vehicles to be equipped with more and more-sophisticated sensing devices, such as cameras. As vehicles gain the ability to act as mobile sensors that carry useful traffic information, people and vehicles are sharing sensing data to enhance the driving experience. This paper describes a vehicular cloud service for route planning, where users collaborate to share traffic images by using their vehicles’ on-board cameras. We present the architecture of a collaborative traffic image–sharing system called Social Vehicle Navigation (SVN), which allows drivers in the vehicular cloud to report and share visual traffic information called NaviTweets. A set of NaviTweets is then filtered, refined, and condensed into a concise, user-friendly snapshot summary of the route of interest, called a Traffic Digest. These digests can provide more pertinent and reliable information about the road situation and can complement predictions like estimated time of arrival, thereby supporting users’ route decision making. As proof of concept, this paper presents the system design and a prototype implementation running on the Android smartphone platform, along with its evaluation.


Operationalizing Engagement with Multimedia as User Coherence with Context

Abstract

          Traditional approaches for assessing user engagement within multimedia environments rely on methods that are removed from the human computer interaction itself, such as surveys, interviews and baselined physiology. We propose a context coherence approach that operationalizes engagement as the amount of independent user variation that covaries in time with multimedia contextual events during unscripted interactions. This can address questions about the features of multimedia users are most engaged and how engaged users are without the need for prescribed interactions or baselining. We assessed the validity of this approach in a psycho physiological study. 40 participants played interactive video games. Intake and post stimulus questionnaires collected subjective engagement reports that provided convergent and divergent validity criteria to evaluate our approach. Estimates of coherence between physiological variation and in-game contextual events predicted subjective engagement and added information beyond physiological metrics computed from baselines taken outside of the multimedia context. Our coherence metric accounted for task-dependent engagement, independent of predispositions; this was not true of a baselined physiological approach that was used for comparison. Our findings show compelling evidence that a context-sensitive approach to measuring engagement overcomes shortcomings of traditional methods by making best use of contextual information sampled from multimedia in time-series analyses.


An Architecture of Cloud-Assisted Information Dissemination in Vehicular Networks

Abstract

        Vehicular network technology allows vehicles to exchange real-time information between each other, which plays a vital role in the development of future intelligent transportation systems (ITSs). Existing research on vehicular networks assumes that each vehicle broadcasts collected information to neighboring vehicles so that information is shared among vehicles. The fundamental problem of what information is delivered with which vehicle(s), however, has not been adequately studied. We propose an innovative cloud-assisted architecture to facilitate intelligent information dissemination among vehicles. Within the novel architecture, virtual social connections between vehicles are created and maintained on the cloud. Vehicles with similar driving histories are considered friends in a vehicular social network (VSN). The closeness of the relation between two vehicles in a VSN is then modeled by the three-valued subjective logic model. Based on the closeness between vehicles, only relevant information will be delivered to vehicles that are likely interested in it. The cloud-assisted architecture coordinates vehicular social connection construction, VSN maintenance, vehicle closeness assessment, and information dissemination.


Coverage and Rate Analysis in Heterogeneous Cloud Radio Access Networks with Device-to-Device Communication

Abstract

      The implementation of heterogeneous cloud radio access network (H-CRAN) architecture is faced with practical challenges such as the capacity and time-delay limitations of the fronthaul links. This paper considers the use of Device-to-Device (D2D) communication to offload the remote radio heads (RRHs) located in the coverage region of high power nodes (HPNs). We propose an H-CRAN with non-uniformly deployed D2D communication, in which D2D links are only utilized outside a specified distance from any HPN. Based on the analytical framework provided in this work, the coverage and the average ergodic rate of a typical user equipment (UE) are characterized. Through defining the exclusion area appropriately, the proposed nonuniform D2D deployment can achieve performance improvement compared with uniform D2D deployment. In addition, to account for the capacity constraint of fronthaul, we characterize the average traffic delivery latency experience by a typical UE when served by RRHs as a quality-of-service metric. Our results show that for a lower fronthaul capacity regime, the proposed nonuniform D2D deployment achieves lower average traffic delivery latency compared with both the uniform D2D deployment and the pure H-CRAN scenarios.


Regional Glacier Mapping Using Optical Satellite Data Time Series

Abstract

        The first of two Sentinel-2 satellites, launched mid- 2015, has similar characteristics as the Landsat TM/ETM + /OLI satellites. Together, these satellites will produce a tremendous quantity of optical images worldwide for glacier mapping, with increasing temporal coverage toward the more glacierized higher latitudes due to convergence of near-polar orbits. To exploit the potential of such near-future dense time series, methods for mapping glaciers from space should be revisited. Currently, snow and ice are typically classified from an optical satellite image using a multispectral band ratio. For each scene, mapping conditions will vary (e.g., snow, ice, and clouds) and not be equally optimal over the entire scene. The increasing amount of images makes it difficult to manually select the best glacier mapping scene as is the current practice. This work is based on the above robust image ratio method for exploiting the dense temporal image coverage. Four application scenarios using time series of Landsat type data for glacier mapping are presented. First, we synthesize an optimal band ratio image from a stack of images within one season to compensate for regional differences. The second application scenario introduces robust methods to improve automatic glacier mapping by exploiting the seasonal variation in spectral properties of snow. Third, we explore the spatio-temporal variation of glacier surface types. Finally, we show how the synthesized band ratio images from the first application scenario can be used for automatic glacier change detection. In summary, we explore automatic algorithms for glacier mapping applications that exploit the temporal signatures in the satellite data time series.


Scalable & Reliable IoT Enabled By Dynamic Spectrum Management for M2M in LTE-A

Abstract

      To underpin the predicted growth of the Internet of Things (IoT), a highly scalable, reliable and available connectivity technology will be required. Whilst numerous technologies are available today, the industry trend suggests that cellular systems will play a central role in ensuring IoT connectivity globally. With spectrum generally a bottleneck for 3GPP technologies, TV white space (TVWS) approaches are a very promising means to handle the billions of connected devices in a highly flexible, reliable and scalable way. To this end, we propose a cognitive radio enabled TD-LET test-bed to realize the dynamic spectrum management over TVWS. In order to reduce the data acquisition and improve the detection performance, we propose a hybrid framework for the dynamic spectrum management of machineto- machine networks. In the proposed framework, compressed sensing is implemented with the aim to reduce the sampling rates for wideband spectrum sensing. A non-iterative reweighted compressive spectrum sensing algorithm is proposed with the weights being constructed by data from geolocation databases. Finally, the proposed hybrid framework is tested by means of simulated as well as real-world data.


System robust optimization of ring resonator-based optical filters

Abstract

      Fabrication variations can have a detrimental effect on the performance of optical filters based on ring resonators. However, by using robust optimization these effects can be minimized and device yield can be significantly improved. This paper presents an efficient robust optimization technique for designing manufacturable optical filters based on serial ring resonators.The serial ring resonator is treated as a system which has computationally expensive (directional coupler section) and cheap components (ring section). Cheap mathematical models are constructed of the directional coupler sections in the resonators. The approximate system response based on the cheap model is then robustly optimized. The robust bandpass filter performance is compared against designs that do not take uncertainties into account. The optimality of the robust solutions is confirmed by simulating it on the expensive physical model as a post-processing step. Results indicate that the employed approach can provide an efficient means for robust optimization of ring resonator-based optical filters.


Enabling the IoT Machine Age with 5G: Machine-Type Multicast Services for Innovative Real-Time Applications

Abstract

       The Internet of Things (IoT) will shortly be undergoing a major transformation from a sensor-driven paradigm to one that is heavily complemented by actuators, drones and robots. The real-time situational awareness of such active systems requires sensed data to be transmitted in the uplink to edge-cloud, processed and control instructions transmitted in the downlink. Since many of these applications will be mission critical, the most suitable connectivity family will be cellular due to the availability of licensed spectrum able to protect the offered communications service. However, while much focus in the past was on the uplink of machine-type communications (MTC), little attention has been paid to the end-to-end reliability, latency and energy consumption comprising both up and downlinks. To address this gap, in this paper we focus on the definition, design and analysis of machine-type multicast service (MtMS). We discuss the different procedures that need to be re-designed for MtMS and we derive the most appropriate design drivers by analyzing different performance indicators such as scalability, reliability, latency and energy consumption. We also discuss the open issues to be considered in future research aimed at enhancing the capabilities of MtMS to support a wide variety of 5G IoT use cases.


Internet of Things for Smart Cities

Abstract

       The Internet of Things (IoT) shall be able to incorporate transparently and seamlessly a large number of different and heterogeneous end systems, while providing open access to selected subsets of data for the development of a plethora of digital services. Building a general architecture for the IoT is hence a very complex task, mainly because of the extremely large variety of devices, link layer technologies, and services that may be involved in such a system. In this paper, we focus specifically to an urban IoT system that, while still being quite a broad category, are characterized by their specific application domain. Urban IoTs, in fact, are designed to support the Smart City vision, which aims at exploiting the most advanced communication technologies to support added-value services for the administration of the city and for the citizens. This paper hence provides a comprehensive survey of the enabling technologies, protocols, and architecture for an urban IoT. Further more, the paper will present and discuss the technical solutions and best-practice guidelines adopted in the Padova Smart City project, a proof-of-concept deployment of an IoT island in the city of Padova, Italy, performed in collaboration with the city municipality.


Pattern Based Sequence Classification

Abstract

      Sequence classification is an important task in data mining. We address the problem of sequence classification using rules composed of interesting patterns found in a dataset of labelled sequences and accompanying class labels. We measure the interestingness of a pattern in a given class of sequences by combining the cohesion and the support of the pattern. We use the discovered patterns to generate confident classification rules, and present two different ways of building a classifier. The first classifier is based on an improved version of the existing method of classification based on association rules, while the second ranks the rules by first measuring their value specific to the new data object. Experimental results show that our rule based classifiers outperform existing comparable classifiers in terms of accuracy and stability. Additionally, we test a number of pattern feature based models that use different kinds of patterns as features to represent each sequence as a feature vector. We then apply a variety of machine learning algorithms for sequence classification, experimentally demonstrating that the patterns we discover represent the sequences well, and prove effective for the classification task.


PUBLIC INTEGRITY AUDITING FOR SHARED DYNAMIC CLOUD DATA WITH GROUP USER REVOCATION

Abstract

         The advent of the cloud computing makes storage outsourcing become a rising trend, which promotes the secure remote data auditing a hot topic that appeared in the research literature. Recently some research consider the problem of secure and efficient public data integrity auditing for shared dynamic data. However, these schemes are still not secure against the collusion of cloud storage server and revoked group users during user revocation in practical cloud storage system. In this paper, we figure out the collusion attack in the exiting scheme and provide an efficient public integrity auditing scheme with secure group user revocation based on vector commitment and verifier-local revocation group signature. We design a concrete scheme based on the our scheme definition. Our scheme supports the public checking and efficient user revocation and also some nice properties, such as confidently, efficiency, countability and traceability of secure group user revocation. Finally, the security and experimental analysis show that, compared with its relevant schemes our scheme is also secure and efficient.


PREDICTING INSTRUCTOR PERFORMANCE USING DATA MINING TECHNIQUES IN HIGHER EDUCATION

Abstract

       Data mining applications are becoming a more common tool in understanding and solving educational and administrative problems in higher education. Generally, research in educational mining focuses on modeling student’s performance instead of instructors’ performance. One of the common tools to evaluate instructors’ performance is the course evaluation questionnaire to evaluate based on students’ perception. In this study, four different classification techniques, –decision tree algorithms, support vector machines, artificial neural networks, and discriminant analysis– are used to build classifier models. Their performances are compared over a dataset composed of responses of students to a real course evaluation questionnaire using accuracy, precision, recall, and specificity performance metrics. Although all the classifier models show comparably high classification performances, C5.0 classifier is the best with respect to accuracy, precision, and specificity. In addition, an analysis of the variable importance for each classifier model is done. Accordingly, it is shown that many of the questions in the course evaluation questionnaire appear to be irrelevant. Furthermore, the analysis shows that the instructors’ success based on the students’ perception mainly depends on the interest of the students in the course. The findings of the study indicate the effectiveness and expressiveness of data mining models in course evaluation and higher education mining. Moreover, these findings may be used to improve measurement instruments.


ON THE PROPERTIES OF NON-MEDIA DIGITAL WATERMARKING: A REVIEW OF STATE OF THE ART TECHNIQUES

Abstract

    Over the last 25 years, there has been much work on multimedia digital watermarking. In this domain, the primary limitation to watermark strength has been in its visibility. For multimedia watermarks, invisibility is defined in human terms (that is, in terms of human sensory limitations). In this paper, we review recent developments in the non-media applications of data watermarking, which have emerged over the last decade as an exciting new sub domain. Since by definition, the intended receiver should be able to detect the watermark, we have to redefine invisibility in an acceptable way that is often application-specific and thus cannot be easily generalized. In particular, this is true when the data is not intended to be directly consumed by humans. For example, a loose definition of robustness might be in terms of the resilience of a watermark against normal host data operations, and of invisibility as resilience of the data interpretation against change introduced by the watermark. In our paper, we classify the data in terms of data mining rules on complex types of data such as time-series, symbolic sequences, data streams and so forth. We emphasize the challenges involved in non-media watermarking in terms of common watermarking properties including invisibility, capacity, robustness, and security. With the aid of a few examples of watermarking applications, we demonstrate these distinctions and we look at the latest research in this regard to make our argument clear and more meaningful. As the last aim, we look at the new challenges of digital watermarking that have arisen with the evolution of big data.


Distributed Coverage Control of Mobile Sensor Networks subject to Measurement Error

Abstract

        Deployment algorithms proposed to improve coverage in sensor networks often rely on the Voronoi diagram, which is obtained by using the position information of the sensors. It is usually assumed that all measurements are sufficiently accurate, while in a practical setting even a small measurement error may lead to significant degradation in the coverage performance. This paper investigates the effect of measurement error on the performance of coverage control in mobile sensor networks. It also presents a distributed deployment strategy, namely the Robust Max-Area strategy, which uses information on error bounds in order to move the sensors to appropriate locations. To this end, two polygons are obtained for each sensor, and it is shown that the exact Voronoi polygon (associated with accurate measurements) lies between them. A local spatial probability function is then derived for each sensor, which translates the available information about the error bound into the likelihood of the points being inside the exact Voronoi polygon. Subsequently, the deployment strategy positions each sensor such that the total overed area increases. The sensors’ movements are shown to be convergent under the proposed strategy.


Block chains and Smart Contracts for the Internet of Things

Abstract

        Motivated by the recent explosion of interest around blockchains, we examine whether they make a good fit for the IoT sector. Blockchains allow us to have a distributed peer-topeer network where non-trusting members can interact with each other without a trusted intermediary, in a verifiable manner. We review how this mechanism works and also look into smart contracts – scripts that reside on the blockchain that allow for the automation of multi-step processes. We then move into the IoT domain, and describe how a blockchain-IoT combination (a) facilitates the sharing of services and resources leading to the creation of a marketplace of services between devices, and (b) allows us to automate in a cryptographically verifiable manner several existing, time-consuming workflows. We also point out certain issues that should be considered before the deployment of a blockchain network in an IoT setting; from transactional privacy to the expected value of the digitized assets traded on the network. Wherever applicable, we identify solutions and workarounds. Our conclusion is that the blockchain-IoT combination is powerful and can cause significant transformations across several industries, paving the way for new business models \ and novel, distributed applications.


Social Set Analysis: A Set Theoretical Approach to Big Data Analytics

Abstract

       Current analytical approaches in Computational Social Science can be characterized by four dominant paradigms: text analysis (information extraction and classification), social network analysis (graph theory), social complexity analysis (complex systems science), social simulations (cellular automata and agent-based modelling). However, when it comes to organizational and societal units of analysis, there exists no approach to conceptualise, model, analyze, explain and predict social media interactions as individuals’ associations with ideas, values, identities, etc. To address this limitation, based on the sociology of associations and the mathematics of set theory, this paper presents a new approach to big data analytics called Social Set Analysis. Social Set Analysis consists of a generative framework for philosophies of computational social science, theory of social data, conceptual and formal models of social data, and an analytical framework for combining big social datasets with organisational and societal datasets. Three empirical studies of big social data are presented to illustrate and demonstrate Social Set Analysis in terms of fuzzy set-theoretical sentiment analysis, crisp set-theoretical interaction analysis and event-studies oriented set theoretical visualisations. Implications for big data analytics, current limitations of the set-theoretical approach, and future directions are outlined.


Guest Editorial Big Data Analytics: Risk and Operations Management for Industrial Applications

Abstract

  BIG data research has been a popular research topic over the last few years. It not only generates enormous attention compared to other research trends in the past, but also covers diverse and wide disciplines in its applications. This probably leads to the fact that the growth of this community is unparalleled, and attention being drawn to this research “buzzword” is growing at an explosive pace. It is simply not a business jargon! Above assertion can be supported by Table 1, which summaries the number of publications in recent years. The data were obtained by searching the term “big data” via three common scholarly databases. The search is simple and no screening was conducted, but the numbers are very representative and impressive.


MiraMap: A We-government Tool for Smart Peripheries in Smart Cities

Abstract

       Increasingly over the last decade, there has been attention and expectations on the role that ICT solutions can play in increasing accountability, participation and transparency in the Public Administration. Also, attention to citizen participation is more and more at the center of the debate about Smart Cities. However technological solutions have been often proposed without considering first citizens needs and the sociotechnical misalignment within the city, i.e. in peripheral area. The paper outlines the design and implementation process of a wegovernment IT tool, called MiraMap. The project has been developed in the Mirafiori District in Torino (Italy), a neighbourhood which is characterized by problems of marginality and by several undergoing urban transformations with a very high potential for social and economic development in the next few years. This makes Mirafiori Sud a valuable case study environment to experiment new methods and IT solutions to strengthen connection between citizens and public administration. The object of MiraMap, indeed, is to facilitate communication and management between citizens and administration in reporting of issues and claims but also in submitting proposals. Collecting and handling this information in an efficient way is crucial to improve the quality of life in urban suburbs, addressing more targeted and better performed public policies. In order to achieve those results, the authors combined First Life, a new local social network based on an interactive map – with a Business Process Management (BPM) system for easing reports about claims and proposals to be handled. The research process involves an interdisciplinary team, composed by architects, computer scientists, engineers, geographers, legal experts, with the direct participation of local administrators and citizens.


Fuzzy based Bilateral Control Design of Non-Linear Tele-Operating system using State Convergence method

Abstract

      This paper presents the design of a state convergence (SC) based bilateral controller for a nonlinear teleoperation system which has been approximated by a Takagi-Sugeno (TS) fuzzy model. The selection of SC is made due to the advantages offered by this scheme both in the modeling and control design stages. The modeling stage considers master/slave systems which can be represented by nth order differential equations while the control design stage offers an easy way to determine the control gains required for assigning desired closed loop dynamics to teleoperation system. After the master/slave systems are represented by TS fuzzy models, a stabilizing fuzzy law is adopted which allows deploying the SC scheme with all its benefits to design the fuzzy bilateral controller. In this way, not only the simplicity of the design scheme is ensured but also the existing SC scheme is able to control a nonlinear teleoperation system based on its TS fuzzy model description. As an additional advantage, the SC based existing linear bilateral controller can be easily derived from the SC based proposed fuzzy bilateral controller. Various cases of master/slave systems originally reported in terms of their linear model representation and communication in the absence/presence of time delay are all discussed in the corresponding fuzzy framework. MATLAB simulations considering a one-degree-of-freedom (DoF) teleoperation system are performed to validate the proposed methodology for controlling a nonlinear teleoperation system.


Privacy protected Facial Biometric Verification using Fuzzy Forest Learning

Abstract

          Although visual surveillance has emerged as an effective technolody for public security, privacy has become an issue of great concern in the transmission and distribution of surveillance videos. For example, personal facial images should not be browsed without permission. To cope with this issue, face image scrambling has emerged as a simple solution for privacy- related applications. Consequently, online facial biometric verification needs to be carried out in the scrambled domain thus bringing a new challenge to face classification. In this paper, we investigate face verification issues in the scrambled domain and propose a novel scheme to handle this challenge. In our proposed method, to make feature extraction from scrambled face images robust, a biased random subspace sampling scheme is applied to construct fuzzy decision trees from randomly selected features, and fuzzy forest decision using fuzzy memberships is then obtained from combining all fuzzy tree decisions. In our experiment, we first estimated the optimal parameters for the construction of the random forest, and then applied the optimized model to the benchmark tests using three publically available face datasets. The experimental results validated that our proposed scheme can robustly cope with the challenging tests in the scrambled domain, and achieved an improved accuracy over all tests, making our method a promising candidate for the emerging privacy-related facial biometric applications.

A New MI-based Visualization Aided Validation Index for Mining Big Longitudinal Web Trial Data

Abstract

          Web-delivered clinical trials generate big complex data. To help untangle the heterogeneity of treatment effects, unsupervised learning methods have been widely applied. However, identifying valid patterns is a priority but challenging issue for these methods. This paper, built upon our previous research on Multiple Imputation (MI) based fuzzy clustering and validation, proposes a new MI-based Visualization-aided validation index (MIVOOS) to determine the optimal number of clusters for big incomplete longitudinal web trial data with inflated zeros. Different from a recently developed fuzzy clustering validation index (VOS), MIVOOS uses a more suitable overlap and separation measures for web trial data but does not depend on the choice of fuzzifiers as the widely-used Xie & Beni (XB) index. Through optimizing the view angles of 3D projections using Sammon mapping, the optimal 2D projection-guided MIVOOS is obtained to better visualize and verify the patterns in conjunction with trajectory patterns. Compared to XB and VOS, our newly-proposed MIVOOS shows its robustness in validating big web-trial data under different missing data mechanisms using real and simulated web trial data.


Learning Proximity Relations for Feature Selection

Abstract

       This work presents a feature selection method based on proximity relations learning. Each single feature is treated as a binary classifier that predicts for any three objects X, A, and B whether X is close to A or B. The performance of the classifier is a direct measure of feature quality. Any linear combination of feature-based binary classifiers naturally corresponds to feature selection. Thus, the feature selection problem is transformed into an ensemble learning problem of combining many weak classifiers into an optimized strong classifier. We provide a theoretical analysis of the generalization error of our proposed method which validates the effectiveness of our proposed method. Various experiments are conducted on synthetic data, four UCI data sets and twelve microarray data sets, and demonstrate the success of our approach applying to feature selection. a weakness of our algorithm is high time complexity.


A Survey on Evolutionary Computation Approaches to Feature Selection

Abstract

         Feature selection is an important task in data mining and machine learning to reduce the dimensionality of the data and increase the performance of an algorithm, such as a classification algorithm. However, feature selection is a challenging task due mainly to the large search space. A variety of methods have been applied to solve feature selection problems, where evolutionary computation techniques have recently gained much attention and shown some success. However, there are no comprehensive  guidelines on the strengths and weaknesses of alternative approaches. This leads to a disjointed and fragmented field with ultimately lost opportunities for improving performance and successful applications. This paper presents a comprehensive survey of the state-of-the-art work on evolutionary computation for feature selection, which identifies the contributions of these different algorithms. In addition, current issues and challenges are also discussed to identify promising areas for future research.


Tracking Temporal community strength in dynamic networks

Abstract

      Community formation analysis of dynamic networks has been a hot topic in data mining which has attracted much attention. Recently, there are many studies which focus on discovering communities successively from consecutive snapshots by considering both the current and historical information. However, these methods cannot provide us with much historical or successive information related to the detected communities. Different from previous studies which focus on community detection in dynamic networks, we define a new problem of tracking the progression of the community strength—a novel measure that reflects the community robustness and coherence throughout the entire observation period. To achieve this goal, we propose a novel framework which formulates the problem as an optimization task. The proposed community strength analysis also provides foundation for a wide variety of related applications such as discovering how the strength of each detected community changes over the entire observation period. To demonstrate that the proposed method provides precise and meaningful evolutionary patterns of communities which are not directly obtainable from traditional methods, we perform extensive experimental studies on one synthetic and five real datasets: Social evolution, tweeting interaction, actor relationships, bibliography, and biological datasets. Experimental results show that the proposed approach is highly effective in discovering the progression of community strengths and detecting interesting communities.


Conflict Aware Weighted Bipartie B Matching and its application to E-Commerce

Abstract

       The weighted bipartite b-matching problem (WBM) plays a significant role in many real-world applications, including resource allocation, scheduling, Internet advertising, and E-commerce. WBM has been widely studied and efficient matching algorithms are well known. In this work, we study a novel variant of WBM, called conflict-aware WBM (CA-WBM), where conflict constraints are present between vertices of the bipartite graph. In CA-WBM, if two vertices (on the same side) are in conflict, they may not be included in the matching result simultaneously. We present a generalized formulation of CA-WBM in the context of E-commerce, where diverse matching results are often desired (e.g., movies of different genres and merchants selling products of different categories). While WBM is efficiently solvable in polynomial-time, we show that CA-WBM is NP-hard. We propose approximate and randomized algorithms to solve CA-WBM and show that they achieve close to optimal solutions via comprehensive experiments using synthetic datasets. We derive a theoretical bound on the approximation ratio of a greedy algorithm for CA-WBM and show that it is scalable on a large-scale real-world dataset.


A bayesian classification approach using class specific features for text categorization

Abstract

      In this paper, we present a Bayesian classification approach for automatic text categorization using class-specific features. Unlike the conventional approaches for text categorization, our proposed method selects a specific feature subset for each class. To apply these class-dependent features for classification, we follow Baggenstoss’s PDF Projection Theorem to reconstruct PDFs in raw data space from the class-specific PDFs in low-dimensional feature space, and build a Bayes classification rule. One noticeable significance of our approach is that most feature selection criteria, such as Information Gain (IG) and Maximum Discrimination (MD), can be easily incorporated into our approach. We evaluate our method’s classification performance on several real-world benchmark data sets, compared with the state-of-the-art feature selection approaches. The superior results demonstrate the effectiveness of the proposed approach and further indicate its wide potential applications in text categorization.


A Novel recommendation model regularized with user trust and item ratings

Abstract

        We propose TrustSVD, a trust-based matrix factorization technique for recommendations. TrustSVD integrates multiple information sources into the recommendation model in order to reduce the data sparsity and cold start problems and their degradation of recommendation performance. An analysis of social trust data from four real-world data sets suggests that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. TrustSVD therefore builds on top of a state-of-the-art recommendation algorithm, SVD++ (which uses the explicit and implicit influence of rated items), by further incorporating both the explicit and implicit influence of trusted and trusting users on the prediction of items for an active user. The proposed technique is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that TrustSVD achieves better accuracy than other ten counterparts recommendation techniques.


Large Margin Distribution learning with cost interval and unlabeled data

Abstract

       In many real-world applications, different types of misclassification usually suffer from different costs, but the accurate cost is often hard to be determined and usually one can only get an interval-estimation like that one type of mistake is about five to ten times more serious than the other type. On the other hand, there are usually abundant unlabeled data available, leading to great research effort about semi-supervised learning. It is noticeable that cost interval and unlabeled data usually appear simultaneously in practice tasks; however, there is rare study tackling them together. In this paper, we propose the cisLDM approach which is able to handle cost interval and exploit unlabeled data in a principled way. Rather than maximizing the minimum margin like traditional large margin classifiers, cisLDM tries to optimize the margin distribution on both labeled and unlabeled data when minimizing the worst-case totalcost and the mean total-cost simultaneously according to the cost interval. Experiments on a broad range of datasets and cost settings exhibit the impressive performance of cisLDM. In particular, cisLDM is able to reduce 47% more total-cost than standard SVM and 27% more total-cost than cost-sensitive semi-supervised SVM which assumes the true cost value is known in advance.


Aspect level influence Discovery from Graphs

Abstract

        Graphs have been widely used to represent objects and object connections in applications such as the Web, social networks, and citation networks. Mining influence relationships from graphs has gained increasing interests in recent years because providing information on how graph objects influence each other can facilitate graph exploration, graph search, and connection recommendations. In this paper, we study the problem of detecting influence aspects, on which objects are connected, and influence degree (or influence strength), with which one graph node influences another graph node on a given aspect. Existing techniques focus on inferring either the overall influence degrees or the influence types from graphs. In this paper, we propose a systematic approach to extract influence aspects and learn aspect-level influence strength. In particular, we first present a novel instance-merging based method to extract influence aspects from the context of object connections. We then introduce two generative models, Observed Aspect Influence Model (OAIM) and Latent Aspect Influence Model (LAIM), to model the topological structure of graphs, the text content associated with graph objects, and the context in which the objects are connected. To learn OAIM and LAIM, we design both non-parallel and parallel Gibbs sampling algorithms. We conduct extensive experiments on synthetic and real data sets to show the effectiveness and efficiency of our methods. The experimental results show that our models can discover more effective results than existing approaches. Our learning algorithms also scale well on large data sets.


K subspaces Quantization for approximate Nearest Neighbor Search

Abstract

       Approximate Nearest Neighbor (ANN) search has become a popular approach for performing fast and efficient retrieval on very large-scale datasets in recent years, as the size and dimension of data grow continuously. In this paper, we propose a novel vector quantization method for ANN search which enables faster and more accurate retrieval on publicly available datasets. We define vector quantization as a multiple affine subspace learning problem and explore the quantization centroids on multiple affine subspaces. We propose an iterative approach to minimize the quantization error in order to create a novel quantization scheme, which outperforms the state-of-the-art algorithms. The computational cost of our method is also comparable to that of the competing methods.


Cluster driven navigation of the query space

Abstract

      How can users who know neither programming nor statistics explore large databases? We present a novel interface, designed to guide explorers through their data: Blaeu. Blaeu is a database front-end, “boosted” with unsupervised learning primitives. Thanks to these primitives, it can summarize and recommend queries. Our first contribution is Blaeu’s interaction model. With Blaeu, users explore the data through data maps. A data map is an interactive set of clusters, which users navigate with zooms and projections. Our second contribution is Blaeu’s engine. We present three mapping algorithms, for three different settings. The first algorithm deals with small to medium databases, the second one targets high dimensional spaces and the last one focuses on speed and interaction. We then present an optimization strategy based on sampling. Our experiments reveal that Blaeu can cluster millions of tuples with hundreds of columns in a few seconds on commodity hardware.


Inference of Regular expressions for text extraction from examples

Abstract

      A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern may be described by a regular expression. In this work we consider the long-standing problem of synthesizing such expressions automatically, based solely on examples of the desired behavior. We present the design and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is based on an evolutionary procedure carefully tailored to the specific needs of regular expression generation by examples. The procedure executes a search driven by a multiobjective optimization strategy aimed at simultaneously improving multiple performance indexes of candidate solutions while at the same time ensuring an adequate exploration of the huge solution space. We assess our proposal experimentally in great depth, on a number of challenging datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators. A prototype is available as a web application at http://regex.inginf.units.it.


Exploit Every Bit: Efficient caching for high dimensional Nearest Neighbor Search

Abstract

    High-dimensional k nearest neighbor (kNN) search has a wide range of applications in multimedia information retrieval. Existing disk-based kNN search methods incur significant I/O costs in the candidate refinement phase. In this paper, we propose to cache compact approximate representations of data points in main memory in order to reduce the candidate refinement time during kNN search. This problem raises two challenging issues: (i) which is the most effective encoding scheme for data points to support kNN search? and (ii) what is the optimal number of bits for encoding a data point? For (i), we formulate and solve a novel histogram optimization problem that decides the most effective encoding scheme. For (ii), we develop a cost model for automatically tuning the optimal number of bits for encoding points. In addition, our approach is generic and applicable to exact / approximate kNN search methods. Extensive experimental results on real datasets demonstrate that our proposal can accelerate the candidate refinement time of kNN search by at least an order of magnitude.


Robust joint feature weights learning framework

Abstract

   Feature selection, selecting the most informative subset of features, is an important research direction in dimension reduction. The combinatorial search in feature selection is essentially a binary optimization problem, known as NP hard, which can be alleviated by learning feature weights. Traditional feature weights algorithms rely on heuristic search path. These approaches neglect the interaction and dependency between different features, and thus provide no guarantee for optimality. In this paper, we propose a novel joint feature weights learning framework, which imposes both nonnegative and ℓ2;1-norm constraints on the feature weights matrix. The nonnegative property ensures the physical significance of learned feature weights. Meanwhile, ℓ2;1-norm minimization achieves joint selection of the most relevant features by exploiting the whole feature space. More importantly, an efficient iterative algorithm with proved convergence is designed to optimize a convex objective function. Using this framework as a platform, we propose new supervised and unsupervised joint feature selection methods. Particularly, in the proposed unsupervised method, nonnegative graph embedding is developed to exploit intrinsic structure in the weighted space. Comparative experiments on seven real-world data sets indicate that our framework is both effective and efficient.


A Game theory inspired approach to stable core decomposition on weighted networks

Abstract

        Meso-scale structural analysis, like core decomposition has uncovered groups of nodes that play important roles in the underlying complex systems. The existing core decomposition approaches generally focus on node properties like degree and strength. The node centric approaches can only capture a limited information about the local neighborhood topology. In the present work we propose a group density based core analysis approach that overcome the drawbacks of the node centric approaches. The proposed algorithmic approach focuses on weight density, cohesiveness and stability of a substructure. The method also assigns an unique score to every node that rank the nodes based on their degree of core-ness. To determine the correctness of the proposed method, we propose a synthetic benchmark with planted core structure. A performance test on the null model is carried out using a weighted lattice without core structures. We further test the stability of the approach against random noise. The experimental results prove the superiority of our algorithm over the state-of-the-arts. We finally analyze the core structures of several popular weighted network models and real life weighted networks. The experimental results reveal important node ranking and hierarchical organization of the complex networks, which give us better insight about the underlying systems.


Relaxed functional dependencies- A Survey of approaches

Abstract

        Recently, there has been a renovated interest in functional dependencies due to the possibility of employing them in several advanced database operations, such as data cleaning, query relaxation, record matching, and so forth. In particular, the constraints defined for canonical functional dependencies have been relaxed to capture inconsistencies in real data, patterns of semantically related data, or semantic relationships in complex data types. In this paper, we have surveyed 35 of such functional dependencies, providing a classification criteria, motivating examples, and a systematic analysis of them.


Similarity measure selection for clustering time series databases

Abstract

In the past few years, clustering has become a popular task associated with time series. The choice of a suitable distance measure is crucial to the clustering process and, given the vast number of distance measures for time series available in the literature and their diverse characteristics, this selection is not straightforward. With the objective of simplifying this task, we propose a multi-label classification framework that provides the means to automatically select the most suitable distance measures for clustering a time series database. This classifier is based on a novel collection of characteristics that describe the main features of the time series databases and provide the predictive information necessary to discriminate between a set of distance measures. In order to test the validity of this classifier, we conduct a complete set of experiments using both synthetic and real time series databases and a set of five common distance measures. The positive results obtained by the designed classification framework for various performance measures indicate that the proposed methodology is useful to simplify the process of distance selection in time series clustering tasks.


Top K Dominating queries on incomplete data

Abstract

      The top-k dominating (TKD) query returns the k objects that dominate the maximum number of objects in a given dataset. It combines the advantages of skyline and top-k queries, and plays an important role in many decision support applications. Incomplete data exists in a wide spectrum of real datasets, due to device failure, privacy preservation, data loss, and so on. In this paper, for the first time, we carry out a systematic study of TKD queries on incomplete data, which involves the data having some missing dimensional value(s). We formalize this problem, and propose a suite of efficient algorithms for answering TKD queries over incomplete data. Our methods employ some novel techniques, such as upper bound score pruning, bitmap pruning, and partial score pruning, to boost query efficiency. Extensive experimental evaluation using both real and synthetic datasets demonstrates the effectiveness of our developed pruning heuristics and the performance of our presented algorithms.


Efficient Algorithms for Mining Top K high utility itemsets

Abstract

     High utility itemsets (HUIs) mining is an emerging topic in data mining, which refers to discovering all itemsets having a utility meeting a user-specified minimum utility threshold min_util. However, setting min_util appropriately is a difficult problem for users. Generally speaking, finding an appropriate minimum utility threshold by trial and error is a tedious process for users. If min_util is set too low, too many HUIs will be generated, which may cause the mining process to be very inefficient. On the other hand, if min_util is set too high, it is likely that no HUIs will be found. In this paper, we address the above issues by proposing a new framework for top-k high utility itemset mining, where k is the desired number of HUIs to be mined. Two types of efficient algorithms named TKU (mining Top-K Utility itemsets) and TKO (mining Top-K utility itemsets in One phase) are proposed for mining such itemsets without the need to set min_util. We provide a structural comparison of the two algorithms with discussions on their advantages and limitations. Empirical evaluations on both real and synthetic datasets show that the performance of the proposed algorithms is close to that of the optimal case of state-of-the-art utility mining algorithms.


Page 1 of 212
RECENT PAPERS