info@itechprosolutions.in | +91 9790176891

PHYTON 2018

Category Archives

An Analytic Gabor Feedforward Network for Single-Sample and Pose-Invariant Face Recognition

Abstract:

Gabor magnitude is known to be among the most discriminative representations for face images due to its space- frequency co-localization property. However, such property causes adverse effects even when the images are acquired under moderate head pose variations. To address this pose sensitivity issue and other moderate imaging variations, we propose an analytic Gabor feedforward network which can absorb such moderate changes. Essentially, the network works directly on the raw face images and produces directionally projected Gabor magnitude features at the hidden layer. Subsequently, several sets of magnitude features obtained from various orientations and scales are fused at the output layer for final classification decision. The network model is analytically trained using a single sample per identity. The obtained solution is globally optimal with respect to the classification total error rate. Our empirical experiments conducted on five face data sets (six subsets) from the public domain show encouraging results in terms of identification accuracy and computational efficiency.

Application of data mining methods in diabetes prediction

Abstract:

Data science methods have the potential to benefit other scientific fields by shedding new light on common questions. One such task is help to make predictions on medical data. Diabetes mellitus or simply diabetes is a disease caused due to the increase level of blood glucose. Various traditional methods, based on physical and chemical tests, are available for diagnosing diabetes. The methods strongly based on the data mining techniques can be effectively applied for high blood pressure risk prediction. In this paper, we explore the early prediction of diabetes via five different data mining methods including: GMM, SVM, Logistic regression, ELM, ANN. The experiment result proves that ANN (Artificial Neural Network) provides the highest accuracy than other techniques.

Traffic sign detection and recognition using fuzzy segmentation approach and artificial neural network classifier respectively

Abstract:

Traffic Sign Recognition (TSR) system is a significant component of Intelligent Transport System (ITS) as traffic signs assist the drivers to drive more safely and efficiently. This paper represents a new approach for TSR system where detection of traffic sign is carried out using fuzzy rules based color segmentation method and recognition is accomplished using Speeded Up Robust Features(SURF) descriptor, trained by artificial neural network (ANN) classifier. In the detection step, the region of interest (sign area) is segmented using a set of fuzzy rules depending on the hue and saturation values of each pixel in the HSV color space, post processed to filter unwanted region. Finally the recognition of the traffic sign is implemented using ANN classifier upon the training of SURF features descriptor. The proposed system simulated on offline road scene images captured under different illumination conditions. The detection algorithm shows a high robustness and the recognition rate is quite satisfactory. The performance of the ANN model is illustrated in terms of cross entropy, confusion matrix and receiver operating characteristic (ROC) curves. Also, performances of some classifier such as Support Vector Machine (SVM), Decision Trees, Ensembles Learners (Adaboost) and K-Nearest Neighbor (KNN) classifier are assessed with ANN approach. The simulation results illustrate that recognition using ANN model is higher than classifiers stated above.

A Machine Learning Approach for Tracking and Predicting Student Performance in Degree Programs

Abstract:

Accurately predicting students’ future performance based on their ongoing academic records is crucial for effectively carrying out necessary pedagogical interventions to ensure students’ on-time and satisfactory graduation. Although there is a rich literature on predicting student performance when solving problems or studying for courses using data-driven approaches, predicting student performance in completing degrees (e.g., college programs) is much less studied and faces new challenges: (1) Students differ tremendously in terms of backgrounds and selected courses; (2) courses are not equally informative for making accurate predictions; and (3) students’ evolving progress needs to be incorporated into the prediction. In this paper, we develop a novel machine learning method for predicting student performance in degree programs that is able to address these key challenges. The proposed method has two major features. First, a bilayered structure comprising multiple base predictors and a cascade of ensemble predictors is developed for making predictions based on students’ evolving performance states. Second, a data-driven approach based on latent factor models and probabilistic matrix factorization is proposed to discover course relevance, which is important for constructing efficient base predictors. Through extensive simulations on an undergraduate student dataset collected over three years at University of California, Los Angeles, we show that the proposed method achieves superior performance to benchmark approaches.

Smart trailer: Automatic generation of movie trailer using only subtitles

Abstract:

With the enormous growth rate in user-generated videos, it is becoming increasingly important to be able to navigate them efficiently. Video summarization is considered a promising approach for efficacious realization of video content through Identifying and picking out descriptive frames of the video. In this paper, we propose an adaptive framework called Smart-Trailer (S-Trailer) to automate the process of creating an online trailer for any movie based only on its subtitle. The language used in the subtitle is English. The framework analyzes the movie subtitle file to extract relevant textual features that are used to classify the movie into its corresponding genre(s). Initial experimentation resulted in generating genre-classification corpus. The generated corpus is tested against real movies dataset and showed high classification accuracy rate (0.89) in classifying movies into their corresponding genre(s). The proposed system returned automated trailers that contain on average 47% accuracy in terms of recalling scenes appeared on the original movie trailer for different movie genres. Currently, we employ deep learning techniques to captures user behaviors and opinions in order to adapt our system to provide users with relevant video scenes recommendations that match their preferences.

Sentence Vector Model Based on Implicit Word Vector Expression

Abstract:

Word vector and topic model can help retrieve information semantically. However, there still are many problems: 1) antonyms share high similarity when clustered through word vectors; 2) vectors for name entities cannot be fully trained, as name entities may appear limited times in specific corpus; and 3) words, sentences, and paragraphs, sharing the same meaning but with no overlapping words, are hard to be recognized. To overcome the above problems, this paper proposes a new vector computation model for text named s2v. Words, sentences, and paragraphs are represented in a unified way in the model. Sentence vectors and paragraph vectors are trained along with word vectors. Based on the unified representation, word and sentence (with different length) retrieval are experimentally studied. The results show that information with similar meaning can be retrieved even if the information is expressed with different words.

Extraction Algorithm of English Text Summarization for English Teaching

Abstract:

In order to improve the ability of sharing and scheduling capability of English teaching resources, an improved algorithm for English text summarization is proposed based on Association semantic rules. The relative features are mined among English text phrases and sentences, the semantic relevance analysis and feature extraction of keywords in English abstract are realized, the association rules differentiation for English text summarization is obtained based on information theory, related semantic rules information in English Teaching Texts is mined. Text similarity feature is taken as the maximum difference component of two semantic association rule vectors, and combining semantic similarity information, the accurate extraction of English text Abstract is realized. The simulation results show that the method can extract the text summarization accurately, it has better convergence and precision performance in the extraction process.

Theme-related keyword extraction from free text descriptions of image contents for tagging

Abstract:

This paper discusses a method for automatic theme-related keyword extraction from users’ natural language comments on their photographs and videos. ‘Theme’ indicates the concepts circumscribing and describing the content of the photos and videos such as pets, natural sites, palaces and places. The method employs a deep learning algorithm, RNN(Recurrent Neural Network) that is good at recognizing implicit patterns of sequential data. The method has been applied to the construction of a place-related image content DB, and delivers reasonably good performance even in case the measure (i.e. themes of image contents) is abstract and vague.

Diggit: Automated code review via software repository mining

Abstract:

We present Diggit, a tool to automatically generate code review comments, offering design guidance on prospective changes, based on insights gained from mining historical changes in source code repositories. We describe how the tool was built and tuned for use in practice as we integrated Diggit into the working processes of an industrial development team. We focus on the developer experience, the constraints that had to be met in adapting academic research to produce a tool that was useful to developers, and the effectiveness of the results in practice.

Discovering Program Topoi via Hierarchical Agglomerative Clustering

Abstract:

In long lifespan software systems, specification documents can be outdated or even missing. Developing new software releases or checking whether some user requirements are still valid becomes challenging in this context. This challenge can be addressed by extracting high-level observable capabilities of a system by mining its source code and the available source-level documentation. This paper presents feature extraction and traceability (FEAT), an approach that automatically extracts topoi, which are summaries of the main capabilities of a program, given under the form of collections of code functions along with an index. FEAT acts in two steps: first, clustering: by mining the available source code, possibly augmented with code-level comments, hierarchical agglomerative clustering groups similar code functions. In addition, this process gathers an index for each function. Second, entry point selection: functions within a cluster are then ranked and presented to validation engineers as topoi candidates. We implemented FEAT on top of a general-purpose test management and optimization platform and performed an experimental study over 15 open-source software projects amounting to more than 1 M lines of codes proving that automatically discovering topoi is feasible and meaningful on realistic projects.

Comments Mining With TF-IDF: The Inherent Bias and Its Removal

Abstract:

Text mining have gained great momentum in recent years, with user-generated content becoming widely available. One key use is comment mining, with much attention being given to sentiment analysis and opinion mining. An essential step in the process of comment mining is text pre-processing; a step in which each linguistic term is assigned with a weight that commonly increases with its appearance in the studied text, yet is offset by the frequency of the term in the domain of interest. A common practice is to use the well-known tf-idf formula to compute these weights.

Visual Analysis of Spatio-Temporal Distribution and Retweet Relation in Weibo Event

Abstract:

Sina Weibo is the most popular microblog service in China and it can provide abundant information about netizens’ attitudes and opinions to those events which are exposed on the Internet. However, it is difficult to know the characteristics of internet public opinions, such as the evolution of users’ focus over time, spatio-temporal distribution of users participating in event comments, weibo retweet relation, etc. To fully understand those, we propose a visual analytic system of Weibo Event, short for WeiboViz, which can be mainly divided into four subparts: fundamental information visualization, spatio-temporal distribution visualization, keywords and entities visualization, weibo retweet relation visualization. A case study of “‘Pseudomonas aeruginosa’ exceeded in the Master Kong You Yue drinking water” demonstrates the effectiveness of the proposed system for the exploration and understanding of weibo data about specific event.

Research on Kano model based on online comment data mining

Abstract:

The opinion mining and the sentiment analysis of the network comment are the key points of the text analysis. By excavating the comment information of the online products, the real demand of the customers can be obtained, however it does not reflect the differences among the requirements. In this paper, we come up with the ideal to combine the data mining technology with the Kano model, first we discover the feature theme of the product by establishing the comment mining model, and analysis the sentiment of the comment through machine learning to acquire the parameters of Kano model such as the initial importance. Finally obtain the real demand of multiple customers as well as the weight of demand. Although we take research on mobile phone products on the net market to proved its feasibility and effectiveness.

Text mining based on tax comments as big data analysis using SVM and feature selection

Abstract:

The tax gives an important role for the contributions of the economy and development of a country. The improvements to the taxation service system continuously done in order to increase the State Budget. One of consideration to know the performance of taxation particularly in Indonesia is to know the public opinion as for the object service. Text mining can be used to know public opinion about the tax system. The rapid growth of data in social media initiates this research to use the data source as big data analysis. The dataset used is derived from Facebook and Twitter as a source of data in processing tax comments. The results of opinions in the form of public sentiment in part of service, website system, and news can be used as consideration to improve the quality of tax services. In this research, text mining is done through the phases of text processing, feature selection and classification with Support Vector Machine (SVM). To reduce the problem of the number of attributes on the dataset in classifying text, Feature Selection used the Information Gain to select the relevant terms to the tax topic. Testing is used to measure the performance level of SVM with Feature Selection from two data sources. Performance measured using the parameters of precision, recall, and F-measure.

 


Application of text classification and clustering of Twitter data for business analytics

Abstract:

In the recent years, social networks in business are gaining unprecedented popularity because of their potential for business growth. Companies can know more about consumers’ sentiments towards their products and services, and use it to better understand the market and improve their brand. Thus, companies regularly reinvent their marketing strategies and campaigns to fit consumers’ preferences. Social analysis harnesses and utilizes the vast volume of data in social networks to mine critical data for strategic decision making. It uses machine learning techniques and tools in determining patterns and trends to gain actionable insights. This paper selected a popular food brand to evaluate a given stream of customer comments on Twitter. Several metrics in classification and clustering of data were used for analysis. A Twitter API is used to collect twitter corpus and feed it to a Binary Tree classifier that will discover the polarity lexicon of English tweets, whether positive or negative. A k-means clustering technique is used to group together similar words in tweets in order to discover certain business value. This paper attempts to discuss the technical and business perspectives of text mining analysis of Twitter data and recommends appropriate future opportunities in developing this emerging field.

RECENT PAPERS