| +91 9790176891

BIG DATA 2016 Projects

Category Archives

Securing Big Data Environments from Attacks


          In this paper we propose techniques for securing big data environments such as public cloud with tenants using their virtual machines for different services such as utility and healthcare. Our model makes use of state based monitoring of the data sources for service specific detection of the attacks and offline traffic analysis of multiple data sources to detect attacks such as botnets.

On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications


          The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.

Attribute Based Access Control for Big Data applications by Query Modification


     We present concepts which can be used for the efficient implementation of Attribute Based Access Control (ABAC) in large applications using maybe several data storage technologies, including Hadoop, NoSQL and relational database systems. The ABAC authorization process takes place in two main stages. Firstly a sequence of permissions is derived which specifies permitted data to be retrieved for the user’s transaction. Secondly, query modification is used to augment the user’s transaction with code which implements the ABAC controls. This requires the storage technologies to support a high-level language such as SQL or similar. The modified user transactions are then optimized and processed using the full functionality of the underlying storage systems. We use an extended ABAC model (TCM2) which handles negative permissions and overrides in a single permissions processing mechanism. We illustrate these concepts using a compelling electronic health records scenario.

A Distributed Mobile Cloud Computing Model for Secure Big Data


  Mobile cloud computing provides a novel ecommerce mode for organizations without any upfront investment. Since cloud computing uses distributed resources in open environment, it is important to provide secure keys to share the data for developing cloud computing applications. To ensure a correctness of users’ data in the cloud, we propose an effective and secure distributed model including a Self-Proxy Server (SPS) with self-created algorithm. The model resolves a communication bottleneck due to re-encryption of a shared data in the cloud whenever users are revoked. It offers to reduce security risks and protect their resources because a distributed SPS dynamically interacts with Key Manager (KM) when the mobile users take on cloud services. This paper presents a comprehensive mobile cloud design which provides an effective and secure cloud computing services on mobile devices.

On the Properties of Non-media Digital Watermarking: A Review of State of the Art Techniques


    Over the last 25 years, there has been much work on multimedia digital watermarking. In this domain, the primary limitation to watermark strength has been in its visibility. For multimedia watermarks, invisibility is defined in human terms (that is, in terms of human sensory limitations). In this paper, we review recent developments in the non-media applications of data watermarking, which have emerged over the last decade as an exciting new sub domain. Since by definition, the intended receiver should be able to detect the watermark, we have to redefine invisibility in an acceptable way that is often application-specific and thus cannot be easily generalized. In particular, this is true when the data is not intended to be directly consumed by humans. For example, a loose definition of robustness might be in terms of the resilience of a watermark against normal host data operations, and of invisibility as resilience of the data interpretation against change introduced by the watermark. In our paper, we classify the data in terms of data mining rules on complex types of data such as time-series, symbolic sequences, data streams and so forth. We emphasize the challenges involved in non-media watermarking in terms of common watermarking properties including invisibility, capacity, robustness, and security. With the aid of a few examples of watermarking applications, we demonstrate these distinctions and we look at the latest research in this regard to make our argument clear and more meaningful. As the last aim, we look at the new challenges of digital watermarking that have arisen with the evolution of big data.

Predicting Instructor Performance Using Data Mining Techniques in Higher Education


       Data mining applications are becoming a more common tool in understanding and solving educational and administrative problems in higher education. Generally, research in educational mining focuses on modeling student’s performance instead of instructors’ performance. One of the common tools to evaluate instructors’ performance is the course evaluation questionnaire to evaluate based on students’ perception. In this study, four different classification techniques, –decision tree algorithms, support vector machines, artificial neural networks, and discriminant analysis– are used to build classifier models. Their performances are compared over a dataset composed of responses of students to a real course evaluation questionnaire using accuracy, precision, recall, and specificity performance metrics. Although all the classifier models show comparably high classification performances, C5.0 classifier is the best with respect to accuracy, precision, and specificity. In addition, an analysis of the variable importance for each classifier model is done. Accordingly, it is shown that many of the questions in the course evaluation questionnaire appear to be irrelevant. Furthermore, the analysis shows that the instructors’ success based on the students’ perception mainly depends on the interest of the students in the course. The findings of the study indicate the effectiveness and expressiveness of data mining models in course evaluation and higher education mining. Moreover, these findings may be used to improve measurement instruments.

Social Set Analysis: A Set Theoretical Approach to Big Data Analytics


       Current analytical approaches in Computational Social Science can be characterized by four dominant paradigms: text analysis (information extraction and classification), social network analysis (graph theory), social complexity analysis (complex systems science), social simulations (cellular automata and agent-based modelling). However, when it comes to organizational and societal units of analysis, there exists no approach to conceptualise, model, analyze, explain and predict social media interactions as individuals’ associations with ideas, values, identities, etc. To address this limitation, based on the sociology of associations and the mathematics of set theory, this paper presents a new approach to big data analytics called Social Set Analysis. Social Set Analysis consists of a generative framework for philosophies of computational social science, theory of social data, conceptual and formal models of social data, and an analytical framework for combining big social datasets with organisational and societal datasets. Three empirical studies of big social data are presented to illustrate and demonstrate Social Set Analysis in terms of fuzzy set-theoretical sentiment analysis, crisp set-theoretical interaction analysis and event-studies oriented set theoretical visualisations. Implications for big data analytics, current limitations of the set-theoretical approach, and future directions are outlined.

Guest Editorial Big Data Analytics: Risk and Operations Management for Industrial Applications


  BIG data research has been a popular research topic over the last few years. It not only generates enormous attention compared to other research trends in the past, but also covers diverse and wide disciplines in its applications. This probably leads to the fact that the growth of this community is unparalleled, and attention being drawn to this research “buzzword” is growing at an explosive pace. It is simply not a business jargon! Above assertion can be supported by Table 1, which summaries the number of publications in recent years. The data were obtained by searching the term “big data” via three common scholarly databases. The search is simple and no screening was conducted, but the numbers are very representative and impressive.

MiraMap: A We-government Tool for Smart Peripheries in Smart Cities


       Increasingly over the last decade, there has been attention and expectations on the role that ICT solutions can play in increasing accountability, participation and transparency in the Public Administration. Also, attention to citizen participation is more and more at the center of the debate about Smart Cities. However technological solutions have been often proposed without considering first citizens needs and the sociotechnical misalignment within the city, i.e. in peripheral area. The paper outlines the design and implementation process of a wegovernment IT tool, called MiraMap. The project has been developed in the Mirafiori District in Torino (Italy), a neighbourhood which is characterized by problems of marginality and by several undergoing urban transformations with a very high potential for social and economic development in the next few years. This makes Mirafiori Sud a valuable case study environment to experiment new methods and IT solutions to strengthen connection between citizens and public administration. The object of MiraMap, indeed, is to facilitate communication and management between citizens and administration in reporting of issues and claims but also in submitting proposals. Collecting and handling this information in an efficient way is crucial to improve the quality of life in urban suburbs, addressing more targeted and better performed public policies. In order to achieve those results, the authors combined First Life, a new local social network based on an interactive map – with a Business Process Management (BPM) system for easing reports about claims and proposals to be handled. The research process involves an interdisciplinary team, composed by architects, computer scientists, engineers, geographers, legal experts, with the direct participation of local administrators and citizens.