Search or add a thesis

Advanced Search (Beta)
Home > A Novel Architecture to Integrate Multi-Sourcedatain Distributed Environment

A Novel Architecture to Integrate Multi-Sourcedatain Distributed Environment

Thesis Info

Access Option

External Link

Author

Sidra Zulfiqar

Institute

Virtual University of Pakistan

Institute Type

Public

City

Lahore

Province

Punjab

Country

Pakistan

Thesis Completing Year

2019

Thesis Completion Status

Completed

Subject

Software Engineering

Language

English

Link

http://vspace.vu.edu.pk/detail.aspx?id=326

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676721024252

Similar


The amount of data has been increasing over the last few years due to the emergence of various end-user applications. These applications utilize cloud computing infrastructure in the data centers. Apart from the increasing volume of data, there are other factors such as variety, velocity, and veracity of the data which result in the problem of big data. Traditional database management systems are not efficient to handle big data. The use of big data platform is necessary to resolve the big data problem. Hadoop is one of the platforms which resolve the problem of big data. Hadoop uses a distributed storage system. Hive and HBase are some of the big data tools for storing big data in Hadoop. They run on top of Hadoop distributed file system (HDFS). Hive is a data warehouse framework for querying and analysis of data that is stored in HDFS.?Hive?is an open-source software that lets programmers analyze large data sets on Hadoop. HBase is a column-oriented, distributed and high fault-tolerant database. It is used to store and manage big data. It can store billions of rows at a time. Both Hive and HBase can be used to store the big data in Hadoop. When the data comes from multiple sources, it is stored into multiple tables in Hive and HBase. As a result, its performance degrades when there is a need to perform join operations. In this thesis, we propose an architecture which stores data from multiple sources into a single HBase table. A new table schema with a unique row key is designed which integrates multi-source data in a table. There is no need to perform join operation in the proposed technique as the data is integrated into a single HBase table. We evaluated the proposed technique using a real testbed by considering a dataset of two publishers. We compare the performance by storing data into Hive and also in the proposed HBase table. Results show improved query performance of the proposed technique as compared to the traditional approach of using join operations in multiple tables in Hive.
Loading...
Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...