中文版 | English
Title

DHive: Query Execution Performance Analysis via Dataflow in Apache Hive

Author
Corresponding AuthorTang, Bo
Publication Years
2023-08-01
DOI
Source Title
ISSN
2150-8097
Volume16Issue:12
Abstract
Nowadays, Apache Hive has been widely used for large-scale data analysis applications in many organizations. Various visual analytical tools are developed to help Hive users quickly analyze the query execution process and identify the performance bottleneck of executed queries. However, existing tools mostly focus on showing the time usage of query sub-components (jobs and operators) but fail to provide enough evidence to analyze the root reasons for the slow execution progress. To tackle this problem, we develop a visual analytical system DHive to visualize and analyze the query execution progress via dataflow analysis. DHive shows the dataflow during query execution at multiple levels: query level, job level and task level, which enable users to identify the key jobs/tasks and explain their time usage by linking them to the auxiliary information such as the system configuration and hardware status. We demonstrate the effectiveness of DHive by two cases in a production cluster. DHive is open-source at https://github.com/DBGroupSUSTech/DHive.git.
URL[Source Record]
Indexed By
Language
English
SUSTech Authorship
First ; Corresponding
Funding Project
Shenzhen Fundamental Research Program[20220815112848002] ; Guangdong Provincial Key Laboratory[2020B121201001]
WOS Research Area
Computer Science
WOS Subject
Computer Science, Information Systems ; Computer Science, Theory & Methods
WOS Accession No
WOS:001067701000066
Publisher
Data Source
Web of Science
Citation statistics
Document TypeJournal Article
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/582919
DepartmentDepartment of Computer Science and Engineering
Affiliation
1.Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
2.Southern Univ Sci & Technol, Res Inst Trustworthy Autonomous Syst, Shenzhen, Peoples R China
First Author AffilicationDepartment of Computer Science and Engineering
Corresponding Author AffilicationDepartment of Computer Science and Engineering
First Author's First AffilicationDepartment of Computer Science and Engineering
Recommended Citation
GB/T 7714
Zhang, Chaozu,Shen, Qiaomu,Tang, Bo. DHive: Query Execution Performance Analysis via Dataflow in Apache Hive[J]. PROCEEDINGS OF THE VLDB ENDOWMENT,2023,16(12).
APA
Zhang, Chaozu,Shen, Qiaomu,&Tang, Bo.(2023).DHive: Query Execution Performance Analysis via Dataflow in Apache Hive.PROCEEDINGS OF THE VLDB ENDOWMENT,16(12).
MLA
Zhang, Chaozu,et al."DHive: Query Execution Performance Analysis via Dataflow in Apache Hive".PROCEEDINGS OF THE VLDB ENDOWMENT 16.12(2023).
Files in This Item:
There are no files associated with this item.
Related Services
Fulltext link
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Zhang, Chaozu]'s Articles
[Shen, Qiaomu]'s Articles
[Tang, Bo]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Zhang, Chaozu]'s Articles
[Shen, Qiaomu]'s Articles
[Tang, Bo]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang, Chaozu]'s Articles
[Shen, Qiaomu]'s Articles
[Tang, Bo]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.