Title | Deep Neural Networks on Genetic Motif Discovery: the Interpretability and Identifiability Issues |
Author | |
Name pinyin | ZHANG Yu
|
School number | 11756001
|
Degree | 博士
|
Discipline | 计算机科学
|
Supervisor | |
Mentor unit | 计算机科学与工程系
|
Tutor of External Organizations | Peter Tino
|
Tutor units of foreign institutions | 伯明翰大学
|
Publication Years | 2022-03-30
|
Submission date | 2022-07-01
|
University | 伯明翰大学
|
Place of Publication | 伯明翰
|
Abstract | Deep neural networks have made great success in a wide range of research fields and real-world applications. However, as a black-box model, the drastic advances in the performance come at the cost of model interpretability. This becomes a big concern especially for domains that are safety-critical or have ethical and legal requirements (e.g., avoiding algorithmic discrimination). In other situations, interpretability might be able to help scientists gain new ``knowledge'' that is learnt by the neural networks (e.g., computational genomics), and neural network based genetic motif discovery is such a field. It naturally leads us to another question: Can current neural network based motif discovery methods identify the underlying motifs from the data? How robust and reliable is it? In other words, we are interested in the motif identifiability problem. In this thesis, we first conduct a comprehensive review of the current neural network interpretability research, and propose a novel unified taxonomy which, to the best of our knowledge, provides the most comprehensive and clear categorisation of the existing approaches. Then we formally study the motif identifiability problem in the context of neural network based motif discovery (i.e., if we only have access to the predictive performance of a neural network, which is a black-box, how well can we recover the underlying ``true'' motifs by interpreting the learnt model). Systematic controlled experiments show that although accurate models tend to recover the underlying motifs better, the motif identifiability (a measure of the similarity between true motifs and learnt motifs) still varies in a large range. Also, the over-complexity (without overfitting) of a high-accuracy model (e.g., using 128 kernels while 16 kernels are already good enough) may be harmful to the motif identifiability. We thus propose a robust neural network based motif discovery workflow addressing above issues, which is verified on both synthetic and real-world datasets. Finally, we propose probabilistic kernels in place of conventional convolutional kernels and study whether it would be better to directly learn probabilistic motifs in the neural networks rather than post hoc interpretation. Experiments show that although probabilistic kernels have some merits (e.g., stable output), their performance is not comparable to classic convolutional kernels under the same network setting (the number of kernels). |
Keywords | |
Language | English
|
Training classes | 联合培养
|
Enrollment Year | 2017
|
Year of Degree Awarded | 2022-07
|
References List | [1] Julius Adebayo et al. “Sanity Checks for Saliency Maps”. Advances in Neural Information Processing Systems. Vol. 31. 2018. |
Data Source | 人工提交
|
Document Type | Thesis |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/347870 |
Department | Department of Computer Science and Engineering |
Recommended Citation GB/T 7714 |
Zhang Y. Deep Neural Networks on Genetic Motif Discovery: the Interpretability and Identifiability Issues[D]. 伯明翰. 伯明翰大学,2022.
|
Files in This Item: | ||||||
File Name/Size | DocType | Version | Access | License | ||
11756001-张宇-计算机科学与工程(11669KB) | Restricted Access | -- | Fulltext Requests |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment