Deep Neural Networks on Genetic Motif Discovery: the Interpretability and Identifiability Issues
|Tutor of External Organizations|
|Tutor units of foreign institutions|
|Place of Publication|
Deep neural networks have made great success in a wide range of research fields and real-world applications. However, as a black-box model, the drastic advances in the performance come at the cost of model interpretability. This becomes a big concern especially for domains that are safety-critical or have ethical and legal requirements (e.g., avoiding algorithmic discrimination). In other situations, interpretability might be able to help scientists gain new ``knowledge'' that is learnt by the neural networks (e.g., computational genomics), and neural network based genetic motif discovery is such a field. It naturally leads us to another question: Can current neural network based motif discovery methods identify the underlying motifs from the data? How robust and reliable is it? In other words, we are interested in the motif identifiability problem.
In this thesis, we first conduct a comprehensive review of the current neural network interpretability research, and propose a novel unified taxonomy which, to the best of our knowledge, provides the most comprehensive and clear categorisation of the existing approaches. Then we formally study the motif identifiability problem in the context of neural network based motif discovery (i.e., if we only have access to the predictive performance of a neural network, which is a black-box, how well can we recover the underlying ``true'' motifs by interpreting the learnt model). Systematic controlled experiments show that although accurate models tend to recover the underlying motifs better, the motif identifiability (a measure of the similarity between true motifs and learnt motifs) still varies in a large range. Also, the over-complexity (without overfitting) of a high-accuracy model (e.g., using 128 kernels while 16 kernels are already good enough) may be harmful to the motif identifiability. We thus propose a robust neural network based motif discovery workflow addressing above issues, which is verified on both synthetic and real-world datasets. Finally, we propose probabilistic kernels in place of conventional convolutional kernels and study whether it would be better to directly learn probabilistic motifs in the neural networks rather than post hoc interpretation. Experiments show that although probabilistic kernels have some merits (e.g., stable output), their performance is not comparable to classic convolutional kernels under the same network setting (the number of kernels).
|Year of Degree Awarded|
 Julius Adebayo et al. “Sanity Checks for Saliency Maps”. Advances in Neural Information Processing Systems. Vol. 31. 2018.
|Department||Department of Computer Science and Engineering|
Zhang Y. Deep Neural Networks on Genetic Motif Discovery: the Interpretability and Identifiability Issues[D]. 伯明翰. 伯明翰大学,2022.
|Files in This Item:|
|11756001-张宇-计算机科学与工程（11669KB）||Restricted Access||--||Fulltext Requests|
|Recommend this item|
|Export to Endnote|
|Export to Excel|
|Export to Csv|
|Similar articles in Google Scholar|
|Similar articles in Baidu Scholar|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.