References
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, and K. Saenko, "Long-term recurrent convolutional networks for visual recognition and description," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, DOI: 10.1109/CVPR.2015.7298878.
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, DOI: 10.1109/CVPR.2015.7298935.
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville, "Describing videos by exploiting temporal structure," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, DOI: 10.1109/ICCV.2015.512.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L, Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," arXiv:1409.0575 [cs.CV], 2014.
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556 [cs.CV], 2014.
- H. Yu, J. Wang, Z. Huang, Y. Yang, and W. Xu., "Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks," arXiv:1510.07712 [cs.CV], 2016
- L. Gao, Z. Guo, H. Zhang, X. Xu, and H. T. Shen., "Video captioning with attention-based LSTM and semantic consistency," IEEE Transactions on Multimedia, vol. 9, no. 9, pp. 2045-2055, Sept., 2017.
- K. Tokuda, H. Zen, and A. Black, "An HMM-based speech synthesis system applied to English," 2002 IEEE Workshop on Speech Synthesis, Santa Monica, CA, USA, 2002.
- A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model for raw audio," arXiv:1609.03499[cs.SD], 2016.
- J. Sotelo, S. Mehri, K. Kumar, J. F. Santos, K. Kastner, A. Courville, and Y. Bengio, "Char2Wav: End-to-end speech synthesis," ICLR 2017, 2017.
- S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, J. Raiman, S. Sengupta, and M. Shoeybi, "Deep voice: Real-time neural text-to-speech," arXiv:1702.07825 [cs.CL], 2017.
- O. Vinyals, L.Kaiser, T. Koo, S. Petrov, I. Sutskever, and G.Hinton, "Grammar as a foreign language," arXiv:1412.7449 [cs.CL], 2015.
- M.-T. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation," arXiv:1508.04025 [cs.CL], 2015.
- S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko, "Sequence to Sequence -- Video to Text," 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
- K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, "On the properties of neural machine translation: Encoderdecoder approaches," arXiv:1409.1259 [cs.CL], 2014.
- I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," arXiv:1409.3215 [cs.CL], 2014.
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, Nov., 1997.
- D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv:1412.6980 [cs.LG], 2014.
- Y. Wang, R.J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and R. A. Saurous, "Tacotron: Towards End-to-End Speech Synthesis," Interspeech 2017, 2017, DOI: 10.21437/Interspeech.2017-1452.
- D. Bahdanau, K.H. Cho, and Y. Bengio "Neural machine translation by jointly learning to align and translate," arXiv:1409.0473 [cs.CL], 2014.
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," Journal of Machine Learning Research., vol. 15, no. 1, pp. 1929-1958, 2014.
- J. Lee, K. Cho, and T. Hofmann, "Fully character-level neural machine translation without explicit segmentation," Transactions of the Association for Computational Linguistics, vol. 5, pp. 365-378, 2017. https://doi.org/10.1162/tacl_a_00067
- R. K. Srivastava, K. Greff, and J. Schmidhuber, "Highway networks," arXiv:1505.00387 [cs.LG], 2015.
- J. Chung, C. Gulcehre, K.H. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv:1412.3555 [cs.NE], 2014.
- S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv:1502.03167 [cs.LG], 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp.770-778, 2016.
- Y. Wu, M. Schuster, Z. Chen, Q. V Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, Ł. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean, "Google's neural machine translation system: Bridging the gap between human and machine translation," arXiv:1609.08144 [cs.CL], 2016.
- H. Zen, Y. Agiomyrgiannakis, N. Egberts, F. Henderson, and P. Szczepaniak, "Fast, compact, and high-quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices," Interspeech, 2016, DOI: 10.21437/Interspeech.2016-522.
- D. Griffin and J. Lim, "Signal estimation from modified short-time fourier transform," ICASSP '83. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236-243, 1984.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, "TensorFlow: Large-scale machine learning on heterogeneous distributed systems," arXiv:1603.04467 [cs.DC], 2016.
- D. L. Chen and W. B. Dolan, "Collecting highly parallel data for paraphrase evaluation," 49th Annual Meeting of the Association for Computational Linguistics, pp. 190-200, 2011.
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a method for automatic evaluation of machine translation," 40th Annual Meeting on Association for Computational Linguistics, pp. 311-318, Philadelphia, PA, USA, 2002.
- C.-Y. Lin. "ROUGE: A Package for Automatic Evaluation of Summaries," ACL-04 Workshop, pp. 74-81, 2004.
- M. Denkowski and A. Lavie, "Meteor universal: Language specific translation evaluation for any target language," Ninth Workshop on Statistical Machine Translation, pp. 376-380, Baltimore, MD, USA, 2014.
- R. Vedantam, C. L. Zitnick, and D. Parikh, "CIDEr: Consensus- based image description evaluation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp.4566-4575, 2015.
Cited by
- 심층학습 기반 표정인식을 통한 학습 평가 보조 방법 연구 vol.23, pp.2, 2020, https://doi.org/10.18108/jeer.2020.23.2.24