TY - JOUR
T1 - Manifesting construction activity scenes via image captioning
AU - Liu, Huan
AU - Wang, Guangbin
AU - Huang, Ting
AU - He, Ping
AU - Skitmore, Martin
AU - Luo, Xiaochun
N1 - Funding Information:
This research is funded by the National Natural Science Foundation of China (Grant No. 71771172 and 71471138 ), the Innovation and Technology Commission (ITC) of Hong Kong ( ITP/020/18LP ), the Sichuan Science and Technology Program ( 2020YFH0124 ), and the Research Institute for Sustainable Urban Development (RISUD) in Hong Kong ( 5-ZJLG ). The authors are grateful to Guipeng Zhang at the China State Construction Engineering Corporation (CSCEC) and Kechen An for their valuable assistance during the data collection processes. The authors also would like to thank the editor and the reviewers for their helpful suggestions.
Funding Information:
This research is funded by the National Natural Science Foundation of China (Grant No. 71771172 and 71471138), the Innovation and Technology Commission (ITC) of Hong Kong (ITP/020/18LP), the Sichuan Science and Technology Program (2020YFH0124), and the Research Institute for Sustainable Urban Development (RISUD) in Hong Kong (5-ZJLG). The authors are grateful to Guipeng Zhang at the China State Construction Engineering Corporation (CSCEC) and Kechen An for their valuable assistance during the data collection processes. The authors also would like to thank the editor and the reviewers for their helpful suggestions.
Publisher Copyright:
© 2020 Elsevier B.V.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/11
Y1 - 2020/11
N2 - This study proposed an automated method for manifesting construction activity scenes by image captioning – an approach rooted in computer vision and natural language generation. A linguistic description schema for manifesting the scenes is developed initially and two unique dedicated image captioning datasets are created for method validation. A general model architecture of image captioning is then instituted by combining an encoder-decoder framework with deep neural networks, followed by three experimental tests involving the selection of model learning strategies and performance evaluation metrics. It is demonstrated the method's performance is comparable with that of state-of-the-art computer vision methods in general. The paper concludes with a discussion of the feasibility of the practical application of the proposed approach at the current technical level.
AB - This study proposed an automated method for manifesting construction activity scenes by image captioning – an approach rooted in computer vision and natural language generation. A linguistic description schema for manifesting the scenes is developed initially and two unique dedicated image captioning datasets are created for method validation. A general model architecture of image captioning is then instituted by combining an encoder-decoder framework with deep neural networks, followed by three experimental tests involving the selection of model learning strategies and performance evaluation metrics. It is demonstrated the method's performance is comparable with that of state-of-the-art computer vision methods in general. The paper concludes with a discussion of the feasibility of the practical application of the proposed approach at the current technical level.
UR - http://www.scopus.com/inward/record.url?scp=85087394957&partnerID=8YFLogxK
U2 - 10.1016/j.autcon.2020.103334
DO - 10.1016/j.autcon.2020.103334
M3 - Article
AN - SCOPUS:85087394957
SN - 0926-5805
VL - 119
JO - Automation in Construction
JF - Automation in Construction
M1 - 103334
ER -