Building a Chinese ancient architecture multimodal dataset combining image, annotation and style-model

Biao Li; Jinyuan Feng; Yunxi Yan; Gang Kou; Hemin Li; Yang Du; Xun Wang; Tie Li; Yi Peng; Kun Guo; Yong Shi

doi:10.1038/s41597-024-03946-1

Building a Chinese ancient architecture multimodal dataset combining image, annotation and style-model

Sci Data. 2024 Oct 16;11(1):1137. doi: 10.1038/s41597-024-03946-1.

Authors

Biao Li^#^{1

2}, Jinyuan Feng^#^{3

4

5}, Yunxi Yan^#^{1

2}, Gang Kou^{6

7

8}, Hemin Li^{1

2}, Yang Du^{1

2}, Xun Wang⁹, Tie Li¹⁰, Yi Peng¹⁰, Kun Guo^{3

4

5}, Yong Shi^{11

12

13}

Affiliations

¹ School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu, 611130, China.
² Big Data Laboratory on Financial Security and Behavior, Southwestern University of Finance and Economics, Chengdu, 611130, China.
³ School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100190, China.
⁴ Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, 100190, China.
⁵ Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, 100190, China.
⁶ School of Business Administration, Faculty of Business Administration, Southwestern University of Finance and Economics, Chengdu, 611130, China. kougang@swufe.edu.cn.
⁷ Big Data Laboratory on Financial Security and Behavior, Southwestern University of Finance and Economics, Chengdu, 611130, China. kougang@swufe.edu.cn.
⁸ Xiangjiang Laboratory, Changsha, 410205, China. kougang@swufe.edu.cn.
⁹ School of Digital Art and Design, Chengdu Neusoft University, Chengdu, 611844, China.
¹⁰ School of Management and Economics, University of Electronic Science and Technology of China, Chengdu, 611731, China.
¹¹ School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100190, China. yshi@ucas.ac.cn.
¹² Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, 100190, China. yshi@ucas.ac.cn.
¹³ Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, 100190, China. yshi@ucas.ac.cn.

^# Contributed equally.

PMID: 39414834
DOI: 10.1038/s41597-024-03946-1

Abstract

In this rapidly evolving era of multimodal generation, diffusion models exhibit impressive generative capabilities, significantly enhancing the realm of creative image synthesis by intricately textual prompts. Yet, their effectiveness is limited in certain niche sectors, like depicting Chinese ancient architecture. This limitation is primarily due to the insufficient data that fails to encompass the unique architectural features and corresponding text information. Hence, we build an extensive multimodal dataset capturing the essence of Chinese architectures mostly from the Tang to the Yuan Dynasties. The dataset is categorized on the types, including image&text, video, and style models. In details, images and videos are methodically categorized based on locations. All images are annotated at two levels: initial annotations and descriptive terms based on distinctive characteristics and official information. Moreover, seven artistic styles fine-tuning models are provided in our dataset for further innovations. Significantly, this is the first Chinese ancient architecture dataset and the instance of using the Pinyin system to annotate unique terms related to Chinese architectural styles.

Abstract

Grants and funding