Indian Institute of Technology (IIT) Jodhpur researchers are developing a framework for creating software – Comic-to-Video Network (C2VNet) to convert born-digital, or digitised comic books, to video. This framework revolves around creating an audio-video storybook.
The C2VNet evolves panel-by-panel in a comic strip and eventually produces a full-length video (with audio) of a digitized or born-digital storybook. The goal was to design and develop software that takes a born-digital or digitized comic book as input and produces an audio-visual animated movie from it.
Along with the software, IIT Jodhpur researchers have proposed a dataset titled “IMCDB: Indian Mythological Comic Dataset of Digitized Indian Comic Storybook” in the English language. This has complete annotations for panels, binary masks of the text balloon, and text files for each speech balloon and narration box within a panel and plans to make the dataset publicly available.
“Methods and methodologies that can create the desired multimedia content have grown as a result of advances in technology. One such instance is “Automatic image synthesis”, which has gained a lot of attention among researchers. In contrast, audio-video scene synthesis, such as that based on document images, remains challenging and underresearched. This field of DH lacks sustained analysis of multimodality in automatic content synthesis and its growing impact on digital scholarship in the humanities. The C2VNet is a step towards bridging this gap,” says IIT Jodhpur.
Dr Chiranjoy Chattopadhyay, Assistant Professor, Department of Computer Science and Engineering, IIT Jodhpur, said that the panel extraction model C2VNet has two internal networks to support the video creation. “CPENet developed by the team gives over 97 per cent accuracy, and the speech balloon segmentation model SBSNet gives 98 per cent accuracy with fewer parameters. Both have outperformed state-of-art models. C2VNet is the first step towards the big future of automatic multimedia creation of comic books to bring new comic reading experiences.”
The study discusses the automation of creating audio-visual content from scanned document images. In the future, the team is working towards improving the software so that these multimedia books become more immersive and engaging for the target audience. “Usually, this kind of work takes more time and effort, but with this software, it can be done quickly and in a more interactive way,” adds the IIT.