With the spread of smart phones and camera devices, as well as various surveillance and monitoring devices all over the world, video data is becoming increasingly easy to capture and store, and growing at an exponential rate. Moreover, as video acquisition is no longer professional’s work, anyone is able to record a video and upload it to internet. However, users always follow the mentality ’capture ?rst, ?lter later’, while think little of time spending, cutting, content and view selection. Consequently, these user-generated videos consist of long, poorly-?lmed (including illumination, shakiness, dynamic background and soon) and unedited contents, such as surveillance feeds, home videos or video dumps from a wearable camera. However, there is still information in these videos, yet most of them are likely not able or enjoyable to be reviewed in detail. Hence, the demand for e?cient ways to search and retrieve desired content increases fast, and it will cost huge amounts of resources like time, human resources and machine con?gurations to process these videos. Currently, users preview a video through various metadata, such as thumbnail, tile, description, video length or quick skim of the entire video. However, it is usually impossible for users to have a concrete sense of the video content or ?nd signi?cant contents quickly. Consequently, the best video usually have been carefully and manually edited to feature the highlights and rim out the boring segments. Therefore, video summarization plays an important role in this context. The summary of a video is a brief representation of it, but still able to convey signi?cant content. A good summary should be concise and with high coverage, and retain the most informative and signi?cant contents. However, generating a good video summary is a challenge task because its character is actually contradictory, since it must be compact but include all signi?cant contents as much as possible. Moreover, the most basic but challengable part of video summarization is to ?nd which part is the most signi?cant and important. Most of user-generated videos only contain several segments of frames where signi?cant contents occur. Hence, traditional approaches with high frequency do not produce semantically meaningful results without prior knowledge, so they focus on some speci?c areas, such as sports or news. Thus, it is di?cult to identify the semantic of a generic video for current development of machine intelligence. In consequence, it is necessary to develop new techniques to summary user-generated videos. The objective of this research is to develop a video summarization method for user-generated video. Video summarization can be applied in many practical applications, such as analyzing surveillance data, video browsing, action recognition or creating a visual diary. In some speci?c domain, it can also be used to generate movie, sports or news highlights. In addition, these summarization techniques is possible to naturally translate to robotics applications in the future.
Zhuo Lei, Ke Sun, Qian Zhang, and Guoping Qiu. 2016. User Video Summarization Based on Joint Visual and Semantic Affinity Graph. In Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion (iV&L-MM '16).
This work was carried out at the International Doctoral Innovation Centre (IDIC). The authors acknowledge the financial support from Ningbo Education Bureau, Ningbo Science and Technology Bureau, China's MOST, and the University of Nottingham. The work is also partially supported by EPSRC grant no EP/G037574/1.