The IEEE International Conference on Multimedia & Expo (ICME) has been the flagship multimedia conference sponsored by four IEEE societies since 2000. It aims at promoting exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities. ICME attracts well over 1000 submissions and 500 participants each year, serving as the prime forum for the dissemination of knowledge in the multimedia field. In 2020 again, ICME convenes leading researchers and practitioners to share the latest developments and advances in the discipline. The conference will showcase high quality oral and poster presentations, as well as, feature Workshops sponsored by IEEE societies. Researchers, developers and practitioners are welcomed to organise such Workshops on any new or emerging topic of Multimedia technology. An exposition of multimedia products, animations and industries will be held in conjunction with the conference. Moreover, proposals for Panels, Tutorials, Special Sessions, Collaborative Project Papers and Grand Challenges are also invited. In ICME 2020, exceptional papers and contributors will be also selected and recognised with prestigious awards.
Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We propose two independent architectures for action recognition using meta-classifiers --- the first is based on combining kernels of support-vector-machines (SVM) and the second is based on distributed Gaussian Processes. Both receive features that are computed using a multi-stream deep convolutional neural network, enabling us to achieve state-of-the-art performance on a 51 and a 101-class activity recognition problem (HMDB-51/UCF-101 dataset). The resulting architecture is named pillar networks as each (very) deep neural network acts as a pillar for the meta classifiers. In addition, we illustrate that hand-crafted features such as the improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), as additional pillars, can further supplement the performance.
Dr.Yu Qian is senior research scientist at Cortexica Vision System Ltd, UK. She has educational background in electrical and electronics engineering and computer science with a bachelors and masters from Hefei University of Technology, China and Phd from the school Computer Science of Middlesex University. After completion of her PhD she was appointed as a research officer/fellow and worked on sketch-based video retrieval (EPSRC project) at Media Technology Research Centre(MTRC) of University of Bath and CVSSP at University of Surrey. After, she joined Middlesex University as researcher fellow and worked on EU project for medical image analysis. Her research interest focuses on computer vision and machine learning, especially in visual feature representation for image/video analysis.
The proliferation of new capabilities in affordable smart devices capable of capturing, processing and rendering audio-visual media content triggers a need for coordination and orchestration between these devices and their capabilities, and of the content flowing from and to such devices. The upcoming MPEG Media Orchestration standard (“MORE”, ISO/IEC 23001-13) enables the temporal and spatial orchestration of multiple media and metadata streams. Temporal orchestration is about time synchronisation of media and sensor captures, processing and renderings, for which the MORE standard uses and extends a DVB standard. Spatial orchestration is about the alignment of (global) position, altitude and orientation, for which the MORE standard provides dedicated timed metadata. Other types of orchestration involve timed metadata for region of interest, perceptual quality of media, audio-feature extraction and media timeline correlation.