In video surveillance, the re-identification of a person through different camera views, involves the recognition of its identity from visual features among all detected people in the scene. This task has been addressed by the learning of a degree of appearance similarity measurement, able to compare two images and determine whether they belong to the same individual or not. This seminar presents a base deep convolutional neural network model with a Siamese architecture, which treats the identification problem as a pair-wise binary classification task. The importance of the training data, as well as, the effects of different architectures and loss functions have been proved through an extensive battery of experiments.
María J. Gómez was graduated as Industrial Technical Engineer specialized in Electronics in 2012 at Universidad de Extremadura, and was awarded the best graduate student. She received her M.Sc. Degree in Robotics and Automation in 2014 at Universidad Carlos III de Madrid (UC3M). Currently, she is developing her PhD degree in Electrical Engineering, Electronics and Automation program at UC3M with Intelligent Systems Lab (LSI). She has experience in image and point cloud data processing and is working in the research field of Intelligent Surveillance Systems.
Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We propose two independent architectures for action recognition using meta-classifiers --- the first is based on combining kernels of support-vector-machines (SVM) and the second is based on distributed Gaussian Processes. Both receive features that are computed using a multi-stream deep convolutional neural network, enabling us to achieve state-of-the-art performance on a 51 and a 101-class activity recognition problem (HMDB-51/UCF-101 dataset). The resulting architecture is named pillar networks as each (very) deep neural network acts as a pillar for the meta classifiers. In addition, we illustrate that hand-crafted features such as the improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), as additional pillars, can further supplement the performance.
Dr.Yu Qian is senior research scientist at Cortexica Vision System Ltd, UK. She has educational background in electrical and electronics engineering and computer science with a bachelors and masters from Hefei University of Technology, China and Phd from the school Computer Science of Middlesex University. After completion of her PhD she was appointed as a research officer/fellow and worked on sketch-based video retrieval (EPSRC project) at Media Technology Research Centre(MTRC) of University of Bath and CVSSP at University of Surrey. After, she joined Middlesex University as researcher fellow and worked on EU project for medical image analysis. Her research interest focuses on computer vision and machine learning, especially in visual feature representation for image/video analysis.
The proliferation of new capabilities in affordable smart devices capable of capturing, processing and rendering audio-visual media content triggers a need for coordination and orchestration between these devices and their capabilities, and of the content flowing from and to such devices. The upcoming MPEG Media Orchestration standard (“MORE”, ISO/IEC 23001-13) enables the temporal and spatial orchestration of multiple media and metadata streams. Temporal orchestration is about time synchronisation of media and sensor captures, processing and renderings, for which the MORE standard uses and extends a DVB standard. Spatial orchestration is about the alignment of (global) position, altitude and orientation, for which the MORE standard provides dedicated timed metadata. Other types of orchestration involve timed metadata for region of interest, perceptual quality of media, audio-feature extraction and media timeline correlation.