Matching entries: 0
settings...

2015

Journal Papers

Blasi SG, Mrak M and Izquierdo E (2015), "Frequency-Domain Intra Prediction Analysis and Processing for High-Quality Video Coding", Circuits and Systems for Video Technology, IEEE Transactions on. May, 2015. Vol. 25(5), pp. 798-811. IEEE.
Abstract: Most of the advances in video coding technology focus on applications that require low bitrates, for example, for content distribution on a mass scale. For these applications, the performance of conventional coding methods is typically sufficient. Such schemes inevitably introduce large losses to the signal, which are unacceptable for numerous other professional applications such as capture, production, and archiving. To boost the performance of video codecs for high-quality content, better techniques are needed especially in the context of the prediction module. An analysis of conventional intra prediction methods used in the state-of-the-art High Efficiency Video Coding (HEVC) standard is reported in this paper, in terms of the prediction performance of such methods in the frequency domain. Appropriately modified encoder and decoder schemes are presented and used for this paper. The analysis shows that conventional intra prediction methods can be improved, especially for high frequency components of the signal which are typically difficult to predict. A novel approach to improve the efficiency of high-quality video coding is also presented in this paper based on such analysis. The modified encoder scheme allows for an additional stage of processing performed on the transformed prediction to replace selected frequency components of the signal with specifically defined synthetic content. The content is introduced in the signal using feature-dependent lookup tables. The approach is shown to achieve consistent gains against conventional HEVC with up to -5.2% coding gains in terms of bitrate savings.
BibTeX:
@article{blasi2015frequency,
  author = {Blasi, Saverio G. and Mrak, Marta and Izquierdo, Ebroul},
  title = {Frequency-Domain Intra Prediction Analysis and Processing for High-Quality Video Coding},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2015},
  volume = {25},
  number = {5},
  pages = {798--811},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6905757},
  doi = {10.1109/TCSVT.2014.2359097}
}
Henderson C and Izquierdo E (2015), "Robust feature matching in long-running poor quality videos", Circuits and Systems for Video Technology, IEEE Transactions on. June, 2015. Vol. PP(99), pp. 1-14. IEEE.
Abstract: We describe a methodology that is designed to match key point and region-based features in real-world images, acquired from long-running security cameras with no control over the environment. We detect frame duplication and images from static scenes that have no activity to prevent processing saliently identical images, and describe a novel blur sensitive feature detection method, a combinatorial feature descriptor and a distance calculation that efficiently unites texture and colour attributes to discriminate feature correspondence in low quality images. Our methods are tested by performing key point matching on real-world security images such as outdoor CCTV videos that are low quality and acquired in uncontrolled conditions with visual distortions caused by weather, crowded scenes, emergency lighting or the high angle of the camera mounting. We demonstrate an improvement in accuracy of matching key points between images compared with state-of-the-art feature descriptors. We use key point features from Harris Corners, SIFT, SURF, BRISK and FAST as well as MSER and MSCR region detectors to provide a comprehensive analysis of our generic method. We demonstrate feature matching using a 138- dimensional descriptor that improves the matching performance of a state-of-the-art 384-dimension colour descriptor with just 36% of the storage requirements.
BibTeX:
@article{henderson2015robustfeature,
  author = {Henderson, Craig and Izquierdo, Ebroul},
  title = {Robust feature matching in long-running poor quality videos},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2015},
  volume = {PP},
  number = {99},
  pages = {1--14},
  note = {update when published},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7118166},
  doi = {10.1109/TCSVT.2015.2441411}
}
Kordelas GA, Alexiadis DS, Daras P and Izquierdo E (2015), "Enhanced disparity estimation in stereo images", Image and Vision Computing. March, 2015. Vol. 35(0), pp. 31-49. Elsevier.
Abstract: This paper presents a novel stereo disparity estimation method, which combines three different cost metrics, defined using RGB information, the CENSUS transform, as well as Scale-Invariant Feature Transform coefficients. The selected cost metrics are aggregated based on an adaptive weight approach, in order to calculate their corresponding cost volumes. The resulting cost volumes are then merged into a combined one, following a novel two-phase strategy, which is further refined by exploiting scanline optimization. A mean-shift segmentation-driven approach is exploited to deal with outliers in the disparity maps. Additionally, low-textured areas are handled using disparity histogram analysis, which allows for reliable disparity plane fitting on these areas. Finally, an efficient two-step approach is introduced to refine disparity discontinuities. Experiments performed on the four images of the Middlebury benchmark demonstrate the accuracy of this methodology, which currently ranks first among published methods. Moreover, this algorithm is tested on 27 additional Middlebury stereo pairs for evaluating thoroughly its performance. The extended comparison verifies the efficiency of this work.
BibTeX:
@article{kordelas2015enhanced,
  author = {Kordelas, Georgios A. and Alexiadis, Dimitrios S. and Daras, Petros and Izquierdo, Ebroul},
  title = {Enhanced disparity estimation in stereo images},
  journal = {Image and Vision Computing},
  publisher = {Elsevier},
  year = {2015},
  volume = {35},
  number = {0},
  pages = {31--49},
  url = {http://www.sciencedirect.com/science/article/pii/S0262885614001851},
  doi = {10.1016/j.imavis.2014.12.001}
}
Markatopoulou F, Mezaris V, Pittaras N and Patras I (2015), "Local Features and a Two-layer Stacking Architecture for Semantic Concept Detection in Video", Emerging Topics in Computing, IEEE Transactions on. June, 2015. Vol. 3(2), pp. 193-204. IEEE.
Abstract: In this paper, we deal with the problem of extending and using different local descriptors, as well as exploiting concept correlations, toward improved video semantic concept detection. We examine how the state-of-the-art binary local descriptors can facilitate concept detection, we propose color extensions of them inspired by previously proposed color extensions of scale invariant feature transform, and we show that the latter color extension paradigm is generally applicable to both binary and nonbinary local descriptors. In order to use them in conjunction with a state-of-the-art feature encoding, we compact the above color extensions using PCA and we compare two alternatives for doing this. Concerning the learning stage of concept detection, we perform a comparative study and propose an improved way of employing stacked models, which capture concept correlations, using multilabel classification algorithms in the last layer of the stack. We examine and compare the effectiveness of the above algorithms in both semantic video indexing within a large video collection and in the somewhat different problem of individual video annotation with semantic concepts, on the extensive video data set of the 2013 TRECVID Semantic Indexing Task. Several conclusions are drawn from these experiments on how to improve the video semantic concept detection.
BibTeX:
@article{markatopoulou2015local,
  author = {Markatopoulou, Foteini and Mezaris, Vasileios and Pittaras, Nikiforos and Patras, Ioannis},
  title = {Local Features and a Two-layer Stacking Architecture for Semantic Concept Detection in Video},
  journal = {Emerging Topics in Computing, IEEE Transactions on},
  publisher = {IEEE},
  year = {2015},
  volume = {3},
  number = {2},
  pages = {193--204},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7073626},
  doi = {10.1109/TETC.2015.2418714}
}
Sariyanidi E, Gunes H and Cavallaro A (2015), "Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition", Pattern Analysis and Machine Intelligence, IEEE Transactions on. June, 2015. Vol. 37(6), pp. 1113-1133. IEEE.
Abstract: Automatic affect analysis has attracted great interest in various contexts including the recognition of action units and basic or non-basic emotions. In spite of major efforts, there are several open questions on what the important cues to interpret facial expressions are and how to encode them. In this paper, we review the progress across a range of affect recognition applications to shed light on these fundamental questions. We analyse the state-of-the-art solutions by decomposing their pipelines into fundamental components, namely face registration, representation, dimensionality reduction and recognition. We discuss the role of these components and highlight the models and new trends that are followed in their design. Moreover, we provide a comprehensive analysis of facial representations by uncovering their advantages and limitations; we elaborate on the type of information they encode and discuss how they deal with the key challenges of illumination variations, registration errors, head-pose variations, occlusions, and identity bias. This survey allows us to identify open issues and to define future directions for designing real-world affect recognition systems.
BibTeX:
@article{sariyanidi2015automatic,
  author = {Sariyanidi, Evangelos and Gunes, Hatice and Cavallaro, Andrea},
  title = {Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition},
  journal = {Pattern Analysis and Machine Intelligence, IEEE Transactions on},
  publisher = {IEEE},
  year = {2015},
  volume = {37},
  number = {6},
  pages = {1113--1133},
  url = {http://www.researchgate.net/profile/Hatice_Gunes2/publication/267502070_Automatic_Analysis_of_Facial_Affect_A_Survey_of_Registration_Representation_and_Recognition/links/555349bb08aeaaff3bf0b945.pdf},
  doi = {10.1109/TPAMI.2014.2366127}
}
Yang H, He X, Jia X and Patras I (2015), "Robust Face Alignment Under Occlusion via Regional Predictive Power Estimation", Image Processing, IEEE Transactions on. Aug, 2015. Vol. 24(8), pp. 2393-2403. IEEE.
Abstract: Face alignment has been well studied in recent years, however, when a face alignment model is applied on facial images with heavy partial occlusion, the performance deteriorates significantly. In this paper, instead of training an occlusion-aware model with visibility annotation, we address this issue via a model adaptation scheme that uses the result of a local regression forest (RF) voting method. In the proposed scheme, the consistency of the votes of the local RF in each of several oversegmented regions is used to determine the reliability of predicting the location of the facial landmarks. The latter is what we call regional predictive power (RPP). Subsequently, we adapt a holistic voting method (cascaded pose regression based on random ferns) by putting weights on the votes of each fern according to the RPP of the regions used in the fern tests. The proposed method shows superior performance over existing face alignment models in the most challenging data sets (COFW and 300-W). Moreover, it can also estimate with high accuracy (72.4% overlap ratio) which image areas belong to the face or nonface objects, on the heavily occluded images of the COFW data set, without explicit occlusion modeling.
BibTeX:
@article{yang2015robust,
  author = {Yang, Heng and He, Xuming and Jia, Xuhui and Patras, Ioannis},
  title = {Robust Face Alignment Under Occlusion via Regional Predictive Power Estimation},
  journal = {Image Processing, IEEE Transactions on},
  publisher = {IEEE},
  year = {2015},
  volume = {24},
  number = {8},
  pages = {2393--2403},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7084187},
  doi = {10.1109/TIP.2015.2421438}
}
Yang H, Jia X, Patras I and Chan K-P (2015), "Random Subspace Supervised Descent Method for Regression Problems in Computer Vision", Signal Processing Letters, IEEE. Oct, 2015. Vol. 22(10), pp. 1816-1820. IEEE.
Abstract: Supervised Descent Method (SDM) has shown good performance in solving non-linear least squares problems in computer vision, giving state of the art results for the problem of face alignment. However, when SDM learns the generic descent maps, it is very difficult to avoid over-fitting due to the high dimensionality of the input features. In this paper we propose a Random Subspace SDM (RSSDM) that maintains the high accuracy on the training data and improves the generalization accuracy. Instead of using all the features for descent learning at each iteration, we randomly select sub-sets of the features and learn an ensemble of descent maps in the corresponding subspaces, one in each subspace. Then, we average the ensemble of descents to calculate the update of the iteration. We test the proposed methods on two representative regression problems, namely, 3D pose estimation and face alignment and show that RSSDM consistently outperforms SDM in both tasks in terms of accuracy (e.g. RSSDM is able to localize 4% more landmarks at error level of 0.1 on the challenging iBug dataset). RSSDM also holds several useful generalization properties: 1) it is more effective when the number of training samples is small-with 3 Monte-Carlo permutations RSSDM can achieve similar performance to SDM with 9 Monte-Carlo permutations; 2) it is less sensitive to the changes of the strength of the regularization-when the regularization parameter is changed to 10 times larger, the mean error increases 9.0% for SDM vs. 3.4% for RSSDM.
BibTeX:
@article{yang2015random,
  author = {Yang, Heng and Jia, Xuhui and Patras, Ioannis and Chan, Kwok-Ping},
  title = {Random Subspace Supervised Descent Method for Regression Problems in Computer Vision},
  journal = {Signal Processing Letters, IEEE},
  publisher = {IEEE},
  year = {2015},
  volume = {22},
  number = {10},
  pages = {1816--1820},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/SPL2.pdf},
  doi = {10.1109/LSP.2015.2437883}
}
Yang H and Patras I (2015), "Fine-Tuning Regression Forests Votes for Object Alignment in the Wild", Image Processing, IEEE Transactions on. February, 2015. Vol. 24(2), pp. 619-631. IEEE.
Abstract: In this paper, we propose a object alignment method that detects the landmarks of an object in 2D images. In the regression forests (RFs) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several landmarks. We propose to refine the votes before accumulating them into the Hough space, by sieving and/or aggregating. In order to filter out false positive votes, we pass them through several sieves, each associated with a discrete or continuous latent variable. The sieves filter out votes that are not consistent with the latent variable in question, something that implicitly enforces global constraints. In order to aggregate the votes when necessary, we adjusts on-the-fly a proximity threshold by applying a classifier on middle-level features extracted from voting maps for the object landmark in question. Moreover, our method is able to predict the unreliability of an individual object landmark. This information can be useful for subsequent object analysis like object recognition. Our contributions are validated for two object alignment tasks, face alignment and car alignment, on data sets with challenging images collected in the wild, i.e. the Labeled Face in the Wild, the Annotated Facial Landmarks in the Wild, and the street scene car data set. We show that with the proposed approach, and without explicitly introducing shape models, we obtain performance superior or close to the state of the art for both tasks.
BibTeX:
@article{yang2015fine,
  author = {Yang, Heng and Patras, Ioannis},
  title = {Fine-Tuning Regression Forests Votes for Object Alignment in the Wild},
  journal = {Image Processing, IEEE Transactions on},
  publisher = {IEEE},
  year = {2015},
  volume = {24},
  number = {2},
  pages = {619--631},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6987359},
  doi = {10.1109/TIP.2014.2383325}
}
Yang H and Patras I (2015), "Privileged Information-based Conditional Structured Output Regression Forest for Facial Point Detection", Circuits and Systems for Video Technology, IEEE Transactions on. September, 2015. Vol. 25(9), pp. 1507-1520. IEEE.
Abstract: This paper introduces a regression method, called Privileged Information-based Conditional Structured Output Regression Forest (PI-CSORF) for facial point detection. In order to train Regression Forest more efficiently, the method utilizes both privileged information, that is side information that is available only during training, such as head pose or gender, and shape constraints on the location of the facial points. We propose to select the test functions at some randomly chosen internal tree nodes according to the information gain calculated on the privileged information. In this way, the training patches arrive at leaves tend to have low variance both in terms of their displacements in relation to the facial points and in terms of the privileged information. At each leaf node, we learn three models: first, a probabilistic model of the pdf of the privileged information; second, a probabilistic regression model for the locations of the facial points; and third, shape models that model the interdependencies of the locations of neighbouring facial points in a predefined structure graph. Both of the latter two are conditioned on the privileged information. During testing, the marginal probability of the privileged information is estimated and the facial point locations are estimated using the appropriate conditional regression and shape models. The proposed method is validated and compared with very recent methods, especially that use Regression Forests, on datasets recorded in controlled and uncontrolled environments, namely the BioID, the Labelled Faces in the Wild, the Labelled Face Parts in the Wild and the Annotated Facial Landmarks in the Wild.
BibTeX:
@article{yangcsvt2015,
  author = {Yang, Heng and Patras, Ioannis},
  title = {Privileged Information-based Conditional Structured Output Regression Forest for Facial Point Detection},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2015},
  volume = {25},
  number = {9},
  pages = {1507--1520},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7005459},
  doi = {10.1109/TCSVT.2015.2389492}
}
Yang H, Zou C and Patras I (2015), "Cascade of forests for face alignment", Computer Vision, IET. May, 2015. Vol. 9(3), pp. 321-330. IET.
Abstract: In this study, we propose a regression forests-based cascaded method for face alignment. We build on the cascaded pose regression (CPR) framework and propose to use the regression forest as a primitive regressor. The regression forests are easier to train and naturally handle the over-fitting problem via averaging the outputs of the trees at each stage. We address the fact that the CPR approaches are sensitive to the shape initialisation; in contrast to using a number of blind initialisations and selecting the median values, we propose an intelligent shape initialisation scheme. More specifically, a large number of initialisations are propagated to a few early stages in the cascade, then only a proportion of them are propagated to the remaining cascades according to their convergence measurement. We evaluate the performance of the proposed approach on the challenging face alignment in the wild database and obtain superior or comparable performance with the state-of-the-art, in spite of the fact that we have utilised only the freely available public training images. More importantly, we show that the intelligent initialisation scheme makes the CPR framework more robust to unreliable initialisations that are typically produced by different face detections.
BibTeX:
@article{yang2015cascade,
  author = {Yang, Heng and Zou, Changqing and Patras, Ioannis},
  title = {Cascade of forests for face alignment},
  journal = {Computer Vision, IET},
  publisher = {IET},
  year = {2015},
  volume = {9},
  number = {3},
  pages = {321--330},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/cascadeofforest.pdf},
  doi = {10.1049/iet-cvi.2014.0085}
}

Books and Chapters in Books

Sariyanidi E, Gunes H and Cavallaro A (2015), "Probabilistic Subpixel Temporal Registration for Facial Expression Analysis", In Computer Vision (ACCV 2014). Singapore, November, 2015. Vol. 4, pp. 320-335. Springer.
Abstract: Face images in a video sequence should be registered accurately before any analysis, otherwise registration errors may be interpreted as facial activity. Subpixel accuracy is crucial for the analysis of subtle actions. In this paper we present PSTR (Probabilistic Subpixel Temporal Registration), a framework that achieves high registration accuracy. Inspired by the human vision system, we develop a motion representation that measures registration errors among subsequent frames, a probabilistic model that learns the registration errors from the proposed motion representation, and an iterative registration scheme that identifies registration failures thus making PSTR aware of its errors. We evaluate PSTR's temporal registration accuracy on facial action and expression datasets, and demonstrate its ability to generalise to naturalistic data even when trained with controlled data.
BibTeX:
@incollection{sariyanidi2015probabilistic,
  author = {Sariyanidi, Evangelos and Gunes, Hatice and Cavallaro, Andrea},
  editor = {Cremers, Daniel and Reid, Ian and Saito, Hideo and Yang, Ming-Hsuan},
  title = {Probabilistic Subpixel Temporal Registration for Facial Expression Analysis},
  booktitle = {Computer Vision (ACCV 2014)},
  publisher = {Springer},
  year = {2015},
  volume = {4},
  pages = {320--335},
  note = {google scholar entry: 12th Asian Conference on Computer Vision (ACCV 2014). Singapore, 1-5 November 2014.},
  url = {http://www.eecs.qmul.ac.uk/~hatice/SariyanidiEtAl-ACCV2014.pdf},
  doi = {10.1007/978-3-319-16817-3_21}
}

Conference Papers

Blasi SG, Macchiavello B, Hung EM, Zupancic I and Izquierdo E (2015), "Context adaptive mode sorting for fast HEVC mode decision", In Image Processing (ICIP), 2015 20th IEEE International Conference on. Québec City, Quebec, September, 2015. IEEE.
BibTeX:
@inproceedings{blasi2015context,
  author = {Blasi, Saverio G. and Macchiavello, B. and Hung, E. M. and Zupancic, Ivan and Izquierdo, Ebroul},
  title = {Context adaptive mode sorting for fast HEVC mode decision},
  booktitle = {Image Processing (ICIP), 2015 20th IEEE International Conference on},
  publisher = {IEEE},
  year = {2015},
  note = {google scholar entry: 22nd International Conference on Image Processing (ICIP 2015). Qu�bec City, Quebec, 27-30 September 2015.}
}
Blasi SG, Zupancic I, Izquierdo E and Peixoto E (2015), "Adaptive Precision Motion Estimation for HEVC Coding", In Proceedings of the 2015 Picture Coding Symposium (PCS). Cairns, Australia, May, 2015, pp. 144-148. IEEE.
Abstract: Most video coding standards, including the state-of-the-art High Efficiency Video Coding (HEVC), make use of sub-pixel Motion Estimation (ME) with Motion Vectors (MV) at fractional precisions to achieve high compression ratios. Unfortunately, sub-pixel ME comes at very high computational costs due to the interpolation step and additional motion searches. In this paper, a fast sub-pixel ME algorithm is proposed. The MV precision is adaptively selected on each block to skip the half or quarter precision steps when not needed. The algorithm bases the decision on local features, such as the behaviour of the residual error samples, and global features, such as the amount of edges in the pictures. Experimental results show that the method reduces total encoding time by up to 17.6% compared to conventional HEVC, at modest efficiency losses.
BibTeX:
@inproceedings{blasi2015adaptive,
  author = {Blasi, Saverio G. and Zupancic, Ivan and Izquierdo, Ebroul and Peixoto, Eduardo},
  title = {Adaptive Precision Motion Estimation for HEVC Coding},
  booktitle = {Proceedings of the 2015 Picture Coding Symposium (PCS)},
  publisher = {IEEE},
  year = {2015},
  pages = {144--148},
  note = {google scholar entry: 2015 Picture Coding Symposium (PCS). Cairns, Australia, 31 May - 3 June 2015.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/blasi2015adaptive.pdf},
  doi = {10.1109/PCS.2015.7170064}
}
Blasi SG, Zupancic I, Izquierdo E and Peixoto E (2015), "Fast HEVC Coding using Reverse CU Visiting", In Proceedings of the 2015 Picture Coding Symposium (PCS). Cairns, Australia, May, 2015, pp. 50-54. IEEE.
Abstract: The High Efficiency Video Coding (HEVC) standard makes use of flexible partitioning to achieve high compression ratios. Each frame is divided in Coding Tree Units (CTUs) of fixed size, which are further partitioned into Coding Units (CUs) following a recursive quadtree structure. Typically, CUs at each level of recursion are tested to select the optimal coding configuration, hence this process is extremely demanding in terms of computational complexity. In this paper, a method to reduce complexity of HEVC quadtree configuration selection and mode decision is presented, based on a reverse bottom-to-top visiting order of CUs in the quadtree. By visiting smallest CUs first, information can be extracted to make decisions on larger CUs. The encoder adaptively selects whether a CTU is encoded using the reverse CU visiting, allowing for considerably faster encoding under all conditions. Experimental results show that the algorithm achieves on average 21% speedups over previous state-of-the-art fast HEVC algorithms, and up to 36% for some sequences, at very limited efficiency losses.
BibTeX:
@inproceedings{blasi2015fast,
  author = {Blasi, Saverio G. and Zupancic, Ivan and Izquierdo, Ebroul and Peixoto, Eduardo},
  title = {Fast HEVC Coding using Reverse CU Visiting},
  booktitle = {Proceedings of the 2015 Picture Coding Symposium (PCS)},
  publisher = {IEEE},
  year = {2015},
  pages = {50--54},
  note = {google scholar entry: 2015 Picture Coding Symposium (PCS). Cairns, Australia, 31 May - 3 June 2015.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/blasi2015fast},
  doi = {10.1109/PCS.2015.7170045}
}
Bozas K and Izquierdo E (2015), "Horizontal Flip-Invariant Sketch Recognition via Local Patch Hashing", In Acoustics Speech and Signal Processing (ICASSP 2015), Proceedings of the 2015 IEEE International Conference on. Brisbane (Queensland), Australia, April, 2015, pp. 1146-1150. IEEE.
Abstract: This paper introduces a flip aware patch matching frame-work that facilitates scalable sketch recognition. An overlapping spatial grid is utilized to generate an ensemble of patches for each sketch. We rank similarities between freely drawn sketches via a spatial voting process where similar patches in terms of shape and structure arbitrate for the result. Patch similarity is efficiently estimated via the min-hash algorithm. A novel spatial aware reverse index structure ensures the scalability of our scheme. We show the benefits of horizontal flip invariance and structural information in sketch recognition and demonstrate state-of-the-art results in two challenging sketch datasets.
BibTeX:
@inproceedings{bozas2015horizontal,
  author = {Bozas, Konstantinos and Izquierdo, Ebroul},
  title = {Horizontal Flip-Invariant Sketch Recognition via Local Patch Hashing},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2015), Proceedings of the 2015 IEEE International Conference on},
  publisher = {IEEE},
  year = {2015},
  pages = {1146--1150},
  note = {google scholar entry: 2015 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010). Brisbane (Queensland), Australia, 19-24 April 2015.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7178149},
  doi = {10.1109/ICASSP.2015.7178149}
}
Henderson C and Izquierdo E (2015), "Robust Feature Matching in the Wild", In Proceeding of the 2015 Science and Information Conference (SAI). London, England, July, 2015. (427), pp. 628-637. IEEE.
Abstract: Finding corresponding key points in images from security camera videos is challenging. Images are generally low quality and acquired in uncontrolled conditions with visual distortions caused by weather, crowded scenes, emergency lighting or the high angle of the camera mounting. We describe a methodology to match features between images that performs especially well with real-world images. We introduce a novel blur sensitive feature detection method, a combinatorial feature descriptor and a distance calculation that efficiently unites texture and colour attributes to discriminate feature correspondence in low quality images. Our methods are tested by performing key point matching on real-world security images such as outdoor CCTV videos, and we demonstrate an improvement in the ability to match features between images compared with the standard feature descriptors extracted from the same set of feature points. We use key point features from Harris Corners, SIFT, SURF, BRISK and FAST as well as MSER and MSCR region detectors to provide a comprehensive analysis of our generic method. We demonstrate feature matching using a 138-dimensional descriptor that improves the matching performance of a state-of-the-art 384-dimension colour descriptor with just 40% of the storage requirements.
BibTeX:
@inproceedings{henderson2015robust,
  author = {Henderson, Craig and Izquierdo, Ebroul},
  title = {Robust Feature Matching in the Wild},
  booktitle = {Proceeding of the 2015 Science and Information Conference (SAI)},
  publisher = {IEEE},
  year = {2015},
  number = {427},
  pages = {628--637},
  note = {google scholar entry: 2015 Science and Information Conference (SAI). London, England, 28-30 July 2015},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Henderson2015robust(SAI2015).pdf},
  doi = {10.1109/SAI.2015.7237208}
}
Huang S, Izquierdo E and Hao P (2015), "Optimized Packet Scheduling for Live Streaming on Peer-to-Peer Network with Network Coding", In Communications (ICC), 2014 IEEE International Conference on. London, England, June, 2015.
Abstract: This paper proposed an optimized packet scheduling algorithm for live peer-to-peer streaming system, where network coding technique is extended to improve the efficiency in bandwidth utilization. We identify a problem of undesirable non-innovative packet transmission due to the latency of buffer-map update among peers, which, with many previous proposed techniques, yields to bandwidth inefficiencies. In the proposed system, we propose an optimized packet scheduling algorithm for forecasting the number of required packets at parent nodes in advance, a selection mechanism for selecting and encoding packets for forwarding, and an adaptive push algorithm for smart frame skipping. The proposed optimized packet scheduling algorithm calculates the minimum number of required packets for full rate transmission. Transmitting as the scheduled results can reduce non-innovative packet transmission, save limited bandwidth for innovative transmission, thereby improving the streaming continuity. The simulation results show that the proposed scheme provides significantly better video quality, delivery ratio, lower redundancy rate, and higher innovative video packet rate compared with previous packet scheduling algorithms.
BibTeX:
@inproceedings{Huang2015,
  author = {Huang, Shenglan and Izquierdo, Ebroul and Hao, Pengwei},
  title = {Optimized Packet Scheduling for Live Streaming on Peer-to-Peer Network with Network Coding},
  booktitle = {Communications (ICC), 2014 IEEE International Conference on},
  year = {2015},
  note = {google scholar entry: IEEE International Conference on Communications (ICC 2014). London, England, 8-12 June 2014.}
}
Kalpakis G, Tsikrika T, Markatopoulou F, Pittaras N, Vrochidis S, Mezaris V, Patras I and Kompatsiaris I (2015), "Concept Detection on Multimedia Web Resources about Home Made Explosives", In Proceedings of the International Workshop on Multimedia Forensics and Security (MFSec 2015). August, 2015.
Abstract: This work investigates the effectiveness of a stateof- the-art concept detection framework for the automatic classification of multimedia content, namely images and videos, embedded in publicly available Web resources containing recipes for the synthesis of Home Made Explosives (HMEs), to a set of predefined semantic concepts relevant to the HME domain. The concept detection framework employs advanced methods for video (shot) segmentation, visual feature extraction (using SIFT, SURF, and their variations), and classification based on machine learning techniques (logistic regression). The evaluation experiments are performed using an annotated collection of multimedia HME content discovered on the Web, and a set of concepts, which emerged both from an empirical study, and were also provided by domain experts and interested stakeholders, including Law Enforcement Agencies personnel. The experiments demonstrate the satisfactory performance of our framework, which in turn indicates the significant potential of the adopted approaches on the HME domain.
BibTeX:
@inproceedings{kalpakis2015concept,
  author = {Kalpakis, George and Tsikrika, Theodora and Markatopoulou, Foteini and Pittaras, Nikiforos and Vrochidis, Stefanos and Mezaris, Vasileios and Patras, Ioannis and Kompatsiaris, Ioannis},
  title = {Concept Detection on Multimedia Web Resources about Home Made Explosives},
  booktitle = {Proceedings of the International Workshop on Multimedia Forensics and Security (MFSec 2015)},
  year = {2015},
  note = {google scholar entry: 2015 International Workshop on Multimedia Forensics and Security (MFSec 2015). Toulouse, France, 24-28 August 2015.},
  url = {http://mklab2.iti.gr/content/concept-detection-multimedia-web-resources-about-home-made-explosives}
}
Markatopoulou F, Mezaris V and Patras I (2015), "Cascade of classifiers based on Binary, Non-binary and Deep Convolutional Network descriptors for video concept detection", In Image Processing (ICIP), 2015 IEEE International Conference on. September, 2015. IEEE.
Abstract: In this paper we propose a cascade architecture that can be used to train and combine different visual descriptors (local binary, local non-binary and Deep Convolutional Neural Network-based) for video concept detection. The proposed architecture is computationally more efficient than typical state-of-the-art video concept detection systems, without affecting the detection accuracy. In addition, this work presents a detailed study on combining descriptors based on Deep Convolutional Neural Networks with other popular local descriptors, both within a cascade and when using different latefusion schemes. We evaluate our methods on the extensive video dataset of the 2013 TRECVID Semantic Indexing Task.
BibTeX:
@inproceedings{markatopoulou2015cascade,
  author = {Markatopoulou, Foteini and Mezaris, Vasileios and Patras, Ioannis},
  title = {Cascade of classifiers based on Binary, Non-binary and Deep Convolutional Network descriptors for video concept detection},
  booktitle = {Image Processing (ICIP), 2015 IEEE International Conference on},
  publisher = {IEEE},
  year = {2015},
  note = {google scholar entry: Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP 2015). Quebec City, Quebec, 27-30 September 2014.}
}
Markatopoulou F, Pittaras N, Papadopoulou O, Mezaris V and Patras I (2015), "A study on the use of a binary local descriptor and color extensions of local descriptors for video concept detection", In MultiMedia Modeling. Proceedings of the 21st International Conference (MMM 2015). Part I. January, 2015. Vol. 8935, pp. 282-293. Springer.
Abstract: In this work we deal with the problem of how different local descriptors can be extended, used and combined for improving the effectiveness of video concept detection. The main contributions of this work are: 1) We examine how effectively a binary local descriptor, namely ORB, which was originally proposed for similarity matching between local image patches, can be used in the task of video concept detection. 2) Based on a previously proposed paradigm for introducing color extensions of SIFT, we define in the same way color extensions for two other non-binary or binary local descriptors (SURF, ORB), and we experimentally show that this is a generally applicable paradigm. 3) In order to enable the efficient use and combination of these color extensions within a state-of-the-art concept detection methodology (VLAD), we study and compare two possible approaches for reducing the color descriptor’s dimensionality using PCA. We evaluate the proposed techniques on the dataset of the 2013 Semantic Indexing Task of TRECVID.
BibTeX:
@inproceedings{markatopoulou2015study,
  author = {Markatopoulou, Foteini and Pittaras, Nikiforos and Papadopoulou, Olga and Mezaris, Vasileios and Patras, Ioannis},
  editor = {Xiangjian He and Suhuai Luo and Dacheng Tao and Changsheng Xu and Jie Yang and Muhammad Abul Hasan},
  title = {A study on the use of a binary local descriptor and color extensions of local descriptors for video concept detection},
  booktitle = {MultiMedia Modeling. Proceedings of the 21st International Conference (MMM 2015). Part I},
  publisher = {Springer},
  year = {2015},
  volume = {8935},
  pages = {282--293},
  note = {google scholar entry: 21st International Multimedia Modeling Conference (MMM 2015). Sydney, Australia, 5-7 January 2015.},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-319-14445-0_25},
  doi = {10.1007/978-3-319-14445-0_25}
}
Naccari M, Gabriellini A, Mrak M, Blasi SG, Zupancic I and Izquierdo E (2015), "HEVC Coding Optimisation for Ultra High Definition Television Services", In Proceedings of the 2015 Picture Coding Symposium (PCS). Cairns, Australia, May, 2015, pp. 20-24. IEEE.
Abstract: Ultra High Definition TV (UHDTV) services are being trialled while UHD streaming services have already seen commercial deébuts. The amount of data associated with these new services is very high thus extremely efficient video compression tools are required for delivery to the end user. The recently published High Efficiency Video Coding (HEVC) standard promises a new level of compression efficiency, up to 50% better than its predecessor, Advanced Video Coding (AVC). The greater efficiency in HEVC is obtained at much greater computational cost compared to AVC. A practical encoder must optimise the choice of coding tools and devise strategies to reduce the complexity without affecting the compression efficiency. This paper describes the results of a study aimed at optimising HEVC encoding for UHDTV content. The study first reviews the available HEVC coding tools to identify the best configuration before developing three new algorithms to further reduce the computational cost. The proposed optimisations can provide an additional 11.5% encoder speed-up for an average 3.1% bitrate increase on top of the best encoder configuration.
BibTeX:
@inproceedings{naccari2015hevc,
  author = {Naccari, Matteo and Gabriellini, Andrea and Mrak, Marta and Blasi, Saverio G. and Zupancic, Ivan and Izquierdo, Ebroul},
  title = {HEVC Coding Optimisation for Ultra High Definition Television Services},
  booktitle = {Proceedings of the 2015 Picture Coding Symposium (PCS)},
  publisher = {IEEE},
  year = {2015},
  pages = {20--24},
  note = {google scholar entry: 2015 Picture Coding Symposium (PCS). Cairns, Australia, 31 May - 3 June 2015.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/naccari2015hevc.pdf},
  doi = {10.1109/PCS.2015.7170039}
}
Palasek P, Yang H, Xu Z, Hajimirza N, Izquierdo E and Patras I (2015), "A Flexible Calibration Method of Multiple Kinects for 3D Human Reconstruction", In 2015 IEEE International Conference on Multimedia and Expo (ICME 2015). Torino, Italy, June, 2015, pp. 1-4. IEEE.
Abstract: In this paper, we present a simple yet effective calibration method for multiple Kinects, i.e. a method that finds the relative position of RGB-depth cameras, as opposed to conventional methods that find the relative position of RGB cameras. We first find the mapping function between the RGB camera and the depth camera mounted on one Kinect. With such a mapping function, we propose a scheme that is able to estimate the 3D coordinates of the extracted corners from a standard calibration chessboard. To this end, we are able to build the 3D correspondences between two Kinects directly. This simplifies the calibration to a simple Least-Square Minimization problem with very stable solution. Furthermore, by using two mirrored chessboard images on a thin board, we are able to calibrate two Kinects facing each other, something that is intractable using traditional calibration methods. We demonstrate our proposed method with real data and show very accurate calibration results, namely less than 7mm reconstruction error for objects at a distance of 1.5m, using around 7 frames for calibration.
BibTeX:
@inproceedings{palasek2015flexible,
  author = {Palasek, Petar and Yang, Heng and Xu, Zongyi and Hajimirza, Navid and Izquierdo, Ebroul and Patras, Ioannis},
  title = {A Flexible Calibration Method of Multiple Kinects for 3D Human Reconstruction},
  booktitle = {2015 IEEE International Conference on Multimedia and Expo (ICME 2015)},
  publisher = {IEEE},
  year = {2015},
  pages = {1--4},
  note = {google scholar entry: IEEE International Conference on Multimedia and Expo (ICME 2015). Torino, Italy, 29 June - 3 July 2015.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICME2015_Paper238EU.pdf},
  doi = {10.1109/ICMEW.2015.7169829}
}
Rivera FM, Fuijk F and Izquierdo E (2015), "Navigation in REVERIE's virtual environments", In Proceedings of the 2015 IEEE Virtual Reality Conference (VR 2015). March, 2015, pp. 273-274. IEEE.
Abstract: This work presents a novel navigation system for social collaborative virtual environments populated with multiple characters. The navigation system ensures collision free movement of avatars and agents. It supports direct user manipulation, automated path planning, positioning to get seated, and follow-me behaviour for groups. In follow-me mode, the socially aware system manages the mise en place of individuals within a group. A use case centred around on an educational virtual trip to the European Parliament created for the REVERIE FP7 project, also serves as an example to bring forward aspects of such navigational requirements.
BibTeX:
@inproceedings{rivera2015navigation,
  author = {Rivera, Fiona M. and Fuijk, Fons and Izquierdo, Ebroul},
  editor = {Tobias Höllerer and Victoria Interrante and Anatole Lécuyer and J. Edward Swan II},
  title = {Navigation in REVERIE's virtual environments},
  booktitle = {Proceedings of the 2015 IEEE Virtual Reality Conference (VR 2015)},
  publisher = {IEEE},
  year = {2015},
  pages = {273--274},
  note = {google scholar entry: 2015 IEEE Virtual Reality (VR 2015). Arles, France, 23-27 March 2015.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7223401},
  doi = {10.1109/VR.2015.7223401}
}
Yang H and Patras I (2015), "Mirror, Mirror on the Wall, Tell Me, Is the Error Small?", In The Conference on Computer Vision and Pattern Recognition (CVPR 2015). Boston, Massachusetts, June, 2015, pp. 4685-4693.
Abstract: Do object part localization methods produce bilaterally symmetric results on mirror images? Surprisingly not, even though state of the art methods augment the training set with mirrored images. In this paper we take a closer look into this issue. We first introduce the concept of mirrorability as the ability of a model to produce symmetric results in mirrored images and introduce a corresponding measure, namely the mirror error that is defined as the difference between the detection result on an image and the mirror of the detection result on its mirror image. We evaluate the mirrorability of several state of the art algorithms in two of the most intensively studied problems, namely human pose estimation and face alignment. Our experiments lead to several interesting findings: 1) Most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0.7). Since the mirror error is calculated without knowledge of the ground truth, we show two interesting applications in the first it is used to guide the selection of difficult samples and in the second to give feedback in a popular Cascaded Pose Regression method for face alignment.
BibTeX:
@inproceedings{yangcvpr2015,
  author = {Yang, Heng and Patras, Ioannis},
  title = {Mirror, Mirror on the Wall, Tell Me, Is the Error Small?},
  booktitle = {The Conference on Computer Vision and Pattern Recognition (CVPR 2015)},
  year = {2015},
  pages = {4685--4693},
  note = {google scholar entry: 2015 Conference on Computer Vision and Pattern Recognition (CVPR 2014). Boston, Massachusetts, 7-12 June 2015.},
  url = {http://www.cv-foundation.org/openaccess/CVPR2015.py}
}
Zupancic I, Blasi SG and Izquierdo E (2015), "Inter-Prediction Optimisations for Fast HEVC Encoding of Ultra High Definition Content", In Proceedings of the 22nd International Workshop on Systems, Signals and Image Processing (IWSSIP 2015). London, England, September, 2015.
BibTeX:
@inproceedings{wang2003high,
  author = {Zupancic, Ivan and Blasi, Saverio G. and Izquierdo, Ebroul},
  title = {Inter-Prediction Optimisations for Fast HEVC Encoding of Ultra High Definition Content},
  booktitle = {Proceedings of the 22nd International Workshop on Systems, Signals and Image Processing (IWSSIP 2015)},
  year = {2015},
  note = {google scholar entry: 22nd International Workshop on Systems, Signals and Image Processing (IWSSIP 2015). London, England, 10-12 September 2015.}
}
Zupancic I, Blasi SG and Izquierdo E (2015), "Multiple Early Termination for Fast HEVC Coding of UHD Content", In Acoustics, Speech and Signal Processing (ICASSP 2015), Proceedings of the 2015 IEEE International Conference on. Brisbane, Australia, April, 2015, pp. 1419-1423. IEEE.
Abstract: The recently ratified High Efficiency Video Coding (HEVC) standard is significantly outperforming previous video coding standards in terms of compression efficiency. However, this comes at the cost of very high computational complexity, which may limit its real-time usage, particularly when targeting Ultra High Definition (UHD) applications. In this paper, an analysis of HEVC coding on UHD content is presented, showing that on average more than 18% of the total encoding time is spent performing uni-directional Motion Estimation (ME) even when using fast algorithms such as Enhanced Predictive Zonal Search (EPZS). In order to speed up the ME process, a novel approach for fast inter prediction is proposed in this paper based on a Multiple Early Termination (MET) decision process. EPZS is only performed in blocks in which it is needed based on local features of the encoded content, or it is skipped otherwise. Experimental results show that the algorithm achieves on average 9.3% speed-ups over conventional HEVC, at the cost of very small BD-rate losses.
BibTeX:
@inproceedings{Zupancic2015,
  author = {Zupancic, Ivan and Blasi, Saverio G. and Izquierdo, Ebroul},
  title = {Multiple Early Termination for Fast HEVC Coding of UHD Content},
  booktitle = {Acoustics, Speech and Signal Processing (ICASSP 2015), Proceedings of the 2015 IEEE International Conference on},
  publisher = {IEEE},
  year = {2015},
  pages = {1419--1423},
  note = {google scholar entry: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015). Brisbane, Australia, 19-24 April 2015.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/zupancic2015multiple.pdf},
  doi = {10.1109/ICASSP.2015.7178204}
}
Zupancic I, Blasi SG and Izquierdo E (2015), "Multiple Early Termination for Fast HEVC Coding of UHD Content", In Acoustics, Speech and Signal Processing (ICASSP 2015), Proceedings of the 2015 IEEE International Conference on. Brisbane, Australia, April, 2015, pp. 1419-1423. IEEE.
Abstract: The recently ratified High Efficiency Video Coding (HEVC) standard is significantly outperforming previous video coding standards in terms of compression efficiency. However, this comes at the cost of very high computational complexity, which may limit its real-time usage, particularly when targeting Ultra High Definition (UHD) applications. In this paper, an analysis of HEVC coding on UHD content is presented, showing that on average more than 18% of the total encoding time is spent performing uni-directional Motion Estimation (ME) even when using fast algorithms such as Enhanced Predictive Zonal Search (EPZS). In order to speed up the ME process, a novel approach for fast inter prediction is proposed in this paper based on a Multiple Early Termination (MET) decision process. EPZS is only performed in blocks in which it is needed based on local features of the encoded content, or it is skipped otherwise. Experimental results show that the algorithm achieves on average 9.3% speed-ups over conventional HEVC, at the cost of very small BD-rate losses.
BibTeX:
@inproceedings{zupancic2015multiple,
  author = {Zupancic, Ivan and Blasi, Saverio G. and Izquierdo, Ebroul},
  title = {Multiple Early Termination for Fast HEVC Coding of UHD Content},
  booktitle = {Acoustics, Speech and Signal Processing (ICASSP 2015), Proceedings of the 2015 IEEE International Conference on},
  publisher = {IEEE},
  year = {2015},
  pages = {1419--1423},
  note = {google scholar entry: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015). Brisbane, Australia, 19-24 April 2015.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/zupancic2015multiple.pdf},
  doi = {10.1109/ICASSP.2015.7178204}
}

Presentations, Posters and Technical Reports

Henderson C and Izquierdo E (2015), "Look this way". May, 2015.
Abstract: We observe that invariance to horizontal image orientation -- reflection invariance -- has not received any attention in contemporary research, and suggest it should be an important metric in measuring the success of computer vision algorithms and applications.
BibTeX:
@misc{henderson2015look,
  author = {Henderson, Craig and Izquierdo, Ebroul},
  title = {Look this way},
  booktitle = {2015 Research Showcase},
  publisher = {Queen Mary University of London},
  year = {2015},
  note = {Poster},
  url = {http://www.eecs.qmul.ac.uk/assets/file/uploads/event-download-file/posterEECSResearchShowcase_CesarPantoja.pdf}
}
Huang S, Izquierdo E and Hao P (2015), "Optimized Live Streaming on P2P Network with Network Coding".
BibTeX:
@misc{huang2015optimized,
  author = {Huang, Shenglan and Izquierdo, Ebroul and Hao, Pengwei},
  title = {Optimized Live Streaming on P2P Network with Network Coding},
  publisher = {QMUL},
  year = {2015},
  note = {Poster},
  url = {http://www.eecs.qmul.ac.uk/assets/file/uploads/event-download-file/posterEECSResearchShowcase_ShenglanHuang.pdf}
}
Pantoja C (2015), "Knowledge Discovery and Representation for Large Scale Exploitation of Forensic Data". May, 2015.
BibTeX:
@misc{pantoja2013knowledge,
  author = {Pantoja, Cesar},
  title = {Knowledge Discovery and Representation for Large Scale Exploitation of Forensic Data},
  booktitle = {2015 Research Showcase},
  publisher = {Queen Mary University of London},
  year = {2015},
  note = {Poster},
  url = {http://www.eecs.qmul.ac.uk/assets/file/uploads/event-download-file/posterEECSResearchShowcase_CesarPantoja.pdf}
}

Theses and Monographs

Sobhani F, Kahar NF and Zhang Q (2015), "An Ontology Framework for Automated Visual Surveillance System", In Content-Based Multimedia Indexing (CBMI 2015), 13th International Workshop on. Prague, Czech Republic, June, 2015, pp. 1-7. IEEE.
Abstract: This paper presents analysis and development of a forensic domain ontology to support an automated visual surveillance system. The proposed domain ontology is built on a specific use case based on the severe riots that swept across major UK cities with devastating effects during the summer 2011. The proposed ontology aims at facilitating the description of activities, entities, relationships, resources and consequences of the event. The study exploits 3.07 TB data provided by the Londons Metropolitan Police (Scotland Yard) as a part of European LASIE project1. The data has been analyzed and used to guarantee adherence to a real-world application scenario. A `top-down development' approach to the ontology design has been taken. The ontology is also used to demonstrate how high level reasoning can be incorporated into an automatop-ted forensic system. Thus, the designed ontology is also the base for future development of knowledge inference as response to domain specific queries.
BibTeX:
@proceedings{sobhani2015ontology,
  author = {Sobhani, Faranak and Kahar, Nur Farhan and Zhang, Qianni},
  title = {An Ontology Framework for Automated Visual Surveillance System},
  booktitle = {Content-Based Multimedia Indexing (CBMI 2015), 13th International Workshop on},
  publisher = {IEEE},
  year = {2015},
  pages = {1--7},
  note = {google scholar entry: 13th International Workshop on Content-Based Multimedia Indexing (CBMI 2015). Prague, Czech Republic, 10-12 June 2015.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7153628},
  doi = {10.1109/CBMI.2015.7153628}
}
Henderson C, Blasi SG, Sobhani F and Beckley R (2015), "On the impurity of street-scene video footage", In Proceedings of the 6th International Conference on Imaging for Crime Detection and Prevention (ICDP-15). London, 07, 2015.
Abstract: The Metropolitan Police in London have found that the opportunity to use computer vision technology in the analysis of real-world street-scene video is severely limited because of the practical constraints in the variety and poor quality of videos available to them. Consequently, in a large criminal investigation, police forces employ numerous officers and volunteers to watch many hours of camera footage to locate, identify and trace the movements of suspects, victims, witnesses, luggage and other inanimate objects. Their goal is to piece together a story of events leading up to an incident, and to determine what happened afterwards. In this paper, we present the technical challenges facing researchers in developing computer vision technique to process from the wild street-scene videos.
BibTeX:
@techreport{henderson2015impurity,
  author = {Henderson, Craig and Blasi, Saverio G. and Sobhani, Faranak and Richard Beckley},
  title = {On the impurity of street-scene video footage},
  booktitle = {Proceedings of the 6th International Conference on Imaging for Crime Detection and Prevention (ICDP-15)},
  year = {2015},
  note = {google scholar entry: 6th International Conference on Imaging for Crime Detection and Prevention (ICDP-15). London, England, 15-17 July 2015.}
}


2014

Journal Papers

Blasi SG, Mrak M and Izquierdo E (2014), "Frequency Domain Intra-Prediction Analysis and Processing for High Quality Video Coding", Circuits and Systems for Video Technology, IEEE Transactions on. May, 2014. Vol. 25(5), pp. 798-811. IEEE.
Abstract: Most of the advances in video coding technology focus on applications that require low bitrates, for example, for content distribution on a mass scale. For these applications, the performance of conventional coding methods is typically sufficient. Such schemes inevitably introduce large losses to the signal, which are unacceptable for numerous other professional applications such as capture, production, and archiving. To boost the performance of video codecs for high-quality content, better techniques are needed especially in the context of the prediction module. An analysis of conventional intra prediction methods used in the state-of-the-art High Efficiency Video Coding (HEVC) standard is reported in this paper, in terms of the prediction performance of such methods in the frequency domain. Appropriately modified encoder and decoder schemes are presented and used for this paper. The analysis shows that conventional intra prediction methods can be improved, especially for high frequency components of the signal which are typically difficult to predict. A novel approach to improve the efficiency of high-quality video coding is also presented in this paper based on such analysis. The modified encoder scheme allows for an additional stage of processing performed on the transformed prediction to replace selected frequency components of the signal with specifically defined synthetic content. The content is introduced in the signal using feature-dependent lookup tables. The approach is shown to achieve consistent gains against conventional HEVC with up to -5.2% coding gains in terms of bitrate savings.
BibTeX:
@article{blasi2014frequency,
  author = {Blasi, Saverio G. and Mrak, Marta and Izquierdo, Ebroul},
  title = {Frequency Domain Intra-Prediction Analysis and Processing for High Quality Video Coding},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2014},
  volume = {25},
  number = {5},
  pages = {798--811},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/blasi2014frequency.pdf},
  doi = {10.1109/TCSVT.2014.2359097}
}
Mekuria R, Sanna M, Izquierdo E, Bulterman DC and Cesar P (2014), "Enabling Geometry-Based 3-D Tele-Immersion With Fast Mesh Compression and Linear Rateless Coding", Multimedia, IEEE Transactions on. November, 2014. Vol. 16(7), pp. 1809-1820. IEEE.
Abstract: 3-D tele-immersion (3DTI) enables participants in remote locations to share, in real time, an activity. It offers users interactive and immersive experiences, but it challenges current media-streaming solutions. Work in the past has mainly focused on the efficient delivery of image-based 3-D videos and on realistic rendering and reconstruction of geometry-based 3-D objects. The contribution of this paper is a real-time streaming component for 3DTI with dynamic reconstructed geometry. This component includes both a novel fast compression method and a rateless packet protection scheme specifically designed towards the requirements imposed by real time transmission of live-reconstructed mesh geometry. Tests on a large dataset show an encoding speed-up up to ten times at comparable compression ratio and quality, when compared with the high-end MPEG-4 SC3DMC mesh encoders. The implemented rateless code ensures complete packet loss protection of the triangle mesh object and a delivery delay within interactive bounds. Contrary to most linear fountain codes, the designed codec enables real-time progressive decoding allowing partial decoding each time a packet is received. This approach is compared with transmission over TCP in packet loss rates and latencies, typical in managed WAN and MAN networks, and heavily outperforms it in terms of end-to-end delay. The streaming component has been integrated into a larger 3DTI environment that includes state of the art 3-D reconstruction and rendering modules. This resulted in a prototype that can capture, compress transmit, and render triangle mesh geometry in real-time in realistic internet conditions as shown in experiments. Compared with alternative methods, lower interactive end-to-end delay and frame rates over three times higher are achieved.
BibTeX:
@article{mekuria2014enabling,
  author = {Mekuria, Rufael and Sanna, Michele and Izquierdo, Ebroul and Bulterman, Dick C. and Cesar, Pablo},
  title = {Enabling Geometry-Based 3-D Tele-Immersion With Fast Mesh Compression and Linear Rateless Coding},
  journal = {Multimedia, IEEE Transactions on},
  publisher = {IEEE},
  year = {2014},
  volume = {16},
  number = {7},
  pages = {1809--1820},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6839005},
  doi = {10.1109/TMM.2014.2331919}
}
Yang H, Zou C and Patras I (2014), "Face Sketch Landmarks Localization in the Wild", Signal Processing Letters, IEEE. November, 2014. Vol. 21(11), pp. 1321-1325. IEEE.
Abstract: In this letter, we propose a method for facial landmarks localization in face sketch images. As recent approaches and the corresponding datasets are designed for ordinary face photos, the performance of such models drop significantly when they are applied on face sketch images. We first propose a scheme to synthesize face sketches from face photos based on random-forests edge detection and local face region enhancement. Then we jointly train a Cascaded Pose Regression based method for facial landmarks localization for both face photos and sketches. We build an evaluation dataset, called Face Sketches in the Wild (FSW), with 450 face sketch images collected from the Internet and with the manual annotation of 68 facial landmark locations on each face sketch. The proposed multi-modality facial landmark localization method shows competitive performance on both face sketch images (the FSW dataset) and face photo images (the Labeled Face Parts in the Wild dataset), despite the fact that we do not use extra annotation of face sketches for model building.
BibTeX:
@article{yang2014face,
  author = {Yang, Heng and Changqing Zou and Patras, Ioannis},
  title = {Face Sketch Landmarks Localization in the Wild},
  journal = {Signal Processing Letters, IEEE},
  publisher = {IEEE},
  year = {2014},
  volume = {21},
  number = {11},
  pages = {1321--1325},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/sketch.pdf},
  doi = {10.1109/LSP.2014.2333544}
}
Ionescu, Bogdan and Benois-Pineau, Jenny and Piatrik, Tomas and Quénot, Georges (2014), In "Fusion in Computer Vision". Springer.
BibTeX:
@book{ionescu2014fusion,,
  editor = {Ionescu, Bogdan and Benois-Pineau, Jenny and Piatrik, Tomas and Quénot, Georges},
  booktitle = {Fusion in Computer Vision},
  publisher = {Springer},
  year = {2014},
  url = {http://link.springer.com/book/10.1007%2F978-3-319-05696-8},
  doi = {10.1007/978-3-319-05696-8}
}

Books and Chapters in Books

Fernandez Arguedas V, Zhang Q and Izquierdo E (2014), "Multimodal Fusion in Surveillance Applications", In Fusion in Computer Vision, pp. 161-184. Springer.
Abstract: The recent outbreak of vandalism, accidents and criminal activities has increased general public�s awareness about safety and security, demanding improved security measures. Smart surveillance video systems have become an ubiquitous platform which monitors private and public environments, ensuring citizens well-being. Their universal deployment integrates diverse media and acquisition systems, generating daily an enormous amount of multimodal data. Nowadays, numerous surveillance applications exploit multiple types of data and features benefitting from their uncorrelated contributions. Hence, the analysis, standardisation and fusion of complex content, specially visual, have become a fundamental problem to enhance surveillance systems by increasing their accuracy, robustness and reliability. During this chapter, an exhaustive survey of the existing multimodal fusion techniques and their applications in surveillance is provided. Addressing some of the revealed challenges from the state of the art, this chapter focuses on the development of a multimodal fusion technique for automatic surveillance object classification. The proposed fusion technique exploits the benefits of a Bayesian inference scheme to enhance surveillance systems� performance. The chapter ends with an evaluation of the proposed Bayesian-based multimodal object classifier against two state-of-the-art object classifiers to demonstrate the benefits of multimodal fusion in surveillance applications.
BibTeX:
@incollection{arguedas2014multimodal,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Ionescu, Bogdan and Benois-Pineau, Jenny and Piatrik, Tomas and Quénot, Georges},
  title = {Multimodal Fusion in Surveillance Applications},
  booktitle = {Fusion in Computer Vision},
  publisher = {Springer},
  year = {2014},
  pages = {161--184},
  url = {http://link.springer.com/chapter/10.1007/978-3-319-05696-8_7},
  doi = {10.1007/978-3-319-05696-8_7}
}
Fernandez Arguedas V, Zhang Q and Izquierdo E (2014), "Multimodal Fusion in Surveillance Applications", In Fusion in Computer Vision, pp. 161-184. Springer.
Abstract: The recent outbreak of vandalism, accidents and criminal activities has increased general public�s awareness about safety and security, demanding improved security measures. Smart surveillance video systems have become an ubiquitous platform which monitors private and public environments, ensuring citizens well-being. Their universal deployment integrates diverse media and acquisition systems, generating daily an enormous amount of multimodal data. Nowadays, numerous surveillance applications exploit multiple types of data and features benefitting from their uncorrelated contributions. Hence, the analysis, standardisation and fusion of complex content, specially visual, have become a fundamental problem to enhance surveillance systems by increasing their accuracy, robustness and reliability. During this chapter, an exhaustive survey of the existing multimodal fusion techniques and their applications in surveillance is provided. Addressing some of the revealed challenges from the state of the art, this chapter focuses on the development of a multimodal fusion technique for automatic surveillance object classification. The proposed fusion technique exploits the benefits of a Bayesian inference scheme to enhance surveillance systems� performance. The chapter ends with an evaluation of the proposed Bayesian-based multimodal object classifier against two state-of-the-art object classifiers to demonstrate the benefits of multimodal fusion in surveillance applications.
BibTeX:
@incollection{FernandezArguedas2014,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Ionescu, Bogdan and Benois-Pineau, Jenny and Piatrik, Tomas and Quénot, Georges},
  title = {Multimodal Fusion in Surveillance Applications},
  booktitle = {Fusion in Computer Vision},
  publisher = {Springer},
  year = {2014},
  pages = {161--184},
  url = {http://link.springer.com/chapter/10.1007/978-3-319-05696-8_7},
  doi = {10.1007/978-3-319-05696-8_7}
}
Roy K and Rivera F (2014), "Sams Teach Yourself Maya in 24 Hours" , pp. 512. Pearson.
BibTeX:
@book{roy2013sams,
  author = {Roy, Kenny and Rivera, Fiona},
  title = {Sams Teach Yourself Maya in 24 Hours},
  publisher = {Pearson},
  year = {2014},
  pages = {512},
  note = {non-mmv},
  url = {http://www.informit.com/store/maya-in-24-hours-sams-teach-yourself-9780672336836}
}

Conference Papers

Badii A, Ebrahimi T, Fedorczak C, Korshunov P, Piatrik T, Eiselein V and Al-Obaidi AA (2014), "Overview of the MediaEval 2014 Visual Privacy Task", In Working Notes Proceedings of the MediaEval 2014 Workshop. Barcelona, Catalonia, October, 2014. Vol. 1263, pp. 1-2. CEUR-WS.org.
BibTeX:
@inproceedings{Badii2014,
  author = { Atta Badii and
Touradj Ebrahimi and
Christian Fedorczak and
Pavel Korshunov and
Tomas Piatrik and
Volker Eiselein and
Ahmed A. Al-Obaidi }, editor = { Martha A. Larson and
Bogdan Ionescu and
Xavier Anguera and
Maria Eskevich and
Pavel Korshunov and
Markus Schedl and
Mohammad Soleymani and
Georgios Petkos and
Richard F. E. Sutcliffe and
Jaeyoung Choi and
Gareth J. F. Jones }, title = {Overview of the MediaEval 2014 Visual Privacy Task}, booktitle = {Working Notes Proceedings of the MediaEval 2014 Workshop}, publisher = {CEUR-WS.org}, year = {2014}, volume = {1263}, pages = {1--2}, note = {google scholar entry: 2014 Multimedia Benchmark Workshop (MediaEval 2014). Barcelona, Catalonia, 16-17 October 2014.}, url = {http://ceur-ws.org/Vol-1263/mediaeval2014_submission_37.pdf} }
Blasi SG, Mrak M and Izquierdo E (2014), "Masking of transformed intra-predicted blocks for high quality image and video coding", In Image Processing (ICIP), 2014 IEEE International Conference on. Paris, France, October, 2014, pp. 3160-3164. IEEE.
Abstract: Many professional applications for image and video coding require very high levels of quality of the decoded signal. Under these conditions, even state-of-the-art standards such as High Efficiency Video Coding (HEVC) may not provide sufficient compression efficiency. Large values of the residual samples, especially at high frequency components, are difficult to encode and result in high bitrates of the coded signal. A novel scheme for image and video coding is presented in this paper to enhance compression at high quality levels, based on separate transform to the frequency domain of original and prediction signals. Selected frequency components of the prediction signal are discarded by means of appropriate masking patterns prior to the residual computation. The optimal masking pattern is selected for each transformed block and signalled in the bitstream. The approach is shown achieving gains against conventional HEVC under high quality constraints when coding both still images and video sequences.
BibTeX:
@inproceedings{blasi2014masking,
  author = {Blasi, Saverio G. and Mrak, Marta and Izquierdo, Ebroul},
  title = {Masking of transformed intra-predicted blocks for high quality image and video coding},
  booktitle = {Image Processing (ICIP), 2014 IEEE International Conference on},
  publisher = {IEEE},
  year = {2014},
  pages = {3160--3164},
  note = {google scholar entry: 2014 IEEE International Conference on Image Processing (ICIP). Paris, France. 27-30 October 2014},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6738404},
  doi = {10.1109/ICIP.2014.7025639}
}
Blasi SG, Zupancic I and Izquierdo E (2014), "Fast Motion Estimation Discarding Low-Impact Fractional Blocks", In Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. Lisbon, Portugal, September, 2014, pp. 201-205. IEEE.
Abstract: Sub-pixel motion estimation is used in most modern video coding schemes to improve the outcomes of motion estimation. The reference frame is interpolated and motion vectors are refined with fractional components to reduce the prediction error. Due to the high complexity of these steps, sub-pixel motion estimation can be very demanding in terms of encoding time and resources. A method to reduce complexity of motion estimation schemes is proposed in this paper based on adaptive precision. A parameter is computed to geometrically characterise each block and select whether fractional refinements are likely to improve coding efficiency or not. The selection is based on an estimate of the actual impact of fractional refinements on the coding performance. The method was implemented within the H.264/AVC standard and is shown achieving considerable time savings with respect to conventional schemes, while ensuring that the performance losses are kept below acceptable limits.
BibTeX:
@inproceedings{Blasi2014,
  author = {Blasi, Saverio G. and Zupancic, Ivan and Izquierdo, Ebroul},
  title = {Fast Motion Estimation Discarding Low-Impact Fractional Blocks},
  booktitle = {Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European},
  publisher = {IEEE},
  year = {2014},
  pages = {201--205},
  note = {google scholar entry: 22nd European Signal Processing Conference (EUSIPCO 2014). Lisbon, Portugal, 1 September - 5 September 2014.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2014/HTML/papers/1569925505.pdf}
}
Blasi SG, Zupancic I and Izquierdo E (2014), "Fast motion estimation discarding low-impact fractional blocks", In Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. Lisbon, Portugal, September, 2014, pp. 201-205. IEEE.
Abstract: Sub-pixel motion estimation is used in most modern video coding schemes to improve the outcomes of motion estimation. The reference frame is interpolated and motion vectors are refined with fractional components to reduce the prediction error. Due to the high complexity of these steps, sub-pixel motion estimation can be very demanding in terms of encoding time and resources. A method to reduce complexity of motion estimation schemes is proposed in this paper based on adaptive precision. A parameter is computed to geometrically characterise each block and select whether fractional refinements are likely to improve coding efficiency or not. The selection is based on an estimate of the actual impact of fractional refinements on the coding performance. The method was implemented within the H.264/AVC standard and is shown achieving considerable time savings with respect to conventional schemes, while ensuring that the performance losses are kept below acceptable limits.
BibTeX:
@inproceedings{blasi2014fast,
  author = {Blasi, Saverio G. and Zupancic, Ivan and Izquierdo, Ebroul},
  title = {Fast motion estimation discarding low-impact fractional blocks},
  booktitle = {Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European},
  publisher = {IEEE},
  year = {2014},
  pages = {201--205},
  note = {google scholar entry: 22nd European Signal Processing Conference (EUSIPCO 2014). Lisbon, Portugal, 1 September - 5 September 2014.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2014/HTML/papers/1569925505.pdf}
}
Brenner M, Avraham T, Lindenbaum M and Izquierdo E (2014), "Temporal Face Embedding and Propagation in Photo Collections", In Image Processing (ICIP), 2014 IEEE International Conference on. Paris, France, October, 2014, pp. 6036-6040. IEEE.
Abstract: We present a two-step approach for modeling facial variations and class likelihoods over time. Unlike traditional approaches, we explicitly model the temporal domain that is often available, for example, in consumer photos or surveillance systems. Our combined approach draws upon the concepts of manifold transformation and semi-supervised graph-based propagation to simultaneously recognize faces across entire photo collections. Experiments on two datasets demonstrate improved face recognition accuracy.
BibTeX:
@inproceedings{brenner2014temporal,
  author = {Brenner, Markus and Avraham, Tamar and Lindenbaum, Michael and Izquierdo, Ebroul},
  title = {Temporal Face Embedding and Propagation in Photo Collections},
  booktitle = {Image Processing (ICIP), 2014 IEEE International Conference on},
  publisher = {IEEE},
  year = {2014},
  pages = {6036--6040},
  note = {google scholar entry: 2014 IEEE International Conference on Image Processing (ICIP 2014). Paris, France, 27-30 October 2014.},
  url = {http://www.cs.technion.ac.il/~tammya/Publications/BrennerAvrahamLindenbaumICIP2014.pdf},
  doi = {10.1109/ICIP.2014.7026218}
}
Brenner M and Izquierdo E (2014), "Joint People Recognition across Photo Collections Using Sparse Markov Random Fields", In MultiMedia Modeling. 20th Anniversary International Conference (MMM 2014), Dublin, Ireland, 6-10 January 2014. Proceedings, Part I. Dublin, Ireland, January, 2014. Vol. 8325, pp. 340-352. Springer.
Abstract: We show how to jointly recognize people across an entire photo collection while considering the specifies of personal photos that often depict multiple people. We devise and explore a sparse but efficient graph design based on a second-order Markov Random Field, and that utilizes a distance-based face description method. Experiments on two datasets demonstrate and validate the effectiveness of our probabilistic approach compared to traditional methods.
BibTeX:
@inproceedings{brenner2014joint,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  editor = {Gurrin, Cathal and Hopfgartner, Frank and Hürst, Wolfgang and Johansen, Håvard D. and Lee, Hyowon and O'Connor, Noel E.},
  title = {Joint People Recognition across Photo Collections Using Sparse Markov Random Fields},
  booktitle = {MultiMedia Modeling. 20th Anniversary International Conference (MMM 2014), Dublin, Ireland, 6-10 January 2014. Proceedings, Part I.},
  publisher = {Springer},
  year = {2014},
  volume = {8325},
  pages = {340--352},
  note = {google scholar entry: 20th Anniversary International Conference on MultiMedia Modeling (MMM 2014). Dublin, Ireland, 6-10 January 2014.},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-319-04114-8_29},
  doi = {10.1007/978-3-319-04114-8_29}
}
Brenner M, Mirza N and Izquierdo E (2014), "People Recognition using Gamified Ambiguous Feedback", In Proceedings of the First International Workshop on Gamification for Information Retrieval. Amsterdam, The Netherlands, April, 2014, pp. 22-26. ACM.
Abstract: We present a semi-supervised approach to recognize faces or people while incorporating crowd-sourced and gamified feedback to iteratively improve recognition accuracy. Unlike traditional approaches which are often limited to explicit feedback, we model ambiguous feedback information that we implicitly gather through a crowd that plays a game. We devise a graph-based recognition approach that incorporates such ambiguous feedback to jointly recognize people across an entire dataset. Multiple experiments demonstrate the effectiveness of our gamified feedback approach.
BibTeX:
@inproceedings{brenner2014people,
  author = {Brenner, Markus and Mirza, Navid and Izquierdo, Ebroul},
  title = {People Recognition using Gamified Ambiguous Feedback},
  booktitle = {Proceedings of the First International Workshop on Gamification for Information Retrieval},
  publisher = {ACM},
  year = {2014},
  pages = {22--26},
  note = {google scholar entry: First International Workshop on Gamification for Information Retrieval (GamifIR 2014). Amsterdam, Netherlands, 13-16 April 2014.},
  url = {http://dl.acm.org/citation.cfm?id=2594781},
  doi = {10.1145/2594776.2594781}
}
Celiktutan O, Eyben F, Sariyanidi E, Gunes H and Schuller B (2014), "MAPTRAITS 2014: The First Audio/Visual Mapping Personality Traits Challenge", In Proceedings of the 2014 Workshop on Mapping Personality Traits Challenge and Workshop. New York, NY, USA , pp. 3-9. ACM.
BibTeX:
@inproceedings{Celiktutan:2014:MFA:2668024.2668026,
  author = {Celiktutan, Oya and Eyben, Florian and Sariyanidi, Evangelos and Gunes, Hatice and Schuller, Björn},
  title = {MAPTRAITS 2014: The First Audio/Visual Mapping Personality Traits Challenge},
  booktitle = {Proceedings of the 2014 Workshop on Mapping Personality Traits Challenge and Workshop},
  publisher = {ACM},
  year = {2014},
  pages = {3--9},
  url = {http://doi.acm.org/10.1145/2668024.2668026},
  doi = {10.1145/2668024.2668026}
}
Celiktutan O, Eyben F, Sariyanidi E, Gunes H and Schuller B (2014), "MAPTRAITS 2014: The First Audio/Visual Mapping Personality Traits Challenge", In Proceedings of the 2014 Workshop on Mapping Personality Traits Challenge and Workshop (MAPTRAITS@ICMI 2014). Istanbul, Turkey, November, 2014, pp. 3-9. ACM.
Abstract: The Audio/Visual Mapping Personality Challenge and Workshop (MAPTRAITS) is a competition event that is organised to facilitate the development of signal processing and machine learning techniques for the automatic analysis of personality traits and social dimensions. MAPTRAITS includes two sub-challenges, the continuous space-time sub-challenge and the quantised space-time sub-challenge. The continuous sub-challenge evaluated how systems predict the variation of perceived personality traits and social dimensions in time, whereas the quantised challenge evaluated the ability of systems to predict the overall perceived traits and dimensions in shorter video clips. To analyse the effect of audio and visual modalities on personality perception, we compared systems under three different settings: visual-only, audio-only and audio-visual. With MAPTRAITS we aimed at improving the knowledge on the automatic analysis of personality traits and social dimensions by producing a benchmarking protocol and encouraging the participation of various research groups from different backgrounds.
BibTeX:
@inproceedings{celiktutan2014maptraits,
  author = {Celiktutan, Oya and Eyben, Florian and Sariyanidi, Evangelos and Gunes, Hatice and Schuller, Björn},
  editor = {Gunes, Hatice and Schuller Björn W. and Celiktutan, Oya and Sariyanidi, Evangelos and Eyben, Florian},
  title = {MAPTRAITS 2014: The First Audio/Visual Mapping Personality Traits Challenge},
  booktitle = {Proceedings of the 2014 Workshop on Mapping Personality Traits Challenge and Workshop (MAPTRAITS@ICMI 2014)},
  publisher = {ACM},
  year = {2014},
  pages = {3--9},
  note = {google scholar entry: 2014 Workshop on Mapping Personality Traits Challenge and Workshop (MAPTRAITS@ICMI 2014). Istanbul, Turkey, 12-16 November 2014.},
  url = {http://dl.acm.org/citation.cfm?doid=2668024.2668026},
  doi = {10.1145/2668024.2668026}
}
Celiktutan O and Gunes H (2014), "Continuous prediction of perceived traits and social dimensions in space and time", In Image Processing (ICIP), 2014 IEEE International Conference on. Oct, 2014, pp. 4196-4200.
BibTeX:
@inproceedings{7025852,
  author = {Celiktutan, O. and Gunes, H.},
  title = {Continuous prediction of perceived traits and social dimensions in space and time},
  booktitle = {Image Processing (ICIP), 2014 IEEE International Conference on},
  year = {2014},
  pages = {4196-4200},
  doi = {10.1109/ICIP.2014.7025852}
}
Gkalelis N, Markatopoulou F, Moumtzidou A, Galanopoulos D, Avgerinakis K, Pittaras N, Vrochidis S, Mezaris V, Kompatsiaris I and Patras I (2014), "ITI-CERTH participation to TRECVID 2014", In TRECVID 2014 workshop participants notebook papers. November, 2014, pp. 1-14. National Institute of Standards and Technology (NIST).
Abstract: This paper provides an overview of the runs submitted to TRECVID 2014 by ITI-CERTH. ITI-CERTH participated in the Semantic Indexing (SIN), Event Detection in Internet Multimedia (MED), Multimedia Event Recounting (MER), and Instance Search (INS) tasks. In the SIN task, techniques are developed that combine floating-point local descriptors with binary local descriptors. In addition a multi-label learning algorithm is employed that captures the correlations among concepts. In the MED task, static and motion visual features as well as visual model vectors are extracted, and an efficient method combining a new kernel discriminant analysis (DA) technique and conventional LSVM is evaluated. In the MER subtask of MED the linear version of our DA method is combined with a model vector approach for selecting the key semantic entities depicted in the video and best describe the detected event. Finally, the INS task is performed by employing VERGE, which is an interactive retrieval application combining retrieval functionalities in various modalities.
BibTeX:
@inproceedings{gkalelis2014iti,
  author = {Gkalelis, Nikolas and Markatopoulou, Foteini and Moumtzidou, Anastasia and Galanopoulos, Damianos and Avgerinakis, Konstantinos and Pittaras, Nikiforos and Vrochidis, Stefanos and Mezaris, Vasileios and Kompatsiaris, Ioannis and Patras, Ioannis},
  title = {ITI-CERTH participation to TRECVID 2014},
  booktitle = {TRECVID 2014 workshop participants notebook papers},
  publisher = {National Institute of Standards and Technology (NIST)},
  year = {2014},
  pages = {1--14},
  note = {google scholar entry: 2014 TRECVID Workshop (TRECVID 2014). Orlando, Florida, 10-12 November 2014.},
  url = {http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.14.org.html}
}
Henderson C and Izquierdo E (2014), "Large-scale forensic analysis of security images and videos", In Proceedings of the BMVC 6th UK Doctoral Consortium Workshop. Nottingham, England, September, 2014, pp. 1-11. Self.
Abstract: Our research is concerned with the practical application of computer vision in the forensic analysis of security images and videos. Contemporary literature make use of high-definition images and Hollywood feature films in their datasets, and there is little or no assessment of algorithms� performance using poor quality images with variable frame rates and uncontrolled lighting conditions such as security video. Work so far has produced a methodology for matching features across low quality images that yields an improved results over existing feature matching techniques. Future work will involve innovation in search and retrieval online machine learning to train models from unlabelled data, and segmentation of one-shot videos to aid computer and human analysis of long-running video sequences. We are motivated to produce an integrated system for police investigators to use a query-by-example search and retrieval system with relevance feedback and machine learning to incrementally discover evidence in criminal investigations.
BibTeX:
@inproceedings{henderson2014large,
  author = {Henderson, Craig and Izquierdo, Ebroul},
  editor = {Blanchfield, Peter},
  title = {Large-scale forensic analysis of security images and videos},
  booktitle = {Proceedings of the BMVC 6th UK Doctoral Consortium Workshop},
  publisher = {Self},
  year = {2014},
  pages = {1--11},
  note = {google scholar entry: 6th UK Doctoral Consortium Workshop London (BMVC 2014). Nottingham, England, 1-5 September 2014},
  url = {http://cdmh.co.uk/BMVC2014/papers/w.paper003/index.html}
}
Huang S, Sanna M, Izquierdo E and Hao P (2014), "Optimized scalable video transmission over P2P network with hierarchical network coding", In Image Processing (ICIP), 2014 IEEE International Conference on. Paris, France, October, 2014, pp. 3993-3997. IEEE.
Abstract: This paper proposes a new push-based peer-to-peer (P2P) communication method to optimally transmit scalable video packets over lossy networks. In our scheme, a rate allocation optimization is performed at the sender node. Different from previous sender/receiver driven schemes, our scheme does not need to update buffer maps or request packets periodically, which reduces the amount of redundant packets and yields to less traffic of video data over the Internet. The proposed optimized rate allocation algorithm calculates in advance the number of needed packets according to the cumulative uplink rate between senders and the receiver, and the loss rate of the link. After calculation, each sender sends the hierarchical network coded packets to receivers based on the estimated bandwidth and resource share allocation. With this method, the waste of bandwidth, and the delay caused by the communications among peers can both be reduced.
BibTeX:
@inproceedings{huang2014optimized,
  author = {Huang, Shenglan and Sanna, Michele and Izquierdo, Ebroul and Hao, Pengwei},
  title = {Optimized scalable video transmission over P2P network with hierarchical network coding},
  booktitle = {Image Processing (ICIP), 2014 IEEE International Conference on},
  publisher = {IEEE},
  year = {2014},
  pages = {3993--3997},
  note = {google scholar entry: Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP 2014). Paris, France, 27-30 October 2014.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7025811},
  doi = {10.1109/ICIP.2014.7025811}
}
Jia X, Yang H, Chan K and Patras I (2014), "Structured Semi-supervised Forest for Facial Landmarks Localization with Face Mask Reasoning", In Proceedings of the British Machine Vision Conference 2014 (BMVC). Nottingham, England, September, 2014. BMVA Press.
Abstract: Despite the great success of recent facial landmarks localization approaches, the presence of occlusions significantly degrades the performance of the systems. However, very few works have addressed this problem explicitly due to the high diversity of occlusion in real world. In this paper, we address the face mask reasoning and facial landmarks localization in an unified Structured Decision Forests framework. We first assign a portion of the face dataset with face masks, i.e. for each face image we give each pixel a label to indicate whether it belongs to the face or not. Then we incorporate such additional information of dense pixel labelling into the training of the Structured Classification-Regression Decision Forest. The classification nodes aim at decreasing the variance of the pixel labels of the patches by using our proposed structured criterion while the regression nodes aim at decreasing the variance of the displacements between the patches and the facial landmarks. The proposed framework allows us to predict the face mask and facial landmarks locations jointly. We test the model on face images from several datasets with significant occlusion. The proposed method 1) yields promising results in face mask reasoning; 2) improves the existing Decision Forests approaches in facial landmark localization, aided by the face mask reasoning.
BibTeX:
@inproceedings{jia2014structured,
  author = {Jia, Xuhui and Yang, Heng and Chan, Kwok-Ping and Patras, Ioannis},
  editor = {Valstar, Michel François and French, Andrew P. and Pridmore, Tony P.},
  title = {Structured Semi-supervised Forest for Facial Landmarks Localization with Face Mask Reasoning},
  booktitle = {Proceedings of the British Machine Vision Conference 2014 (BMVC)},
  publisher = {BMVA Press},
  year = {2014},
  note = {google scholar entry: British Machine Vision Conference (BMVC 2014). Nottingham, England, 1-5 September 2014.},
  url = {http://www.bmva.org/bmvc/2014/papers/paper068/index.html}
}
Kordelas G, Daras P, Klavdianos P, Izquierdo E, Zhang Q and others (2014), "Accurate stereo 3D point cloud generation suitable for multi-view stereo reconstruction", In Visual Communications and Image Processing Conference, 2014 IEEE. Valletta, Malta, December, 2014, pp. 307-310. IEEE.
Abstract: This paper proposes a novel methodology for generating 3D point clouds of good accuracy from stereo pairs. Initially, the methodology defines some conditions for the proper selection of image pairs. Then, the selected stereo images are used to estimate dense correspondences using the Daisy descriptor. An efficient two-phase strategy to remove outliers is then introduced. Finally, the 3D point cloud is refined by combining sub-pixel accuracy correspondences estimation and the moving least squares algorithm. The proposed methodology can be exploited by multi-view stereo algorithms due to its good accuracy and its fast computation.
BibTeX:
@inproceedings{Kordelas2014,
  author = {Kordelas, Georgios and Daras, Petros and Klavdianos, Patrycia and Izquierdo, Ebroul and Zhang, Qianni and others},
  title = {Accurate stereo 3D point cloud generation suitable for multi-view stereo reconstruction},
  booktitle = {Visual Communications and Image Processing Conference, 2014 IEEE},
  publisher = {IEEE},
  year = {2014},
  pages = {307--310},
  note = {google scholar entry: 2014 IEEE Visual Communications and Image Processing Conference (VCIP 2014). Valletta, Malta, 7-10 December 2014.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7051565},
  doi = {10.1109/VCIP.2014.7051565}
}
Kordelas G, Daras P, Klavdianos P, Izquierdo E, Zhang Q and others (2014), "Accurate stereo 3D point cloud generation suitable for multi-view stereo reconstruction", In Visual Communications and Image Processing Conference, 2014 IEEE. Valletta, Malta, December, 2014, pp. 307-310. IEEE.
Abstract: This paper proposes a novel methodology for generating 3D point clouds of good accuracy from stereo pairs. Initially, the methodology defines some conditions for the proper selection of image pairs. Then, the selected stereo images are used to estimate dense correspondences using the Daisy descriptor. An efficient two-phase strategy to remove outliers is then introduced. Finally, the 3D point cloud is refined by combining sub-pixel accuracy correspondences estimation and the moving least squares algorithm. The proposed methodology can be exploited by multi-view stereo algorithms due to its good accuracy and its fast computation.
BibTeX:
@inproceedings{kordelas2014accurate,
  author = {Kordelas, Georgios and Daras, Petros and Klavdianos, Patrycia and Izquierdo, Ebroul and Zhang, Qianni and others},
  title = {Accurate stereo 3D point cloud generation suitable for multi-view stereo reconstruction},
  booktitle = {Visual Communications and Image Processing Conference, 2014 IEEE},
  publisher = {IEEE},
  year = {2014},
  pages = {307--310},
  note = {google scholar entry: 2014 IEEE Visual Communications and Image Processing Conference (VCIP 2014). Valletta, Malta, 7-10 December 2014.},
  url = {http://www.researchgate.net/profile/Petros_Daras/publication/275887874_Accurate_stereo_3D_point_cloud_generation_suitable_for_multi-view_stereo_reconstruction/links/554c9fb20cf29752ee7f200c.pdf},
  doi = {10.1109/VCIP.2014.7051565}
}
Moumtzidou A, Avgerinakis K, Apostolidis E, Markatopoulou F, Apostolidis K, Mironidis T, Vrochidis S, Mezaris V, Kompatsiaris I and Patras I (2014), "VERGE: A Multimodal Interactive Video Search Engine", In MultiMedia Modeling. Proceedings of the 20th Anniversary International Conference (MMM 2014). Part II. Dublin, Ireland, January, 2014. Vol. 8326, pp. 411-414. Springer.
Abstract: This paper presents VERGE interactive video retrieval engine, which is capable of searching into video content. The system integrates several content-based analysis and retrieval modules such as video shot boundary detection, concept detection, clustering and visual similarity search.
BibTeX:
@inproceedings{moumtzidou2015verge,
  author = {Moumtzidou, Anastasia and Avgerinakis, Konstantinos and Apostolidis, Evlampios and Markatopoulou, Fotini and Apostolidis, Konstantinos and Mironidis, Theodoros and Vrochidis, Stefanos and Mezaris, Vasileios and Kompatsiaris, Ioannis and Patras, Ioannis},
  editor = {Gurrin, Cathal and Hopfgartner, Frank and Hürst, Wolfgang and Johansen, Håvard D. and Lee, Hyowon and O'Connor, Noel E.},
  title = {VERGE: A Multimodal Interactive Video Search Engine},
  booktitle = {MultiMedia Modeling. Proceedings of the 20th Anniversary International Conference (MMM 2014). Part II.},
  publisher = {Springer},
  year = {2014},
  volume = {8326},
  pages = {411--414},
  note = {google scholar entry: 20th Anniversary International Multimedia Modeling Conference (MMM 2014). Dublin, Ireland, 6-10 January 2014.},
  url = {http://link.springer.com/chapter/10.1007/978-3-319-04117-9_48},
  doi = {10.1007/978-3-319-14442-9_23}
}
O'Connor NE, Alexiadis DS, Apostolakis KC, Daras P, Izquierdo E, Li Y, Monaghan DS, Rivera F, Stevens C, van Broeck S, Wall J and Wei H (2014), "Tools for User Interaction in Immersive Environments", In MultiMedia Modeling. 20th Anniversary International Conference (MMM 2014). Dublin, Ireland, 6-10 January 2014. Proceedings, Part I. Dublin, Ireland, January, 2014. Vol. 8326, pp. 382-385. Springer.
Abstract: REVERIE -- REal and Virtual Engagement in Realistic Immersive Environments -- is a large scale collaborative project co-funded by the European Commission targeting novel research in the general domain of Networked Media and Search Systems. The project aims to bring about a revolution in 3D media and virtual reality by developing technologies for safe, collaborative, online environments that can enable realistic interpersonal communication and interaction in immersive environments. To date, project partners have been developing component technologies for a variety of functionalities related to the aims of REVERIE prior to integration into an end-to-end system. In this demo submission, we first introduce the project in general terms, outlining the high-level concept and vision before briefly describing the suite of demonstrations that we intend to present at MMM 2014.
BibTeX:
@inproceedings{o2014tools,
  author = {O'Connor, Noel E. and Alexiadis, Dimitrios S. and Apostolakis, Konstantinos C. and Daras, Petros and Izquierdo, Ebroul and Li, Y. and Monaghan, David S. and Rivera, Fiona and Stevens, C. and van Broeck, Sigurd and Wall, Julie and Wei, Haolin},
  editor = {Gurrin, Cathal and Hopfgartner, Frank and Hürst, Wolfgang and Johansen, Håvard D. and Lee, Hyowon and O'Connor, Noel E.},
  title = {Tools for User Interaction in Immersive Environments},
  booktitle = {MultiMedia Modeling. 20th Anniversary International Conference (MMM 2014). Dublin, Ireland, 6-10 January 2014. Proceedings, Part I.},
  publisher = {Springer},
  year = {2014},
  volume = {8326},
  pages = {382--385},
  note = {google scholar entry: 20th Anniversary International Conference on MultiMedia Modeling (MMM 2014). Dublin, Ireland, 6-10 January 2014.},
  url = {http://doras.dcu.ie/19631/1/MMM2014.pdf},
  doi = {10.1007/978-3-319-04117-9_41}
}
O'Connor NE, Alexiadis DS, Apostolakis KC, Daras P, Izquierdo E, Li Y, Monaghan DS, Rivera F, Stevens C, van Broeck S, Wall J and Wei H (2014), "Tools for User Interaction in Immersive Environments", In MultiMedia Modeling. 20th Anniversary International Conference (MMM 2014), Dublin, Ireland, 6-10 January 2014. Proceedings, Part I. Dublin, Ireland, January, 2014. Vol. 8326, pp. 382-385. Springer.
Abstract: REVERIE -- REal and Virtual Engagement in Realistic Immersive Environments -- is a large scale collaborative project co-funded by the European Commission targeting novel research in the general domain of Networked Media and Search Systems. The project aims to bring about a revolution in 3D media and virtual reality by developing technologies for safe, collaborative, online environments that can enable realistic interpersonal communication and interaction in immersive environments. To date, project partners have been developing component technologies for a variety of functionalities related to the aims of REVERIE prior to integration into an end-to-end system. In this demo submission, we first introduce the project in general terms, outlining the high-level concept and vision before briefly describing the suite of demonstrations that we intend to present at MMM 2014.
BibTeX:
@inproceedings{OConnor2014,
  author = {O'Connor, Noel E. and Alexiadis, Dimitrios S. and Apostolakis, Konstantinos C. and Daras, Petros and Izquierdo, Ebroul and Li, Y. and Monaghan, David S. and Rivera, Fiona and Stevens, C. and van Broeck, Sigurd and Wall, Julie and Wei, Haolin},
  editor = {Gurrin, Cathal and Hopfgartner, Frank and Hürst, Wolfgang and Johansen, Håvard D. and Lee, Hyowon and O'Connor, Noel E.},
  title = {Tools for User Interaction in Immersive Environments},
  booktitle = {MultiMedia Modeling. 20th Anniversary International Conference (MMM 2014), Dublin, Ireland, 6-10 January 2014. Proceedings, Part I.},
  publisher = {Springer},
  year = {2014},
  volume = {8326},
  pages = {382--385},
  note = {google scholar entry: 20th Anniversary International Conference on MultiMedia Modeling (MMM 2014). Dublin, Ireland, 6-10 January 2014.},
  url = {http://doras.dcu.ie/19631/1/MMM2014.pdf},
  doi = {10.1007/978-3-319-04117-9_41}
}
Pantoja C and Izquierdo E (2014), "MediaEval 2014 Visual Privacy Task: De-identification and Re-identification of Subjects in CCTV", In Working Notes Proceedings of the MediaEval 2014 Workshop. Barcelona, Catalonia, October, 2014. Vol. 1263, pp. 1-2. CEUR-WS.org.
BibTeX:
@inproceedings{badii2014overview,
  author = {Pantoja, Cesar and Izquierdo, Ebroul},
  editor = { Martha A. Larson and
Bogdan Ionescu and
Xavier Anguera and
Maria Eskevich and
Pavel Korshunov and
Markus Schedl and
Mohammad Soleymani and
Georgios Petkos and
Richard F. E. Sutcliffe and
Jaeyoung Choi and
Gareth J. F. Jones }, title = {MediaEval 2014 Visual Privacy Task: De-identification and Re-identification of Subjects in CCTV}, booktitle = {Working Notes Proceedings of the MediaEval 2014 Workshop}, publisher = {CEUR-WS.org}, year = {2014}, volume = {1263}, pages = {1--2}, note = {google scholar entry: 2014 Multimedia Benchmark Workshop (MediaEval 2014). Barcelona, Catalonia, 16-17 October 2014.}, url = {http://ceur-ws.org/Vol-1263/mediaeval2014_submission_37.pdf} }
Peixoto E, Macchiavello B, de Queiroz RL and Hung EM (2014), "Fast H.264/AVC to HEVC transcoding based on Machine Learning", In Proceedings of the 2014 International Telecommunications Symposium (ITS 2014). Sao Paulo, Brazil, August, 2014, pp. 1-4.
Abstract: Since the HEVC codec has become an ITU-T and ISO/IEC standard, efficient transcoding from previous standards, such as the H.264/AVC, to HEVC is highly needed. In this paper, we build on our previous work with the goal to develop a faster transcoder from H.264/AVC to HEVC. The transcoder is built around an established two-stage transcoding. In the first stage, called the training stage, full re-encoding is performed while the H.264/AVC and the HEVC information are gathered. This information is then used to build a CU classification model that is used in the second stage (called the transcoding stage). The solution is tested with well-known video sequences and evaluated in terms of rate-distortion and complexity. The proposed method is 3.4 times faster, on average, than the trivial transcoder, and 1.65 times faster than a previous transcoding solution.
BibTeX:
@inproceedings{peixoto2010fast,
  author = {Peixoto, Eduardo and Macchiavello, Bruno and de Queiroz, Ricardo L. and Hung, Edson Mintsu},
  title = {Fast H.264/AVC to HEVC transcoding based on Machine Learning},
  booktitle = {Proceedings of the 2014 International Telecommunications Symposium (ITS 2014)},
  year = {2014},
  pages = {1--4},
  note = {google scholar entry: 2014 International Telecommunications Symposium (ITS 2014). Sao Paulo, Brazil, 17-20 August 2014.},
  url = {http://queiroz.divp.org/papers/its2014_transcoder.pdf},
  doi = {10.1109/ITS.2014.6947999}
}
Peixoto E, Zgaljic T and Izquierdo E (2014), "Transcoding from H.264/AVC to a Wavelet-based Scalable Video Codec", In Image Processing (ICIP 2014), Proceedings of the 2014 International Conference on. Paris, France, October, 2014, pp. 2532-2536. IEEE.
Abstract: In this paper, a fast transcoding solution from H.264/AVC to HEVC bitstreams is presented. This solution is based on two main modules: a coding unit (CU) classification module that relies on a machine learning technique in order to map H.264/AVC macroblocks into HEVC CUs; and an early termination technique that is based on statistical modeling of the HEVC rate-distortion (RD) cost in order to further speed-up the transcoding. The transcoder is built around an established two-stage transcoding. In the first stage, called the training stage, full re-encoding is performed while the H.264/AVC and the HEVC information are gathered. This information is then used to build both the CU classification model and the early termination sieves, that are used in the second stage (called the transcoding stage). The solution is tested with well-known video sequences and evaluated in terms of RD and complexity. The proposed method is 3.83 times faster, on average, than the trivial transcoder, and 1.8 times faster than a previous transcoding solution, while yielding a RD loss of 4% compared to this solution.
BibTeX:
@inproceedings{peixoto2014transcoding,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Transcoding from H.264/AVC to a Wavelet-based Scalable Video Codec},
  booktitle = {Image Processing (ICIP 2014), Proceedings of the 2014 International Conference on},
  publisher = {IEEE},
  year = {2014},
  pages = {2532--2536},
  note = {google scholar entry: 17th International Conference on Image Processing (ICIP 2014). Paris, France, 27-30 October 2014.},
  url = {http://queiroz.divp.org/papers/icip2014_transcoder.pdf},
  doi = {10.1109/ICIP.2014.7025512}
}
Wall J, Izquierdo E, Argyriou L, Monaghan DS, O'Connor NE, Poulakos S, Smolic A and Mekuria R (2014), "REVERIE: Natural human interaction in virtual immersive environments", In Image Processing (ICIP), Proceedings of the 2014 IEEE International Conference on. Paris, France, October, 2014, pp. 2022-2024. IEEE.
Abstract: REVERIE (REal and Virtual Engagement in Realistic Immersive Environments [1]) targets novel research to address the demanding challenges involved with developing state-of-the-art technologies for online human interaction. The REVERIE framework enables users to meet, socialise and share experiences online by integrating cutting-edge technologies for 3D data acquisition and processing, networking, autonomy and real-time rendering. In this paper, we describe the innovative research that is showcased through the REVERIE integrated framework through richly defined use-cases which demonstrate the validity and potential for natural interaction in a virtual immersive and safe environment. Previews of the REVERIE demo and its key research components can be viewed at www.youtube.com/user/REVERIEFP7.
BibTeX:
@inproceedings{Wall2014,
  author = {Wall, Julie and Izquierdo, Ebroul and Argyriou, Lemonia and Monaghan, David S. and O'Connor, Noel E and Poulakos, Steven and Smolic, Aljoscha and Mekuria, Rufael},
  title = {REVERIE: Natural human interaction in virtual immersive environments},
  booktitle = {Image Processing (ICIP), Proceedings of the 2014 IEEE International Conference on},
  publisher = {IEEE},
  year = {2014},
  pages = {2022--2024},
  note = {google scholar entry: 2014 IEEE International Conference on Image Processing (ICIP 2014). Paris, France, 27-30 October 2014.},
  url = {http://oai.cwi.nl/oai/asset/22601/22601A.pdf},
  doi = {10.1109/ICIP.2014.7025435}
}
Wall J, Izquierdo E, Argyriou L, Monaghan DS, O'Connor NE, Poulakos S, Smolic A and Mekuria R (2014), "REVERIE: Natural human interaction in virtual immersive environments", In Image Processing (ICIP), Proceedings of the 2014 IEEE International Conference on. Paris, France, October, 2014, pp. 2022-2024. IEEE.
Abstract: REVERIE (REal and Virtual Engagement in Realistic Immersive Environments [1]) targets novel research to address the demanding challenges involved with developing state-of-the-art technologies for online human interaction. The REVERIE framework enables users to meet, socialise and share experiences online by integrating cutting-edge technologies for 3D data acquisition and processing, networking, autonomy and real-time rendering. In this paper, we describe the innovative research that is showcased through the REVERIE integrated framework through richly defined use-cases which demonstrate the validity and potential for natural interaction in a virtual immersive and safe environment. Previews of the REVERIE demo and its key research components can be viewed at www.youtube.com/user/REVERIEFP7.
BibTeX:
@inproceedings{wall2014reverie,
  author = {Wall, Julie and Izquierdo, Ebroul and Argyriou, Lemonia and Monaghan, David S. and O'Connor, Noel E and Poulakos, Steven and Smolic, Aljoscha and Mekuria, Rufael},
  title = {REVERIE: Natural human interaction in virtual immersive environments},
  booktitle = {Image Processing (ICIP), Proceedings of the 2014 IEEE International Conference on},
  publisher = {IEEE},
  year = {2014},
  pages = {2022--2024},
  note = {google scholar entry: Proceeding of the 2014 IEEE International Conference on Image Processing (ICIP 2014). Paris, France, 27-30 October 2014.},
  url = {http://oai.cwi.nl/oai/asset/22601/22601A.pdf},
  doi = {10.1109/ICIP.2014.7025435}
}
Zou C, Yang H and Liu J (2014), "Separation of Line Drawings Based on Split Faces for 3D Object Reconstruction", In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. Columbus, Ohio, June, 2014, pp. 692-699. IEEE.
Abstract: Reconstructing 3D objects from single line drawings is often desirable in computer vision and graphics applications. If the line drawing of a complex 3D object is decomposed into primitives of simple shape, the object can be easily reconstructed. We propose an effective method to conduct the line drawing separation and turn a complex line drawing into parametric 3D models. This is achieved by recursively separating the line drawing using two types of split faces. Our experiments show that the proposed separation method can generate more basic and simple line drawings, and its combination with the example-based reconstruction can robustly recover wider range of complex parametric 3D objects than previous methods.
BibTeX:
@inproceedings{zou2014separation,
  author = {Zou, Changqing and Yang, Heng and Liu, Jianzhuang},
  title = {Separation of Line Drawings Based on Split Faces for 3D Object Reconstruction},
  booktitle = {Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on},
  publisher = {IEEE},
  year = {2014},
  pages = {692--699},
  note = {google scholar entry: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014). Columbus, Ohio, 23-28 June 2014.},
  url = {http://dx.doi.org/10.1109/CVPR.2014.94},
  doi = {10.1109/CVPR.2014.94}
}

Theses and Monographs

Bozas K (2014), "Scalable Image Retrieval based on Hand Drawn Sketches and their Semantic Information". Thesis at: Queen Mary University of London. November, 2014, pp. 1-153.
Abstract: The research presented in this thesis aims to extend the capabilities of traditional content-based image retrieval systems, towards more expressive and scalable interactions. The study focuses on machine sketch understanding and its applications. In particular, sketch based image retrieval (SBIR), a form of image search where the query is a user drawn picture (sketch), and freehand sketch recognition. SBIR provides a platform for the user to express image search queries that otherwise would be difficult to describe with text. The research builds upon two main axes: extension of the state-of-the art and scalability. Three novel approaches for sketch recognition and retrieval are presented. Notably, a patch hashing algorithm for scalable SBIR is introduced, along with a manifold learning technique for sketch recognition and a horizontal flip-invariant sketch matching method to further enhance recognition accuracy. The patch hashing algorithm extracts several overlapping patches of an image. Similarities between a hand drawn sketch and the images in a database are ranked through a voting process where patches with similar shape and structure configuration arbitrate for the result. Patch similarity is efficiently estimated with a hashing algorithm. A spatially aware index structure built on the hashing keys ensures the scalability of the scheme and allows for real time re-ranking upon query updates. Sketch recognition is achieved through a discriminant manifold learning method named Discriminant Pairwise Local Embeddings (DPLE). DPLE is a supervised dimensionality reduction technique that generates structure preserving discriminant subspaces. This objective is achieved through a convex optimization formulation where Euclidean distances between data pairs that belong to the same class are minimized, while those of pairs belonging to different classes are maximized. A scalable one-to-one sketch matching technique invariant to horizontal mirror reflections further improves recognition accuracy without high computational cost. The matching is based on structured feature correspondences and produces a dissimilarity score between two sketches. Extensive experimental evaluation of our methods demonstrates the improvements over the state-of-the-art in SBIR and sketch recognition.
BibTeX:
@phdthesis{bozas2014scalable,
  author = {Bozas, Konstantinos},
  editor = {Izquierdo, Ebroul},
  title = {Scalable Image Retrieval based on Hand Drawn Sketches and their Semantic Information},
  school = {Queen Mary University of London},
  year = {2014},
  pages = {1--153},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/KonstantinosBozas(kb300)_PhDThesis.pdf}
}
Brenner M (2014), "Context-based Semi-supervised Joint People Recognition in Consumer Photo Collections using Markov Networks". Thesis at: Queen Mary University of London. September, 2014. pp. 1-178.
Abstract: Faces, along with the personal identities behind them, are effective elements in organizing a collection of consumer photos, as they represent who was involved. However, the accurate discrimination and subsequent recognition of face appearances is still very challenging. This can be attributed to the fact that faces are usually neither perfectly lit nor captured, particularly in the uncontrolled environments of consumer photos. Unlike, for instance, passport photos that only show faces stripped of their surroundings, Consumer Photo Collections contain a vast amount of meaningful context. For example, consecutively shot photos often correlate in time, location or scene. Further information can also be provided by the people appearing in photos, such as their demographics (ages and gender are often easier to surmise than identities), clothing, or the social relationships among co-occurring people. Motivated by this ubiquitous context, we propose and research people recognition approaches that consider contextual information within photos, as well as across entire photo collections. Our aim of leveraging additional contextual information (as opposed to only considering faces) is to improve recognition performance. However, instead of requiring users to explicitly label specific pieces of contextual information, we wish to implicitly learn and draw from the seemingly coherent content that exists inherently across an entire photo collection. Moreover, unlike conventional approaches that usually predict the identity of only one person�s appearance at a time, we lay out a semi-supervised approach to jointly recognize multiple peoples� appearances across an entire photo collection simultaneously. As such, our aim is to find the overall best recognition solution. To make context-based joint recognition of people feasible, we research a sparse but efficient graph-based approach that builds on Markov Networks and utilizes distance-based face description methods. We show how to exploit the following specific contextual cues: time, social semantics, body appearances (clothing), gender, scene and ambiguous captions. We also show how to leverage crowd-sourced gamified feedback to iteratively improve recognition performance. Experiments on several datasets demonstrate and validate the effectiveness of our semi-supervised graph-based recognition approach compared to conventional approaches.
BibTeX:
@phdthesis{brenner2014context,
  author = {Brenner, Markus},
  editor = {Izquierdo, Ebroul},
  title = {Context-based Semi-supervised Joint People Recognition in Consumer Photo Collections using Markov Networks},
  school = {Queen Mary University of London},
  year = {2014},
  pages = {1--178},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/MarkusBrenner(markusb)_PhDThesis.pdf}
}


2013

Journal Papers

Blasi SG, Peixoto E and Izquierdo E (2013), "Enhanced Inter-Prediction via Shifting Transformation in the H.264/AVC", Circuits and Systems for Video Technology, IEEE Transactions on. April, 2013. Vol. 23(4), pp. 735-740. IEEE.
Abstract: Inter-prediction based on block-based motion estimation is used in most video codecs. The closer the prediction is to the target block, the lower is the residual and more efficient compression can be achieved. In this paper a new technique called Enhanced Inter-Prediction (EIP) is proposed to improve the prediction candidates using an additional transformation acting while performing motion estimation. A parametric transformation acts within the coding loop of each block to modify the prediction for each motion vector candidate. The EIP is validated in the particular case of a single-parameter shifting transformation. This paper presents an efficient algorithm to compute the best shift for each prediction candidate, and a model to select the optimal prediction based on minimum cost integrating the approach with existing rate-distortion optimization techniques in the H.264/AVC video codec. Results show significant improvements with an average of 6% bit-rate reduction compared to the original H.264/AVC.
BibTeX:
@article{blasi2012enhanced,
  author = {Blasi, Saverio G. and Peixoto, Eduardo and Izquierdo, Ebroul},
  title = {Enhanced Inter-Prediction via Shifting Transformation in the H.264/AVC},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2013},
  volume = {23},
  number = {4},
  pages = {735--740},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6279460},
  doi = {10.1109/TCSVT.2012.2214931}
}
Essid S, Lin X, Gowing M, Kordelas G, Aksay A, Kelly P, Fillon T, Zhang Q, Dielmann A, Kitanovski V, Tournemenne R, Masurelle A, Izquierdo E, O'Connor NE, Daras P and Richard G (2013), "A multi-modal dance corpus for research into interaction between humans in virtual environments", Journal on Multimodal User Interfaces. March, 2013. Vol. 7(1-2), pp. 157-170. Springer.
Abstract: We present a new, freely available, multimodal corpus for research into, amongst other areas, real-time realistic interaction between humans in online virtual environments. The specific corpus scenario focuses on an online dance class application scenario where students, with avatars driven by whatever 3D capture technology is locally available to them, can learn choreographies with teacher guidance in an online virtual dance studio. As the dance corpus is focused on this scenario, it consists of student/teacher dance choreographies concurrently captured at two different sites using a variety of media modalities, including synchronised audio rigs, multiple cameras, wearable inertial measurement devices and depth sensors. In the corpus, each of the several dancers performs a number of fixed choreographies, which are graded according to a number of specific evaluation criteria. In addition, ground-truth dance choreography annotations are provided. Furthermore, for unsynchronised sensor modalities, the corpus also includes distinctive events for data stream synchronisation. The total duration of the recorded content is 1 h and 40 min for each single sensor, amounting to 55 h of recordings across all sensors. Although the dance corpus is tailored specifically for an online dance class application scenario, the data is free to download and use for any research and development purposes.
BibTeX:
@article{Essid2013,
  author = {Essid, Slim and Lin, Xinyu and Gowing, Marc and Kordelas, Georgios and Aksay, Anil and Kelly, Philip and Fillon, Thomas and Zhang, Qianni and Dielmann, Alfred and Kitanovski, Vlado and Tournemenne, Robin and Masurelle, Aymeric and Izquierdo, Ebroul and O'Connor, Noel E. and Daras, Petros and Richard, Gaël},
  title = {A multi-modal dance corpus for research into interaction between humans in virtual environments},
  journal = {Journal on Multimodal User Interfaces},
  publisher = {Springer},
  year = {2013},
  volume = {7},
  number = {1-2},
  pages = {157--170},
  url = {http://embots.dfki.de/mmc/mmc11/Essidetal.pdf},
  doi = {10.1007/s12193-012-0109-5}
}
Essid S, Lin X, Gowing M, Kordelas G, Aksay A, Kelly P, Fillon T, Zhang Q, Dielmann A, Kitanovski V, Tournemenne R, Masurelle A, Izquierdo E, O'Connor NE, Daras P and Richard G (2013), "A multi-modal dance corpus for research into interaction between humans in virtual environments", Journal on Multimodal User Interfaces. March, 2013. Vol. 7(1-2), pp. 157-170. Springer.
Abstract: We present a new, freely available, multimodal corpus for research into, amongst other areas, real-time realistic interaction between humans in online virtual environments. The specific corpus scenario focuses on an online dance class application scenario where students, with avatars driven by whatever 3D capture technology is locally available to them, can learn choreographies with teacher guidance in an online virtual dance studio. As the dance corpus is focused on this scenario, it consists of student/teacher dance choreographies concurrently captured at two different sites using a variety of media modalities, including synchronised audio rigs, multiple cameras, wearable inertial measurement devices and depth sensors. In the corpus, each of the several dancers performs a number of fixed choreographies, which are graded according to a number of specific evaluation criteria. In addition, ground-truth dance choreography annotations are provided. Furthermore, for unsynchronised sensor modalities, the corpus also includes distinctive events for data stream synchronisation. The total duration of the recorded content is 1 h and 40 min for each single sensor, amounting to 55 h of recordings across all sensors. Although the dance corpus is tailored specifically for an online dance class application scenario, the data is free to download and use for any research and development purposes.
BibTeX:
@article{essid2013multi,
  author = {Essid, Slim and Lin, Xinyu and Gowing, Marc and Kordelas, Georgios and Aksay, Anil and Kelly, Philip and Fillon, Thomas and Zhang, Qianni and Dielmann, Alfred and Kitanovski, Vlado and Tournemenne, Robin and Masurelle, Aymeric and Izquierdo, Ebroul and O'Connor, Noel E. and Daras, Petros and Richard, Gaël},
  title = {A multi-modal dance corpus for research into interaction between humans in virtual environments},
  journal = {Journal on Multimodal User Interfaces},
  publisher = {Springer},
  year = {2013},
  volume = {7},
  number = {1-2},
  pages = {157--170},
  url = {http://link.springer.com/article/10.1007/s12193-012-0109-5},
  doi = {10.1007/s12193-012-0109-5}
}
Koelstra S and Patras I (2013), "Fusion of facial expressions and EEG for implicit affective tagging", Image Vision Computing. February, 2013. Vol. 31(2), pp. 164-174. Elsevier.
Abstract: The explosion of user-generated, untagged multimedia data in recent years, generates a strong need for efficient search and retrieval of this data. The predominant method for content-based tagging is through slow, labor-intensive manual annotation. Consequently, automatic tagging is currently a subject of intensive research. However, it is clear that the process will not be fully automated in the foreseeable future. We propose to involve the user and investigate methods for implicit tagging, wherein users' responses to the interaction with the multimedia content are analyzed in order to generate descriptive tags. Here, we present a multi-modal approach that analyses both facial expressions and electroencephalography (EEG) signals for the generation of affective tags. We perform classification and regression in the valence-arousal space and present results for both feature-level and decision-level fusion. We demonstrate improvement in the results when using both modalities, suggesting the modalities contain complementary information.
BibTeX:
@article{koelstra2013fusion,
  author = {Koelstra, Sander and Patras, Ioannis},
  title = {Fusion of facial expressions and EEG for implicit affective tagging},
  journal = {Image Vision Computing},
  publisher = {Elsevier},
  year = {2013},
  volume = {31},
  number = {2},
  pages = {164--174},
  url = {http://www.eecs.qmul.ac.uk/~ioannisp/pubs/ecopies/2012-KoelstraPatrasIVCJ.pdf},
  doi = {10.1016/j.imavis.2012.10.002}
}
Sanna M and Izquierdo E (2013), "Live Scalable Video Streaming on Peer-to-Peer Overlays with Network Coding", Latin America Transactions, IEEE (Revista IEEE America Latina). May, 2013. Vol. 11(3), pp. 962-968. IEEE.
Abstract: Scalable video coding is a paradigm that allows partial decoding of the video stream at reduced resolution, frame-rate or quality, adapting to display requirements and reception conditions of heterogeneous receivers. Transmission of scalable data with prioritization enhances the transmission performance, reducing the sensitivity to network congestions and exploiting the multirate characteristic of scalable coding. Network coding is a novel transmission technique that allows intermediate network nodes to perform coding operations on the information in transit, as opposed to traditional routing. This yields to maximization of the transmission rate and encoding of the information with spatial diversity. We employ an overlay network that uses network coding and delivers scalable video with prioritization. We test the performance of the scalable streaming against a non-scalable system, when the upload bandwidth of the nodes is not known.
BibTeX:
@article{sanna2012live,
  author = {Sanna, Michele and Izquierdo, Ebroul},
  title = {Live Scalable Video Streaming on Peer-to-Peer Overlays with Network Coding},
  journal = {Latin America Transactions, IEEE (Revista IEEE America Latina)},
  publisher = {IEEE},
  year = {2013},
  volume = {11},
  number = {3},
  pages = {962--968},
  doi = {10.1109/TLA.2013.6568840}
}
Shanableh T, Peixoto E and Izquierdo E (2013), "MPEG-2 to HEVC video transcoding with content-based modeling", Circuits and Systems for Video Technology, IEEE Transactions on. July, 2013. Vol. 23(7), pp. 1191-1196. IEEE.
Abstract: This paper proposes an efficient MPEG-2 to HEVC video transcoder. The objective of the transcoder is to migrate the abundant MPEG-2 video content to the emerging HEVC video coding standard. The transcoder introduces a content-based machine learning solution to predict the depth of the HEVC coding units. The proposed transcoder utilizes full re-encoding to find a mapping between the incoming MPEG-2 coding information and the outgoing HEVC depths of the coding units. Once the model is built, a switch to transcoding mode takes place. Hence the model is content-based and varies from one video sequence to another. The transcoder is compared against full re-encoding using the default HEVC fast motion estimation. Using HEVC test sequences, it is shown that a speedup factor of up to 3 is achieved whilst reducing the bitrate of the incoming video by around 50 In comparison to full re-encoding, an average of 3.9% excessive bitrate is encountered with an average PSNR drop of 0.1 dB. Since this is the first work to report on MPEG-2 to HEVC video transcoding, the reported results can be used as a benchmark for future transcoding research.
BibTeX:
@article{Shanableh2013,
  author = {Shanableh, Tamer and Peixoto, Eduardo and Izquierdo, Ebroul},
  title = {MPEG-2 to HEVC video transcoding with content-based modeling},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2013},
  volume = {23},
  number = {7},
  pages = {1191--1196},
  url = {http://ieeexplore.ieee.org//xpl/articleDetails.jsp?tp=&arnumber=6415262},
  doi = {10.1109/TCSVT.2013.2241352}
}
Shanableh T, Peixoto E and Izquierdo E (2013), "MPEG-2 to HEVC video transcoding with content-based modeling", Circuits and Systems for Video Technology, IEEE Transactions on. July, 2013. Vol. 23(7), pp. 1191-1196. IEEE.
Abstract: This paper proposes an efficient MPEG-2 to HEVC video transcoder. The objective of the transcoder is to migrate the abundant MPEG-2 video content to the emerging HEVC video coding standard. The transcoder introduces a content-based machine learning solution to predict the depth of the HEVC coding units. The proposed transcoder utilizes full re-encoding to find a mapping between the incoming MPEG-2 coding information and the outgoing HEVC depths of the coding units. Once the model is built, a switch to transcoding mode takes place. Hence the model is content-based and varies from one video sequence to another. The transcoder is compared against full re-encoding using the default HEVC fast motion estimation. Using HEVC test sequences, it is shown that a speedup factor of up to 3 is achieved whilst reducing the bitrate of the incoming video by around 50 In comparison to full re-encoding, an average of 3.9% excessive bitrate is encountered with an average PSNR drop of 0.1 dB. Since this is the first work to report on MPEG-2 to HEVC video transcoding, the reported results can be used as a benchmark for future transcoding research.
BibTeX:
@article{shanableh2013mpeg,
  author = {Shanableh, Tamer and Peixoto, Eduardo and Izquierdo, Ebroul},
  title = {MPEG-2 to HEVC video transcoding with content-based modeling},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2013},
  volume = {23},
  number = {7},
  pages = {1191--1196},
  url = {http://ieeexplore.ieee.org//xpl/articleDetails.jsp?tp=&arnumber=6415262},
  doi = {10.1109/TCSVT.2013.2241352}
}
Vaiapury K, Aksay A, Lin X, Izquierdo E and Papadopoulos C (2013), "A new cost effective 3D measurement audit and model comparison system for verification tasks", Multidimensional Systems and Signal Processing. September, 2013. Vol. 24(2), pp. 331-377. Springer.
Abstract: A new unified system application for the production audit in an aerospace industry is presented in this paper which comprises two key application tools such as (a) 3D PAMT (production audit measurement tool) and (b) 3D PACT (production audit compare tool). In spite of the facts that above functionalities are modular wise independent, commonly they are related in terms of assisting the production audit task. 3D PAMT facilitates the verification of manufactured parts to be within a pre-defined threshold range using a calibrated stereo camera with the safety test engineer interaction in order to select the matching disparity points. The distance between datum points with or without reference to a planar reference surface model can be obtained. We describe the system flow, plus validate the technique via a number of experimental datasets. 3D PACT allows the identification of discrepancies between a computed 3D point cloud model and the corresponding digital mock-up point cloud model. Usually, the computer aided geometry model is built before an actual installation. This knowledge about the components of an installation assembly is available as semantic information in an extendable markup language (XML) format of the CATIA model. We have provided an use case study of a sample assembly with components such as cube, pyramid, rectangular prism and triangular prism. The proposed cost-effective and robust framework for 3D measurement audit and model comparison is based on the input available from a digital camera and the semantic metadata knowledge available from geometry models which can be used for verification tasks.
BibTeX:
@article{vaiapury2012new,
  author = {Vaiapury, Karthikeyan and Aksay, Anil and Lin, Xinyu and Izquierdo, Ebroul and Papadopoulos, Christopher},
  title = {A new cost effective 3D measurement audit and model comparison system for verification tasks},
  journal = {Multidimensional Systems and Signal Processing},
  publisher = {Springer},
  year = {2013},
  volume = {24},
  number = {2},
  pages = {331--377},
  url = {http://www.springerlink.com/content/3715636422919275},
  doi = {10.1007/s11045-012-0200-9}
}
Wall J and Glackin C (2013), "Spiking Neural Network Connectivity and its Potential for Temporal Sensory Processing and Variable Binding", Frontiers in Computational Neuroscience. December, 2013. Vol. 7(182), pp. 1-2. Frontiers.
BibTeX:
@article{wall2013spiking,
  author = {Wall, Julie and Glackin, Cornelius},
  title = {Spiking Neural Network Connectivity and its Potential for Temporal Sensory Processing and Variable Binding},
  journal = {Frontiers in Computational Neuroscience},
  publisher = {Frontiers},
  year = {2013},
  volume = {7},
  number = {182},
  pages = {1--2},
  url = {http://journal.frontiersin.org/article/10.3389/fncom.2013.00182/full},
  doi = {10.3389/fncom.2013.00182}
}
Zhang Q and Izquierdo E (2013), "Multifeature Analysis and Semantic Context Learning for Image Classification", ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). May, 2013. Vol. 9(2), pp. 12:1-12:20. ACM.
Abstract: This article introduces an image classification approach in which the semantic context of images and multiple low-level visual features are jointly exploited. The context consists of a set of semantic terms defining the classes to be associated to unclassified images. Initially, a multiobjective optimization technique is used to define a multifeature fusion model for each semantic class. Then, a Bayesian learning procedure is applied to derive a context model representing relationships among semantic classes. Finally, this context model is used to infer object classes within images. Selected results from a comprehensive experimental evaluation are reported to show the effectiveness of the proposed approaches.
BibTeX:
@article{Zhang2013,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Multifeature Analysis and Semantic Context Learning for Image Classification},
  journal = {ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  publisher = {ACM},
  year = {2013},
  volume = {9},
  number = {2},
  pages = {12:1--12:20},
  url = {http://dl.acm.org/citation.cfm?id=2457454},
  doi = {10.1145/2457450.2457454}
}
Zhang Q and Izquierdo E (2013), "Multifeature Analysis and Semantic Context Learning for Image Classification", ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). May, 2013. Vol. 9(2), pp. 12:1-12:20. ACM.
Abstract: This article introduces an image classification approach in which the semantic context of images and multiple low-level visual features are jointly exploited. The context consists of a set of semantic terms defining the classes to be associated to unclassified images. Initially, a multiobjective optimization technique is used to define a multifeature fusion model for each semantic class. Then, a Bayesian learning procedure is applied to derive a context model representing relationships among semantic classes. Finally, this context model is used to infer object classes within images. Selected results from a comprehensive experimental evaluation are reported to show the effectiveness of the proposed approaches.
BibTeX:
@article{zhang2013multifeature,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Multifeature Analysis and Semantic Context Learning for Image Classification},
  journal = {ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  publisher = {ACM},
  year = {2013},
  volume = {9},
  number = {2},
  pages = {12:1--12:20},
  url = {http://dl.acm.org/citation.cfm?id=2457454},
  doi = {10.1145/2457450.2457454}
}

Books and Chapters in Books

Fernandez Arguedas V, Zhang Q, Chandramouli K and Izquierdo E (2013), "Vision Based Semantic Analysis of Surveillance Videos", In Semantic Hyper/Multimedia Adaptation. Vol. 418, pp. 83-125. Springer.
Abstract: As recent research in automatic surveillance systems has attracted many cross-domain researchers, a large-number of algorithms have been proposed for automating surveillance systems. The objective of this chapter is twofold: First, we present an extensive survey of different techniques that have been proposed for surveillance systems categorised into motion analysis, visual feature extraction and indexing. Second, an integrated surveillance framework for unsupervised object indexing is developed to study and evaluate the performance of visual features. The study focuses on two characteristics highly related with human visual perception, colour and texture. The set of visual features under analysis comprises two categories, new leading visual features versus state-of-the-art MPEG-7 visual features. The evaluation of the framework is carried out with AVSS 2007 and CamVid 2008 datasets.
BibTeX:
@incollection{arguedas2013vision,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Chandramouli, Krishna and Izquierdo, Ebroul},
  editor = {Anagnostopoulos, Ioannis E. and Bieliková, Mária and Mylonas, Phivos and Tsapatsoulis, Nicolas},
  title = {Vision Based Semantic Analysis of Surveillance Videos},
  booktitle = {Semantic Hyper/Multimedia Adaptation},
  publisher = {Springer},
  year = {2013},
  volume = {418},
  pages = {83--125},
  note = {fix google scholar entry: publication type},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-28977-4_3},
  doi = {10.1007/978-3-642-28977-4_3}
}
Fernandez Arguedas V, Zhang Q, Chandramouli K and Izquierdo E (2013), "Vision Based Semantic Analysis of Surveillance Videos", In Semantic Hyper/Multimedia Adaptation. Vol. 418, pp. 83-125. Springer.
Abstract: As recent research in automatic surveillance systems has attracted many cross-domain researchers, a large-number of algorithms have been proposed for automating surveillance systems. The objective of this chapter is twofold: First, we present an extensive survey of different techniques that have been proposed for surveillance systems categorised into motion analysis, visual feature extraction and indexing. Second, an integrated surveillance framework for unsupervised object indexing is developed to study and evaluate the performance of visual features. The study focuses on two characteristics highly related with human visual perception, colour and texture. The set of visual features under analysis comprises two categories, new leading visual features versus state-of-the-art MPEG-7 visual features. The evaluation of the framework is carried out with AVSS 2007 and CamVid 2008 datasets.
BibTeX:
@incollection{FernandezArguedas2013,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Chandramouli, Krishna and Izquierdo, Ebroul},
  editor = {Anagnostopoulos, Ioannis E. and Bieliková, Mária and Mylonas, Phivos and Tsapatsoulis, Nicolas},
  title = {Vision Based Semantic Analysis of Surveillance Videos},
  booktitle = {Semantic Hyper/Multimedia Adaptation},
  publisher = {Springer},
  year = {2013},
  volume = {418},
  pages = {83--125},
  note = {fix google scholar entry: publication type},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-28977-4_3},
  doi = {10.1007/978-3-642-28977-4_3}
}
Piatrik T, Zhang Q, Sevillano X and Izquierdo E (2013), "Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis", In Social Media Retrieval, pp. 143-167. Springer.
Abstract: Manually annotating large scale content such as Internet videos is an expensive and consuming process. Furthermore, community-provided tags lack consistency and present numerous irregularities. This chapter aims to provide a forum for the state-of-the-art research in this emerging field, with particular focus on mechanisms capable of exploiting the full range of information vailable online to predict user tags automatically. The exploited information covers both semantic metadata including complementary information in external resources and embedded low-level features within the multimedia content. Furthermore, this chapter presents a framework for predicting general tags from the associated textual metadata and visual features. The goal of this framework is to simplify and improve the process of tagging online videos, which are unbounded to any particular domain. In this framework, the first step is to extract named entities exploiting complementary textual resources such as Wikipedia and WordNet. To facilitate the extraction of semantically meaningful tags from a largely unstructured textual corpus, this framework employs GATE natural language processing tools. Extending the functionalities of the built-in GATE named entities, the framework also integrates a bag-of-articles algorithm for effectively extracting relevant articles from the Wikipedia articles. Experiments were conducted for validation of the framework against MediaEval 2010 Wild Wild Web dataset for the tagging task.
BibTeX:
@incollection{Piatrik2013,
  author = {Piatrik, Tomas and Zhang, Qianni and Sevillano, Xavier and Izquierdo, Ebroul},
  editor = {Ramzan, Naeem and van Zwol, Roelof and Lee, Jong-Seok and Clüver, Kai and Hua, Xian-Sheng},
  title = {Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis},
  booktitle = {Social Media Retrieval},
  publisher = {Springer},
  year = {2013},
  pages = {143--167},
  url = {http://link.springer.com/chapter/10.1007%2F978-1-4471-4555-4_7},
  doi = {10.1007/978-1-4471-4555-4_7}
}
Piatrik T, Zhang Q, Sevillano X and Izquierdo E (2013), "Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis", In Social Media Retrieval, pp. 143-167. Springer.
Abstract: Manually annotating large scale content such as Internet videos is an expensive and consuming process. Furthermore, community-provided tags lack consistency and present numerous irregularities. This chapter aims to provide a forum for the state-of-the-art research in this emerging field, with particular focus on mechanisms capable of exploiting the full range of information vailable online to predict user tags automatically. The exploited information covers both semantic metadata including complementary information in external resources and embedded low-level features within the multimedia content. Furthermore, this chapter presents a framework for predicting general tags from the associated textual metadata and visual features. The goal of this framework is to simplify and improve the process of tagging online videos, which are unbounded to any particular domain. In this framework, the first step is to extract named entities exploiting complementary textual resources such as Wikipedia and WordNet. To facilitate the extraction of semantically meaningful tags from a largely unstructured textual corpus, this framework employs GATE natural language processing tools. Extending the functionalities of the built-in GATE named entities, the framework also integrates a bag-of-articles algorithm for effectively extracting relevant articles from the Wikipedia articles. Experiments were conducted for validation of the framework against MediaEval 2010 Wild Wild Web dataset for the tagging task.
BibTeX:
@incollection{piatrik2013predicting,
  author = {Piatrik, Tomas and Zhang, Qianni and Sevillano, Xavier and Izquierdo, Ebroul},
  editor = {Ramzan, Naeem and van Zwol, Roelof and Lee, Jong-Seok and Clüver, Kai and Hua, Xian-Sheng},
  title = {Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis},
  booktitle = {Social Media Retrieval},
  publisher = {Springer},
  year = {2013},
  pages = {143--167},
  url = {http://link.springer.com/chapter/10.1007%2F978-1-4471-4555-4_7},
  doi = {10.1007/978-1-4471-4555-4_7}
}
Piatrik T, Zhang Q, Sevillano X and Izquierdo E (2013), "Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis", In Social Media Retrieval, pp. 143-167. Springer.
Abstract: Manually annotating large scale content such as Internet videos is an expensive and consuming process. Furthermore, community-provided tags lack consistency and present numerous irregularities. This chapter aims to provide a forum for the state-of-the-art research in this emerging field, with particular focus on mechanisms capable of exploiting the full range of information vailable online to predict user tags automatically. The exploited information covers both semantic metadata including complementary information in external resources and embedded low-level features within the multimedia content. Furthermore, this chapter presents a framework for predicting general tags from the associated textual metadata and visual features. The goal of this framework is to simplify and improve the process of tagging online videos, which are unbounded to any particular domain. In this framework, the first step is to extract named entities exploiting complementary textual resources such as Wikipedia and WordNet. To facilitate the extraction of semantically meaningful tags from a largely unstructured textual corpus, this framework employs GATE natural language processing tools. Extending the functionalities of the built-in GATE named entities, the framework also integrates a bag-of-articles algorithm for effectively extracting relevant articles from the Wikipedia articles. Experiments were conducted for validation of the framework against MediaEval 2010 Wild Wild Web dataset for the tagging task.
BibTeX:
@incollection{piatrikpredicting,
  author = {Piatrik,Tomas and Zhang, Qianni and Sevillano, Xavier and Izquierdo, Ebroul},
  editor = {Ramzan, Naeem and van Zwol, Roelof and Lee, Jong-Seok and Clüver, Kai and Hua, Xian-Sheng},
  title = {Predicting User Tags in Social Media Repositories Using Semantic Expansion and Visual Analysis},
  booktitle = {Social Media Retrieval},
  publisher = {Springer},
  year = {2013},
  pages = {143--167},
  url = {http://books.google.co.uk/books?id=p81-pnyjODAC},
  doi = {10.1007/978-1-4471-4555-4_7}
}
Zhang Q, O'Connor NE and Izquierdo E (2013), "3DLife - Bringing the Media Internet to Life", In The Future Internet. Vol. 7858, pp. 339-341. Springer.
Abstract: The 3DLife EU FP7 Network of Excellence focuses on stimulating joint research and integrating leading European research groups to create a long-term integration of critical mass for innovation of currently fragmented research addressing media Internet. It fosters the creation of sustainable and long-term relationships between existing national research groups and lay the foundations for a Virtual Centre of Excellence in 3D media Internet - EMC$^2$. This is a summary of 3DLife�s missions as well as its achievements in the last three years.
BibTeX:
@incollection{zhang20133dlife,
  author = {Zhang, Qianni and O'Connor, Noel E. and Izquierdo, Ebroul},
  editor = {Galis, Alex and Gavras, Anastasius},
  title = {3DLife - Bringing the Media Internet to Life},
  booktitle = {The Future Internet},
  publisher = {Springer},
  year = {2013},
  volume = {7858},
  pages = {339--341},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-642-38082-2_27},
  doi = {10.1007/978-3-642-38082-2_27}
}

Conference Papers

Badii A, Einig M and Piatrik T (2013), "Overview of the MediaEval 2013 Visual Privacy Task", In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop. Barcelona, Catalonia, October, 2013. Vol. 1043 CEUR-WS.org.
Abstract: This paper describes the Visual Privacy Task (VPT) 2013, its scope and objectives, related dataset and evaluation approach.
BibTeX:
@inproceedings{badii2013overview,
  author = {Badii, Atta and Einig, Mathieu and Piatrik, Tomas},
  editor = { Martha A. Larson and
Xavier Anguera and
Timo Reuter and
Gareth J. F. Jones and
Bogdan Ionescu and
Markus Schedl and
Tomas Piatrik and
Claudia Hauff and
Mohammad Soleymani }, title = {Overview of the MediaEval 2013 Visual Privacy Task}, booktitle = {Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop}, publisher = {CEUR-WS.org}, year = {2013}, volume = {1043}, note = {fix google scholar description.}, url = {http://ceur-ws.org/Vol-1043} }
Blasi SG, Peixoto E and Izquierdo E (2013), "Enhanced inter-prediction using Merge Prediction Transformation in the HEVC codec", In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. Vancouver, British Columbia, May, 2013, pp. 1709-1713. IEEE.
Abstract: Merge prediction is a novel technique introduced in the HEVC standard to improve inter-prediction exploiting redundancy of themotion information. We propose in this paper a new approach to enhance the Merge mode in a typical HEVC encoder using parametric transformations of the Merge prediction candidates. An Enhanced Inter-Prediction module is implemented in HEVC using Merge Prediction Transformation (MPT), integrated with the HEVC new features such as the large coding units (CU) and the recursive prediction unit partitioning. The MPT parameters are quantised according to the CU depth and the current QP. The optimal quantization steps are derived via statistical analysis as illustrated in the paper. Results show consistent improvements over conventional HEVC encoding in terms of rate-distortion performance, with a small impact on the encoding complexity and negligible impact on the decoding complexity.
BibTeX:
@inproceedings{blasi2013enhanced,
  author = {Blasi, Saverio G. and Peixoto, Eduardo and Izquierdo, Ebroul},
  title = {Enhanced inter-prediction using Merge Prediction Transformation in the HEVC codec},
  booktitle = {Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on},
  publisher = {IEEE},
  year = {2013},
  pages = {1709--1713},
  note = {google scholar entry: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, British Columbia, 26-31 May 2013.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6637944},
  doi = {10.1109/ICASSP.2013.6637944}
}
Blasi SG, Peixoto E and Izquierdo E (2013), "Mode decision with enhanced inter-prediction in HEVC", In Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013). Melbourne, Australia, September, 2013, pp. 1962-1966. IEEE.
Abstract: The HEVC standard makes use of inter-prediction to exploit temporal redundancy in order to obtain efficient compression. A novel approach is proposed in this paper where the inter-prediction for a coding unit (CU) is further enhanced by means of simple parametric transformations. The Enhanced Inter-Prediction (EIP) module is embedded within the mode-decision module at a CU level, to obtain a more accurate inter-prediction and hence possibly reducing the residuals before transform and quantisation. The EIP parameters are encoded in the CU header, and the exact rate-distortion (RD) cost after reconstruction is computed to make sure that the EIP is only used when it is effective. The approach is also improved by means of efficient quantisation of the EIP parameters. The optimal quantisation steps for each CU are found following from analytical considerations as illustrated in the paper. Results show consistent improvements over conventional HEVC encoding in terms of PSNR and bitrate performance.
BibTeX:
@inproceedings{blasi2013mode,
  author = {Blasi, Saverio G. and Peixoto, Eduardo and Izquierdo, Ebroul},
  title = {Mode decision with enhanced inter-prediction in HEVC},
  booktitle = {Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013)},
  publisher = {IEEE},
  year = {2013},
  pages = {1962--1966},
  note = {google scholar entry: 20th International Conference on Image Processing (ICIP 2013). Melbourne, Australia, 15-18 September 2013.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6738404},
  doi = {10.1109/ICIP.2013.6738404}
}
Bozas K and Izquierdo E (2013), "Discriminant Pairwise Local Embeddings", In Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on. San Jose, California, July, 2013, pp. 1-4. IEEE.
Abstract: This paper introduces Discriminant Pairwise Local Embeddings (DPLE) a supervised dimensionality reduction technique that generates structure preserving discriminant subspaces. This objective is achieved through a convex optimization formulation where Euclidean distances between data pairs that belong to the same class are minimized, while those of pairs belonging to different classes are maximized. These pairwise relations are encoded in two matrices and weighted with the data affinity matrix to ensure local structure preservation. The discriminant efficiency of our technique is demonstrated in two popular applications, face and sketch recognition, where DPLE outperforms competitive manifold learning algorithms. A kernelized version of DPLE, that further enhances recognition accuracy, is also explained.
BibTeX:
@inproceedings{bozas2013discriminant,
  author = {Bozas, Konstantinos and Izquierdo, Ebroul},
  title = {Discriminant Pairwise Local Embeddings},
  booktitle = {Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--4},
  note = {google scholar entry: 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW 2013). San Jose, California, 15-19 July 2013.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6618312},
  doi = {10.1109/ICMEW.2013.6618312}
}
Brenner M and Izquierdo E (2013), "Event-driven retrieval in collaborative photo collections", In Image Analysis for Multimedia Interactive Services (WIAMIS 2013), 14th International Workshop on. Paris, France, July, 2013, pp. 1-4. IEEE.
BibTeX:
@inproceedings{brenner2013event,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  title = {Event-driven retrieval in collaborative photo collections},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2013), 14th International Workshop on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--4},
  note = {google scholar entry: 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2013). Paris, France, 3-5 July 2013.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6616121},
  doi = {10.1109/WIAMIS.2013.6616121}
}
Brenner M and Izquierdo E (2013), "Gender-aided people recognition in photo collections", In 2013 IEEE International Workshop on Multimedia Signal Processing (MMSP 2013). Pula (Sardinia), Italy, 30 September 30 - 2 October, 2013. Pula (Sardinia), Italy, September, 2013, pp. 35-39.
Abstract: We show how to recognize people based on their faces in Consumer Photo Collections while also incorporating context in the form of gender information. We devise and explore a unified framework that has a graphical model along a distance-based face description method at its core. We jointly recognize people across an entire photo collection to also consider the specifics of photos that depict multiple people. Experiments on two datasets demonstrate and validate the effectiveness of our probabilistic approach compared to traditional methods that do not consider gender information.
BibTeX:
@inproceedings{brenner2013gender,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  title = {Gender-aided people recognition in photo collections},
  booktitle = {2013 IEEE International Workshop on Multimedia Signal Processing (MMSP 2013). Pula (Sardinia), Italy, 30 September 30 - 2 October, 2013.},
  year = {2013},
  pages = {35--39},
  note = {google scholar entry: google scholar entry: 2013 IEEE Internatonal Workshop on Multimedia Signal Processing (MMSP 2013). PUla (Sardinia), Italy, 30 September - 2 October 2007.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6659260},
  doi = {10.1109/MMSP.2013.6659260}
}
Brenner M and Izquierdo E (2013), "MediaEval 2013: Social Event Detection, Retrieval and Classification in Collaborative Photo Collections", In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop. Barcelona, Catalonia Vol. 1043, pp. 1-2.
Abstract: We present a framework to detect social events, retrieve associated photos and classify the photos according to event types in collaborative photo collections as part of the MediaEval 2013 benchmarks. We incorporate various contextual cues using both a constraint-based clustering model and a classification model. Experiments based on the MediaEval Social Event Detection Dataset demonstrate the effectiveness of our approach.
BibTeX:
@inproceedings{brenner2013mediaeval,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  editor = { Martha A. Larson and
Xavier Anguera and
Timo Reuter and
Gareth J. F. Jones and
Bogdan Ionescu and
Markus Schedl and
Tomas Piatrik and
Claudia Hauff and
Mohammad Soleymani }, title = {MediaEval 2013: Social Event Detection, Retrieval and Classification in Collaborative Photo Collections}, booktitle = {Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop}, year = {2013}, volume = {1043}, pages = {1--2}, note = {Google scholar entry: MediaEval 2013 Multimedia Benchmark Workshop. Barcelona, Catalonia, 18-19 October 2013}, url = {http://ceur-ws.org/Vol-927/} }
Brenner M and Izquierdo E (2013), "Mining People's Appearances to Improve Recognition in Photo Collections", In Advances in Multimedia Modeling (MMM 2013), 19th International Conference. Huangshan, China, January 7-9, 2013. Proceedings. Huangshan, China, January, 2013. Vol. 7732, pp. 185-195. Springer.
Abstract: We show how to recognize people in Consumer Photo Collections by employing a graphical model together with a distance-based face description method. To further improve recognition performance, we incorporate context in the form of social semantics. We devise an approach that has a data mining technique at its core to discover and incorporate patterns of groups of people frequently appearing together in photos. We demonstrate the effect of our probabilistic approach through experiments on a dataset that spans nearly ten years.
BibTeX:
@inproceedings{brenner2013mining,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  editor = {Li, Shipeng and Saddik, Abdulmotaleb and Wang, Meng and Mei, Tao and Sebe, Nicu and Yan, Shuicheng and Hong, Richang and Gurrin, Cathal},
  title = {Mining People's Appearances to Improve Recognition in Photo Collections},
  booktitle = {Advances in Multimedia Modeling (MMM 2013), 19th International Conference. Huangshan, China, January 7-9, 2013. Proceedings.},
  publisher = {Springer},
  year = {2013},
  volume = {7732},
  pages = {185--195},
  note = {google scholar entry: 19th International Conference on Multimedia Modeling (MMM 2013). Huangshan, China, 7-9 January 2013.},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-35725-1_17},
  doi = {10.1007/978-3-642-35725-1_17}
}
Brenner M and Izquierdo E (2013), "People Recognition in Ambiguously Labeled Photo Collections", In Multimedia and Expo (ICME), Proceedings of the 2013 IEEE International Conference on. Paris, France, July, 2013, pp. 1-6.
Abstract: We show how to recognize people based on their faces in Consumer Photo Collections while also incorporating context in the form of ambiguous labels. Such labels can be assigned to single photos (depicting multiple people) as well as to entire sets of photos (e.g. relating to events). To achieve this, we devise a unified framework that has a graphical model along a distance-based face description method at its core. We evaluate our probabilistic approach by performing experiments on two datasets, one of which includes around 5000 face appearances spanning nearly ten years.
BibTeX:
@inproceedings{brenner2013people,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  title = {People Recognition in Ambiguously Labeled Photo Collections},
  booktitle = {Multimedia and Expo (ICME), Proceedings of the 2013 IEEE International Conference on},
  year = {2013},
  pages = {1--6},
  note = {google scholar entry: 2013 IEEE International Conference on Multimedia and Expo (ICME 2013). San Jose, California, 15-19 July 2013.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6607603},
  doi = {10.1109/ICME.2013.6607603}
}
Brenner M and Izquierdo E (2013), "Recognizing People by Face and Body in Photo Collections", In Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG 2013). Shanghai, China, April, 2013, pp. 1-7. IEEE.
Abstract: We show how to detect and recognize people based on their faces and bodies in Consumer Photo Collections. We devise a graphical model that incorporates multiple contextual cues to discriminate faces, upper and lower bodies, and ultimately, individuals without relying on faces. For efficiency, we only consider body features when faces are not discriminative enough. Experiments on two datasets demonstrate the effectiveness of our probabilistic approach.
BibTeX:
@inproceedings{brenner2013recognizing,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  title = {Recognizing People by Face and Body in Photo Collections},
  booktitle = {Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG 2013)},
  publisher = {IEEE},
  year = {2013},
  pages = {1--7},
  note = {google scholar entry: 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG 2013). Shanghai, China, 22-26 April 2013.},
  url = {http://www.computer.org/csdl/proceedings/fg/2013/5545/00/06553751.pdf},
  doi = {10.1109/FG.2013.6553751}
}
Chandramouli K, Fernandez Arguedas V and Izquierdo E (2013), "Knowledge Modeling for Privacy-by-Design in Smart Surveillance Solution", In Advanced Video and Signal Based Surveillance (AVSS 2013), 10th IEEE International Conference on. Kraków, Poland, August, 2013, pp. 171-176. IEEE.
Abstract: As new information and communications systems are being equipped with more aggressive capabilities to enable smart surveillance, individuals' private and ethical data is more exposed to potential threats. Consequently, the attention of researchers and policy makers has become increasingly focused on controlling the emerging threats to privacy. In order to ensure that a surveillance system framework complies with the legal, ethical and privacy requirements of the law, in this paper we present a Surveillance Ontology extending the SKOS foundational ontology. The fundamental principles of privacy-by-design (PbD) demand that the surveillance framework consider data minimization, user control, accountability and data separation. Hence, the objective of this ontology is to translate the high-level linguistic rules into the information that can be processed and used to assess the compliance of the video analysis module with the rules defined.
BibTeX:
@inproceedings{chandramouli2013knowledge,
  author = {Chandramouli, Krishna and Fernandez Arguedas, Virginia and Izquierdo, Ebroul},
  title = {Knowledge Modeling for Privacy-by-Design in Smart Surveillance Solution},
  booktitle = {Advanced Video and Signal Based Surveillance (AVSS 2013), 10th IEEE International Conference on},
  publisher = {IEEE},
  year = {2013},
  pages = {171--176},
  note = {google scholar entry: 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2013). Krak�w, Poland, 27-30 August 2013.},
  url = {http://www.computer.org/csdl/proceedings/avss/2013/9999/00/06636635.pdf},
  doi = {10.1109/AVSS.2013.6636635}
}
Fechteler P, Hilsmann A, Eisert P, Broeck SV, Stevens C, Wall J, Sanna M, Mauro DA, Kuijk F, Mekuria R, César P, Monaghan DS, O'Connor NE, Daras P, Alexiadis DS and Zahariadis TB (2013), "A Framework for Realistic 3D Tele-Immersion", In Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications. Berlin, Germany, June, 2013, pp. 12:1-12:8. ACM.
Abstract: Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite different from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experience of talking in person. Several causes for these differences have been identified and we propose inspiring and innovative solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational experience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic experiences to a multitude of users that for them will feel much more similar to having face to face meetings than the experience offered by conventional teleconferencing systems.
BibTeX:
@inproceedings{fechteler2013framework,
  author = {Philipp Fechteler and
Anna Hilsmann and
Peter Eisert and
Sigurd Van Broeck and
C. Stevens and
Julie Wall and
Michele Sanna and
Davide A. Mauro and
Fons Kuijk and
Rufael Mekuria and
Pablo César and
David S. Monaghan and
Noel E. O'Connor and
Petros Daras and
Dimitrios S. Alexiadis and
Theodore B. Zahariadis}, editor = {Eisert, Peter and Gagalowicz, André}, title = {A Framework for Realistic 3D Tele-Immersion}, booktitle = {Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications}, publisher = {ACM}, year = {2013}, pages = {12:1--12:8}, note = {google scholar entry: 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications (MIRAGE 2013). Berlin, Germany, 6-7 June 2013.}, url = {http://doras.dcu.ie/18171/1/mirage2013-reverie.pdf}, doi = {10.1145/2466715.2466718} }
Fechteler P, Hilsmann A, Eisert P, Broeck SV, Stevens C, Wall J, Sanna M, Mauro DA, Kuijk F, Mekuria R, Cesar P, Monaghan DS, O'Connor NE, Daras P, Alexiadis DS and Zahariadis TB (2013), "A Framework for Realistic 3D Tele-Immersion", In Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications. Berlin, Germany, June, 2013, pp. 12:1-12:8. ACM.
Abstract: Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite different from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experience of talking in person. Several causes for these differences have been identified and we propose inspiring and innovative solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational experience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic experiences to a multitude of users that for them will feel much more similar to having face to face meetings than the experience offered by conventional teleconferencing systems.
BibTeX:
@inproceedings{Fechteler2013,
  author = {Philipp Fechteler and
Anna Hilsmann and
Peter Eisert and
Sigurd Van Broeck and
C. Stevens and
Julie Wall and
Michele Sanna and
Davide A. Mauro and
Fons Kuijk and
Rufael Mekuria and
Pablo Cesar and
David S. Monaghan and
Noel E. O'Connor and
Petros Daras and
Dimitrios S. Alexiadis and
Theodore B. Zahariadis}, editor = {Eisert, Peter and Gagalowicz, André}, title = {A Framework for Realistic 3D Tele-Immersion}, booktitle = {Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications}, publisher = {ACM}, year = {2013}, pages = {12:1--12:8}, note = {google scholar entry: 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications (MIRAGE 2013). Berlin, Germany, 6-7 June 2013.}, url = {http://doi.acm.org/10.1145/2466715.2466718}, doi = {10.1145/2466715.2466718} }
Fernandez Arguedas V, Izquierdo E and Chandramouli K (2013), "Surveillance Ontology for Legal, Ethical and Privacy Protection based on SKOS", In Digital Signal Processing (DSP 2013), 18th International Conference on. Santorini, Greece, July, 2013, pp. 1-5. IEEE.
Abstract: In a world filled with heightened vandalism and terrorist activities, video surveillance forms an integral part of any incident investigation. While aiming to provide safety and security for the citizens, CCTV cameras are exponentially deployed around metropolitan cities. However, the information thus collected and further processed should comply with the legal and ethical rules defined by the law. The primary ethical issue invoked by surveillance activities in general is that of privacy. In order to ensure that a surveillance system framework complies with the legal, ethical and privacy requirements of the law, in this paper we present a Surveillance Ontology extending the SKOS foundational ontology. The objective of this ontology is to translate the high-level linguistic rules into the information that can be processed and used to assess the compliance of the video analysis module with the rules defined.
BibTeX:
@inproceedings{fernandez2013surveillance,
  author = {Fernandez Arguedas, Virginia and Izquierdo, Ebroul and Chandramouli, Krishna},
  title = {Surveillance Ontology for Legal, Ethical and Privacy Protection based on SKOS},
  booktitle = {Digital Signal Processing (DSP 2013), 18th International Conference on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--5},
  note = {google scholar entry: 18th International Conference on Digital Signal Processing (DSP 2013). Santorini, Greece, 1-3 July 2013.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6622811},
  doi = {10.1109/ICDSP.2013.6622811}
}
Klavdianos P, Mansouri A and Meriaudeau F (2013), "Gestalt-inspired features extraction for object category recognition", In Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013). Melbourne, Australia, September, 2013, pp. 4330-4334. IEEE.
Abstract: We propose a methodology inspired by Gestalt laws to extract and combine features and we test it on the object category recognition problem. Gestalt is a psycho-visual theory of Perceptual Organization that aims to explain how visual information is organized by our brain. We interpreted its laws of homogeneity and continuation in link with shape and color to devise new features beyond the classical proximity and similarity laws. The shape of the object is analyzed based on its skeleton (good continuation) and as a measure of homogeneity, we propose self-similarity enclosed within shape computed at super-pixel level. Furthermore, we propose a framework to combine these features in different ways and we test it on Caltech 101 database. The results are good and show that such an approach improves objectively the efficiency in the task of object category recognition.
BibTeX:
@inproceedings{Klavdianos2013,
  author = {Klavdianos, Patrycia and Mansouri, Alamin and Meriaudeau, Fabrice},
  title = {Gestalt-inspired features extraction for object category recognition},
  booktitle = {Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013)},
  publisher = {IEEE},
  year = {2013},
  pages = {4330--4334},
  note = {google scholar entry: 20th International Conference on Image Processing (ICIP 2013). Melbourne, Australia, 15-18 September 2013.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6738892},
  doi = {10.1109/ICIP.2013.6738892}
}
Klavdianos P, Mansouri A and Meriaudeau F (2013), "Gestalt-inspired features extraction for object category recognition", In Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013). Melbourne, Australia, September, 2013, pp. 4330-4334. IEEE.
Abstract: We propose a methodology inspired by Gestalt laws to extract and combine features and we test it on the object category recognition problem. Gestalt is a psycho-visual theory of Perceptual Organization that aims to explain how visual information is organized by our brain. We interpreted its laws of homogeneity and continuation in link with shape and color to devise new features beyond the classical proximity and similarity laws. The shape of the object is analyzed based on its skeleton (good continuation) and as a measure of homogeneity, we propose self-similarity enclosed within shape computed at super-pixel level. Furthermore, we propose a framework to combine these features in different ways and we test it on Caltech 101 database. The results are good and show that such an approach improves objectively the efficiency in the task of object category recognition.
BibTeX:
@inproceedings{klavdianos2013gestalt,
  author = {Klavdianos, Patrycia and Mansouri, Alamin and Meriaudeau, Fabrice},
  title = {Gestalt-inspired features extraction for object category recognition},
  booktitle = {Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013)},
  publisher = {IEEE},
  year = {2013},
  pages = {4330--4334},
  note = {google scholar entry: 20th International Conference on Image Processing (ICIP 2013). Melbourne, Australia, 15-18 September 2013.},
  url = {https://hal.archives-ouvertes.fr/hal-00839640/document},
  doi = {10.1109/ICIP.2013.6738892}
}
Klavdianos P, Zhang Q and Izquierdo E (2013), "A concise survey for 3D reconstruction of building façades", In Image Analysis for Multimedia Interactive Services (WIAMIS 2013), 14th International Workshop on. Paris, France, July, 2013, pp. 1-4. IEEE.
Abstract: 3D façade modeling consists of representing architectural elements of a building in detail so that both geometry and appearance are generated in a photo-realistic 3D scene. In this article, we review four main techniques used to solve this problem: photogrammetry, sparse and dense reconstruction based on SfM (Structure from Motion) and procedural modeling. We provide a comparison of several methods in these categories by considering their main advantages and limitations.
BibTeX:
@inproceedings{Klavdianos2013a,
  author = {Klavdianos, Patrycia and Zhang, Qianni and Izquierdo, Ebroul},
  title = {A concise survey for 3D reconstruction of building façades},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2013), 14th International Workshop on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--4},
  note = {google scholar entry: 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2013). Paris, France, 3-5 July 2013.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6616170},
  doi = {10.1109/WIAMIS.2013.6616170}
}
Klavdianos P, Zhang Q and Izquierdo E (2013), "A concise survey for 3D reconstruction of building façades", In Image Analysis for Multimedia Interactive Services (WIAMIS 2013), 14th International Workshop on. Paris, France, July, 2013, pp. 1-4. IEEE.
Abstract: 3D façade modeling consists of representing architectural elements of a building in detail so that both geometry and appearance are generated in a photo-realistic 3D scene. In this article, we review four main techniques used to solve this problem: photogrammetry, sparse and dense reconstruction based on SfM (Structure from Motion) and procedural modeling. We provide a comparison of several methods in these categories by considering their main advantages and limitations.
BibTeX:
@inproceedings{klavdianos2013concise,
  author = {Klavdianos, Patrycia and Zhang, Qianni and Izquierdo, Ebroul},
  title = {A concise survey for 3D reconstruction of building façades},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2013), 14th International Workshop on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--4},
  note = {google scholar entry: 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2013). Paris, France, 3-5 July 2013.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6616170},
  doi = {10.1109/WIAMIS.2013.6616170}
}
Kuijk F, Van Broeck S, Dareau C, Ravenet B, Ochs M, Apostolakis K, Daras P, Monaghan D, O'Connor NE, Wall J and Izquierdo E (2013), "A Framework for Human-like Behavior in an Immersive Virtual World", In Digital Signal Processing (DSP), 2013 18th International Conference on. Santorini, Greece, July, 2013, pp. 1-7. IEEE.
Abstract: Just as readers feel immersed when the story line adheres to their experiences, users will more easily feel immersed in a virtual environment if the behavior of the characters in that environment adheres to their expectations, based on their lifelong observations in the real world. This paper introduces a framework that allows authors to establish natural, human-like behavior, physical interaction and emotional engagement of characters living in a virtual environment. Represented by realistic virtual characters, this framework allows people to feel immersed in an Internet based virtual world in which they can meet and share experiences in a natural way as they can meet and share experiences in real life. Rather than just being visualized in a 3D space, the virtual characters (autonomous agents as well as avatars representing users) in the immersive environment facilitate social interaction and multi-party collaboration, mixing virtual with real.
BibTeX:
@inproceedings{kuijk2013framework,
  author = {Kuijk, Fons and Van Broeck, Sigurd and Dareau, Claude and Ravenet, Brian and Ochs, Magalie and Apostolakis, Konstantinos and Daras, Petros and Monaghan, David and O'Connor, Noel E and Wall, Julie and and Izquierdo, Ebroul},
  title = {A Framework for Human-like Behavior in an Immersive Virtual World},
  booktitle = {Digital Signal Processing (DSP), 2013 18th International Conference on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--7},
  note = {google scholar entry: 18th International Conference on Digital Signal Processing (DSP 2013). Santorini, Greece, 1-3 July 2013.},
  url = {http://doras.dcu.ie/18089/1/DSP2013_camera.pdf},
  doi = {10.1109/ICDSP.2013.6622826}
}
Liu Y, Lin X, Zhang Q and Izquierdo E (2013), "Improved indoor scene geometry recognition from single image based on depth map", In Proceedings of the 11th IEEE IVMSP Workshop (IVMSP 2013). Seol, Korea, June, 2013. (29), pp. 1-4. IEEE.
Abstract: Interpreting 3D structure from 2D images is a constant problem to be solved in the field of computer vision. Prior work has been made to tackle this issue mainly in two different ways - depth estimation from multiple-view images based on geometric triangulation and depth reasoning from single image depending on monocular depth cues. Both solutions do not involve direct depth map information. In this work, we captured a RGBD dataset using Microsoft Kinect depth sensor. Approximate depth information is acquired as the fourth channel and employed as an extra reference for 3D scene geometry reasoning. It helps to achieve better estimation accuracy. We define nine basic geometric models for general indoor restricted-view scenes. Then we extract low/medium level colour and depth features from all four of the RGBD channels. Sequential Minimal Optimization SVM is used in this work as efficient classification tool. Experiments are implemented to compare the result of this approach with previous work that does not have the depth channel as input.
BibTeX:
@inproceedings{liu2013improved,
  author = {Liu, Yixian and Lin, Xinyu and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Improved indoor scene geometry recognition from single image based on depth map},
  booktitle = {Proceedings of the 11th IEEE IVMSP Workshop (IVMSP 2013)},
  publisher = {IEEE},
  year = {2013},
  number = {29},
  pages = {1--4},
  note = {google scholar entry: 11th IEEE IVMSP Workshop (IVMSP 2013). Seoul, Korea, 10-12 June 2014.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6611938},
  doi = {10.1109/IVMSPW.2013.6611938}
}
Maffon H, Brasil L, Melo J, Klavdianos P, Curilem G and Lamas J (2013), "Modeling the Architecture of an Intelligent Tutoring System Applied to Breast Cancer", In 2013 Pan American Health Care Exchanges (PAHCE 2013). Medellin, Columbia, April, 2013. (267), pp. 1-6. IEEE.
Abstract: This paper presents an Intelligent Tutoring System (ITS) applied to the teaching of anatomy of the female breast, including some types of cancer related to this organ. This ITS is composed of four modules: Student, an Expert System containing a questionnaire for the diagnosis of the learner's profile; Tutor, an Artificial Neural Network Interactive Activation and Competition, for the application of teaching techniques; Domain, some ontologies containing content and related media; and Interface, developed as an Adaptive Hypermedia System. The objective of this work is to lift requirements for the integration of various modeling types that use Artificial Intelligence techniques in the same ITS, even enabling the use of this system in a Medical Simulation Environment. The validation process of this ITS is in progress because the class period has not yet started at three universities: the University of Brasilia (Federal) and the Catholic University of Brasi�lia (Private), both located in Brasilia, Brazil, and the Universidad de La Frontera (Federal) located in Temuco, Chile.
BibTeX:
@inproceedings{maffon2013modeling,
  author = {Maffon, H.P. and Brasil, L.M. and Melo, J.S. and Klavdianos, P.B.L. and Curilem, G.M.J.S. and Lamas, J.M.},
  title = {Modeling the Architecture of an Intelligent Tutoring System Applied to Breast Cancer},
  booktitle = {2013 Pan American Health Care Exchanges (PAHCE 2013)},
  publisher = {IEEE},
  year = {2013},
  number = {267},
  pages = {1--6},
  note = {google scholar entry: 2013 Pan American Health Care Exchanges (PAHCE 2013). Medellin, Columbia, 29 April - 4 May 2013.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6568295},
  doi = {10.1109/PAHCE.2013.6568295}
}
Markatopoulou Foteiniand Moumtzidou A, Tzelepis Christosand Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V and Kompatsiaris I (2013), "ITI-CERTH participation to TRECVID 2013", In TRECVID 2013 workshop participants notebook papers. Gaithersburg, Maryland, November, 2013. National Institute of Standards and Technology (NIST).
Abstract: This paper provides an overview of the tasks submitted to TRECVID 2013 by ITI-CERTH. ITICERTH participated in the Semantic Indexing (SIN), the Event Detection in Internet Multimedia (MED), the Multimedia Event Recounting (MER) and the Instance Search (INS) tasks. In the SIN task, techniques are developed, which combine new video representations (video tomographs) with existing well-performing descriptors such as SIFT, Bag-of-Words for shot representation, ensemble construction techniques and a multi-label learning method for score refinement. In the MED task, an efficient method that uses only static visual features as well as limited audio information is evaluated. In the MER sub-task of MED a discriminant analysis-based feature selection method is combined with a model vector approach for selecting the key semantic entities depicted in the video that best describe the detected event. Finally, the INS task is performed by employing VERGE, which is an interactive retrieval application combining retrieval functionalities in various modalities, used previously for supporting the Known Item Search (KIS) task.
BibTeX:
@inproceedings{Markatopoulou2013,
  author = { Markatopoulou, Foteiniand 
Moumtzidou, Anastasia and
Tzelepis, Christosand
Avgerinakis, Kostas and
Gkalelis, Nikolaos and
Vrochidis, Stefanos and
Mezaris, Vasileios and
Kompatsiaris, Ioannis
}, title = {ITI-CERTH participation to TRECVID 2013}, booktitle = {TRECVID 2013 workshop participants notebook papers}, publisher = {National Institute of Standards and Technology (NIST)}, year = {2013}, note = {google scholar entry: 2013 TRECVID Workshop (TRECVID 2013). Gaithersburg, Maryland, 20-22 November 2014.}, url = {http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.13.org.html} }
Markatopoulou Foteiniand Moumtzidou A, Tzelepis Christosand Avgerinakis K, Gkalelis N, Vrochidis S, Mezaris V and Kompatsiaris I (2013), "ITI-CERTH participation to TRECVID 2013", In TRECVID 2013 workshop participants notebook papers. Gaithersburg, Maryland, November, 2013. National Institute of Standards and Technology (NIST).
Abstract: This paper provides an overview of the tasks submitted to TRECVID 2013 by ITI-CERTH. ITICERTH participated in the Semantic Indexing (SIN), the Event Detection in Internet Multimedia (MED), the Multimedia Event Recounting (MER) and the Instance Search (INS) tasks. In the SIN task, techniques are developed, which combine new video representations (video tomographs) with existing well-performing descriptors such as SIFT, Bag-of-Words for shot representation, ensemble construction techniques and a multi-label learning method for score refinement. In the MED task, an efficient method that uses only static visual features as well as limited audio information is evaluated. In the MER sub-task of MED a discriminant analysis-based feature selection method is combined with a model vector approach for selecting the key semantic entities depicted in the video that best describe the detected event. Finally, the INS task is performed by employing VERGE, which is an interactive retrieval application combining retrieval functionalities in various modalities, used previously for supporting the Known Item Search (KIS) task.
BibTeX:
@inproceedings{markatopoulou2013iti,
  author = { Markatopoulou, Foteiniand 
Moumtzidou, Anastasia and
Tzelepis, Christosand
Avgerinakis, Kostas and
Gkalelis, Nikolaos and
Vrochidis, Stefanos and
Mezaris, Vasileios and
Kompatsiaris, Ioannis
}, title = {ITI-CERTH participation to TRECVID 2013}, booktitle = {TRECVID 2013 workshop participants notebook papers}, publisher = {National Institute of Standards and Technology (NIST)}, year = {2013}, note = {google scholar entry: 2013 TRECVID Workshop (TRECVID 2013). Gaithersburg, Maryland, 20-22 November 2014.}, url = {http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.13.org.html} }
Mauro DA, O'Connor NE, Monaghan D, Gowing M, Fechteler P, Eisert P, Wall J, Izquierdo E, Alexiadis DS, Daras P, Mekuria R and Cesar P (2013), "Advancements and challenges towards a collaborative framework for 3D tele-immersive social networking", In 4th IEEE International Workshop on Hot Topics in 3D (Hot3D). San Jose, California, July, 2013, pp. 1-2. IEEE.
Abstract: Social experiences realized through teleconferencing systems are still quite different from face to face meetings. The awareness that we are online and in a, to some extent, lesser real world are preventing us from really engaging and enjoying the event. Several reasons account for these differences and have been identified. We think it is now time to bridge these gaps and propose inspiring and innovative solutions in order to provide realistic, believable and engaging online experiences. We present a distributed and scalable framework named REVERIE that faces these challenges and provides a mix of these solutions. Applications built on top of the framework will be able to provide interactive, truly immersive, photo-realistic experiences to a multitude of users that for them will feel much more similar to having face to face meetings than the experience offered by conventional teleconferencing systems.
BibTeX:
@inproceedings{Mauro2013,
  author = {Mauro, Davide A. and O'Connor, Noel E. and Monaghan, David and Gowing, Marc and Fechteler, Philipp and Eisert, Peter and Wall, Julie and Izquierdo, Ebroul and Alexiadis, Dimitrios S. and Daras, Petros and Mekuria, Rufael and Cesar, Pablo},
  title = {Advancements and challenges towards a collaborative framework for 3D tele-immersive social networking},
  booktitle = {4th IEEE International Workshop on Hot Topics in 3D (Hot3D)},
  publisher = {IEEE},
  year = {2013},
  pages = {1--2},
  note = {google scholar entry: 4th IEEE International Workshop on Hot Topics in 3D (Hot3D). San Jose, California, 15 July 2013.},
  url = {http://doras.dcu.ie/19680/1/hot3d.pdf}
}
Mauro DA, O'Connor NE, Monaghan D, Gowing M, Fechteler P, Eisert P, Wall J, Izquierdo E, Alexiadis DS, Daras P, Mekuria R and Cesar P (2013), "Advancements and challenges towards a collaborative framework for 3D tele-immersive social networking", In 4th IEEE International Workshop on Hot Topics in 3D (Hot3D). San Jose, California, July, 2013, pp. 1-2.
Abstract: Social experiences realized through teleconferencing systems are still quite different from face to face meetings. The awareness that we are online and in a, to some extent, lesser real world are preventing us from really engaging and enjoying the event. Several reasons account for these differences and have been identified. We think it is now time to bridge these gaps and propose inspiring and innovative solutions in order to provide realistic, believable and engaging online experiences. We present a distributed and scalable framework named REVERIE that faces these challenges and provides a mix of these solutions. Applications built on top of the framework will be able to provide interactive, truly immersive, photo-realistic experiences to a multitude of users that for them will feel much more similar to having face to face meetings than the experience offered by conventional teleconferencing systems.
BibTeX:
@inproceedings{mauro2013advancements,
  author = {Mauro, Davide A. and O'Connor, Noel E. and Monaghan, David and Gowing, Marc and Fechteler, Philipp and Eisert, Peter and Wall, Julie and Izquierdo, Ebroul and Alexiadis, Dimitrios S. and Daras, Petros and Mekuria, Rufael and Cesar, Pablo},
  title = {Advancements and challenges towards a collaborative framework for 3D tele-immersive social networking},
  booktitle = {4th IEEE International Workshop on Hot Topics in 3D (Hot3D)},
  year = {2013},
  pages = {1--2},
  note = {google scholar entry: 4th IEEE International Workshop on Hot Topics in 3D (Hot3D). San Jose, California, 15 July 2013.},
  url = {http://doras.dcu.ie/19680/1/hot3d.pdf}
}
Mekuria R, Sanna M, Asioli S, Izquierdo E, Bulterman D and Cesar P (2013), "A 3D Tele-Immersion System Based on Live Captured Mesh Geometry", In Proceedings of the Third Annual ACM SIGMM Conference on Multimedia Systems (MMSys 2012). Oslo, Norway, February, 2013, pp. PP. ACM.
Abstract: 3D Tele-immersion enables participants in remote locations to share, in real-time, an activity. It offers users natural interactivity and immersive experiences, but it challenges current networking solutions. Work in the past has mainly focused on the efficient delivery of image-based 3D videos and on the realistic rendering and reconstruction of geometry-based 3D objects. The contribution of this paper is a complete media pipeline that allows for geometry-based 3D tele-immersion. Unlike previous approaches, that stream videos or video plus depth estimate, our streaming module can transmit the live-reconstructed 3D representations (triangle meshes). Based on a set of comparative experiments, this paper details the architecture and describes a novel component that can efficiently stream geometry in real-time. This component includes both a novel fast local compression algorithm and a rateless packet protection scheme geared towards the requirements imposed by real-time transmission of live-capture mesh geometry. Tests on a large dataset show an encoding and decoding speed-up of over 10 times at similar compression and quality rates, when compared to the high end MPEG-4 SC3DMC mesh encoder. The implemented rateless code ensures complete packet loss protection of the triangle mesh object and avoids delay introduced by retransmissions. This approach is compared to a streaming mechanism over TCP and outperforms it at packet loss rates over 2% and/or latencies over 9 ms in terms of end-to-end transmission delay. As reported in this paper, the component has been successfully integrated into a larger tele-immersive environment that includes beyond state of the art 3D reconstruction and rendering modules. This resulted in a prototype that can capture, compress transmit and render triangle mesh geometry in real-time over the internet.
BibTeX:
@inproceedings{sanna20133d,
  author = {Mekuria,Rufael and Sanna, Michele and Asioli, Stefano and Izquierdo, Ebroul and Bulterman, Dick and Cesar, Pablo},
  title = {A 3D Tele-Immersion System Based on Live Captured Mesh Geometry},
  booktitle = {Proceedings of the Third Annual ACM SIGMM Conference on Multimedia Systems (MMSys 2012).},
  publisher = {ACM},
  year = {2013},
  pages = {PP},
  note = {google scholar entry: Third Annual ACM SIGMM Conference on Multimedia Systems (MMSys 2012). Oslo, Norway, 27 February - 1 March 2013.}
}
Naccari M, Blasi SG, Mrak M and Izquierdo E (2013), "Improving inter prediction in HEVC with residual DPCM for lossless screen content coding", In Proceedings of the 2013 Picture Coding Symposium (PCS). San Jose, California, December, 2013, pp. 361-364. IEEE.
Abstract: Video content containing computer generated objects is usually denoted as screen content and is becoming popular in applications such as desktop sharing, wireless displays, etc. Screen content images and videos are characterized by high frequency details such as sharp edges and high contrast image areas. On these areas classical lossy encoding tools - spatial transform plus quantization - may significantly compromise their quality and intelligibility. Therefore, lossless coding is used instead and improved coding tools should be specifically devised for screen content. In this context this paper proposes a residual differential pulse code modulation (RDPCM) applied to inter predicted residuals and tested in the context of the HEVC range extension development. The proposed method exploits the spatial correlation present in blocks containing edges or text areas which are poorly predicted by motion compensation. In addition to the baseline inter RDCPM, two improvements to the compression efficiency and the overall throughput are presented and assessed. When compared to HEVC lossless coding as specified in Version 1 of the standard, the proposed algorithm achieves up to 8% average bitrate reduction while not increasing the overall decoding complexity.
BibTeX:
@inproceedings{naccari2013improving,
  author = {Naccari, Matteo and Blasi, Saverio G. and Mrak, Marta and Izquierdo, Ebroul},
  title = {Improving inter prediction in HEVC with residual DPCM for lossless screen content coding},
  booktitle = {Proceedings of the 2013 Picture Coding Symposium (PCS)},
  publisher = {IEEE},
  year = {2013},
  pages = {361--364},
  note = {google scholar entry: 2013 Picture Coding Symposium (PCS). San Jose, California, 8-11 December 2013.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6737758},
  doi = {10.1109/PCS.2013.6737758}
}
Pantoja C, Fernandez Arguedas V and Izquierdo E (2013), "MediaEval 2013 Visual Privacy Task: Pixel Based Anonymisation Technique", In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop. Barcelona, Catalonia Vol. 1043, pp. 1-2.
Abstract: In this paper, we describe the Visual Privacy Task, including its aim, its related dataset and the evaluation methods.
BibTeX:
@inproceedings{Pantoja2013,
  author = {Pantoja, Cesar and Fernandez Arguedas, Virginia and Izquierdo, Ebroul},
  editor = { Martha A. Larson and
Xavier Anguera and
Timo Reuter and
Gareth J. F. Jones and
Bogdan Ionescu and
Markus Schedl and
Tomas Piatrik and
Claudia Hauff and
Mohammad Soleymani }, title = {MediaEval 2013 Visual Privacy Task: Pixel Based Anonymisation Technique}, booktitle = {Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop}, year = {2013}, volume = {1043}, pages = {1--2}, note = {fix google scholar description.}, url = {http://ceur-ws.org/Vol-927/} }
Pantoja C, Fernandez Arguedas V and Izquierdo E (2013), "Anonymization and De-identification of Personal Surveillance Visual Information: A Review", In Proceedings of the 5th Latin-American Conference on Networked and Electronic Media (LACNEM 2013). Manizales, Colombia, September, 2013, pp. 1-6.
Abstract: The recent widespread adoption of video surveillance systems implies an invasive proactive approach to ensure citizen�s security. The ever-increasing amount of recorded information, implies a direct threat to citizen�s privacy and their right to preserve their personal information. Thus, a general social concern has raised for the citizen�s lost of privacy, demanding new approaches to preserve and protect their privacy, ensuring their anonymity and freedom of action whilst maintaining the surveillance performance. Several approaches have been proposed to preserve sensitive information. In this paper, a review of the existing anonymization and de-identification techniques is presented, categorising them by the domain in which the anonymization is applied and evaluating them with a common framework which takes into account the features and characteristics of each method.
BibTeX:
@inproceedings{pantoja2013lacnem,
  author = {Pantoja, Cesar and Fernandez Arguedas, Virginia and Izquierdo, Ebroul},
  title = {Anonymization and De-identification of Personal Surveillance Visual Information: A Review},
  booktitle = {Proceedings of the 5th Latin-American Conference on Networked and Electronic Media (LACNEM 2013)},
  year = {2013},
  pages = {1--6},
  note = {google scholar entry: 5th Latin-American Conference on Networked and Electronic Media (LACNEM 2013). Manizales, Colombia, 1-4 September 2013.},
  url = {https://zenodo.org/record/17022/files/LACNEM-CP-VF-EI-v6.pdf}
}
Pantoja C, Fernandez Arguedas V and Izquierdo E (2013), "MediaEval 2013 Visual Privacy Task: Pixel Based Anonymisation Technique", In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop. Barcelona, Catalonia Vol. 1043, pp. 1-2.
Abstract: In this paper, we describe the Visual Privacy Task, including its aim, its related dataset and the evaluation methods.
BibTeX:
@inproceedings{pantoja2013mediaeval,
  author = {Pantoja, Cesar and Fernandez Arguedas, Virginia and Izquierdo, Ebroul},
  editor = { Martha A. Larson and
Xavier Anguera and
Timo Reuter and
Gareth J. F. Jones and
Bogdan Ionescu and
Markus Schedl and
Tomas Piatrik and
Claudia Hauff and
Mohammad Soleymani }, title = {MediaEval 2013 Visual Privacy Task: Pixel Based Anonymisation Technique}, booktitle = {Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop}, year = {2013}, volume = {1043}, pages = {1--2}, note = {fix google scholar description.}, url = {http://ceur-ws.org/Vol-927/} }
Peixoto E, Macchiavello B, Hung EM, Zaghetto A, Shanableh T and Izquierdo E (2013), "An H.264/AVC to HEVC Video Transcoder based on Mode Mapping", In Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013). Melbourne, Australia, September, 2013, pp. 4330-4334. IEEE.
Abstract: The emerging video coding standard, HEVC, was developed to replace the current standard, H.264/AVC. However, in order to promote inter-operability with existing systems using the H.264/AVC, transcoding from H.264/AVC to the HEVC codec is highly needed. This paper presents a transcoding solution that uses machine learning techniques in order to map H.264/AVC macroblocks into HEVC coding units (CUs). Two alternatives to build the machine learning model are evaluated. The first uses a static training, where the model is built offline and used to transcode any video sequence. The other uses a dynamic training, with two well-defined stages: a training stage and a transcoding stage. In the training stage, full re-encoding is performed while the H.264/AVC and the HEVC information are gathered. This information is then used to build a model, which is used in the transcoding stage to classify the HEVC CU partitioning. Both solutions are tested with well-known video sequences and evaluated in terms of rate-distortion (RD) and complexity. The proposed method is on average 2.26 times faster than the trivial transcoder using fast motion estimation, while yielding a RD loss of only 3.6% in terms of bitrate.
BibTeX:
@inproceedings{peixoto2013h,
  author = { Peixoto, Eduardo and
Macchiavello, Bruno and
Hung, Edson Mintsu and
Zaghetto, Alexandre and
Shanableh, Tamer and
Ebroul Izquierdo }, title = {An H.264/AVC to HEVC Video Transcoder based on Mode Mapping}, booktitle = {Proceedings of the 20th IEEE International Conference on Image Processing (ICIP 2013)}, publisher = {IEEE}, year = {2013}, pages = {4330--4334}, note = {google scholar entry: 20th International Conference on Image Processing (ICIP 2013). Melbourne, Australia, 15-18 September 2013.}, url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6738406}, doi = {10.1109/ICIP.2013.6738406} }
Sanna M and Izquierdo E (2013), "Proactive Prioritized Mixing of Scalable Video Packets in Push-Based Network Coding Overlays", In 20th International Packet Video Workshop (PV 2013). San Jose, California, December, 2013, pp. 1-7. IEEE.
Abstract: Network coding applied to end-system multicast is a viable solution for a multitude of issues related to on-demand video streaming. End-system multicast on network overlays is a desirable option for relieving the content server from bandwidth bottlenecks and computational load as well as allowing decentralized allocation of resources for terminals with different computational and display capabilities. Network coding has proven to be able to solve many issues related to content distribution and rate allocation on end-system overlays, one of them being the coupon-collection problems typical of P2P networks. In this paper we present a scalable video streaming system based on end-system multicast, where the network coding technique with push-based content distribution is extended to perform prioritized streaming with error and congestion control. We identify a problem of layer and rate selection due to the difficulty in estimating the max-flow in end-system overlays, which, with many previously proposed techniques, yields to bandwidth inefficiencies. We present a mechanism for selecting and encoding chunks of scalable video prior to forwarding, and a peer-selection technique, targeting increased efficiency with the available bandwidth, that also improves quality and continuity of service with better use of network rate. Simulated tests results are presented to prove the performance of our system.
BibTeX:
@inproceedings{sanna2013proactive,
  author = {Sanna, Michele and Izquierdo, Ebroul},
  title = {Proactive Prioritized Mixing of Scalable Video Packets in Push-Based Network Coding Overlays},
  booktitle = {20th International Packet Video Workshop (PV 2013)},
  publisher = {IEEE},
  year = {2013},
  pages = {1--7},
  note = {google scholar entry: 20th International Packet Video Workshop (PV 2013). San Jose, California, 12-13 December 2013.},
  doi = {10.1109/PV.2013.6691456}
}
Sariyanidi E, Gunes H, Gökmen M and Cavallaro A (2013), "Local Zernike Moment Representation for Facial Affect Recognition", In Proceedings of the British Machine Vision Conference (BMVC 2013). Bristol, England, September, 2013. (108), pp. 1-13. BMVA Press.
Abstract: In this paper, we propose to use local Zernike Moments (ZMs) for facial affect recognition and introduce a representation scheme based on performing non-linear encoding on ZMs via quantization. Local ZMs provide a useful and compact description of image discontinuities and texture. We demonstrate the use of this ZM-based representation for posed and discrete as well as naturalistic and continuous affect recognition on standard datasets, and show that ZM-based representations outperform well-established alternative approaches for both tasks. To the best of our knowledge, the performance we achieved on CK+ dataset is superior to all results reported to date.
BibTeX:
@inproceedings{sariyanidi2013local2,
  author = {Sariyanidi, Evangelos and Gunes, Hatice and Gökmen, Muhittin and Cavallaro, Andrea},
  editor = {Burghardt, Tilo and Damen, Dima and Mayol-Cuevas, Walterio and Mirmehdi, Majid},
  title = {Local Zernike Moment Representation for Facial Affect Recognition},
  booktitle = {Proceedings of the British Machine Vision Conference (BMVC 2013)},
  publisher = {BMVA Press},
  year = {2013},
  number = {108},
  pages = {1--13},
  note = {google scholar entry: British Machine Vision Conference (BMVC 2013). Bristol, England, 9-13 September 2013.},
  url = {http://sariyanidi.pythonanywhere.com/media/sariyanidi_bmvc13None.pdf},
  doi = {10.5244/C.27.108}
}
Sariyanidi E, Sencan O and Temeltas H (2013), "Loop Closure Detection Using Local Zernike Moment Patterns", In Proceedings of SPIE 8662 -- Intelligent Robots and Computer Vision XXX: Algorithms and Techniques. Burlington, California, February, 2013. Vol. 8662(07), pp. 1-7. SPIE.
Abstract: This paper introduces a novel image description technique that aims at appearance based loop closure detection for mobile robotics applications. This technique relies on the local evaluation of the Zernike Moments. Binary patterns, which are referred to as Local Zernike Moment (LZM) patterns, are extracted from images, and these binary patterns are coded using histograms. Each image is represented with a set of histograms, and loop closure is achieved by simply comparing the most recent image with the images in the past trajectory. The technique has been tested on the New College dataset, and as far as we know, it outperforms the other methods in terms of computation efficiency and loop closure precision.
BibTeX:
@inproceedings{sariyanidi2013loop,
  author = {Sariyanidi, Evangelos and Sencan, Onur and Temeltas, Hakan},
  editor = {Röning, Juha and Casasent, David},
  title = {Loop Closure Detection Using Local Zernike Moment Patterns},
  booktitle = {Proceedings of SPIE 8662 -- Intelligent Robots and Computer Vision XXX: Algorithms and Techniques},
  publisher = {SPIE},
  year = {2013},
  volume = {8662},
  number = {07},
  pages = {1--7},
  note = {google scholar entry: Intelligent Robots and Computer Vision XXX: Algorithms and Techniques (SPIE 8662). Burlington, California, 4-6 February 2013.},
  url = {http://sariyanidi.pythonanywhere.com/media/sariyanidi_spie13None_1.pdf},
  doi = {10.1117/12.2008473}
}
Tralic D, Zupancic I, Grgic S and Grgic M (2013), "CoMoFoD - New Database for Copy-Move Forgery Detection", In Proceedings of ELMAR-2013, 55th International Symposium (ELMAR-2013). Zadar, Croatia, September, 2013, pp. 49-54. IEEE.
Abstract: Due to the availability of many sophisticated image processing tools, a digital image forgery is nowadays very often used. One of the common forgery method is a copy-move forgery, where part of an image is copied to another location in the same image with the aim of hiding or adding some image content. Numerous algorithms have been proposed for a copy-move forgery detection (CMFD), but there exist only few benchmarking databases for algorithms evaluation. We developed new database for a CMFD that consist of 260 forged image sets. Every image set includes forged image, two masks and original image. Images are grouped in 5 categories according to applied manipulation: translation, rotation, scaling, combination and distortion. Also, postprocessing methods, such as JPEG compression, blurring, noise adding, color reduction etc. are applied at all forged and original images. In this paper we present database organization and content, creation of forged images, postprocessing methods, and database testing. CoMoFoD database is available at http://www.vcl.fer.hr/comofod.
BibTeX:
@inproceedings{tralic2013comofod,
  author = {Tralic, Dijana and Zupancic, Ivan and Grgic, Sonja and Grgic, Mislav},
  editor = {Božek, Jelena and Grgić, Mislav and Zovko-Cihlar, Branka},
  title = {CoMoFoD - New Database for Copy-Move Forgery Detection},
  booktitle = {Proceedings of ELMAR-2013, 55th International Symposium (ELMAR-2013)},
  publisher = {IEEE},
  year = {2013},
  pages = {49--54},
  note = {google scholar entry: 55th International Symposium (ELMAR 2013). Zadar, Croatia, 25-27 September 2013.},
  url = {http://www.researchgate.net/profile/Dijana_Tralic/publication/266927943_CoMoFoD_-New_Database_for_Copy-Move_Forgery_Detection/links/543f79120cf21c84f23cd6f2.pdf}
}
Tzelepis C, Gkalelis N, Mezaris V and Kompatsiaris I (2013), "Improving event detection using related videos and relevance degree support vector machines", In Proceedings of the 21st ACM international conference on Multimedia (MM 2013). Barcelona, Catalunya, October, 2013, pp. 673-676. ACM.
Abstract: In this paper, a new method that exploits related videos for the problem of event detection is proposed, where related videos are videos that are closely but not fully associated with the event of interest. In particular, the Weighted Margin SVM formulation is modified so that related class observations can be effectively incorporated in the optimization problem. The resulting Relevance Degree SVM is especially useful in problems where only a limited number of training observations is provided, e.g. for the EK10Ex subtask of TRECVID MED, where only ten positive and ten related samples are provided for the training of a complex event detector. Experimental results on the TRECVID MED 2011 dataset verify the effectiveness of the proposed method.
BibTeX:
@inproceedings{tzelepis2013improving,
  author = {Tzelepis, Christos and Gkalelis, Nikolaos and Mezaris, Vasileios and Kompatsiaris, Ioannis},
  title = {Improving event detection using related videos and relevance degree support vector machines},
  booktitle = {Proceedings of the 21st ACM international conference on Multimedia (MM 2013)},
  publisher = {ACM},
  year = {2013},
  pages = {673--676},
  note = {google scholar entry: 21st International Conference on Multimedia (MM 2013). Barconal, Catalunya, 21-25 October 2013.},
  url = {http://dl.acm.org/citation.cfm?id=2502176},
  doi = {10.1145/2502081.2502176}
}
Wall J, Izquierdo E and Zhang Q (2013), "Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds", In Digital Signal Processing (DSP), 2013 18th International Conference on. Santorini, Greece, July, 2013, pp. 1-6. IEEE.
Abstract: Semi-autonomous avatars should be both realistic and believable. The goal is to learn from and reproduce the behaviours of the user-controlled input to enable semi-autonomous avatars to plausibly interact with their human-controlled counterparts. A powerful tool for embedding autonomous behaviour is learning by imitation. Hence, in this paper an ensemble of fuzzy inference systems cluster the user input data to identify natural groupings within the data to describe the users movement and actions in a more abstract way. Multiple clustering algorithms are investigated along with a neuro-fuzzy classifier; and an ensemble of fuzzy systems are evaluated.
BibTeX:
@inproceedings{Wall2013,
  author = {Wall, Julie and Izquierdo, Ebroul and Zhang, Qianni},
  title = {Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds},
  booktitle = {Digital Signal Processing (DSP), 2013 18th International Conference on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--6},
  note = {google scholar entry: 18th International Conference on Digital Signal Processing (DSP 2013). Santorini, Greece, 1-3 July 2013.},
  url = {ftp://213.176.96.142/ieee07c1aedc-f2bd-20140331092155.pdf},
  doi = {10.1109/ICDSP.2013.6622818}
}
Wall J, Izquierdo E and Zhang Q (2013), "Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds", In Digital Signal Processing (DSP), 2013 18th International Conference on. Santorini, Greece, July, 2013, pp. 1-6. IEEE.
Abstract: Semi-autonomous avatars should be both realistic and believable. The goal is to learn from and reproduce the behaviours of the user-controlled input to enable semi-autonomous avatars to plausibly interact with their human-controlled counterparts. A powerful tool for embedding autonomous behaviour is learning by imitation. Hence, in this paper an ensemble of fuzzy inference systems cluster the user input data to identify natural groupings within the data to describe the users movement and actions in a more abstract way. Multiple clustering algorithms are investigated along with a neuro-fuzzy classifier; and an ensemble of fuzzy systems are evaluated.
BibTeX:
@inproceedings{wall2013fuzzy,
  author = {Wall, Julie and Izquierdo, Ebroul and Zhang, Qianni},
  title = {Fuzzy Ensembles for Embedding Adaptive Behaviours in Semi-Autonomous Avatars in 3D Virtual Worlds},
  booktitle = {Digital Signal Processing (DSP), 2013 18th International Conference on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--6},
  note = {google scholar entry: 18th International Conference on Digital Signal Processing (DSP 2013). Santorini, Greece, 1-3 July 2013.},
  url = {ftp://213.176.96.142/ieee07c1aedc-f2bd-20140331092155.pdf},
  doi = {10.1109/ICDSP.2013.6622818}
}
Yang H and Patras I (2013), "Face Parts Localization Using Structured-output Regression Forests", In Proceedings of the 11th Asian Conference on Computer Vision -- Volume II. Daejeon, Korea, November, 2013. Vol. 2, pp. 667-679. Springer.
Abstract: In this paper, we propose a method for face parts localization called Structured-Output Regression Forests (SO-RF). We assume that the spatial graph of face parts structure can be partitioned into star graphs associated with individual parts. At each leaf, a regression model for an individual part as well as an interdependency model between parts in the star graph is learned. During testing, individual part positions are determined by the product of two voting maps, corresponding to two different models. The part regression model captures local feature evidence while the interdependency model captures the structure configuration. Our method has shown state of the art results on the publicly available BioID dataset and competitive results on a more challenging dataset, namely Labeled Face Parts in the Wild.
BibTeX:
@inproceedings{yang2012face,
  author = {Yang, Heng and Patras, Ioannis},
  editor = {Lee, Kyoung Mu and Matsushita, Yasuyuki and Rehg, James M. and Hu, Zhanyi},
  title = {Face Parts Localization Using Structured-output Regression Forests},
  booktitle = {Proceedings of the 11th Asian Conference on Computer Vision -- Volume II},
  publisher = {Springer},
  year = {2013},
  volume = {2},
  pages = {667--679},
  note = {google scholar entry: 11th Asian Conference on Computer Vision (ACCV 2012). Daejeon, Korea, 5-9 November 2012.},
  url = {http://www.researchgate.net/profile/Heng_Yang3/publication/262361922_Face_parts_localization_using_structured-output_regression_forests/links/54293b960cf26120b7b5af13.pdf},
  doi = {10.1007/978-3-642-37444-9_52}
}
Yang H and Patras I (2013), "Privileged Information-based Conditional Regression Forest for Facial Feature Detection", In Automatic Face and Gesture Recognition (FG 2013), 10th IEEE International Conference and Workshops on. Shanghai, China, April, 2013, pp. 1-6. IEEE.
Abstract: In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild.
BibTeX:
@inproceedings{yang2013privileged,
  author = {Yang, Heng and Patras, Ioannis},
  title = {Privileged Information-based Conditional Regression Forest for Facial Feature Detection},
  booktitle = {Automatic Face and Gesture Recognition (FG 2013), 10th IEEE International Conference and Workshops on},
  publisher = {IEEE},
  year = {2013},
  pages = {1--6},
  note = {google scholar entry: 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG 2013). Shanghai, China, 22-26 April 2013.},
  url = {http://www.computer.org/csdl/proceedings/fg/2013/5545/00/06553766.pdf},
  doi = {10.1109/FG.2013.6553766}
}
Yang H and Patras I (2013), "Sieving Regression Forest Votes for Facial Feature Detection in the Wild", In Proceedings of the 2013 IEEE International Conference onComputer Vision (ICCV 2013). Sydney, Australia, December, 2013, pp. 1936-1943. IEEE.
Abstract: In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to filter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, filter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in question, makes a sequence of decisions on whether the threshold should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold adjustment especially on `difficult' face images.
BibTeX:
@inproceedings{yang2013sieving,
  author = {Yang, Heng and Patras, Ioannis},
  title = {Sieving Regression Forest Votes for Facial Feature Detection in the Wild},
  booktitle = {Proceedings of the 2013 IEEE International Conference onComputer Vision (ICCV 2013)},
  publisher = {IEEE},
  year = {2013},
  pages = {1936--1943},
  note = {google scholar entry: 2013 IEEE International Conference on Computer Vision (ICCV 2013). Sydney, Australia, 1-8 December 2013.},
  url = {http://www.cv-foundation.org/openaccess/content_iccv_2013/papers/Yang_Sieving_Regression_Forest_2013_ICCV_paper.pdf},
  doi = {10.1109/ICCV.2013.243}
}
(2013), "Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop" Barcelona, Catalonia, October, 2013. Vol. 1043 CEUR-WS.org.
BibTeX:
@inproceedings{larson2013proceedings,,
  editor = { Martha A. Larson and
Xavier Anguera and
Timo Reuter and
Gareth J. F. Jones and
Bogdan Ionescu and
Markus Schedl and
Tomas Piatrik and
Claudia Hauff and
Mohammad Soleymani }, title = {Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop}, publisher = {CEUR-WS.org}, year = {2013}, volume = {1043}, note = {fix google scholar description.}, url = {http://ceur-ws.org/Vol-1043} }

Presentations, Posters and Technical Reports

Papadopoulos C, Heckman JP, May N, Zhang Q, Lin X, Izquierdo E, Lisagors O and Trouilloud J (2013), "Conversion Method and System". March, 2013.
Abstract: A computer-implemented method for converting a representation of a system into a behaviour model of the system is provided. The method can be used to convert a schematic diagram into a behavioural model.
BibTeX:
@misc{papadopoulos2013conversion,
  author = {Papadopoulos, Christopher and Heckman, Jean Pierre and May, Nicholas and Zhang, Qianni and Lin, Xinyu and Izquierdo, Ebroul and Lisagors, Olegs and Trouilloud, Jean},
  title = {Conversion Method and System},
  year = {2013},
  note = {US Patent 20,130,080,137}
}
Rivera F (2013), "10 Tips for Starting a Second Career in Tech: What I Wish my Advisors Had Told Me".
BibTeX:
@misc{rivera201310,
  author = {Rivera, Fiona},
  title = {10 Tips for Starting a Second Career in Tech: What I Wish my Advisors Had Told Me},
  journal = {Pearson InformIT},
  publisher = {Pearson},
  year = {2013},
  note = {non-mmv},
  url = {http://www.informit.com/articles/article.aspx?p=2142706}
}


2012

Journal Papers

Akram M and Izquierdo E (2012), "Fast motion estimation for surveillance video compression", Signal, Image and Video Processing. June, 2012. Vol. 6(2), pp. 1-10. Springer.
Abstract: In this article, novel approaches to perform efficient motion estimation specific to surveillance video compression are proposed. These includes (i) selective (ii) tracker-based and (iii) multi-frame-based motion estimation. In selective approach, motion vector search is performed for only those frames that contain some motion activity. In another approach, contrary to performing motion estimation on the encoder side, motion vectors are calculated using information of a surveillance video tracker. This approach is quicker but for some scenarios it degrades the visual perception of the video compared with selective approach. In an effort to speed up multi-frame motion estimation, we propose a fast multiple reference frames-based motion estimation technique for surveillance videos. Experimental evaluation shows that significant reduction in computational complexity can be achieved by applying the proposed strategies.
BibTeX:
@article{akram2012fast,
  author = {Akram, Muhammad and Izquierdo, Ebroul},
  title = {Fast motion estimation for surveillance video compression},
  journal = {Signal, Image and Video Processing},
  publisher = {Springer},
  year = {2012},
  volume = {6},
  number = {2},
  pages = {1--10},
  url = {http://link.springer.com/article/10.1007/s11760-012-0355-8},
  doi = {10.1007/s11760-012-0355-8}
}
Asioli S, Ramzan N and Izquierdo E (2012), "A game theoretic approach to minimum-delay scalable video transmission over P2P", Signal Processing: Image Communication. May, 2012. Vol. 27(5), pp. 513-521.
Abstract: In this paper we describe a game theoretic framework for scalable video streaming over a peer-to-peer network. The proposed system integrates minimum delay functionalities with an incentive provision mechanism for optimal resource allocation. First of all, we introduce an algorithm for packet scheduling that allows users to download a specific sub-set of the original scalable bit-stream, depending on the current network conditions. Furthermore, we present an algorithm that aims both at identifying free-riders and minimising the transmission delay. Uncooperative peers are cut out of this system, while users upload more data to those which have less to share, in order to fully exploit the resources of all peers. Experimental evaluation shows that the proposed model can effectively cope with free-riders and minimise the transmission delay for scalable video transmission by exploiting a packet scheduling algorithm, game theory, and a minimum-delay algorithm.
BibTeX:
@article{asioli2012game,
  author = {Stefano Asioli and Naeem Ramzan and Ebroul Izquierdo},
  title = {A game theoretic approach to minimum-delay scalable video transmission over P2P},
  journal = {Signal Processing: Image Communication},
  year = {2012},
  volume = {27},
  number = {5},
  pages = {513--521},
  note = {ADVANCES IN 2D/3D VIDEO STREAMING OVER P2P NETWORKS},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596512000410},
  doi = {10.1016/j.image.2012.02.012}
}
Dong L, Su J and Izquierdo E (2012), "Scene-Oriented Hierarchical Classification of Blurry and Noisy Images", Image Processing, IEEE Transactions on. May, 2012. Vol. 21(5), pp. 2534-2545. IEEE.
Abstract: A system for scene-oriented hierarchical classification of blurry and noisy images is proposed. It attempts to simulate important features of the human visual perception. The underlying approach is based on three strategies: extraction of essential signatures captured from a global context, simulating the global pathway; highlight detection based on local conspicuous features of the reconstructed image, simulating the local pathway; and hierarchical classification of extracted features using probabilistic techniques. The techniques involved in hierarchical classification use input from both the local and global pathways. Visual context is exploited by a combination of Gabor filtering with the principal component analysis. In parallel, a pseudo-restoration process is applied together with an affine invariant approach to improve the accuracy in the detection of local conspicuous features. Subsequently, the local conspicuous features and the global essential signature are combined and clustered by a Monte Carlo approach. Finally, clustered features are fed to a self-organizing tree algorithm to generate the final hierarchical classification results. Selected representative results of a comprehensive experimental evaluation validate the proposed system.
BibTeX:
@article{le2012scene,
  author = {Dong, Le and Su, Jiang and Izquierdo, Ebroul},
  title = {Scene-Oriented Hierarchical Classification of Blurry and Noisy Images},
  journal = {Image Processing, IEEE Transactions on},
  publisher = {IEEE},
  year = {2012},
  volume = {21},
  number = {5},
  pages = {2534--2545},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6151149},
  doi = {10.1109/TIP.2012.2187528}
}
Essid S, Lin X, Gowing M, Kordelas G, Aksay A, Kelly P, Fillon T, Zhang Q, Dielmann A, Kitanovski V, Tournemenne R, Masurelle A, Izquierdo E, O�Connor NE, Daras P and Richard G (2012), "A multi-modal dance corpus for research into interaction between humans in virtual environments", Journal on Multimodal User Interfaces. August, 2012, pp. 1-14. Springer.
Abstract: We present a new, freely available, multimodal corpus for research into, amongst other areas, real-time realistic interaction between humans in online virtual environments. The specific corpus scenario focuses on an online dance class application scenario where students, with avatars driven by whatever 3D capture technology is locally available to them, can learn choreographies with teacher guidance in an online virtual dance studio. As the dance corpus is focused on this scenario, it consists of student/teacher dance choreographies concurrently captured at two different sites using a variety of media modalities, including synchronised audio rigs, multiple cameras, wearable inertial measurement devices and depth sensors. In the corpus, each of the several dancers performs a number of fixed choreographies, which are graded according to a number of specific evaluation criteria. In addition, ground-truth dance choreography annotations are provided. Furthermore, for unsynchronised sensor modalities, the corpus also includes distinctive events for data stream synchronisation. The total duration of the recorded content is 1 h and 40 min for each single sensor, amounting to 55 h of recordings across all sensors. Although the dance corpus is tailored specifically for an online dance class application scenario, the data is free to download and use for any research and development purposes.
BibTeX:
@article{Essid2012,
  author = {Essid, Slim and Lin, Xinyu and Gowing, Marc and Kordelas, Georgios and Aksay, Anil and Kelly, Philip and Fillon, Thomas and Zhang, Qianni and Dielmann, Alfred and Kitanovski, Vlado and Tournemenne, Robin and Masurelle, Aymeric and Izquierdo, Ebroul and O�Connor, Noel E. and Daras, Petros and Richard, Gaël},
  title = {A multi-modal dance corpus for research into interaction between humans in virtual environments},
  journal = {Journal on Multimodal User Interfaces},
  publisher = {Springer},
  year = {2012},
  pages = {1--14},
  url = {http://link.springer.com/article/10.1007%2Fs12193-012-0109-5},
  doi = {10.1007/s12193-012-0109-5}
}
Essid S, Lin X, Gowing M, Kordelas G, Aksay A, Kelly P, Fillon T, Zhang Q, Dielmann A, Kitanovski V, Tournemenne R, Masurelle A, Izquierdo E, O�Connor NE, Daras P and Richard G (2012), "A multi-modal dance corpus for research into interaction between humans in virtual environments", Journal on Multimodal User Interfaces. august, 2012, pp. 1-14. Springer.
Abstract: We present a new, freely available, multimodal corpus for research into, amongst other areas, real-time realistic interaction between humans in online irtual environments. The specific corpus scenario focuses on an online dance class application scenario where students, with avatars driven by whatever 3D capture technology is locally available to them, can learn choreographies with teacher guidance in an online virtual dance studio. As the dance corpus is focused on this scenario, it consists of student/teacher dance choreographies concurrently captured at two different sites using a variety of media modalities, including synchronised audio rigs, multiple cameras, wearable inertial measurement devices and depth sensors. In the corpus, each of the several dancers performs a number of fixed choreographies, which are graded according to a number of specific evaluation criteria. In addition, ground-truth dance choreography annotations are provided. Furthermore, for unsynchronised sensor modalities, the corpus als includes distinctive events for data stream synchronisation. The total duration of the recorded content is 1 h and 40 min for each single sensor, amounting to 55 h of recordings across all sensors. Although the dance corpus is tailored specifically for an online dance class application scenario, the data is free to download and use for any research and development purposes.
BibTeX:
@article{essid2012multi,
  author = {Essid, Slim and Lin, Xinyu and Gowing, Marc and Kordelas, Georgios and Aksay, Anil and Kelly, Philip and Fillon, Thomas and Zhang, Qianni and Dielmann, Alfred and Kitanovski, Vlado and Tournemenne, Robin and Masurelle, Aymeric and Izquierdo, Ebroul and O�Connor, Noel E. and Daras, Petros and Richard, Gaël},
  title = {A multi-modal dance corpus for research into interaction between humans in virtual environments},
  journal = {Journal on Multimodal User Interfaces},
  publisher = {Springer},
  year = {2012},
  pages = {1--14},
  url = {http://link.springer.com/article/10.1007%2Fs12193-012-0109-5},
  doi = {10.1007/s12193-012-0109-5}
}
Haji Mirza SN, Proulx MJ and Izquierdo E (2012), "Reading Users' Minds From Their Eyes: A Method for Implicit Image Annotation", IEEE Transactions on Multimedia. June, 2012. Vol. 14(3), pp. 805-815.
Abstract: This paper explores the possible solutions for image annotation and retrieval by implicitly monitoring user attention via eye-tracking. Features are extracted from the gaze trajectory of users examining sets of images to provide implicit information on the target template that guides visual attention. Our Gaze Inference System (GIS) is a fuzzy logic based framework that analyzes the gaze-movement features to assign a user interest level (UIL) from 0 to 1 to every image that appeared on the screen. Because some properties of the gaze features are unique for every user, our user adaptive framework builds a new processing system for every new user to achieve higher accuracy. The generated UILs can be used for image annotation purposes; however, the output of our system is not limited as it can be used also for retrieval or other scenarios. The developed framework produces promising and reliable UILs where approximately 53% of target images in the users' minds can be identified by the machine with an error of less than 20% and the top 10% of them with no error. We show in this paper that the existing information in gaze patterns can be employed to improve the machine's judgement of image content by assessment of human interest and attention to the objects inside virtual environments.
BibTeX:
@article{haji2012reading,
  author = {Haji Mirza, Seyed Navid and Proulx, Michael J. and Izquierdo, Ebroul},
  title = {Reading Users' Minds From Their Eyes: A Method for Implicit Image Annotation},
  journal = {IEEE Transactions on Multimedia},
  year = {2012},
  volume = {14},
  number = {3},
  pages = {805--815},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6145688},
  doi = {10.1109/TMM.2012.2186792}
}
Koelstra S, Mühl C, Soleymani M, Lee J, Yazdani A, Ebrahimi T, Pun T, Nijholt A and Patras I (2012), "DEAP: A Database for Emotion Analysis ;Using Physiological Signals", Affective Computing, IEEE Transactions on. January, 2012. Vol. 3(1), pp. 18-31. IEEE.
Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.
BibTeX:
@article{koelstra2012deap,
  author = { Koelstra, Sander and
Mühl, Christian and
Mohammad Soleymani and
Jong-Seok Lee and
Ashkan Yazdani and
Touradj Ebrahimi and
Thierry Pun and
Anton Nijholt and
Ioannis Patras }, title = {DEAP: A Database for Emotion Analysis ;Using Physiological Signals}, journal = {Affective Computing, IEEE Transactions on}, publisher = {IEEE}, year = {2012}, volume = {3}, number = {1}, pages = {18--31}, url = {http://www.eecs.qmul.ac.uk/~ioannisp/pubs/ecopies/2012KoelstraPatrasetalTAF.pdf}, doi = {10.1109/T-AFFC.2011.15} }
Kumar BGV, Kotsia I and Patras I (2012), "Max-margin Non-negative Matrix Factorization", Image and Vision Computing. Vol. 30(4�5), pp. 279-291. Elsevier.
Abstract: In this paper we introduce a supervised, maximum margin framework for linear and non-linear Non-negative Matrix Factorization. By contrast to existing methods in which the matrix factorization phase (i.e. the feature extraction phase) and the classification phase are separated, we incorporate the maximum margin classification constraints within the NMF formulation. This results to a non-convex constrained optimization problem with respect to the bases and the separating hyperplane, which we solve following a block coordinate descent iterative optimization procedure. At each iteration a set of convex (constrained quadratic or Support Vector Machine-type) sub-problems are solved with respect to subsets of the unknown variables. By doing so, we obtain a bases matrix that maximizes the margin of the classifier in the low dimensional space (in the linear case) or in the high dimensional feature space (in the non-linear case). The proposed algorithms are evaluated on several computer vision problems such as pedestrian detection, image retrieval, facial expression recognition and action recognition where they are shown to consistently outperform schemes that extract features using bases that are learned using semi-NMF and classify them using an SVM classifier.
BibTeX:
@article{kumar2012max,
  author = {Kumar, B. G. Vijay and Kotsia, Irene and Patras, Ioannis},
  title = {Max-margin Non-negative Matrix Factorization},
  journal = {Image and Vision Computing},
  publisher = {Elsevier},
  year = {2012},
  volume = {30},
  number = {4�5},
  pages = {279--291},
  url = {http://www.sciencedirect.com/science/article/pii/S026288561200025X},
  doi = {10.1016/j.imavis.2012.02.010}
}
Lee J-S, De Simone F, Ebrahimi T, Ramzan N and Izquierdo E (2012), "Quality Assessment of Multidimensional Video Scalability", Communications Magazine, IEEE. April, 2012. Vol. 50(4), pp. 38-46. IEEE.
Abstract: Scalability is a powerful concept for adaptive video content delivery to many end users having heterogeneous and dynamic characteristics of networks and devices. In order to maximize users' quality of experience by selecting appropriate combinations of multiple scalability parameters, it is crucial to understand and model the relationship between multidimensional scalability and perceived quality. In this article, we address the latest advances in subjective and objective quality evaluation of multidimensional video scalability for optimal content distribution, present their applications, and discuss future trends and challenges.
BibTeX:
@article{lee2012quality,
  author = {Lee, Jong-Seok and De Simone, Francesca and Ebrahimi, Touradj and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Quality Assessment of Multidimensional Video Scalability},
  journal = {Communications Magazine, IEEE},
  publisher = {IEEE},
  year = {2012},
  volume = {50},
  number = {4},
  pages = {38--46},
  url = {https://sites.google.com/site/jongseokleehome/files/03_lee_commag12.pdf},
  doi = {10.1109/MCOM.2012.6178832}
}
Peixoto E, Zgaljic T and Izquierdo E (2012), "Transcoding from Hybrid Nonscalable to Wavelet-Based Scalable Video Codecs", Circuits and Systems for Video Technology, IEEE Transactions on. April, 2012. Vol. 22(4), pp. 502-515.
Abstract: Scalable video coding (SVC) enables low complexity adaptation of compressed video, providing an efficient solution for content delivery through heterogeneous networks and to diverse displays. However, legacy video and most commercially available content capturing devices use conventional nonscalable coding, e.g. H.264/AVC. This paper proposes an efficient transcoder from H.264/AVC to a wavelet-based SVC. It aims at exploiting the advantages offered by fine granularity SVC technology when dealing with conventional coders and legacy video. The proposed transcoder was developed to cope with important functionalities of H.264/AVC, such as flexible reference frame (RF) selection. It is able to work with different coding configurations of H.264/AVC, including IPP or IBBP with multiple RFs. Moreover, many of the techniques presented in this paper are generic in the sense that they can be used for transcoding with many popular wavelet-based and hybrid-based video coding architectures. To reduce the transcoder's complexity, motion information and residual data extracted from a compressed H.264/AVC stream are exploited. Experimental results show a very good performance of the proposed transcoder in terms of decoded video quality and system complexity.
BibTeX:
@article{Peixoto2012,
  author = {Peixoto, E. and Zgaljic, T. and Izquierdo, E.},
  title = {Transcoding from Hybrid Nonscalable to Wavelet-Based Scalable Video Codecs},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  year = {2012},
  volume = {22},
  number = {4},
  pages = {502--515},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6019028},
  doi = {10.1109/TCSVT.2011.2168175}
}
Peixoto E, Zgaljic T and Izquierdo E (2012), "Transcoding from Hybrid Nonscalable to Wavelet-Based Scalable Video Codecs", IEEE Transactions on Circuits and Systems for Video Technology. April, 2012. Vol. 22(4), pp. 502-515.
Abstract: Scalable video coding (SVC) enables low complexity adaptation of compressed video, providing an efficient solution for content delivery through heterogeneous networks and to diverse displays. However, legacy video and most commercially available content capturing devices use conventional nonscalable coding, e.g. H.264/AVC. This paper proposes an efficient transcoder from H.264/AVC to a wavelet-based SVC. It aims at exploiting the advantages offered by fine granularity SVC technology when dealing with conventional coders and legacy video. The proposed transcoder was developed to cope with important functionalities of H.264/AVC, such as flexible reference frame (RF) selection. It is able to work with different coding configurations of H.264/AVC, including IPP or IBBP with multiple RFs. Moreover, many of the techniques presented in this paper are generic in the sense that they can be used for transcoding with many popular wavelet-based and hybrid-based video coding architectures. To reduce the transcoder's complexity, motion information and residual data extracted from a compressed H.264/AVC stream are exploited. Experimental results show a very good performance of the proposed transcoder in terms of decoded video quality and system complexity.
BibTeX:
@article{peixoto2012transcoding,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Transcoding from Hybrid Nonscalable to Wavelet-Based Scalable Video Codecs},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology},
  year = {2012},
  volume = {22},
  number = {4},
  pages = {502--515},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Peixoto2012_Transcoding.pdf},
  doi = {10.1109/TCSVT.2011.2168175}
}
Ramzan N, Izquierdo E, Park H, Katsaggelos AK and Pouwelse J (2012), "Special issue on advances in 2D/3D Video Streaming Over P2P Networks", Signal Processing: Image Communication. May, 2012. Vol. 27(5), pp. 379-382. Elsevier.
BibTeX:
@article{ramzan2012special,
  author = {Ramzan, Naeem and Izquierdo, Ebroul and Park, Hyunggon and Katsaggelos, Aggelos K. and Pouwelse, Johan},
  title = {Special issue on advances in 2D/3D Video Streaming Over P2P Networks},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2012},
  volume = {27},
  number = {5},
  pages = {379--382},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596512000823},
  doi = {10.1016/j.image.2012.04.003}
}
Ramzan N, Park H and Izquierdo E (2012), "Video streaming over P2P networks: Challenges and opportunities", Signal Processing: Image Communication. May, 2012. Vol. 27, pp. 401-411. Elsevier.
Abstract: A robust real-time video communication service over the Internet in a distributed manner is an important challenge, as it influences not only the current Internet structure but also the future Internet evolution. In this context, Peer-to-Peer (P2P) networks are playing an imperative position for providing efficient video transmission over the Internet. Recently, several P2P video transmission systems have been proposed for live video streaming services or video-on-demand services over the Internet. In this paper, we describe and discuss existing video streaming systems over P2P. Efficient (delay tolerant and intolerant) data sharing mechanisms in P2P and current video coding trends are elaborated in detail. Moreover, video streaming solutions (live and on-demand) over P2P from the perspective of tree-based and mesh-based systems are explained. Finally, the conclusion is drawn with key challenges and open issues related to video streaming over P2P.
BibTeX:
@article{ramzan2012video,
  author = {Ramzan, Naeem and Park, Hyunggon and Izquierdo, Ebroul},
  title = {Video streaming over P2P networks: Challenges and opportunities},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2012},
  volume = {27},
  pages = {401--411},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596512000331},
  doi = {10.1016/j.image.2012.02.004}
}
Sevillano X, Piatrik T, Chandramouli K, Zhang Q and Izquierdo E (2012), "Indexing Large Online Multimedia Repositories Using Semantic Expansion and Visual Analysis", MultiMedia, IEEE. July, 2012. Vol. 19(3), pp. 53-61. IEEE.
Abstract: The proposed framework automatically predicts user tags for online videos from their visual features and associated textual metadata, which is semantically expanded using complementary textual resources.
BibTeX:
@article{Sevillano2012,
  author = {Sevillano, Xavier and Piatrik, Tomas and Chandramouli, Krishna and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Indexing Large Online Multimedia Repositories Using Semantic Expansion and Visual Analysis},
  journal = {MultiMedia, IEEE},
  publisher = {IEEE},
  year = {2012},
  volume = {19},
  number = {3},
  pages = {53--61},
  note = {title on google scholar entry is wrong},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6205726},
  doi = {10.1109/MMUL.2012.28}
}
Sevillano X, Piatrik T, Chandramouli K, Zhang Q and Izquierdo E (2012), "Indexing Large Online Multimedia Repositories Using Semantic Expansion and Visual Analysis", MultiMedia, IEEE. July-September, 2012. Vol. 19(3), pp. 53-61.
Abstract: The proposed framework automatically predicts user tags for online videos from their visual features and associated textual metadata, which is semantically expanded using complementary textual resources.
BibTeX:
@article{sevillano2012indexing,
  author = {Sevillano, Xavier and Piatrik, Tomas and Chandramouli, Krishna and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Indexing Large Online Multimedia Repositories Using Semantic Expansion and Visual Analysis},
  journal = {MultiMedia, IEEE},
  year = {2012},
  volume = {19},
  number = {3},
  pages = {53--61},
  note = {title on google scholar entry is wrong},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6205726},
  doi = {10.1109/MMUL.2012.28}
}
Wall J, McDaid LJ, Maguire LP and McGinnity TM (2012), "Spiking Neural Network Model of Sound Localization Using the Interaural Intensity Difference", Neural Networks and Learning Systems, IEEE Transactions on. April, 2012. Vol. 23(4), pp. 574-586. IEEE.
Abstract: In this paper, a spiking neural network (SNN) architecture to simulate the sound localization ability of the mammalian auditory pathways using the interaural intensity difference cue is presented. The lateral superior olive was the inspiration for the architecture, which required the integration of an auditory periphery (cochlea) model and a model of the medial nucleus of the trapezoid body. The SNN uses leaky integrate-and-fire excitatory and inhibitory spiking neurons, facilitating synapses and receptive fields. Experimentally derived head-related transfer function (HRTF) acoustical data from adult domestic cats were employed to train and validate the localization ability of the architecture, training used the supervised learning algorithm called the remote supervision method to determine the azimuthal angles. The experimental results demonstrate that the architecture performs best when it is localizing high-frequency sound data in agreement with the biology, and also shows a high degree of robustness when the HRTF acoustical data is corrupted by noise.
BibTeX:
@article{wall2012spiking,
  author = {Wall, Julie and McDaid, Liam J. and Maguire, Liam P. and McGinnity, Thomas M.},
  title = {Spiking Neural Network Model of Sound Localization Using the Interaural Intensity Difference},
  journal = {Neural Networks and Learning Systems, IEEE Transactions on},
  publisher = {IEEE},
  year = {2012},
  volume = {23},
  number = {4},
  pages = {574--586},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6145692},
  doi = {10.1109/TNNLS.2011.2178317}
}
Zhang Q and Izquierdo E (2012), "Histology Image Retrieval in Optimised Multi-Feature Spaces", Information Technology in Biomedicine, IEEE Transactions on. November, 2012. Vol. 17(1), pp. 240-249. IEEE.
Abstract: Content based histology image retrieval systems have shown great potential in supporting decision making in clinical activities, teaching, and biological research. In content based image retrieval, feature combination plays a key role. It aims at enhancing the descriptive power of visual features corresponding to semantically meaningful queries. It is particularly valuable in histology image analysis where intelligent mechanisms are needed for interpreting varying tissue composition and architecture into histological concepts. This paper presents an approach to automatically combine heterogeneous visual features for histology image retrieval. The aim is to obtain the most representative fusion model for a particular keyword that is associated to multiple query images. The core of this approach is a multi-objective learning method, which aims to understand an optimal visual-semantic matching function by jointly considering the different preferences of the group of query images. The task is posed as an optimisation problem, and a multi-objective optimisation strategy is employed in order to handle potential contradictions in the query images associated to the same keyword. Experiments were performed on two different collections of histology images. The results show that it is possible to improve a system for content based histology image retrieval by using an appropriately defined multi-feature fusion model, which takes careful consideration of the structure and distribution of visual features.
BibTeX:
@article{Zhang2012,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Histology Image Retrieval in Optimised Multi-Feature Spaces},
  journal = {Information Technology in Biomedicine, IEEE Transactions on},
  publisher = {IEEE},
  year = {2012},
  volume = {17},
  number = {1},
  pages = {240--249},
  url = {http://pgembeddedsystems.com/securelogin/upload/project/IEEE/29/PG2013IP001/1.pdf},
  doi = {10.1109/TITB.2012.2227270}
}
Zhang Q and Izquierdo E (2012), "Histology Image Retrieval in Optimised Multi-Feature Spaces", Information Technology in Biomedicine, IEEE Transactions on. November, 2012. Vol. 17(1), pp. 240-249. IEEE.
Abstract: Content based histology image retrieval systems have shown great potential in supporting decision making in clinical activities, teaching, and biological research. In content based image retrieval, feature combination plays a key role. It aims at enhancing the descriptive power of visual features corresponding to semantically meaningful queries. It is particularly valuable in histology image analysis where intelligent mechanisms are needed for interpreting varying tissue composition and architecture into histological concepts. This paper presents an approach to automatically combine heterogeneous visual features for histology image retrieval. The aim is to obtain the most representative fusion model for a particular keyword that is associated to multiple query images. The core of this approach is a multi-objective learning method, which aims to understand an optimal visual-semantic matching function by jointly considering the different preferences of the group of query images. The task is posed as an optimisation problem, and a multi-objective optimisation strategy is employed in order to handle potential contradictions in the query images associated to the same keyword. Experiments were performed on two different collections of histology images. The results show that it is possible to improve a system for content based histology image retrieval by using an appropriately defined multi-feature fusion model, which takes careful consideration of the structure and distribution of visual features.
BibTeX:
@article{zhang2012histology,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Histology Image Retrieval in Optimised Multi-Feature Spaces},
  journal = {Information Technology in Biomedicine, IEEE Transactions on},
  publisher = {IEEE},
  year = {2012},
  volume = {17},
  number = {1},
  pages = {240--249},
  url = {http://pgembeddedsystems.com/securelogin/upload/project/IEEE/29/PG2013IP001/1.pdf},
  doi = {10.1109/TITB.2012.2227270}
}

Conference Papers

Asioli S, Ramzan N and Izquierdo E (2012), "Exploiting Social Relationships for Free-riders Detection in Minimum-delay P2P Scalable Video Streaming", In IEEE International Conference on Image Processing (ICIP 2012). Proceedings. Orlando, Florida , pp. 2257-2260. IEEE.
Abstract: In this paper we describe a game theoretic framework for scalable video streaming over a peer-to-peer network that exploits social relationships. The proposed system integrates optimal resource allocation functionalities with an incentive provision mechanism for data sharing. First of all, we introduce an algorithm for packet scheduling that allows users to download a specific sub-set of the original scalable bit-stream, depending on the current network conditions. Furthermore, we present an algorithm that aims both at identifying free-riders and minimising transmission delay by exploiting social relationships among peers. Non social/uncooperative peers are cut out of this system, while users upload more data to those which have less to share, in order to fully exploit the resources of all the users.
BibTeX:
@inproceedings{asioli2012exploiting,
  author = {Asioli, Stefano and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Exploiting Social Relationships for Free-riders Detection in Minimum-delay P2P Scalable Video Streaming},
  booktitle = {IEEE International Conference on Image Processing (ICIP 2012). Proceedings.},
  publisher = {IEEE},
  year = {2012},
  pages = {2257--2260},
  note = {google scholar entry: 19th IEEE International Conference on Image Processing (ICIP 2012). Orlando, Florida, 30 September - 3 October 2012},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6467345},
  doi = {10.1109/ICIP.2012.6467345}
}
Asioli S, Ramzan N and Izquierdo E (2012), "A game theoretic framework for optimal resource allocation in P2P scalable video streaming", In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. Kyoto, Japan, March, 2012, pp. 2293-2296. IEEE.
Abstract: In this paper we describe a game theoretic framework for scalable video streaming over a peer-to-peer network. The proposed system integrates optimal resource allocation functionalities with an incentive provision mechanism for data sharing. First of all, we introduce an algorithm for packet scheduling that allows users to download a specific sub-set of the original scalable bit-stream, depending on the current network conditions. Furthermore, we present an algorithm that aims both at identifying free-riders and minimising transmission delay. Uncooperative peers are cut out of this system, while users upload more data to those which have less to share, in order to fully exploit the resources of all the peers. Experimental evaluation shows that this model can effectively cope with free-riders and minimise transmission delay for scalable video streaming.
BibTeX:
@inproceedings{asioli2012gametheoretic,
  author = {Asioli, Stefano and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {A game theoretic framework for optimal resource allocation in P2P scalable video streaming},
  booktitle = {Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on},
  publisher = {IEEE},
  year = {2012},
  pages = {2293--2296},
  note = {google scholar entry: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan, 25-30 March 2012.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6288372},
  doi = {10.1109/ICASSP.2012.6288372}
}
Badii A, Piatrik T, Einig M and Lallah C (2012), "Overview of MediaEval 2012 Visual Privacy Task", In Working Notes Proceedings of the MediaEval 2012 Workshop. Pisa, Italy, October, 2012. Vol. 927, pp. 1-2. CEUR-WS.org.
Abstract: In this paper, we describe the Visual Privacy Task, including its aim, its related dataset and the evaluation methods.
BibTeX:
@inproceedings{badii2012overview,
  author = {Badii, Atta and Piatrik, Tomas and Einig, Mathieu and Lallah, Chattun},
  editor = {Larson, Martha and Schmiedeke, Sebastian and Kelm, Pascal and Rae, Adam and Mezaris, Vasileios and Piatrik, Tomas and Soleymani, Mohammad and Metze, Florian and Jones, Gareth},
  title = {Overview of MediaEval 2012 Visual Privacy Task},
  booktitle = {Working Notes Proceedings of the MediaEval 2012 Workshop},
  publisher = {CEUR-WS.org},
  year = {2012},
  volume = {927},
  pages = {1--2},
  note = {fix google scholar description.},
  url = {http://ceur-ws.org/Vol-927/}
}
Blasi SG and Izquierdo E (2012), "Residual error curvature estimation and adaptive classification for selective sub-pel precision motion estimation", In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. Kyoto, Japan, March, 2012, pp. 1193-1196. IEEE.
Abstract: We present a novel approach for adaptive precision motion estimation based on a classification of the residual error curvature. A fast algorithm is proposed to estimate the curvature of the interpolated residual surface using the error samples after integer precision motion estimation. We also propose an original technique to compute and successively update a set of thresholds using the information from previously coded frames. The optimal motion vector precision is then selected for each block according to the current thresholds. The approach is compared in terms of PSNR of the motion compensated reconstruction against conventional state of the art sub-pel motion estimation algorithms, and it is shown to efficiently reduce complexity and coding times of a typical video encoder with negligible effects on the prediction accuracy.
BibTeX:
@inproceedings{blasi2012residual,
  author = {Blasi, Saverio G. and Izquierdo, Ebroul},
  title = {Residual error curvature estimation and adaptive classification for selective sub-pel precision motion estimation},
  booktitle = {Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on},
  publisher = {IEEE},
  year = {2012},
  pages = {1193--1196},
  note = {google scholar entry: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan, 25-30 March 2012.},
  url = {http://www.mirlab.org/conference_papers/International_Conference/ICASSP%202012/pdfs/0001193.pdf},
  doi = {10.1109/ICASSP.2012.6288101}
}
Bozas K and Izquierdo E (2012), "Large Scale Sketch Based Image Retrieval Using Patch Hashing", In Advances in Visual Computing. 8th International Symposium on Visual Computing (ISVC 2012). Revised Selected Papers. Rethymnon, Crete, July, 2012. Vol. 7431, pp. 210-219. Springer.
Abstract: This paper introduces a hashing based framework that facilitates sketch based image retrieval in large image databases. Instead of exporting a single visual descriptor for every image, an overlapping spatial grid is utilised to generate a pool of patches. We rank similarities between a hand drawn sketch and the natural images in a database through a voting process where near duplicate in terms of shape and structure patches arbitrate for the result. Patch similarity is efficiently estimated with a hashing algorithm. A reverse index structure built on the hashing keys ensures the scalability of our scheme and at the same time allows for real time reranking on query updates. Experiments in a publicly available benchmark dataset demonstrate the superiority of our approach.
BibTeX:
@inproceedings{bozas2012large,
  author = {Bozas, Konstantinos and Izquierdo, Ebroul},
  editor = {Bebis, George and Boyle, Richard and Parvin, Bahram and Koracin, Darko and Fowlkes, Charless and Wang, Sen and Choi, Min-Hyung and Mantler, Stephan and Schulze, Jürgen and Acevedo, Daniel and Mueller, Klaus and Papka, Michael},
  title = {Large Scale Sketch Based Image Retrieval Using Patch Hashing},
  booktitle = {Advances in Visual Computing. 8th International Symposium on Visual Computing (ISVC 2012). Revised Selected Papers.},
  publisher = {Springer},
  year = {2012},
  volume = {7431},
  pages = {210--219},
  note = {google scholar entry: 8th International Symposium on Visual Computing (ISVC 2012). Rethymnon, Crete, 16-18 July 2012.},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-33179-4_21},
  doi = {10.1007/978-3-642-33179-4_21}
}
Brenner M and Izquierdo E (2012), "QMUL@MediaEval 2012: Social Event Detection in Collaborative Photo Collections", In Proceedings of the "MediaEval" 2012 Workshop. Pisa, Italy, October, 2012, pp. 1-2. CEUR-WS.
Abstract: We present an approach to detect social events and retrieve associated photos in collaboratively annotated photo collections as part of the MediaEval 2012 Benchmark. We combine data of various modalities from annotated photos as well as from external data sources within a framework that has a classification model at its core. Experiments based on the MediaEval Social Event Detection Dataset demonstrate the effectiveness of our approach.
BibTeX:
@inproceedings{brenner2012mediaeval,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  editor = {Larson, Martha and Schmiedeke, Sebastian and Kelm, Pascal and Rae, Adam and Mezaris, Vasileios and Piatrik, Tomas and Soleymani, Mohammad and Metze, Florian and Jones, Gareth},
  title = {"QMUL @ MediaEval 2012": Social Event Detection in Collaborative Photo Collections},
  booktitle = {Proceedings of the "MediaEval" 2012 Workshop},
  publisher = {CEUR-WS},
  year = {2012},
  pages = {1--2},
  note = {google scholar entry: MediaEval 2012 Workshop. Pisa, Italy, 4-5 October 2012.},
  url = {http://ceur-ws.org/Vol-927/}
}
Brenner M and Izquierdo E (2012), "Social event detection and retrieval in collaborative photo collections", In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. Hong Kong, China (21), pp. 1-8. ACM.
Abstract: In this paper, we present an approach to detect social events and retrieve associated photos in collaboratively annotated photo collections. We combine data of various modalities such as time, location, and textual and visual features within a framework that has a classification model at its core. Compared to traditional approaches that mainly consider the photos only as a source of information, we also incorporate external information from datasets and online web services to further improve the performance. Experiments based on the MediaEval Social Event Detection Dataset demonstrate the effectiveness of our approach.
BibTeX:
@inproceedings{brenner2012social,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  title = {Social event detection and retrieval in collaborative photo collections},
  booktitle = {Proceedings of the 2nd ACM International Conference on Multimedia Retrieval},
  publisher = {ACM},
  year = {2012},
  number = {21},
  pages = {1--8},
  note = {google scholar entry: 2nd ACM International Conference on Multimedia Retrieval (ICMR 2012). Hong Kong, China, 5-8 June 2012.},
  url = {http://dl.acm.org/citation.cfm?id=2324823},
  doi = {10.1145/2324796.2324823}
}
Chandramouli K, Piatrik T and Izquierdo E (2012), "Predicting User Tags Using Semantic Expansion", In Eternal Systems, First International Workshop (EternalS 2011). Budapest, Hungary, May, 2012. Vol. 255, pp. 88-99. Springer.
Abstract: Manually annotating content such as Internet videos, is an intellectually expensive and time consuming process. Furthermore, keywords and community-provided tags lack consistency and present numerous irregularities. Addressing the challenge of simplifying and improving the process of tagging online videos, which is potentially not bounded to any particular domain, we present an algorithm for predicting user-tags from the associated textual metadata in this paper. Our approach is centred around extracting named entities exploiting complementary textual resources such as Wikipedia and Wordnet. More specifically to facilitate the extraction of semantically meaningful tags from a largely unstructured textual corpus we developed a natural language processing framework based on GATE architecture. Extending the functionalities of the in-built GATE named entities, the framework integrates a bag-of-articles algorithm for effectively searching through the Wikipedia articles for extracting relevant articles. The proposed framework has been evaluated against MediaEval 2010 Wild Wild Web dataset, which consists of large collection of Internet videos.
BibTeX:
@inproceedings{chandramouli2012predicting,
  author = {Chandramouli, Krishna and Piatrik, Tomas and Izquierdo, Ebroul},
  editor = {Moschitti, Alessandro and Scandariato, Riccardo},
  title = {Predicting User Tags Using Semantic Expansion},
  booktitle = {Eternal Systems, First International Workshop (EternalS 2011)},
  publisher = {Springer},
  year = {2012},
  volume = {255},
  pages = {88--99},
  note = {wrong category on google scholar},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-642-28033-7_8},
  doi = {10.1007/978-3-642-28033-7_8}
}
Chandramouli Krishna; Fernandez Arguedas VIE (2012), "An Unified Knowledge Representation Framework for Surveillance Videos", In Latin-American Conference on Networked and Electronic Media (LACNEM 2012). Santiago de Chile, Chile, October, 2012, pp. 1-4. VTR Banda Ancha (Chile) S.A..
Abstract: Knowledge Representation in the domain of Surveillance has attracted the attention of many researchers from interdisciplinary areas. Several fragmented ontologies are found in the literature for describing events and indexing media items to facilitate semantic retrieval and object annotation. However, with the ever increasing deployment of CCTV throughout the world, it is imperative that ontologies are developed that could unify the relevant and necessary information that are obtained from surveillance installation. Hence, in this paper we present a comprehensive knowledge representation framework for modelling, indexing, classifying and retrieving surveillance videos for forensic applications. The framework integrates multitude dimension of information including geospatial grid representation, temporal semantics, event representation and object annotation through multimedia ontology. The framework thus would enable querying of high-level information that otherwise would not be cross-indexed.
BibTeX:
@inproceedings{chandramouli2012unified,
  author = {Chandramouli, Krishna; Fernandez Arguedas, Virginia; Izquierdo, Ebroul},
  title = {An Unified Knowledge Representation Framework for Surveillance Videos},
  booktitle = {Latin-American Conference on Networked and Electronic Media (LACNEM 2012)},
  publisher = {VTR Banda Ancha (Chile) S.A.},
  year = {2012},
  pages = {1--4},
  note = {google scholar entry: Latin-American Conference on Networked and Electronic Media (LACNEM 2012). Santiago de Chile, Chile, 18 October 2012.},
  url = {http://lacnem.cl/papers}
}
Fernandez Arguedas V, Zhang Q and Izquierdo E (2012), "Bayesian Multimodal Fusion in Forensic Applications", In Computer Vision � ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings. Firenze, Italy, October, 2012. Vol. 7585, pp. 466-475. Springer.
Abstract: The public location of CCTV cameras and their connexion with public safety demand high robustness and reliability from surveillance systems. This paper focuses on the development of a multimodal fusion technique which exploits the benefits of a Bayesian inference scheme to enhance surveillance systems� reliability. Additionally, an automatic object classifier is proposed based on the multimodal fusion technique, addressing semantic indexing and classification for forensic applications. The proposed Bayesian-based Multimodal Fusion technique, and particularly, the proposed object classifier are evaluated against two state-of-the-art automatic object classifiers on the i-LIDS surveillance dataset.
BibTeX:
@inproceedings{fernandez2012bayesian,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Fusiello, Andrea and Murino, Vittorio and Cucchiara, Rita},
  title = {Bayesian Multimodal Fusion in Forensic Applications},
  booktitle = {Computer Vision � ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings},
  publisher = {Springer},
  year = {2012},
  volume = {7585},
  pages = {466--475},
  note = {google scholar entry: 12th European Conference on Computer Vision (ECCV 2012), Florence, Italy, October 7-13, 2012.},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-33885-4_47},
  doi = {10.1007/978-3-642-33885-4_47}
}
Fernandez Arguedas V, Zhang Q and Izquierdo E (2012), "Bayesian Multimodal Fusion in Forensic Applications", In Computer Vision � ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings. Firenze, Italy, October, 2012. Vol. 7585, pp. 466-475. Springer.
Abstract: The public location of CCTV cameras and their connexion with public safety demand high robustness and reliability from surveillance systems. This paper focuses on the development of a multimodal fusion technique which exploits the benefits of a Bayesian inference scheme to enhance surveillance systems� reliability. Additionally, an automatic object classifier is proposed based on the multimodal fusion technique, addressing semantic indexing and classification for forensic applications. The proposed Bayesian-based Multimodal Fusion technique, and particularly, the proposed object classifier are evaluated against two state-of-the-art automatic object classifiers on the i-LIDS surveillance dataset.
BibTeX:
@inproceedings{FernandezArguedas2012,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Fusiello, Andrea and Murino, Vittorio and Cucchiara, Rita},
  title = {Bayesian Multimodal Fusion in Forensic Applications},
  booktitle = {Computer Vision � ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings},
  publisher = {Springer},
  year = {2012},
  volume = {7585},
  pages = {466--475},
  note = {google scholar entry: 12th European Conference on Computer Vision (ECCV 2012), Florence, Italy, October 7-13, 2012.},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-33885-4_47},
  doi = {10.1007/978-3-642-33885-4_47}
}
Fraternali P, Tagliasacchi M, Martinenghi D, Bozzon A, Catallo I, Ciceri E, Nucci F, Croce V, Altingovde IS, Siberski W, Giunchiglia F, Nejdl W, Larson M, Izquierdo E, Daras P, Chrons O, Traphoener R, Decker B, Lomas J, Aichroth P, Novak J, Sillaume G, Figueroa Sanchez F and Salas-Parra C (2012), "The CUBRIK Project: Human-Enhanced Time-Aware Multimedia Search", In Proceedings of the 21st international conference companion on World Wide Web (WWW 2012). Lyon, France, April, 2012, pp. 259-262. ACM.
Abstract: The Cubrik Project is an Integrated Project of the 7th Framework Programme that aims at contributing to the multimedia search domain by opening the architecture of multimedia search engines to the integration of open source and third party content annotation and query processing components, and by exploiting the contribution of humans and communities in all the phases of multimedia search, from content processing to query processing and relevance feedback processing. The CUBRIK presentation will showcase the architectural concept and scientific background of the project and demonstrate an initial scenario of human-enhanced content and query processing pipeline.
BibTeX:
@inproceedings{fraternali2012cubrik,
  author = {
Fraternali, Piero and
Tagliasacchi, Marco and
Martinenghi, Davide and
Bozzon, Alessandro and
Catallo, Ilio and
Ciceri, Eleonora and
Nucci, Francesco and
Croce, Vincenzo and
Altingovde, Ismail Sengor and
Siberski, Wolf and
Giunchiglia, Fausto and
Nejdl, Wolfgang and
Larson, Martha and
Izquierdo, Ebroul and
Daras, Petros and
Chrons, Otto and
Traphoener, Ralph and
Decker, Bjoern and
Lomas, John and
Aichroth, Patrick and
Novak, Jasminko and
Sillaume, Ghislain and
Figueroa Sanchez, F. and
Salas-Parra, Carolina
}, title = {The CUBRIK Project: Human-Enhanced Time-Aware Multimedia Search}, booktitle = {Proceedings of the 21st international conference companion on World Wide Web (WWW 2012)}, publisher = {ACM}, year = {2012}, pages = {259--262}, note = {google scholar entry: 21st international conference companion on World Wide Web (WWW 2012). Lyon, France, 16-20 April 2012.}, url = {http://www2012.wwwconference.org/proceedings/companion/p267.pdf}, doi = {10.1145/2187980.2188023} }
Gündoğdu E and Alatan AA (2012), "Nestle: Interest point extraction via nested circles", In Signal Processing and Communications Applications Conference (SIU), Proceedings of 20th. Muğla, Turkey, april, 2012, pp. 1-4. IEEE.
Abstract: A novel low complexity feature extraction algorithm, only performing by a single comparison per pixel on the average during detection is proposed. While single-scale version of the algorithm remains quite efficient compared against the complexity of the state-of-the-art algorithms, a multi-scale version is also proposed to handle blur and scale changes. The performance tests on the repeatability of these keypoints signify the promising performance of the proposed algorithm to be used in many resource limited computer vision applications due to its efficiency and competitive repeatability performance.
BibTeX:
@inproceedings{gundogdu2012nestle,
  author = {Gündoğdu, Erhan and Alatan, A. Aydın},
  title = {Nestle: Interest point extraction via nested circles},
  booktitle = {Signal Processing and Communications Applications Conference (SIU), Proceedings of 20th},
  publisher = {IEEE},
  year = {2012},
  pages = {1--4},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6204540},
  doi = {10.1109/SIU.2012.6204540}
}
Gündoğdu E and Alatan Alatan AA (2012), "Feature detection and matching towards augmented reality applications on mobile devices", In 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2012 Proceedings of. Zurich, Switzerland, October, 2012, pp. 1-4.
Abstract: In this work, a novel feature detection algorithm, a new local binary pattern for local binary description and a tree-based descriptor indexing for descriptor matching are proposed. Similar to well-known FAST detector, proposed feature detector performs detection via pixel intensity comparisons in nested circles. Interest point description is achieved by a novel comparison pattern, whereas matching is performed by a fuzzy decision tree. Based on simulations, it is observed that the proposed system performs competitive or better than the state-of-the-art similar techniques. Moreover, the overall system is capable of running in real-time in a 2.8GHz PC with this promising performance.
BibTeX:
@inproceedings{gundogdu2012feature,
  author = {Gündoğdu, Erhan and Alatan, Alatan, A. Aydın},
  title = {Feature detection and matching towards augmented reality applications on mobile devices},
  booktitle = {3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2012 Proceedings of},
  year = {2012},
  pages = {1--4},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6365432},
  doi = {10.1109/3DTV.2012.6365432}
}
Kaymak S and Patras I (2012), "Exploiting Depth and Intensity Information for Head Pose Estimation with Random Forests and Tensor Models", In Computer Vision -- ACCV 2012 Workshops. Daejeon, Korea, November, 2012. Vol. 7729, pp. 160-170. Springer.
BibTeX:
@inproceedings{kaymak2012exploiting,
  author = {Kaymak, Sertan and Patras, Ioannis},
  editor = {Park, Jong-Il and Kim, Junmo},
  title = {Exploiting Depth and Intensity Information for Head Pose Estimation with Random Forests and Tensor Models},
  booktitle = {Computer Vision -- ACCV 2012 Workshops},
  publisher = {Springer},
  year = {2012},
  volume = {7729},
  pages = {160--170},
  note = {google scholar entry: Computer Vision -- ACCV 2012 International Workshops. Daejeon, Korea, 5-6 November 2012.},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-642-37484-5_14},
  doi = {10.1007/978-3-642-37484-5_14}
}
Kumar BGV and Patras I (2012), "Learning codebook weights for action detection", In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. Providence, Rhode Island , pp. 27-32. IEEE.
Abstract: In this work we present a discriminative codebook weighting approach for action detection. We learn global and local weights for the codewords by considering the spatio-temporal Hough voting space of the training sequences. In contrast to the Implicit Shape Model (ISM) where all the codewords that are matched to a local descriptor cast votes with uniform weights, we learn local weights for the matched codewords. In order to learn the local weights we employ Locality-constrained Linear Coding (LLC). Further, we formulate the learning of the global weights as a convex quadratic programming and use alternating optimization to solve for the weights. We demonstrate the performance of the algorithm on KTH action dataset where we compare with the Hough detector using kmeans codebook.
BibTeX:
@inproceedings{kumar2012learning,
  author = {Kumar, B. G. Vijay and Patras, Ioannis},
  title = {Learning codebook weights for action detection},
  booktitle = {Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on},
  publisher = {IEEE},
  year = {2012},
  pages = {27--32},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6239257},
  doi = {10.1109/CVPRW.2012.6239257}
}
Lin X, Kitanovski V, Zhang Q and Izquierdo E (2012), "Enhanced multi-view dancing videos synchronisation", In Image Analysis for Multimedia Interactive Services (WIAMIS 2012), 13th International Workshop on. May, 2012, pp. 1-4. IEEE.
Abstract: This paper describes a system for automatically synchronising multi-view video sequences of Salsa dancing recorded with multimodal capturing platform. The multimodal capturing setup consists of audiovisual streams along with depth maps and inertial measurements. Part of the dataset was video sequences captured from machine vision cameras and Microsoft Kinect sensor that were not temporal synchronised during the capturing stage. As an essential step, we proposed efficient solutions for synchronisation of these data based on co-occurrence appearance changes. In order to improve the accuracy, the proposed system employed state-of-art body detection and tracking algorithm to obtain Region of Interest, within which the appearance changes are analysed. The accurately synchronised video set can then be further analysed and augmented for visualisation and evaluation of dancing performance.
BibTeX:
@inproceedings{Lin2012,
  author = {Lin, Xinyu and Kitanovski, Vlado and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Enhanced multi-view dancing videos synchronisation},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2012), 13th International Workshop on},
  publisher = {IEEE},
  year = {2012},
  pages = {1--4},
  note = {google scholar entry: 13th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2012). Dublin, Ireland, 23-25 May 2012.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6226773},
  doi = {10.1109/WIAMIS.2012.6226773}
}
Lin X, Kitanovski V, Zhang Q and Izquierdo E (2012), "Enhanced multi-view dancing videos synchronisation", In Image Analysis for Multimedia Interactive Services (WIAMIS), 2012 13th International Workshop on. Dublin, Ireland, May, 2012, pp. 1-4. IEEE.
Abstract: This paper describes a system for automatically synchronising multi-view video sequences of Salsa dancing recorded with multimodal capturing platform. The multimodal capturing setup consists of audiovisual streams along with depth maps and inertial measurements. Part of the dataset was video sequences captured from machine vision cameras and Microsoft Kinect sensor that were not temporal synchronised during the capturing stage. As an essential step, we proposed efficient solutions for synchronisation of these data based on co-occurrence appearance changes. In order to improve the accuracy, the proposed system employed state-of-art body detection and tracking algorithm to obtain Region of Interest, within which the appearance changes are analysed. The accurately synchronised video set can then be further analysed and augmented for visualisation and evaluation of dancing performance.
BibTeX:
@inproceedings{lin2012enhanced,
  author = {Lin, Xinyu and Kitanovski, Vlado and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Enhanced multi-view dancing videos synchronisation},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS), 2012 13th International Workshop on},
  publisher = {IEEE},
  year = {2012},
  pages = {1--4},
  note = {google scholar entry: 13th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2012). Dublin, Ireland, 23-25 May 2012.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6226773},
  doi = {10.1109/WIAMIS.2012.6226773}
}
Liu Y, Hao P and Izquierdo E (2012), "Scene Geometric Recognition from Monocular Image", In 6th 3DTV-Conference (3DTV-CON 2012): The True Vision - Capture, Transmission and Display of 3D Video. ETH Zurich, 15-17 October 2012. Zurich, Switzerland, October, 2012, pp. 1-4.
Abstract: In this paper, we propose an approach to detect scene geometrical structure given only one monocular image. Several typical scene geometries are investigated and corresponding models are built. A scene geometry reasoning system is set up based on image statistical features and scene geometric features. This system is able to find best fitting geometric models for most of the images from the benchmark dataset. Scene categorization could reveal important three-dimensional information contained in an image. We demonstrate how this valuable information could be used to reason the depth profile of a specific scene. Planes co-constructing the scene could be detected and located. Experiments have been done to roughly restore the structure of the scene to verify system performance. By our approach, computer could interpret a single image in terms of its geometry straightforwardly, avoiding usual semantically overlapping and deficiency problems.
BibTeX:
@inproceedings{liu2012scene,
  author = {Liu, Yixian and Hao, Pengwei and Izquierdo, Ebroul},
  title = {Scene Geometric Recognition from Monocular Image},
  booktitle = {6th 3DTV-Conference (3DTV-CON 2012): The True Vision - Capture, Transmission and Display of 3D Video. ETH Zurich, 15-17 October 2012.},
  year = {2012},
  pages = {1--4},
  note = {google scholar entry: 6th 3DTV-Conference (3DTV-CON 2012). Zurich, Switzerland, 15-17 October 2012.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6365465},
  doi = {10.1109/3DTV.2012.6365465}
}
Liu Y, Hao P and Izquierdo E (2012), "Stage-Based 3D Scene Reconstruction from Single Image", In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2010). Tsukuba, Japan, November, 2012, pp. 1034-1037. IEEE.
Abstract: Holistic scene understanding is a major goal in recent research of computer vision. To deal with this task, reasoning the 3D relationship of components in a scene is identified as one of the key problems. We study this problem in terms of structural reconstruction of 3D scene from single view image. Our first step concentrates on geometrical layout analysis of scene using low-level features. We allocate images into seven recurring and stable geometry classes. This classification labels the image with rough knowledge of its scene geometry. Then, based on this geometry label, we propose an adaptive autonomous scene reconstruction algorithm which adopts specific approaches particularly for different scene types. We show, experimentally, given the right geometry label, low-quality uncalibrated monocular images from the benchmark dataset can be structurally reconstructed in 3D space in a time/effort efficient way. This robust approach does not require high quality or high complexity input image. We demonstrate the effectiveness of this approach in this paper.
BibTeX:
@inproceedings{liu2012stage,
  author = {Liu, Yixian and Hao, Pengwei and Izquierdo, Ebroul},
  title = {Stage-Based 3D Scene Reconstruction from Single Image},
  booktitle = {Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2010)},
  publisher = {IEEE},
  year = {2012},
  pages = {1034--1037},
  note = {google scholar entry: 21st International Conference on Pattern Recognition (ICPR 2010). Tsukuba Science City, Japan, 11-15 November 2012.},
  url = {http://vision.unipv.it/VA/VA-En/TEST/Hough%20Poligoni%20e%20rette/Single%20image.pdf}
}
Mekuria R, Sanna M and Cesar P (2012), "Media Synchronization in 3D Tele-Immersion Applications: an architecture", In Proceedings of the 2012 International Workshop on Media Synchronization (MediaSync 2012). Berlin, Germany, October, 2012, pp. 1-6. Google.
Abstract: Network coding applied to end-system multicast is a viable solution for a multitude of issues related to on-demand video streaming. End-system multicast on network overlays is a desirable option for relieving the content server from bandwidth bottlenecks and computational load as well as allowing decentralized allocation of resources for terminals with different computational and display capabilities. Network coding has proven to be able to solve many issues related to content distribution and rate allocation on end-system overlays, one of them being the coupon-collection problems typical of P2P networks. In this paper we present a scalable video streaming system based on end-system multicast, where the network coding technique with push-based content distribution is extended to perform prioritized streaming with error and congestion control. We identify a problem of layer and rate selection due to the difficulty in estimating the max-flow in end-system overlays, which, with many previously proposed techniques, yields to bandwidth inefficiencies. We present a mechanism for selecting and encoding chunks of scalable video prior to forwarding, and a peer-selection technique, targeting increased efficiency with the available bandwidth, that also improves quality and continuity of service with better use of network rate. Simulated tests results are presented to prove the performance of our system.
BibTeX:
@inproceedings{mekuria2012media,
  author = {Mekuria, Rufael and Sanna, Michele and Cesar, Pablo},
  title = {Media Synchronization in 3D Tele-Immersion Applications: an architecture},
  booktitle = {Proceedings of the 2012 International Workshop on Media Synchronization (MediaSync 2012)},
  publisher = {Google},
  year = {2012},
  pages = {1--6},
  note = {google scholar entry: 2012 Media Sunchronization Workshop (MediaSynch 2012). Berlin, Germany, 11 October 2012.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/3_Mekuria.pdf}
}
Mekuria R, Sanna M and Cesar P (2012), "Media Synchronization in REVERIE (FP7 REVERIE)", In Proceedings of the MediaSync 2012 Workshop. Berlin, Germany, October, 2012, pp. 1-6. ICIN Events.
Abstract: This article discusses the challenges ahead for assuring media synchronization in 3D tele-immersion applications. The discussion is based on an architecture and use cases that are defined in the European FP7 project REVERIE. The architecture allows capturing, transmission, and rendering, in real-time, various types of 3D media streams (e.g. geometry, movements of the participants, 3D audio), with the final objective of enabling immersive communication (and interactions) between remote users. For achieving the final goal, media synchronization is a key requirement. In particular, this article will focus on two types of synchronization: between real-time streams, and between downloaded and real-time streams. The first case refers to the synchronization needed between the different real-time captured media (e.g. 3D audio and visual streams). The second one aims at synchronizing downloaded content (e.g. 3D models) and the media captured in real-time. The solution reported in this article is implemented as a novel real-time streaming engine that can handle various types of 3D media streams. Moreover, based on global timestamps, the engine can provide synchronization support for a variety of scenarios.
BibTeX:
@inproceedings{sanna2012media,
  author = {Mekuria, Rufael and Sanna, Michele and Cesar, Pablo},
  title = {Media Synchronization in REVERIE (FP7 REVERIE)},
  booktitle = {Proceedings of the MediaSync 2012 Workshop},
  publisher = {ICIN Events},
  year = {2012},
  pages = {1--6},
  note = {google scholar entry: Media Synchronization Workshop (MediaSynch 2012)[co-located with ICIN 2012]. Berlin, Germany, 11 October 2012.},
  url = {http://www.icin.co.uk/tutorials/2012#workshop1}
}
Peixoto E and Izquierdo E (2012), "A Complexity-scalable Transcoder from H.264/AVC to the new HEVC Codec", In Image Processing (ICIP 2012), Proceedings of the 19th International Conference on. Orlando, Florida, September, 2012, pp. 737-740. IEEE.
Abstract: The emerging video coding standard, HEVC, is currently approaching the final stage of development prior to standardization. However, the current H.264/AVC standard is very successful, and it has been widely adopted for many applications. Thus, transcoding between these codecs will be highly needed once the HEVC codec is finalised. This paper studies the performance of one of the most common techniques for heterogeneous transcoding, motion vector (MV) reuse, in a H.264/AVC to HEVC transcoder. Furthermore, it proposes a new transcoder that is capable of complexity scalability, trading off rate-distortion performance for complexity reduction. The proposed transcoder is based on a new metric to compute the similarity of the H.264/AVCMVs, which is used to decide which HEVC partitions are tested on the transcoder.
BibTeX:
@inproceedings{Peixoto2012a,
  author = {Peixoto, Eduardo and Izquierdo, Ebroul},
  title = {A Complexity-scalable Transcoder from H.264/AVC to the new HEVC Codec},
  booktitle = {Image Processing (ICIP 2012), Proceedings of the 19th International Conference on},
  publisher = {IEEE},
  year = {2012},
  pages = {737--740},
  note = {google scholar entry: 19th International Conference on Image Processing (ICIP 2012). Orlando, Florida, 30 September - 3 October 2012.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICIP2012_Eduardo.pdf},
  doi = {10.1109/ICIP.2012.6466965}
}
Peixoto E and Izquierdo E (2012), "A Complexity-scalable Transcoder from H.264/AVC to the new HEVC Codec", In Image Processing (ICIP 2012), Proceedings of the 19th International Conference on. Orlando, Florida, September, 2012, pp. 737-740. IEEE.
Abstract: The emerging video coding standard, HEVC, is currently approaching the final stage of development prior to standardization. However, the current H.264/AVC standard is very successful, and it has been widely adopted for many applications. Thus, transcoding between these codecs will be highly needed once the HEVC codec is finalised. This paper studies the performance of one of the most common techniques for heterogeneous transcoding, motion vector (MV) reuse, in a H.264/AVC to HEVC transcoder. Furthermore, it proposes a new transcoder that is capable of complexity scalability, trading off rate-distortion performance for complexity reduction. The proposed transcoder is based on a new metric to compute the similarity of the H.264/AVCMVs, which is used to decide which HEVC partitions are tested on the transcoder.
BibTeX:
@inproceedings{peixoto2012image,
  author = {Peixoto, Eduardo and Izquierdo, Ebroul},
  title = {A Complexity-scalable Transcoder from H.264/AVC to the new HEVC Codec},
  booktitle = {Image Processing (ICIP 2012), Proceedings of the 19th International Conference on},
  publisher = {IEEE},
  year = {2012},
  pages = {737--740},
  note = {google scholar entry: 19th International Conference on Image Processing (ICIP 2012). Orlando, Florida, 30 September - 3 October 2012.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICIP2012_Eduardo.pdf},
  doi = {10.1109/ICIP.2012.6466965}
}
Piatrik T, Fernandez Arguedas V and Izquierdo E (2012), "The privacy challenges of in-depth video analytics", In Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on. Banff, Alberta , pp. 383-386. IEEE.
Abstract: The increasing need for both automated and privacy-respecting CCTV systems adds many challenges to the tasks of Video Analytics. The growing capabilities of automated surveillance systems lead to the automatic extraction and processing of complex and potentially personal data related to individuals who may not even be aware of it. This paper discusses the issues related to the processing of potentially sensitive information extracted from various computer vision algorithms and techniques such as person tracking, soft biometrics, behaviour analysis or multi-modal person identification, and delivers insights regarding technical solutions and guidelines for achieving a higher level of privacy compliance.
BibTeX:
@inproceedings{Piatrik2012,
  author = {Piatrik, Tomas and Fernandez Arguedas, Virginia and Izquierdo, Ebroul},
  title = {The privacy challenges of in-depth video analytics},
  booktitle = {Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on},
  publisher = {IEEE},
  year = {2012},
  pages = {383--386},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6343473},
  doi = {10.1109/MMSP.2012.6343473}
}
Piatrik T, Fernandez Arguedas V and Izquierdo E (2012), "The privacy challenges of in-depth video analytics.", In Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on, pp. 383-386. IEEE.
Abstract: The increasing need for both automated and privacy-respecting CCTV systems adds many challenges to the tasks of Video Analytics. The growing capabilities of automated surveillance systems lead to the automatic extraction and processing of complex and potentially personal data related to individuals who may not even be aware of it. This paper discusses the issues related to the processing of potentially sensitive information extracted from various computer vision algorithms and techniques such as person tracking, soft biometrics, behaviour analysis or multi-modal person identification, and delivers insights regarding technical solutions and guidelines for achieving a higher level of privacy compliance.
BibTeX:
@inproceedings{piatrik2012privacy,
  author = {Piatrik, Tomas and Fernandez Arguedas, Virginia and Izquierdo, Ebroul},
  title = {The privacy challenges of in-depth video analytics.},
  booktitle = {Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on},
  publisher = {IEEE},
  year = {2012},
  pages = {383--386},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6343473},
  doi = {10.1109/MMSP.2012.6343473}
}
Sanna M and Izquierdo E (2012), "A Method for Detection/Deletion via Network Coding for Unequal Error Protection of Scalable Video over Error-Prone Networks", In Mobile Multimedia Communications. 7th International ICST Conference (MobiMedia 2011). Cagliari, Italy, September, 2012. Vol. 79, pp. 105-120. Springer.
Abstract: The development of universal systems for video streaming needs transmission strategies that exploit the characteristics of the transmission medium such as a wireless network. Scalable video coding allows partial decoding of the video for multiple demands or under severe reception conditions. Network coding increases the transmission rate and provides error control at network level. We propose a detection/deletion system for error reduction in presence of channel noise. We combine the error detection capabilities of the network code with erasure decoding and unequal error protection to improve the visual quality of the video.
BibTeX:
@inproceedings{sanna2012method,
  author = {Sanna, Michele and Izquierdo, Ebroul},
  editor = {Atzori, Luigi and Delgado, Jaime and Giusto, Daniele D.},
  title = {A Method for Detection/Deletion via Network Coding for Unequal Error Protection of Scalable Video over Error-Prone Networks},
  booktitle = {Mobile Multimedia Communications. 7th International ICST Conference (MobiMedia 2011).},
  publisher = {Springer},
  year = {2012},
  volume = {79},
  pages = {105--120},
  note = {google scholar entry: 7th International ICST Mobile Multimedia Communications Conference (MobiMedia 2011). Cagliari, Italy, 5-7 September 2011.},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-642-30419-4_11},
  doi = {10.1007/978-3-642-30419-4_11}
}
Sarıyanidi E, Şencan O and Temeltaş H (2012), "An image-to-image loop-closure detection method based on unsupervised landmark extraction", In 2012 IEEE Intelligent Vehicles Symposium (IV 2012). Alcala de Henares, Spain, June, 2012, pp. 420-425. IEEE.
Abstract: This paper presents a dedicated approach to detect loop closures using visually salient patches. We introduce a novel, energy maximization based saliency detection technique which has been used for unsupervised landmark extraction. We explain how to learn the extracted landmarks on-the-fly and re-identify them. Furthermore, we describe the sparse location representation we use to recognize previously seen locations in order to perform reliable loop closure detection. The performance of our method has been analyzed both on an indoor and an outdoor dataset, and it has been shown that our approach achieves quite promising results on both datasets.
BibTeX:
@inproceedings{sariyanidi2012image,
  author = {Sarıyanidi, Evangelos and Şencan, Onur and Temeltaş, Hakan},
  title = {An image-to-image loop-closure detection method based on unsupervised landmark extraction},
  booktitle = {2012 IEEE Intelligent Vehicles Symposium (IV 2012)},
  publisher = {IEEE},
  year = {2012},
  pages = {420--425},
  note = {google scholar entry: IEEE Intelligent Vehicles Symposium (IV 2012). Alcal de Henares, Spain, 3-7 June 2012.},
  url = {http://www.robotics.itu.edu.tr/files/fl-4fb2166fcaab1.pdf},
  doi = {10.1109/IVS.2012.6232174}
}
Sarıyanidi E, Dağlı V, Tek SC, Tunç B and Gökmen M (2012), "A novel face representation using local Zernike moments", In Proceedings of the 20th Signal Processing and Communications Applications Conference (SIU 2012). Muğla, Turkey, April, 2012, pp. 1-4. IEEE.
Abstract: This study proposes a novel image representation and demonstrates its advantages when used for face recognition. The proposed representation is obtained by computing the global moments, which are popular tools for object and especially character recognition, locally at each pixel, thus by decomposing the image into a set of images corresponding to different moment components. Our experiments on FERET face database indicates the superiority of the proposed method over methods employing Gabor or LBP representations.
BibTeX:
@inproceedings{sariyanidi2012novel,
  author = {Sarıyanidi, Evangelos and Dağlı, Volkan and Tek, Salih Cihan and Tunç, Birkan and Muhittin Gökmen},
  title = {A novel face representation using local Zernike moments},
  booktitle = {Proceedings of the 20th Signal Processing and Communications Applications Conference (SIU 2012)},
  publisher = {IEEE},
  year = {2012},
  pages = {1--4},
  note = {google scholar entry: 20th Signal Processing and Communications Applications Conference (SIU 2012). Mugla, Turkey, 18-20 April 2012.},
  url = {http://sariyanidi.pythonanywhere.com/media/sariyanidi_siu12None.pdf},
  doi = {10.1109/SIU.2012.6204621}
}
Sarıyanidi E, Tunç B and Gökmen M (2012), "LZM in Action: Realtime Face Recognition System", In Computer Vision (ECCV 2012). Florence, Italy, October, 2012. Vol. 3, pp. 647-650. Springer.
Abstract: In this technical demonstration, we introduce a real time face detection and recognition prototype. The proposed system can work with different image sources such as still images, videos from web cameras , and videos from ip cameras. The captured images are firstly processed by a cascaded classifier of Modified Census Transform (MCT) features to detect the faces. Then, facial features are detected inside the face region. These features are used to align and crop the face patches. Detection phase can be considerably improved by incorporating a tracking scheme to increase the hit rate while decreasing the false alarm rate. The registered faces are recognized using a novel method called Local Zernike Moments (LZM). A probabilistic decision step is employed in the final inference phase to provide a confidence margin. Introducing new identities via system’s user interface is considerably simple since the system does not require retraining after each new identity.
BibTeX:
@inproceedings{sariyanidi2012lzm,
  author = {Sarıyanidi, Evangelos and Tunç, Birkan and Gökmen, Muhittin},
  editor = {Fusiello, Andrea and Murino, Vittorio and Cucchiara, Rita},
  title = {LZM in Action: Realtime Face Recognition System},
  booktitle = {Computer Vision (ECCV 2012)},
  publisher = {Springer},
  year = {2012},
  volume = {3},
  pages = {647--650},
  note = {google scholar entry: European Conference on Computer Vision Workshops and Demonstrations [Workshops and Demonstrations](ECCV 2012). Firenze, Italy, 7-13 October 2012.},
  url = {http://sariyanidi.pythonanywhere.com/media/sariyanidi_eccvw12None.pdf},
  doi = {10.1007/978-3-642-33885-4_73}
}
Sariyanidi E, Dağlı V, Tek SC, Tunç B and Gökmen M (2012), "Local Zernike Moments: A new representation for face recognition", In Image Processing (ICIP 2012), Proceedings of the 19th International Conference on. Orlando, Florida, September, 2012, pp. 585-588. IEEE.
Abstract: In this paper, we propose a new image representation called Local Zernike Moments (LZM) for face recognition. In recent years, local image representations such as Gabor and Local Binary Patterns (LBP) have attracted great interest due to their success in handling difficulties of face recognition. In this study, we aim to develop an alternative representation to further improve the face recognition performance. We achieve this by utilizing Zernike Moments which have been successfully used as shape descriptors for character recognition. We modify global Zernike moments to obtain a local representation by computing the moments at every pixel of a face image by considering its local neighborhood, thus decomposing the image into a set of images, moment components, to capture the micro structure around each pixel. Our experiments on FERET face database reveal the superior performance of LZM over Gabor and LBP representations.
BibTeX:
@inproceedings{sariyanidi2012local,
  author = {Sariyanidi, Evangelos and Dağlı, Volkan and Salih Cihan Tek and Tunç, Birkan and Gökmen, Muhittin},
  title = {Local Zernike Moments: A new representation for face recognition},
  booktitle = {Image Processing (ICIP 2012), Proceedings of the 19th International Conference on},
  publisher = {IEEE},
  year = {2012},
  pages = {585--588},
  note = {google scholar entry: 19th International Conference on Image Processing (ICIP 2012). Orlando, Florida, 30 September - 3 October 2012.},
  url = {http://sariyanidi.pythonanywhere.com/media/sariyanidi_icip12None.pdf},
  doi = {10.1109/ICIP.2012.6466927}
}
Sevillano X, Piatrik T, Chandramouli K, Zhang Q and Izquierdo E (2012), "Geo-tagging online videos using semantic expansion and visual analysis", In Image Analysis for Multimedia Interactive Services (WIAMIS), 2012 13th International Workshop on. Dublin, Ireland , pp. 1-4. IEEE.
Abstract: The association of geographical tags to multimedia resources enables browsing and searching online multimedia repositories using geographical criteria, but millions of already online but non geo-tagged videos and images remain invisible to the eyes of this type of systems. This situation calls for the development of automatic geo-tagging techniques capable of estimating the location where a video or image was taken. This paper presents a bimodal geo-tagging system for online videos based on extracting and expanding the geographical information contained in the textual metadata and on visual similarity criteria. The performance of the proposed system is evaluated on the MediaEval 2011 Placing task data set, and compared against the participants in that workshop.
BibTeX:
@inproceedings{Sevillano2012a,
  author = {Sevillano, Xavier and Piatrik, Tomas and Chandramouli, Krishna and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Geo-tagging online videos using semantic expansion and visual analysis},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS), 2012 13th International Workshop on},
  publisher = {IEEE},
  year = {2012},
  pages = {1--4},
  note = {fix authors on google scholar},
  url = {http://www.researchgate.net/profile/Xavier_Sevillano/publication/233726317_Geo-Tagging_Online_Videos_Using_Semantic_Expansion_and_Visual_Analysis/links/00b7d5232f70547ad8000000.pdf},
  doi = {10.1109/WIAMIS.2012.6226764}
}
Sevillano X, Piatrik T, Chandramouli K, Zhang Q and Izquierdo E (2012), "Geo-tagging online videos using semantic expansion and visual analysis", In Image Analysis for Multimedia Interactive Services (WIAMIS 2012), 13th International Workshop on, pp. 1-4. IEEE.
Abstract: The association of geographical tags to multimedia resources enables browsing and searching online multimedia repositories using geographical criteria, but millions of already online but non geo-tagged videos and images remain invisible to the eyes of this type of systems. This situation calls for the development of automatic geo-tagging techniques capable of estimating the location where a video or image was taken. This paper presents a bimodal geo-tagging system for online videos based on extracting and expanding the geographical information contained in the textual metadata and on visual similarity criteria. The performance of the proposed system is evaluated on the MediaEval 2011 Placing task data set, and compared against the participants in that workshop.
BibTeX:
@inproceedings{sevillano2012geo,
  author = {Sevillano, Xavier and Piatrik, Tomas and Chandramouli, Krishna and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Geo-tagging online videos using semantic expansion and visual analysis},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2012), 13th International Workshop on},
  publisher = {IEEE},
  year = {2012},
  pages = {1--4},
  note = {fix authors on google scholar},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6226764},
  doi = {10.1109/WIAMIS.2012.6226764}
}
Yang H, Liu X and Patras I (2012), "A Simple and Effective Extrinsic Calibration Method of a Camera and a Single Line Scanning Lidar", In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012). Tsukuba, Japan, November, 2012, pp. 1439-1442. IEEE.
Abstract: In this paper we propose an extrinsic calibration method of regular camera and single line scanning lidar which are widely utilized together. Based on a long ignored aspect that infra-red (IR) source for commonly used lidars lies in the response range of regular cameras, we employ an auxiliary IR filter (blocking natural light and letting IR pass) in order for camera to image the scan traces of lidar by prolonging exposure time. Then by scanning V-shaped target such as the intersection of two smooth walls, corresponding lines (or points) on scan plane and image are found through line fitting. With these high confident correspondences, extrinsic parameters (known camera) or a planar homography (unknown camera) can easily be calculated. Furthermore, two evaluation methods, namely line alignment error and two-view point alignment error, are developed. Experiments show that our method greatly simplifies the calibration procedure and outperforms the state-of-the-art in accuracy by using only one-tenth of the calibration data.
BibTeX:
@inproceedings{yang2012simple,
  author = {Yang, Heng and Liu, Xiaolin and Patras, Ioannis},
  title = {A Simple and Effective Extrinsic Calibration Method of a Camera and a Single Line Scanning Lidar},
  booktitle = {Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012)},
  publisher = {IEEE},
  year = {2012},
  pages = {1439--1442},
  note = {google scholar entry: 21st International Conference on Pattern Recognition (ICPR 2012). Tsukuba, Japan, 11-15 November 2012.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/ICPR12_1396_FI.pdf}
}
Yang H, Zhang Y, Liu X and Patras I (2012), "Coupled 3D Tracking and Pose Optimization of Rigid Objects Using Particle Filter", In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012). Tsukuba, Japan, November, 2012, pp. 1451-1454. IEEE.
Abstract: In order to track and estimate the pose of known rigid objects with high accuracy in unconstrained environment with light disturbance, scale changes and occlusion, we propose to combine 3D particle filter (PF) framework with algebraic pose optimization in a closed loop. A new PF observation model based on line similarity in 3D space is devised and the output of 3D PF tracking, namely line correspondences (model edges and image line segments), are provided for algebraic line-based pose optimization. As a feedback, the optimized pose serves as a particle with high weight during re-sampling. To speed up the algorithm, a dynamic ROI is used for reducing the line detection and search space. Experiments show our proposed algorithm can effectively track and accurately estimate the pose of freely moving 3D rigid objects in complex environment.
BibTeX:
@inproceedings{yang2012coupled,
  author = {Yang, Heng and Zhang, Yueqiang and Liu, Xiaolin and Patras, Ioannis},
  title = {Coupled 3D Tracking and Pose Optimization of Rigid Objects Using Particle Filter},
  booktitle = {Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012)},
  publisher = {IEEE},
  year = {2012},
  pages = {1451--1454},
  note = {google scholar entry: 21st International Conference on Pattern Recognition (ICPR 2012). Tsukuba, Japan, 11-15 November 2012.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/ICPR12_1402_FI.pdf}
}

Theses and Monographs

Bangert T (2012), "Color: an algorithmic approach". Thesis at: Queen Mary University of London. October, 2012, pp. 1-93.
Abstract: The fundamentals of colour vision were set out in the mid-19th century but have been split between the empirical observation that the underlying hardware responsible for vision was based upon three classes of physical sensors and the perceptual finding that colour consisted of variations of four underlying indivisible primaries, organized into two opponent pairs (blue-yellow and red-green). One of the great advances in the understanding of colour vision was developing an understanding of the mechanism of opponency that makes up the first layer of the neural circuitry that resides directly behind the sensor array of the human visual system. Two opponent colour channels were found, precisely as predicted by the study of perception. Despite the fact that the neural processing circuitry of the visual sensor array consists of only two or three layers of neurons, little further progress has been made to decipher the functionality of subsequent layers. As a result there is little agreement on the nature of the information that is produced by the neural systems that lie directly behind the sensors (at the front of the brain) which is sent to the visual system at the rear of the brain. In this thesis it is proposed that the failure to understand the nature of this information stems mainly from two factors: (1) a need to compensate for an inherent deficiency in the sensor array specific to our evolutionary history (2) the success of the paradigm under which colour is a property of perception rather than information structured by underlying function. In this thesis a paradigm of colour as functional information of an artificial computational visual system is proposed, a simplified artificial colour sensor processing system is presented and parallels are drawn between how this system processes information and how the human visual system is known to process information. It is suggested that understanding the computational requirements of functional colour processing might be helpful in understanding the complex functionality that resides directly behind the sensor array of the human visual system.
BibTeX:
@mastersthesis{bangert2012color,
  author = {Bangert, Thomas},
  editor = {Izquierdo, Ebroul},
  title = {Color: an algorithmic approach},
  school = {Queen Mary University of London},
  year = {2012},
  pages = {1--93},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/ThomasBangert(tb300)_MScThesis.pdf}
}
Gündoğdu E (2012), "feature detection and matching towards augmented reality applications on mobile devices". Thesis at: Middle East Tehnical University. Sept, 2012.
BibTeX:
@mastersthesis{gundogu2012feature,
  author = {Gündoğdu, Erhan},
  editor = {Alatan, A. Aydın},
  title = {feature detection and matching towards augmented reality applications on mobile devices},
  school = {Middle East Tehnical University},
  year = {2012},
  url = {http://www.eecs.qmul.ac.uk/~eg304/pub/Msc_Thesis.pdf}
}
Palašek P (2012), "Visual tracking of soft tissue targets in sequences of 3D ultrasound images". Thesis at: Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia. July, 2012. (361)
Abstract: This work considers tracking of deforming soft tissues in sequences of three-dimensional ultrasound images. The chosen approach to the tracking is based on modeling the deformations using the thin-plate spline warp. In practice, the tracking is reduced to the estimation of the changes in control point locations by examining the intensity changes of neighbouring three-dimensional images in the sequence. The problem this approach faces is its low robustness to ultrasound noise, preventing the correct evolution of the deformation model. In this work, the basic method for tracking of the deforming soft tissues was implemented, along with two new methods used for improving of the basic method's robustness. The two new methods used were the regularization of the thin-plate spline warp used as the motion model, and the adding of a mass-spring system used for physically constraining the movement of the control points. The new used methods are described and the results acquired by performing tests on simulated sequences of deforming three-dimensional ultrasound images are discussed.
BibTeX:
@mastersthesis{palasek2012visual,
  author = {Palašek, Petar},
  title = {Visual tracking of soft tissue targets in sequences of 3D ultrasound images},
  school = {Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia},
  year = {2012},
  number = {361},
  url = {http://www.eecs.qmul.ac.uk/~pp305/pub/palasek12ms.pdf}
}
Peixoto E (2012), "Advanced Heterogeneous Video Transcoding". Thesis at: Queen Mary University of London. November, 2012, pp. 1-246.
Abstract: Video transcoding is an essential tool to promote inter-operability between different video communication systems. This thesis presents two novel video transcoders, both operating on bitstreams of the current H.264/AVC standard. The first transcoder converts H.264/AVC bitstreams to a Wavelet Scalable Video Codec (W-SVC), while the second targets the emerging High Efficiency Video Coding (HEVC). Scalable Video Coding (SVC) enables low complexity adaptation of compressed video, providing an efficient solution for content delivery through heterogeneous networks. The transcoder proposed here aims at exploiting the advantages offered by SVC technology when dealing with conventional coders and legacy video, efficiently reusing information found in the H.264/AVC bitstream to achieve a high rate-distortion performance at a low complexity cost. Its main features include new mode mapping algorithms that exploit the W-SVC larger macroblock sizes, and a new state-of-the-art motion vector composition algorithm that is able to tackle different coding configurations in the H.264/AVC bitstream, including IPP or IBBP with multiple reference frames. The emerging video coding standard, HEVC, is currently approaching the final stage of development prior to standardization. This thesis proposes and evaluates several transcoding algorithms for the HEVC codec. In particular, a transcoder based on a new method that is capable of complexity scalability, trading off rate-distortion performance for complexity reduction, is proposed. Furthermore, other transcoding solutions are explored, based on a novel content-based modeling approach, in which the transcoder adapts its parameters based on the contents of the sequence being encoded. Finally, the application of this research is not constrained to these transcoders, as many of the techniques developed aim to contribute to advance the research on this field, and have the potential to be incorporated in different video transcoding architectures.
BibTeX:
@phdthesis{peixoto2012advanced,
  author = {Eduardo Peixoto},
  editor = {Izquierdo, Ebroul},
  title = {Advanced Heterogeneous Video Transcoding},
  school = {Queen Mary University of London},
  year = {2012},
  pages = {1--246},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/EduardoPeixoto_PhDThesis.pdf}
}

Presentations, Posters and Technical Reports

(2012), "Working Notes Proceedings of the MediaEval 2012 Workshop" Pisa, Italy Vol. 927 CEUR-WS.org.
BibTeX:
@proceedings{larson2012working,,
  editor = { Martha A. Larson and
Sebastian Schmiedeke and
Pascal Kelm and
Adam Rae and
Vasileios Mezaris and
Tomas Piatrik and
Mohammad Soleymani and
Florian Metze and
Gareth J. F. Jones }, title = {Working Notes Proceedings of the MediaEval 2012 Workshop}, publisher = {CEUR-WS.org}, year = {2012}, volume = {927}, note = {fix google scholar description.}, url = {http://ceur-ws.org/Vol-927} }
Daras P, Doumanoglou A, Zampoka M, Tsolakis G-A, Kordelas G, Drémeau A, Richard G, Hilsmann A, Eisert P, Lin X, Zhang Q, Izquierdo E, Monaghan DS, O'Connor N, Magnenat-Thalmann N, Cadi-Yazli N, Ben Moussa M and Kim HJ (2012), "3DLife Deliverable - 10th Quarterly Management Report", June, 2012, pp. 1-17. Queen Mary University of London.
Abstract: This report is part of Work Package (WP) 1, Management and Auditing, in 3DLife Network of Excellence. This report covers the period from month 28 to month 30 of the project, i.e. from April to June 2012. The general goal of this report is to present an overview of the work carried out and summarise resources consumed during the reporting period. It also gives an outline on all activities in administration, integration, development of virtual centre of excellence, dissemination and finance during that period. During this reporting period, 5th project review meeting was held focusing on the feedbacks to the comments received from 4th review meeting. The key activity in WP2 was improvement and successfully resubmission of the Erasmus Mundus proposal. 3DLife and VideoSense co-orgonised the MediaSense summer school on multi-modal data analytics. The 13th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2012) was also successfully held in DCU. HUAWEI/3DLife ACM Multimedia grand challenge was announced and sponsored for the second year in the row. One joint publication was submitted to ACM Multimedia HUAWEI/3DLife grand challenge 2012. Research and development have been continued in WP4 toward the integrated framework. 3DLife Software Framework has been continually evolved and maintained by WP5. Significant progress has been made for online dissemination activities in WP6. The deliverables and milestones in all work package have been fulfilled for this reporting period.
BibTeX:
@techreport{daras20123dlife,
  author = {Daras, Petros and Doumanoglou, Alexandros and Zampoka, Maria and Tsolakis, Georgios-Aggelos and Kordelas, Georgios and Drémeau, Angélique and Richard, Gaël and Hilsmann, Anna and Eisert, Peter and Lin, Xinyu and Zhang, Qianni and Izquierdo, Ebroul and Monaghan, David S. and O'Connor, Noel and Magnenat-Thalmann, Nadia and Cadi-Yazli, Nedjma and Ben Moussa, Maher and Kim, Hyoung Joung},
  title = {3DLife Deliverable - 10th Quarterly Management Report},
  publisher = {Queen Mary University of London},
  year = {2012},
  pages = {1--17},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/3dlife-deliverable-1.1.10-10th-quarterly-management-report.pdf}
}
Zhang Q, Lin X, O'Connor N, Monaghan DS, Drémeau A, Richard G, Essid S, Eisert P, Hilsmann A, Prestele B, Magnenat-Thalmann N, Cadi-Yazli N, Ben Moussa M, Daras P, Doumanoglou A, Zampoka M, Tsolakis G-A, Kordelas G, Kim HJ and Lim SM (2012), "3DLife Deliverable - 5th Report on Joint Activities of the Network", June, 2012, pp. 1-31. Queen Mary University of London.
BibTeX:
@techreport{izquierdo20123dlife,
  author = {Zhang, Qianni and Lin, Xinyu and O'Connor, Noel and Monaghan, David S. and Drémeau, Angélique and Richard, Gaël and Essid, Slim and Eisert, Peter and Hilsmann, Anna and Prestele, Benjamin and Magnenat-Thalmann, Nadia and Cadi-Yazli, Nedjma and Ben Moussa, Maher and Daras, Petros and Doumanoglou, Alexandros and Zampoka, Maria and Tsolakis, Georgios-Aggelos and Kordelas, Georgios and Kim, Hyoung Joung and Lim, Soo Min},
  title = {3DLife Deliverable - 5th Report on Joint Activities of the Network},
  publisher = {Queen Mary University of London},
  year = {2012},
  pages = {1--31},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/3dlife-deliverable-1.2.5-5th-report-on-joint-activities-of-the-network.pdf}
}
Bangert T (2012), "Color: an algorithmic approach". Queen Mary University of London. September, 2012.
BibTeX:
@unpublished{bangert2012color_Presentation,
  author = {Bangert, Thomas},
  title = {Color: an algorithmic approach},
  school = {Queen Mary University of London},
  year = {2012},
  note = {Presentatation given for MSc Viva and internally to research group},
  url = {http://www.eecs.qmul.ac.uk/~tb300/pub/ColourVision.pptx}
}


2011

Journal Papers

Ho CYF, Ling BWK, Blasi SG, Chi Z-W and Siu W-C (2011), "Single Step Optimal Block Matched Motion Estimation with Motion Vectors Having Arbitrary Pixel Precisions", American Journal of Engineering and Applied Sciences. Vol. 4, pp. 448-460. Science Publications.
Abstract: This paper proposes a non-linear block matched motion model and solves the motion vectors with arbitrary pixel precisions in a single step. As the optimal motion vector which minimizes the mean square error is solved analytically in a single step, the computational complexity of our proposed algorithm is lower than that of conventional quarter pixel search algorithms. Also, our proposed algorithm can be regarded as a generalization of conventional half pixel search algorithms and quarter pixel search algorithms because our proposed algorithm could achieve motion vectors with arbitrary pixel precisions.
BibTeX:
@article{ho2011single,
  author = {Ho, Charlotte Y. F. and Ling, Bingo W. K. and Blasi, Saverio G. and Chi, Zhi-Wei and Siu, Wan-Chi},
  title = {Single Step Optimal Block Matched Motion Estimation with Motion Vectors Having Arbitrary Pixel Precisions},
  journal = {American Journal of Engineering and Applied Sciences},
  publisher = {Science Publications},
  year = {2011},
  volume = {4},
  pages = {448--460},
  url = {http://eprints.lincoln.ac.uk/4521/1/letter_multimedia_motion_estimation.pdf},
  doi = {10.3844/ajeassp.2011.448.460}
}
Ramzan N, Quacchio E, Zgaljic T, Asioli S, Celetto L, Izquierdo E and Rovati F (2011), "Peer-to-peer streaming of scalable video in future Internet applications", Communications Magazine, IEEE. March, 2011. Vol. 49(3), pp. 128-135. IEEE.
Abstract: Scalable video delivery over peer-to-peer networks appears to be key for efficient streaming in emerging and future Internet applications. Contrasting the conventional server-client approach, here, video is delivered to a user in a fully distributed fashion. This is, for instance, beneficial in cases where a high demand for a particular video content is imposed, as different users can receive the same data from different peers. Furthermore, due to the heterogeneous nature of Internet connectivity, the content needs to be delivered to a user through networks with highly varying bandwidths. Moreover, content needs to be displayed on a variety of devices featuring different sizes, resolutions, and computational capabilities. If video is encoded in a scalable way, it can be adapted to any required spatio-temporal resolution and quality in the compressed domain, according to a peer bandwidth and other peers' context requirements. This enables efficient low-complexity content adaptation and interoperability for improved peer-to-peer streaming in future Internet applications. An efficient piece picking and peer selection policy enables high quality of service in such a streaming system.
BibTeX:
@article{ramzan2011peer,
  author = {Ramzan, Naeem and Quacchio, Emanuele and Zgaljic, Toni and Asioli, Stefano and Celetto, Luca and Izquierdo, Ebroul and Rovati, Fabrizio},
  title = {Peer-to-peer streaming of scalable video in future Internet applications},
  journal = {Communications Magazine, IEEE},
  publisher = {IEEE},
  year = {2011},
  volume = {49},
  number = {3},
  pages = {128--135},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5723810},
  doi = {10.1109/MCOM.2011.5723810}
}
Ramzan N, Quacchio E, Zgaljic T, Asioli S, Celetto L, Izquierdo E and Rovati F (2011), "Peer-to-Peer streaming of Scalable Video in Future Internet Applications", Communications Magazine, IEEE. March, 2011. Vol. 49(3), pp. 128-135.
Abstract: Scalable video delivery over peer-to-peer networks appears to be key for efficient streaming in emerging and future Internet applications. Contrasting the conventional server-client approach, here, video is delivered to a user in a fully distributed fashion. This is, for instance, beneficial in cases where a high demand for a particular video content is imposed, as different users can receive the same data from different peers. Furthermore, due to the heterogeneous nature of Internet connectivity, the content needs to be delivered to a user through networks with highly varying bandwidths. Moreover, content needs to be displayed on a variety of devices featuring different sizes, resolutions, and computational capabilities. If video is encoded in a scalable way, it can be adapted to any required spatio-temporal resolution and quality in the compressed domain, according to a peer bandwidth and other peers #x00BF; context requirements. This enables efficient low-complexity content adaptation and interoperability for improved peer-to-peer streaming in future Internet applications. An efficient piece picking and peer selection policy enables high quality of service in such a streaming system.
BibTeX:
@article{ramzan2011peer-to-peerapplications,
  author = {Ramzan, Naeem and Quacchio, Emanuele and Zgaljic, Toni and Asioli, Stefano and Celetto, Luca and Izquierdo, Ebroul and Rovati, Fabrizio},
  title = {Peer-to-Peer streaming of Scalable Video in Future Internet Applications},
  journal = {Communications Magazine, IEEE},
  year = {2011},
  volume = {49},
  number = {3},
  pages = {128--135},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5723810},
  doi = {10.1109/MCOM.2011.5723810}
}
Sanna M and Izquierdo E (2011), "A Survey of Linear Network Coding and Network Error Correction Code Constructions and Algorithms", International Journal of Digital Multimedia Broadcasting. May, 2011. (857847), pp. 1-12. Hindawi.
Abstract: Network coding was introduced by Ahlswede et al. in a pioneering work in 2000. This paradigm encompasses coding and retransmission of messages at the intermediate nodes of the network. In contrast with traditional store-and-forward networking, network coding increases the throughput and the robustness of the transmission. Linear network coding is a practical implementation of this new paradigm covered by several research works that include rate characterization, error-protection coding, and construction of codes. Especially determining the coding characteristics has its importance in providing the premise for an efficient transmission. In this paper, we review the recent breakthroughs in linear network coding for acyclic networks with a survey of code constructions literature. Deterministic construction algorithms and randomized procedures are presented for traditional network coding and for network-control network coding.
BibTeX:
@article{sanna2011survey,
  author = {Sanna, Michele and Izquierdo, Ebroul},
  title = {A Survey of Linear Network Coding and Network Error Correction Code Constructions and Algorithms},
  journal = {International Journal of Digital Multimedia Broadcasting},
  publisher = {Hindawi},
  year = {2011},
  number = {857847},
  pages = {1--12},
  url = {http://www.hindawi.com/journals/ijdmb/2011/857847/cta/},
  doi = {10.1155/2011/857847}
}
Vaiapury K, Aksay A, Lin X and Izquierdo E (2011), "Model Based 3D Vision and Analysis for Production Audit Purposes", Infocommunications Journal, pp. 1-8. Scientific Association for Infocommunications.
Abstract: This paper describes a new methodology for 3D measurement and model matching for installation audit in industrial environment. The problem is addressed by using both 3D information from 2D images and semantic meta-data of the installation engine parts by comparing with that of corresponding base CATIA installations. In this research, we deploy independent (vision- based) method to accelerate the convergence toward optimal system architecture that integrates safety constraints.
BibTeX:
@article{vaiapury2011model,
  author = {Vaiapury, Karthikeyan and Aksay, Anil and Lin, Xinyu and Izquierdo, Ebroul},
  title = {Model Based 3D Vision and Analysis for Production Audit Purposes},
  journal = {Infocommunications Journal},
  publisher = {Scientific Association for Infocommunications},
  year = {2011},
  pages = {1--8},
  url = {http://www.hit.bme.hu/~szabo/hiradastechnika/2011-harmadik-angol/Vaiapury-etal.pdf}
}
Vaiapury K, Aksay A, Lin X, Izquierdo E and Papadopoulos C (2011), "A Vision Based Audit Method and Tool that Compares a Systems Installation on a Production Aircraft to the Original Digital Mock-Up", SAE International Journal of Aerospace. Vol. 4(2), pp. 880-892. SAE International.
Abstract: The work describes a concept application to aid a safety engineer to perform an audit of a production aircraft against safety driven installation requirements. The capability is achieved using the following steps: 1.Image capture of a product and measurement of distances between datum points within the product with/without references to a planar surface. 2.A digital reconstruction of the fabricated product by using multiple captured images to reposition parts according to the actual model. 3.The projection onto the 3D digital reconstruction of the safety related installation constraints, respecting the original intent of the constraints that are defined in the digital mockup. 4.Identification of the differences between the 3D reconstruction of the actual product and the design time digital mock up of the product. 5.Identification of the differences/non conformances that have a relevance to safety driven installation requirements with reference to the original safety requirement intent. Step �A� gives the safety engineer a means to perform measurements on a set of captured images of the equipment they are interested in. Steps �B�, �C�, �D� and �E� together give the safety engineer the ability to overlay a digital reconstruction that should be as true to the fabricated product as possible so that they can see how the product conforms/ or doesn't conform to the safety driven installation requirements. The work has produced a concept demonstrator that will be further developed in future work to address accuracy, workflow and process efficiency.
BibTeX:
@article{Vaiapury2011,
  author = {Vaiapury, Karthikeyan and Aksay, Anil and Lin, Xinyu and Izquierdo, Ebroul and Papadopoulos, Christopher},
  title = {A Vision Based Audit Method and Tool that Compares a Systems Installation on a Production Aircraft to the Original Digital Mock-Up},
  journal = {SAE International Journal of Aerospace},
  publisher = {SAE International},
  year = {2011},
  volume = {4},
  number = {2},
  pages = {880--892},
  url = {http://saeaero.saejournals.org/content/4/2/880},
  doi = {10.4271/2011-01-2565}
}
Vaiapury K, Aksay A, Lin X, Izquierdo E and Papadopoulos C (2011), "A Vision Based Audit Method and Tool that Compares a Systems Installation on a Production Aircraft to the Original Digital Mock-Up", SAE International Journal of Aerospace. Vol. 4(2), pp. 880-892. SAE International.
Abstract: The work describes a concept application to aid a safety engineer to perform an audit of a production aircraft against safety driven installation requirements. The capability is achieved using the following steps: 1.Image capture of a product and measurement of distances between datum points within the product with/without references to a planar surface. 2.A digital reconstruction of the fabricated product by using multiple captured images to reposition parts according to the actual model. 3.The projection onto the 3D digital reconstruction of the safety related installation constraints, respecting the original intent of the constraints that are defined in the digital mockup. 4.Identification of the differences between the 3D reconstruction of the actual product and the design time digital mock up of the product. 5.Identification of the differences/non conformances that have a relevance to safety driven installation requirements with reference to the original safety requirement intent. Step �A� gives the safety engineer a means to perform measurements on a set of captured images of the equipment they are interested in. Steps �B�, �C�, �D� and �E� together give the safety engineer the ability to overlay a digital reconstruction that should be as true to the fabricated product as possible so that they can see how the product conforms/ or doesn't conform to the safety driven installation requirements. The work has produced a concept demonstrator that will be further developed in future work to address accuracy, workflow and process efficiency.
BibTeX:
@article{vaiapury2011amock-up,
  author = {Vaiapury, Karthikeyan and Aksay, Anil and Lin, Xinyu and Izquierdo, Ebroul and Papadopoulos, Christopher},
  title = {A Vision Based Audit Method and Tool that Compares a Systems Installation on a Production Aircraft to the Original Digital Mock-Up},
  journal = {SAE International Journal of Aerospace},
  publisher = {SAE International},
  year = {2011},
  volume = {4},
  number = {2},
  pages = {880--892},
  url = {http://saeaero.saejournals.org/content/4/2/880},
  doi = {10.4271/2011-01-2565}
}
Zhang Q, Lin X, Papadopoulos C, Heckmann J-P, Lisagor O, Sartor V and Izquierdo E (2011), "Semi-Automated Vision-Based Construction of Safety Models from Engineering Drawings", SAE International Journal of Aerospace. November, 2011. Vol. 4(2), pp. 893-899. SAE International.
Abstract: The work describes a concept application that aids a safety engineer to create a layup of equipment models by using an image scan of a schematic and a library of predefined standard component and their symbols. The approach uses image recognition techniques to identify the symbols within the scanned image of the schematic from a given library of symbols. Two recognition approaches are studied, one uses General Hough Transform; the other is based on pixel-level feature computation combining both structure and statistical features. The application allows the user to accept or edit the results of the recognition step and allows the user to define new components during the layup step. The tool then generates an output file that is compatible with a formal safety modeling tool. The identified symbols are associated to behavioral nodes from a model based safety tool. The model based safety modeling tool, together with the associated library is used during the last step for the user to link the nodes together and to ensure that the behavior of the equipment is correct. The tool improves the time to create a layup of equipment, and helps make the model of the equipment look like the original schematic to help engineers remain familiar with the model when compared with the original schematic. This helps a safety engineer to create a formal safety model for an existing aircraft based on an existing failure library of components.
BibTeX:
@article{Zhang2011,
  author = {Zhang, Qianni and Lin, Xinyu and Papadopoulos, Chris and Heckmann, Jean-Pierre and Lisagor, Oleg and Sartor, Valerie and Izquierdo, Ebroul},
  title = {Semi-Automated Vision-Based Construction of Safety Models from Engineering Drawings},
  journal = {SAE International Journal of Aerospace},
  publisher = {SAE International},
  year = {2011},
  volume = {4},
  number = {2},
  pages = {893--899},
  url = {http://saeaero.saejournals.org/content/4/2/893.short},
  doi = {10.4271/2011-01-2566}
}
Zhang Q, Lin X, Papadopoulos C, Heckmann J-P, Lisagor O, Sartor V and Izquierdo E (2011), "Semi-Automated Vision-Based Construction of Safety Models from Engineering Drawings", SAE International Journal of Aerospace. November, 2011. Vol. 4(2), pp. 893-899. SAE International.
Abstract: The work describes a concept application that aids a safety engineer to create a layup of equipment models by using an image scan of a schematic and a library of predefined standard component and their symbols. The approach uses image recognition techniques to identify the symbols within the scanned image of the schematic from a given library of symbols. Two recognition approaches are studied, one uses General Hough Transform; the other is based on pixel-level feature computation combining both structure and statistical features. The application allows the user to accept or edit the results of the recognition step and allows the user to define new components during the layup step. The tool then generates an output file that is compatible with a formal safety modeling tool. The identified symbols are associated to behavioral nodes from a model based safety tool. The model based safety modeling tool, together with the associated library is used during the last step for the user to link the nodes together and to ensure that the behavior of the equipment is correct. The tool improves the time to create a layup of equipment, and helps make the model of the equipment look like the original schematic to help engineers remain familiar with the model when compared with the original schematic. This helps a safety engineer to create a formal safety model for an existing aircraft based on an existing failure library of components.
BibTeX:
@article{zhang2011semi,
  author = {Zhang, Qianni and Lin, Xinyu and Papadopoulos, Chris and Heckmann, Jean-Pierre and Lisagor, Oleg and Sartor, Valerie and Izquierdo, Ebroul},
  title = {Semi-Automated Vision-Based Construction of Safety Models from Engineering Drawings},
  journal = {SAE International Journal of Aerospace},
  publisher = {SAE International},
  year = {2011},
  volume = {4},
  number = {2},
  pages = {893--899},
  url = {http://saeaero.saejournals.org/content/4/2/893.short},
  doi = {10.4271/2011-01-2566}
}
(2011), "Multimedia Analysis, Processing and Communications" Vol. 346, pp. 764. Springer.
BibTeX:
@book{lin2011multimedia,,
  editor = {Lin, Weisi and Tao, Dacheng and Kacprzyk, Janusz and Li, Zhu and Izquierdo, Ebroul and Wang, Haohong},
  title = {Multimedia Analysis, Processing and Communications},
  publisher = {Springer},
  year = {2011},
  volume = {346},
  pages = {764},
  url = {http://books.google.co.uk/books?id=BCS8TdGBWB0C},
  doi = {10.1007/978-3-642-19551-8}
}
Chandramouli K, Piatrik T and Izquierdo E (2011), "Biological Inspired Methods for Media Classification and Retrieval", In Multimedia Analysis, Processing and Communications. Vol. 346, pp. 81-109. Springer.
Abstract: Automatic image clustering and classification is a critical and vibrant research topic in the computer vision community over the last couple of decades. However, the performance of the automatic image clustering and classification tools have been hindered by the commonly referred problem of ?Semantic Gap?, which is defined as the gap between low-level features that can be extracted from the media and the high-level semantic concepts humans are able to perceive from media content. Addressing this problem, recent developments in biologically inspired techniques for media retrieval is presented in this chapter.
BibTeX:
@inbook{chandramouli2011biological,
  author = {Chandramouli, Krishna and Piatrik, Tomas and Izquierdo, Ebroul},
  editor = {Lin, Weisi and Tao, Dacheng and Kacprzyk, Janusz and Li, Zhu and Izquierdo, Ebroul and Wang, Haohong},
  title = {Biological Inspired Methods for Media Classification and Retrieval},
  booktitle = {Multimedia Analysis, Processing and Communications},
  publisher = {Springer},
  year = {2011},
  volume = {346},
  pages = {81--109},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-642-19551-8_3},
  doi = {10.1007/978-3-642-19551-8_3}
}
Essid S, Campedel M, Richard G, Piatrik T, Benmokhtar R and Huet B (2011), "Machine Learning Techniques for Multimedia Analysis", In Multimedia Semantics: Metadata, Analysis and Interaction, pp. 59-80. John Wiley & Sons, Ltd.
Abstract: This chapter contains sections titled: * Feature Selection * Classification * Classifier Fusion * Conclusion
BibTeX:
@inbook{essid2011machine,
  author = {Essid, Slim and Campedel, Marine and Richard, Ga�l and Piatrik, Tomas and Benmokhtar, Rachid and Huet, Benoit},
  editor = {Troncy, Raphaël and Huet, Benoit and Schenk, Simon},
  title = {Machine Learning Techniques for Multimedia Analysis},
  booktitle = {Multimedia Semantics: Metadata, Analysis and Interaction},
  publisher = {John Wiley & Sons, Ltd},
  year = {2011},
  pages = {59--80},
  url = {http://onlinelibrary.wiley.com/book/10.1002/9781119970231},
  doi = {10.1002/9781119970231.ch5}
}

Books and Chapters in Books

Dong L, Izquierdo E and Ge S (2011), "Bio-Inspired Scheme for Classification of Visual Information", In Computer Vision for Multimedia Applications: Methods and Solutions. Hershey, PA , pp. 238-262. IGI Global.
Abstract: In this chapter, research on visual information classification based on biologically inspired visually selective attention with knowledge structuring is presented. The research objective is to develop visual models and corresponding algorithms to automatically extract features from selective essential areas of natural images, and finally, to achieve knowledge structuring and classification within a structural description scheme. The proposed scheme consists of three main aspects: biologically inspired visually selective attention, knowledge structuring and classification of visual information. Biologically inspired visually selective attention closely follow the mechanisms of the visual ``what'' and ``where'' pathways in the human brain. The proposed visually selective attention model uses a bottom-up approach to generate essential areas based on low-level features extracted from natural images. This model also exploits a low-level top-down selective attention mechanism which performs decisions on interesting objects by human interaction with preference or refusal inclination. Knowledge structuring automatically creates a relevance map from essential areas generated by visually selective attention. The developed algorithms derive a set of well-structured representations from low-level description to drive the final classification. The knowledge structuring relays on human knowledge to produce suitable links between low-level descriptions and high-level representation on a limited training set. The backbone is a distribution mapping strategy involving two novel modules: structured low-level feature extraction using convolution neural network and topology preservation based on sparse representation and unsupervised learning algorithm. Classification is achieved by simulating high-level top-down visual information perception and classification using an incremental Bayesian parameter estimation method. The utility of the proposed scheme for solving relevant research problems is validated. The proposed modular architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.
BibTeX:
@incollection{dong2011bio,
  author = {Dong, Le and Izquierdo, Ebroul and Ge, Shuzhi},
  editor = {Wang, Jinjun and Cheng, Jian and Jiang, Shuqiang},
  title = {Bio-Inspired Scheme for Classification of Visual Information},
  booktitle = {Computer Vision for Multimedia Applications: Methods and Solutions},
  publisher = {IGI Global},
  year = {2011},
  pages = {238--262},
  url = {http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-60960-024-2.ch014},
  doi = {10.4018/978-1-60960-024-2.ch014}
}
Izquierdo E and Vaiapury K (2011), "Applications of Video Segmentation", In Video Segmentation and Its Applications, pp. 145-157. Springer.
Abstract: Segmentation is one of the important computer vision processes that is used in many practical applications such as medical imaging, computer-guided surgery, machine vision, object recognition, surveillance, content-based browsing, augmented reality applications, etc. . The knowledge to ascertain plausible segmentation applications and corresponding algorithmic techniques is necessary to simplify the video representation into a more meaningful and easier form to analyze. This is because expected segmentation quality for a given application depends on the level of granularity and the requirement that is related to shape precision and temporal coherence of the objects.
BibTeX:
@incollection{Izquierdo2011,
  author = {Izquierdo, Ebroul and Vaiapury, Karthikeyan},
  editor = {Ngan, King Ngi and Li, Hongliang},
  title = {Applications of Video Segmentation},
  booktitle = {Video Segmentation and Its Applications},
  publisher = {Springer},
  year = {2011},
  pages = {145--157},
  url = {http://link.springer.com/chapter/10.1007%2F978-1-4419-9482-0_6},
  doi = {10.1007/978-1-4419-9482-0_6}
}
Izquierdo E and Vaiapury K (2011), "Applications of Video Segmentation", In Video Segmentation and Its Applications, pp. 145-157. Springer.
Abstract: Segmentation is one of the important computer vision processes that is used in many practical applications such as medical imaging, computer-guided surgery, machine vision, object recognition, surveillance, content-based browsing, augmented reality applications, etc. . The knowledge to ascertain plausible segmentation applications and corresponding algorithmic techniques is necessary to simplify the video representation into a more meaningful and easier form to analyze. This is because expected segmentation quality for a given application depends on the level of granularity and the requirement that is related to shape precision and temporal coherence of the objects.
BibTeX:
@incollection{izquierdo2011applications,
  author = {Izquierdo, Ebroul and Vaiapury, Karthikeyan},
  editor = {Ngan, King Ngi and Li, Hongliang},
  title = {Applications of Video Segmentation},
  booktitle = {Video Segmentation and Its Applications},
  publisher = {Springer},
  year = {2011},
  pages = {145--157},
  url = {http://link.springer.com/chapter/10.1007/978-1-4419-9482-0_6},
  doi = {10.1007/978-1-4419-9482-0_6}
}
Ramzan N and Izquierdo E (2011), "Scalable video coding and its applications", In Multimedia analysis, processing and communications. Vol. 346, pp. 547-559. Springer.
Abstract: Scalable video coding provides an efficient solution when video is delivered through heterogeneous networks to terminals with different computational and display capabilities. Scalable video bitstream can easily be adapted to required spatio-temporal resolution and quality, according to the transmission requirements. In this chapter, the Wavelet-based Scalable Video Coding (W-SVC) architecture is presented in detail. The W-SVC framework is based on wavelet based motion compensated approaches. The practical capabilities of the W-SVC are also demonstrated by using the error resilient transmission and surveillance applications. The experimental result shows that the W-SVC framework produces improved performance than existing method and provides full flexible architecture with respect to different application scenarios.
BibTeX:
@incollection{Ramzan2011,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Lin, Weisi and Tao, Dacheng and Kacprzyk, Janusz and Li, Zhu and Izquierdo, Ebroul and Wang, Haohong},
  title = {Scalable video coding and its applications},
  booktitle = {Multimedia analysis, processing and communications},
  publisher = {Springer},
  year = {2011},
  volume = {346},
  pages = {547--559},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-19551-8_20},
  doi = {10.1007/978-3-642-19551-8_20}
}
Ramzan N and Izquierdo E (2011), "Scalable and Adaptable Media Coding Techniques for Future Internet", In The Future Internet: Future Internet Assembly 2011: Achievements and Technological Promises. Vol. 6656, pp. 381-389. Springer.
Abstract: High quality multimedia contents can distribute in a flexible, efficient and personalized way through dynamic and heterogeneous environments in Future Internet. Scalable Video Coding (SVC) and Multiple Description Coding (MDC) fulfill these objective thorough P2P distribution techniques. This chapter discusses the SVC and MDC techniques along with the real experience of the authors of SVC/MDC over P2P networks and emphasizes their pertinence in Future Media Internet initiatives in order to decipher potential challenges.
BibTeX:
@incollection{ramzan2011,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Domingue, John and Galis, Alex and Gavras, Anastasius and Zahariadis, Theodore and Lambert, Dave and Cleary, Frances and Daras, Petros and Krco, Srdjan and Müller, Henning and Li, Man-Sze and Schaffers, Hans and Lotz, Volkmar and Alvarez, Federico and Stiller, Burkhard and Karnouskos, Stamatis and Avessta, Susanna and Nilsson, Michael},
  title = {Scalable and Adaptable Media Coding Techniques for Future Internet},
  booktitle = {The Future Internet: Future Internet Assembly 2011: Achievements and Technological Promises},
  publisher = {Springer},
  year = {2011},
  volume = {6656},
  pages = {381--389},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-20898-0_27},
  doi = {10.1007/978-3-642-20898-0_27}
}
Ramzan N and Izquierdo E (2011), "Scalable and Adaptable Media Coding Techniques for Future Internet", In The Future Internet: Future Internet Assembly 2011: Achievements and Technological Promises. Vol. 6656, pp. 381-389. Springer.
Abstract: High quality multimedia contents can distribute in a flexible, efficient and personalized way through dynamic and heterogeneous environments in Future Internet. Scalable Video Coding (SVC) and Multiple Description Coding (MDC) fulfill these objective thorough P2P distribution techniques. This chapter discusses the SVC and MDC techniques along with the real experience of the authors of SVC/MDC over P2P networks and emphasizes their pertinence in Future Media Internet initiatives in order to decipher potential challenges.
BibTeX:
@incollection{Ramzan2011a,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Domingue, John and Galis, Alex and Gavras, Anastasius and Zahariadis, Theodore and Lambert, Dave and Cleary, Frances and Daras, Petros and Krco, Srdjan and Müller, Henning and Li, Man-Sze and Schaffers, Hans and Lotz, Volkmar and Alvarez, Federico and Stiller, Burkhard and Karnouskos, Stamatis and Avessta, Susanna and Nilsson, Michael},
  title = {Scalable and Adaptable Media Coding Techniques for Future Internet},
  booktitle = {The Future Internet: Future Internet Assembly 2011: Achievements and Technological Promises},
  publisher = {Springer},
  year = {2011},
  volume = {6656},
  pages = {381--389},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-20898-0_27},
  doi = {10.1007/978-3-642-20898-0_27}
}
Ramzan N and Izquierdo E (2011), "Scalable Video Coding and Its Applications", In Multimedia Analysis, Processing and Communications. Vol. 346, pp. 547-559. Springer.
Abstract: Scalable video coding provides an efficient solution when video is delivered through heterogeneous networks to terminals with different computational and display capabilities. Scalable video bitstream can easily be adapted to required spatio-temporal resolution and quality, according to the transmission requirements. In this chapter, the Wavelet-based Scalable Video Coding (W-SVC) architecture is presented in detail. The W-SVC framework is based on wavelet based motion compensated approaches. The practical capabilities of the W-SVC are also demonstrated by using the error resilient transmission and surveillance applications. The experimental result shows that the W-SVC framework produces improved performance than existing method and provides full flexible architecture with respect to different application scenarios.
BibTeX:
@incollection{ramzan2011scalable,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Lin, Weisi and Tao, Dacheng and Kacprzyk, Janusz and Li, Zhu and Izquierdo, Ebroul and Wang, Haohong},
  title = {Scalable Video Coding and Its Applications},
  booktitle = {Multimedia Analysis, Processing and Communications},
  publisher = {Springer},
  year = {2011},
  volume = {346},
  pages = {547--559},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-19551-8_20},
  doi = {10.1007/978-3-642-19551-8_20}
}
Zhang Q and Izquierdo E (2011), "Semantic Context Inference in Multimedia Search", In The Future Internet: Future Internet Assembly 2011: Achievements and Technological Promises. Vol. 6656, pp. 391-400. Springer.
Abstract: Multimedia content is usually complex and may contain many semantically meaningful elements interrelated to each other. Therefore to understand the high-level semantic meanings of the content, such interrelations need to be learned and exploited to further improve the search process. We introduce our ideas on how to enable automatic construction of semantic context by learning from the content. Depending on the targeted source of content, representation schemes for its semantic context can be constructed by learning from data. In the target representation scheme, metadata is divided into three levels: low, mid, and high levels. By using the proposed scheme, high-level features are derived out of the mid-level features. In order to explore the hidden interrelationships between mid-level and the high-level terms, a Bayesian network model is built using from a small amount of training data. Semantic inference and reasoning is then performed based on the model to decide the relevance of a video.
BibTeX:
@incollection{zhang2011semantic,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Domingue, John and Galis, Alex and Gavras, Anastasius and Zahariadis, Theodore and Lambert, Dave and Cleary, Frances and Daras, Petros and Krco, Srdjan and Müller, Henning and Li, Man-Sze and Schaffers, Hans and Lotz, Volkmar and Alvarez, Federico and Stiller, Burkhard and Karnouskos, Stamatis and Avessta, Susanna and Nilsson, Michael},
  title = {Semantic Context Inference in Multimedia Search},
  booktitle = {The Future Internet: Future Internet Assembly 2011: Achievements and Technological Promises},
  publisher = {Springer},
  year = {2011},
  volume = {6656},
  pages = {391--400},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-20898-0_28},
  doi = {10.1007/978-3-642-20898-0_28}
}

Conference Papers

Bosilj P, Palašek P, Popović B and Štefić D (2011), "Simulation of a Texas Hold'Em poker player", In Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011). Opatija, Croatia, May, 2011, pp. 1628-1633. IEEE.
Abstract: Imperfect information environments are amongst common research subjects in the field of Artificial Intelligence. A game of poker is a good example of such an environment. As the popularity of the game grew, so did the interest in implementing a functioning automatized poker player. Approaches to this problem include various Machine Learning techniques like Bayesian decision networks, various Case-based reasoning (CBR) techniques and reinforcement learning. For a player to play well it is not enough to know just the probability estimates of one's own hand. A player must adjust his strategy according to his estimate of the opponents' strategies and an estimate of opponents' hand strength. This paper explores the usage of the k - Nearest Neighbors technique, an example of CBR techniques, in implementing an automatized poker player. As a result, an average player able to cope with most in-game situations was developed. The main difference from a model based on optimal mathematical play is that the developed player seems more human, which makes its actions harder to predict. Numerous simulations on the developed testing model show that a small but stable profit is gained by the implemented automatized player.
BibTeX:
@inproceedings{bosilj2011simulation,
  author = {Bosilj, Petra and Palašek, Petar and Popović, Bojan and Štefić, Daria},
  editor = { Biljanović, Petar and 
Skala, Karolj and
Golubić, Stjepan and
Bogunović, Nikola and
Ribarić, Slobodan and
Čičin-Šain, Marina and
Čišić, Dragan and
Hutinski, Željko and
Baranović, Mirta and
Mauher, Mladen and
Ordanić, Lea }, title = {Simulation of a Texas Hold'Em poker player}, booktitle = {Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011)}, publisher = {IEEE}, year = {2011}, pages = {1628--1633}, note = {google scholar entry: 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011). Opatija, Croatia, 23-27 May 2011.}, url = {http://qmro.qmul.ac.uk/jspui/bitstream/123456789/4260/4/BOSILJSimulationOfATexas2011FINAL.pdf} }
Brenner M and Izquierdo E (2011), "Graph-based recognition in photo collections using social semantics", In Proceedings of the ACM workshop on Social and Behavioural Networked Media Access (SBMA 2011). Scottsdale, Arizona, December, 2011, pp. 47-52. ACM.
Abstract: In this paper, we show how to recognize people in Consumer Photo Collections by employing a graphical model together with a distance-based face description method. We devise a graph design and explore ways to further improve the recognition performance by incorporating context in the form of social semantics. Experiments on a public dataset demonstrate the effectiveness of our probabilistic approach compared to traditional nearest-neighbor matching.
BibTeX:
@inproceedings{brenner2011graph,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  title = {Graph-based recognition in photo collections using social semantics},
  booktitle = {Proceedings of the ACM workshop on Social and Behavioural Networked Media Access (SBMA 2011)},
  publisher = {ACM},
  year = {2011},
  pages = {47--52},
  note = {google scholar entry: ACM workshop on Social and behavioural networked media access (SBNMA 2011)[MM 20011]. Scottsdale, Arizona, 1 December 2011.},
  url = {http://doi.acm.org/10.1145/2072627.2072642},
  doi = {10.1145/2072627.2072642}
}
Brenner M and Izquierdo E (2011), "MediaEval Benchmark: Social Event Detection in collaborative photo collections", In Proceedings of the "MediaEval" 2011 Workshop. Pisa, Italy CEUR-WS.
Abstract: In this paper, we present an approach to detect social events in collaboratively annotated photo collections as part of the MediaEval Benchmark. We combine various information from tagged photos with external data sources to train a classification model. Experiments based on the MediaEval Social Event Detection Dataset demonstrate the effectiveness of our approach.
BibTeX:
@inproceedings{brenner2011mediaeval,
  author = {Brenner, Markus and Izquierdo, Ebroul},
  editor = {Larson, Martha and Rae, Adam and Demarty, Claire-Helene and Kofler, Christoph and Metze, Florian and Troncy, Raphael and Mezaris, Vasileios and Jones, Gareth J. F.},
  title = {"MediaEval" Benchmark: Social Event Detection in collaborative photo collections},
  booktitle = {Proceedings of the "MediaEval" 2011 Workshop},
  publisher = {CEUR-WS},
  year = {2011},
  note = {google scholar entry: MediaEval 2011 Workshop. Pisa, Italy, 1-2 September 2011},
  url = {http://ceur-ws.org/Vol-807/}
}
Conci N and Izquierdo E (2011), "Background estimation and update in cluttered surveillance video via the Radon transform", In Visual Information Processing and Communication (SPIE 7882), Proceedings of the 2nd SPIE Conference on. San Francisco, CA, January, 2011. Vol. 7882, pp. 1-6. SPIE.
Abstract: In this paper we propose a background estimation and update algorithm for cluttered video surveillance sequences in indoor scenarios. Taking inspiration from the sophisticated framework of the Beamlets, the implementation we propose here relies on the integration of the Radon transform in the processing chain, applied on a blockby- block basis. During the acquisition of the real-time video, the Radon transform is applied at each frame in order to extract the meaningful information in terms of edges and texture present in the block under analysis, providing with the goal of extracting a signature for each portion of the image plane. The acquired model is updated at each frame, thus achieving a reliable representation of the most relevant details that persist over time for each processed block. The algorithm is validated in typical surveillance contexts and presented in this paper using two video sequences. The first example is an indoor scene with a considerably static background, while the second video belongs to a more complex scenario which is part of the PETS benchmark sequences.
BibTeX:
@inproceedings{conci2011background,
  author = {Conci, Nicola and Izquierdo, Ebroul},
  editor = {Said, Amir and Guleryuz, Onur G. and Stevenson, Robert L.},
  title = {Background estimation and update in cluttered surveillance video via the Radon transform},
  booktitle = {Visual Information Processing and Communication (SPIE 7882), Proceedings of the 2nd SPIE Conference on},
  publisher = {SPIE},
  year = {2011},
  volume = {7882},
  pages = {1--6},
  note = {google scholar entry: 2nd Visual Information Processing and Communication II (SPIE 7882). San Francisco, California, 25-26 January 2011.},
  url = {http://disi.unitn.it/~conci/nix/Publications_files/Background%20Estimation%20CR.pdf},
  doi = {10.1117/12.872517}
}
Essid S, Lin X, Gowing M, Kordelas G, Aksay A, Kelly P, Fillon T, Zhang Q, Dielmann A, Kitanovski V, Tournemenne R and Richard G (2011), "A multi-modal dance corpus for research into real-time interaction between humans in online virtual environments", In Proceedings of the 13th International Conference on Multimodal Interfaces (ICMI 2011). Alicante, Spain, November, 2011, pp. 1-14. ACM.
Abstract: We present a new, freely available, multimodal corpus for research into, amongst other areas, real-time realistic interaction between humans in online virtual environments. The specific corpus scenario focuses on an online dance class application scenario where students, with avatars driven by whatever 3D capture technology are locally available to them, can learn choerographies with teacher guidance in an online virtual ballet studio. As the data corpus is focused on this scenario, it consists of student/teacher dance choreographies concurrently captured at two different sites using a variety of media modalities, including synchronised audio rigs, multiple cameras, wearable inertial measurement devices and depth sensors. In the corpus, each of the several dancers perform a number of fixed choreographies, which are both graded according to a number of specific evaluation criteria. In addition, ground-truth dance choreography annotations are provided. Furthermore, for unsynchronised sensor modalities, the corpus also includes distinctive events for data stream synchronisation. Although the data corpus is tailored specifically for an online dance class application scenario, the data is free to download and used for any research and development purposes.
BibTeX:
@inproceedings{Essid2011,
  author = {Essid, Slim and Lin, Xinyu and Gowing, Marc and Kordelas, Georgios and Aksay, Anil and Kelly, Philip and Fillon, Thomas and Zhang, Qianni and Dielmann, Alfred and Kitanovski, Vlado and Tournemenne, Robin and Richard, Gaël},
  editor = {Hervé Bourlard and Thomas S. Huang and Enrique Vidal and Daniel Gatica-Perez and Louis-Philippe Morency and Nicu Sebe},
  title = {A multi-modal dance corpus for research into real-time interaction between humans in online virtual environments},
  booktitle = {Proceedings of the 13th International Conference on Multimodal Interfaces (ICMI 2011)},
  publisher = {ACM},
  year = {2011},
  pages = {1--14},
  note = {note: accepted for workshop, not published in the ACM proceeding, paper published seperately by Springer},
  url = {http://embots.dfki.de/mmc/mmc11/Essidetal.pdf}
}
Essid S, Lin X, Gowing M, Kordelas G, Aksay A, Kelly P, Fillon T, Zhang Q, Dielmann A, Kitanovski V, Tournemenne R and Richard G (2011), "A multimodal dance corpus for research into real-time interaction between humans in online virtual environments", In Proceedings of the 13th International Conference on Multimodal Interfaces, ICMI 2011. Alicante, Spain ACM.
Abstract: We present a new, freely available, multimodal corpus for research into, amongst other areas, real-time realistic interaction between humans in online virtual environments. The specific corpus scenario focuses on an online dance class application scenario where students, with avatars driven by whatever 3D capture technology are locally available to them, can learn choerographies with teacher guidance in an online virtual ballet studio. As the data corpus is focused on this scenario, it consists of student/teacher dance choreographies concurrently captured at two different sites using a variety of media modalities, including synchronised audio rigs, multiple cameras, wearable inertial measurement devices and depth sensors. In the corpus, each of the several dancers perform a number of fixed choreographies, which are both graded according to a number of specific evaluation criteria. In addition, ground-truth dance choreography annotations are provided. Furthermore, for unsynchronised sensor modalities, the corpus also includes distinctive events for data stream synchronisation. Although the data corpus is tailored specifically for an online dance class application scenario, the data is free to download and used for any research and development purposes.
BibTeX:
@inproceedings{essid2011multimodal,
  author = {Essid, Slim and Lin, Xinyu and Gowing, Marc and Kordelas, Georgios and Aksay, Anil and Kelly, Philip and Fillon, Thomas and Zhang, Qianni and Dielmann, Alfred and Kitanovski, Vlado and Tournemenne, Robin and Richard, Ga�l},
  editor = {Hervé Bourlard and Thomas S. Huang and Enrique Vidal and Daniel Gatica-Perez and Louis-Philippe Morency and Nicu Sebe},
  title = {A multimodal dance corpus for research into real-time interaction between humans in online virtual environments},
  booktitle = {Proceedings of the 13th International Conference on Multimodal Interfaces, ICMI 2011},
  publisher = {ACM},
  year = {2011},
  note = {xxx: needs pages in proceedings},
  url = {http://embots.dfki.de/mmc/mmc11/Essidetal.pdf}
}
Fernandez Arguedas V, Chandramouli K and Izquierdo E (2011), "Study of Particle Swarm Optimisation as Surveillance Object Classifier", In Latin-American Conference on Networked and Electronic Media (LACNEM 2011), Proceedings of. San José, Costa Rica, November, 2011, pp. 1-6. Kingston University.
Abstract: Following the recent exponential interest that has been shown in biologically inspired techniques for solving various optimisation challenges in computer vision, in this paper we present a study of Particle Swarm Optimisation as a surveillance object classifier. Though inspired by fundamental biological organisms, the memory inherently embedded in particle swarms enables solutions, which are computationally less complex, adept to multi-dimensional problems, and attains global minima. The performance of the proposed technique has been thoroughly evaluated with AVSS 2007 surveillance dataset containing objects such as car and person against well-known kernel machines.
BibTeX:
@inproceedings{arguedas2011study,
  author = {Fernandez Arguedas, Virginia and Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Study of Particle Swarm Optimisation as Surveillance Object Classifier},
  booktitle = {Latin-American Conference on Networked and Electronic Media (LACNEM 2011), Proceedings of},
  publisher = {Kingston University},
  year = {2011},
  pages = {1--6},
  note = {fix google scholar entry: publication type, publisher},
  url = {http://dilnxsrv.king.ac.uk/lacnem2012/PastProceedings/lacnem2011/papers/pre4.pdf}
}
Fernandez Arguedas V, Chandramouli K and Izquierdo E (2011), "Behaviour-Based Object Classifier for Surveillance Videos", In Eternal Systems: First International Workshop, EternalS 2011, Budapest, Hungary, May 3, 2011, Revised Selected Papers. Budapest, Hungary , pp. 116-124. Springer.
Abstract: In this paper, a study on effective exploitation of geometrical features for classifying surveillance objects into a set of pre-defined semantic categories is presented. The geometrical features correspond to object�s motion, spatial location and velocity. The extraction of these features is based on object�s trajectory corresponding to object�s temporal evolution. These geometrical features are used to build a behaviour-based classifier to assign semantic categories to the individual blobs extracted from surveillance videos. The proposed classification framework has been evaluated against conventional object classifiers based on visual features extracted from semantic categories defined on AVSS 2007 surveillance dataset.
BibTeX:
@inproceedings{fernandez2012behaviour,
  author = {Fernandez Arguedas, Virginia and Chandramouli, Krishna and Izquierdo, Ebroul},
  editor = {Moschitti, Alessandro and Scandariato, Riccardo},
  title = {Behaviour-Based Object Classifier for Surveillance Videos},
  booktitle = {Eternal Systems: First International Workshop, EternalS 2011, Budapest, Hungary, May 3, 2011, Revised Selected Papers},
  publisher = {Springer},
  year = {2011},
  pages = {116--124},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-28033-7_10},
  doi = {10.1007/978-3-642-28033-7_10}
}
Fernandez Arguedas V, Chandramouli K, Zhang Q and Izquierdo E (2011), "Optimal Combination of Low-level Features for Surveillance Object Retrieval", In SIGMAP 2011 - Proceedings of the International Conference on Signal Processing and Multimedia Applications, Seville, Spain, 18-21 July, 2011, SIGMAP is part of ICETE - The International Joint Conference on e-Business and Telecommunications. July, 2011, pp. 187-192. SciTePress.
BibTeX:
@inproceedings{fernandez2011optimal,
  author = {Fernandez Arguedas, Virginia and Chandramouli, Krishna and Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Linares Barranco, Alejandro and Tsihrintzis, George A.},
  title = {Optimal Combination of Low-level Features for Surveillance Object Retrieval},
  booktitle = {SIGMAP 2011 - Proceedings of the International Conference on Signal Processing and Multimedia Applications, Seville, Spain, 18-21 July, 2011, SIGMAP is part of ICETE - The International Joint Conference on e-Business and Telecommunications},
  publisher = {SciTePress},
  year = {2011},
  pages = {187--192},
  note = {incomplete + add to google scholar}
}
Fernandez Arguedas V and Izquierdo E (2011), "Object Classification based on Behaviour Patterns", In Imaging for Crime Detection and Prevention (ICDP 2011), 4th International Conference on. London, England, November, 2011, pp. 1-6. IEEE.
Abstract: With the recent explosion of surveillance videos, media management has gained an increasing popularity. Addressing this challenge, in this paper, we propose a Surveillance Media Management framework for object detection and classification based on behaviour patterns. The objectives of the paper are: (i) demostrating the discriminative power of behaviour features for object recognition and classification, (ii) proposing a behavioural fuzzy classifier which progressively discriminate objects by including different degrees of uncertainty in the classification process and (iii) presenting a Surveillance Media Management system to extract semantic media information and provide unsupervised object classification from raw surveillance videos. The performance of the proposed system has been thoroughly evaluated on AVSS 2007 surveillance dataset and as the results indicate the proposed technique enhances object classification performance.
BibTeX:
@inproceedings{fernandez2011object,
  author = {Fernandez Arguedas, Virginia and Izquierdo, Ebroul},
  title = {Object Classification based on Behaviour Patterns},
  booktitle = {Imaging for Crime Detection and Prevention (ICDP 2011), 4th International Conference on},
  publisher = {IEEE},
  year = {2011},
  pages = {1--6},
  note = {google scholar entry: 4th International Conference on Imaging for Crime Detection and Prevention (ICDP 2011). London, England, 3-4 November 2011.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6203663},
  doi = {10.1049/ic.2011.0112}
}
Fernandez Arguedas V, Zhang Q, Chandramouli K and Izquierdo E (2011), "Multi-feature fusion for surveillance video indexing", In WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 13-15, 2011. Deflt, Netherlands, April, 2011, pp. 1-4. TU Delpht.
Abstract: In this paper, we present a part of surveillance centric indexing framework aimed at studying the performance of multi-feature fusion technique for indexing objects from surveillance videos. The multi-feature fusion algorithm determines an optimal metric for fusing low-level descriptors extracted from different feature space. These low-level descriptors exhibit a non-linear behaviour and typically consist of different similarity metrics. The framework also includes a motion analysis component for the extraction of objects as blobs from individual frames. The proposed framework, in particular the multi-feature fusion algorithm is evaluated against kernel machines for indexing objects such as car and person on AVSS 2007 surveillance dataset.
BibTeX:
@inproceedings{arguedas2011multi,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Multi-feature fusion for surveillance video indexing},
  booktitle = {WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 13-15, 2011},
  publisher = {TU Delpht},
  year = {2011},
  pages = {1--4},
  note = {google scholar entry: 12th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011). Deflt, Netherlands, 13-15 April 2011.},
  url = {http://repository.tudelft.nl/view/conferencepapers/uuid:99338628-4ea5-476c-8669-b0a43faeeeb9/}
}
Fernandez Arguedas V, Zhang Q, Chandramouli K and Izquierdo E (2011), "Multi-feature fusion for surveillance video indexing", In WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 13-15, 2011. Deflt, Netherlands, April, 2011, pp. 1-4. TU Delpht.
Abstract: In this paper, we present a part of surveillance centric indexing framework aimed at studying the performance of multi-feature fusion technique for indexing objects from surveillance videos. The multi-feature fusion algorithm determines an optimal metric for fusing low-level descriptors extracted from different feature space. These low-level descriptors exhibit a non-linear behaviour and typically consist of different similarity metrics. The framework also includes a motion analysis component for the extraction of objects as blobs from individual frames. The proposed framework, in particular the multi-feature fusion algorithm is evaluated against kernel machines for indexing objects such as car and person on AVSS 2007 surveillance dataset.
BibTeX:
@inproceedings{FernandezArguedas2011,
  author = {Fernandez Arguedas, Virginia and Zhang, Qianni and Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Multi-feature fusion for surveillance video indexing},
  booktitle = {WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 13-15, 2011},
  publisher = {TU Delpht},
  year = {2011},
  pages = {1--4},
  note = {fix google scholar entry: publication type, duplicate, publisher},
  url = {http://repository.tudelft.nl/view/conferencepapers/uuid:99338628-4ea5-476c-8669-b0a43faeeeb9/}
}
Gowing M, Kell P, O'Connor NE, Concolato C, Essid S, Lefeuvre J, Tournemenne R, Izquierdo E, Kitanovski V, Lin X and Zhang Q (2011), "Enhanced visualisation of dance performance from automatically synchronised multimodal recordings", In Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, Arizona, November, 2011, pp. 667-670. ACM.
Abstract: The Huawei/3DLife Grand Challenge Dataset provides multimodal recordings of Salsa dancing, consisting of audiovisual streams along with depth maps and inertial measurements. In this paper, we propose a system for augmented reality-based evaluations of Salsa dancer performances. An essential step for such a system is the automatic temporal synchronisation of the multiple modalities captured from different sensors, for which we propose efficient solutions. Furthermore, we contribute modules for the automatic analysis of dance performances and present an original software application, specifically designed for the evaluation scenario considered, which enables an enhanced dance visualisation experience, through the augmentation of the original media with the results of our automatic analyses.
BibTeX:
@inproceedings{Gowing2011,
  author = {Gowing, Marc and Kell, Philip and O'Connor, Noel E. and Concolato, Cyril and Essid, Slim and Lefeuvre, Jean and Tournemenne, Robin and Izquierdo, Ebroul and Kitanovski, Vlado and Lin, Xinyu and Zhang, Qianni},
  title = {Enhanced visualisation of dance performance from automatically synchronised multimodal recordings},
  booktitle = {Proceedings of the 19th ACM international conference on Multimedia},
  publisher = {ACM},
  year = {2011},
  pages = {667--670},
  note = {google scholar entry: 19th ACM international conference on Multimedia (MM 2011). Scottsdale, Arizona, 28 November - 1 December 2011.},
  url = {http://doras.dcu.ie/16579/2/gcp117-Gowing_Doras.pdf},
  doi = {10.1145/2072298.2072414}
}
Gowing M, Kell P, O'Connor NE, Concolato C, Essid S, Lefeuvre J, Tournemenne R, Izquierdo E, Kitanovski V, Lin X and Zhang Q (2011), "Enhanced visualisation of dance performance from automatically synchronised multimodal recordings", In Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, Arizona , pp. 667-670. ACM.
Abstract: The Huawei/3DLife Grand Challenge Dataset provides multimodal recordings of Salsa dancing, consisting of audiovisual streams along with depth maps and inertial measurements. In this paper, we propose a system for augmented reality-based evaluations of Salsa dancer performances. An essential step for such a system is the automatic temporal synchronisation of the multiple modalities captured from different sensors, for which we propose efficient solutions. Furthermore, we contribute modules for the automatic analysis of dance performances and present an original software application, specifically designed for the evaluation scenario considered, which enables an enhanced dance visualisation experience, through the augmentation of the original media with the results of our automatic analyses.
BibTeX:
@inproceedings{gowing2011enhanced,
  author = {Gowing, Marc and Kell, Philip and O'Connor, Noel E. and Concolato, Cyril and Essid, Slim and Lefeuvre, Jean and Tournemenne, Robin and Izquierdo, Ebroul and Kitanovski, Vlado and Lin, Xinyu and Zhang, Qianni},
  title = {Enhanced visualisation of dance performance from automatically synchronised multimodal recordings},
  booktitle = {Proceedings of the 19th ACM international conference on Multimedia},
  publisher = {ACM},
  year = {2011},
  pages = {667--670},
  note = {19th ACM international conference on Multimedia (MM 2011). Scottsdale, Arizona, 28 November - 1 December 2011.},
  url = {http://doras.dcu.ie/16579/2/gcp117-Gowing_Doras.pdf},
  doi = {10.1145/2072298.2072414}
}
Haji Mirza SN and Izquierdo E (2011), "Examining Visual Attention: a Method for Revealing Users' Interest for Images on Screen", In Quality of Multimedia Experience (QoMEX 2011), Proceedings of the Third International Workshop on. Mechelen, Belgium, September, 2011, pp. 207-212. IEEE.
Abstract: This report tries to measure users' interest in images that appear on the screen by monitoring their attention via eye-tracking. Our Gaze Inference System analyzes the gaze-movement features to assign a user interest level (UIL) from 0 to 1 to every image that appears on the screen. Because the properties of the gaze features for every user are different from others, the framework is designed to be user adaptive. This framework is capable of building a new processing system for every new user that starts experiencing it. The generated UILs can be used in different scenarios that use the users' interest as an input. The developed framework produces promising and reliable results where 10% of the target images that the users were searching for received UILs over 0.8 with precision of 100
BibTeX:
@inproceedings{haji2011examining,
  author = {Haji Mirza, Seyed Navid and Izquierdo, Ebroul},
  title = {Examining Visual Attention: a Method for Revealing Users' Interest for Images on Screen},
  booktitle = {Quality of Multimedia Experience (QoMEX 2011), Proceedings of the Third International Workshop on},
  publisher = {IEEE},
  year = {2011},
  pages = {207--212},
  note = {google scholar entry: 3rd International Workshop on Quality of Multimedia Experience (QoMEX 2011). Mechelen, Belgium, 7-9 September 2011.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6065706},
  doi = {10.1109/QoMEX.2011.6065706}
}
Haji Mirza SN, Proulx M and Izquierdo E (2011), "Gaze Movement Inference for User Adapted Image Annotation and Retrieval", In Proceedings of the ACM workshop on Social and Behavioural Networked Media Access (SBMA 2011). Scottsdale, Arizona, December, 2011, pp. 27-32. ACM.
Abstract: In media personalisation the media provider needs to receive feedbacks from its users to adapt media contents used for interaction. At the current stage this feedback is limited to mouse clicks and keyboard entries. This report explores the possible solutions to include the gaze movements of a user as a form of feedback for media personalisation and adaptation. Features are extracted from the gaze trajectory of users while they are searching in an image database for a Target Concept(TC). These features are used to measure a user's visual attention to every image appeared on the screen called user interest level(UIL). Because the reaction of different people to the same content are different, for every new user a new adapted processing interface is developed automatically. In average our interface could detect 10% of the images belonging to the TC class with no error and it could identify 40% of them with only 20% error. We show in this paper that the gaze movement is a reliable feedback to be used for measuring one's interest to images which help to personalise image annotation and retrieval.
BibTeX:
@inproceedings{haji2011gaze,
  author = {Haji Mirza, Seyed Navid and Proulx, Michael and Izquierdo, Ebroul},
  title = {Gaze Movement Inference for User Adapted Image Annotation and Retrieval},
  booktitle = {Proceedings of the ACM workshop on Social and Behavioural Networked Media Access (SBMA 2011)},
  publisher = {ACM},
  year = {2011},
  pages = {27--32},
  note = {google scholar entry: ACM workshop on Social and behavioural networked media access (SBNMA 2011)[MM 20011]. Scottsdale, Arizona, 1 December 2011.},
  url = {http://doi.acm.org/10.1145/2072627.2072636},
  doi = {10.1145/2072627.2072636}
}
Izquierdo E (2011), "Social Networked Media: Advances and Trends", In Proceedings of the ACM workshop on Social and Behavioural Networked Media Access (SBMA 2011). Scottsdale, Arizona, December, 2011, pp. 1-2. ACM.
Abstract: This paper provides an overview of major issues and challenges for social networked media, as presented during the keynote by the author at SBNMA: ACM Workshop on Social, Behavioural Networked Media Access.
BibTeX:
@inproceedings{izquierdo2011social,
  author = {Izquierdo, Ebroul},
  title = {Social Networked Media: Advances and Trends},
  booktitle = {Proceedings of the ACM workshop on Social and Behavioural Networked Media Access (SBMA 2011)},
  publisher = {ACM},
  year = {2011},
  pages = {1--2},
  note = {google scholar entry: ACM workshop on Social and behavioural networked media access (SBNMA 2011)[MM 20011]. Scottsdale, Arizona, 1 December 2011.},
  url = {http://doi.acm.org/10.1145/2072627.2072629},
  doi = {10.1145/2072627.2072629}
}
Kitanovski V and Izquierdo E (2011), "3D Tracking of Facial Features for Augmented Reality Applications", In WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 13-15, 2011. Deflt, Netherlands, April, 2011, pp. 1-4. TU Delpht.
Abstract: We present an algorithm for feature-based real-time 3D tracking of facial features and its application for visualization of virtual facial modifications. A non-linear Kalman-based estimator is used for 3D head pose calculation and accurate facial features localization. The 3D face model used is adapted to the particular user�s face by utilizing active shape model for facial landmarks detection, followed by z-depth progressive refining. Virtual facial modifications are performed by user-driven 3D-aware 2D warping of the image sequence. The evaluation of our system shows that the tracker is robust to moderate head movements, occlusion and facial animation, which results in quite realistically-looking virtual facial modifications.
BibTeX:
@inproceedings{kitanovski20113d,
  author = {Kitanovski, Vlado and Izquierdo, Ebroul},
  title = {3D Tracking of Facial Features for Augmented Reality Applications},
  booktitle = {WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 13-15, 2011},
  publisher = {TU Delpht},
  year = {2011},
  pages = {1--4},
  note = {google scholar entry: 12th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011). Deflt, Netherlands, 13-15 April 2011.},
  url = {http://repository.tudelft.nl/view/conferencepapers/uuid:4a107a5a-a5a1-4d06-af4f-5c8da20c709b/}
}
Kitanovski V and Izquierdo E (2011), "Augmented Reality Mirror for Virtual Facial Alterations", In Image Processing (ICIP 2011), Proceedings of the 18th International Conference on. Brussels, Belgium, September, 2011, pp. 1093-1096. IEEE.
Abstract: We present a system for virtual mirror experience that performs attentive facial geometric alterations in augmented reality. The virtual mirror is simulated using commonly available PC with webcam that capture, process and display video in real-time. High realism is obtained by considerate 3D-aware warping of the 2D captured video. A Kalman-based real-time face tracker is used for 3D head pose estimation and accurate facial features localization. The 3D face model used is adapted to the person in front of the mirror by utilizing active shape models for facial landmarks detection, followed by z-depth progressive refining. Geometric adjustments are performed on 3D face vertices while the 2D warping is calculated utilizing the location of back-projected-to-2D face model vertices. The evaluation of our system shows that realistic facial modifications can be rendered in scenarios that correspond to typical usage of a real mirror.
BibTeX:
@inproceedings{kitanovski2011augmented,
  author = {Kitanovski, Vlado and Izquierdo, Ebroul},
  title = {Augmented Reality Mirror for Virtual Facial Alterations},
  booktitle = {Image Processing (ICIP 2011), Proceedings of the 18th International Conference on},
  publisher = {IEEE},
  year = {2011},
  pages = {1093--1096},
  note = {google scholar entry: 18th International Conference on Image Processing (ICIP 2011). Brussels, Belgium, 11-14 September 2011.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6115616},
  doi = {10.1109/ICIP.2011.6115616}
}
Klavdianos P, Brasil L and Lamas J (2011), "A process for integrating RIS, PACS and Full-field Digital Mammography", In 2011 Pan American Health Care Exchanges (PAHCE 2011). Rio de Janeiro, Brazil, March, 2011, pp. 314-314. IEEE.
Abstract: The transition from Mammography based on films to Digital Mammography or FFDM (Full-field Digital Mammography) is not a simple task, involving solely the replacement of a few electronic components and the addition of new computers and software. This paper describes a process for integrating RIS (Radiology Information Systems), PACS (Picture Archiving and Communications Systems) and digital mammography equipments in order to help clinical engineers and IT professionals in planning, preparation and implementation of a FFDM environment in conformity with the data communication standards used in the medicine field.
BibTeX:
@inproceedings{klavdianos2011process,
  author = {Klavdianos, P.B.L. and Brasil, L.M. and Lamas, J.M.},
  title = {A process for integrating RIS, PACS and Full-field Digital Mammography},
  booktitle = {2011 Pan American Health Care Exchanges (PAHCE 2011)},
  publisher = {IEEE},
  year = {2011},
  pages = {314--314},
  note = {google scholar entry: 2011 Pan American Health Care Exchanges (PAHCE 2011). Rio de Janeiro, Brazil, 28 March - 1 April 2011.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5871912},
  doi = {10.1109/PAHCE.2011.5871912}
}
Klavdianos P, Souza E, Brasil L and Lamas J (2011), "Onto-mama: An ontology of the female breast anatomy applicable to a virtual learning environment", In 2011 Pan American Health Care Exchanges (PAHCE 2011). March, 2011, pp. 315-315. IEEE.
BibTeX:
@inproceedings{klavdianos2011onto,
  author = {Klavdianos, P.B.L. and Souza, E.K.F. and Brasil, L.M. and Lamas, J.M.},
  title = {Onto-mama: An ontology of the female breast anatomy applicable to a virtual learning environment},
  booktitle = {2011 Pan American Health Care Exchanges (PAHCE 2011)},
  publisher = {IEEE},
  year = {2011},
  pages = {315--315},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5871913},
  doi = {10.1109/PAHCE.2011.5871913}
}
Palašek P, Bosilj P and Šegvić S (2011), "Detecting and recognizing centerlines as parabolic sections of the steerable filter response", In Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011). Opatija, Croatia, May, 2011, pp. 903-908. IEEE.
Abstract: This paper is concerned with detection and recognition of road surface markings in video acquired from the driver's perspective. In particular, we focus on centerlines which separate the two road lanes with opposed traffic directions, since they are often the only markings in many urban and suburban roads. The proposed technique is based on detecting parabolic sections of the thresholded steerable filter response in inverse perspective images. The technique has been experimentally evaluated on production videos acquired from moving service vehicles. The obtained results are provided and discussed.
BibTeX:
@inproceedings{palasek2011detecting,
  author = {Palašek, Petar and Bosilj, Petra and Šegvić, Siniša},
  editor = { Biljanović, Petar and 
Skala, Karolj and
Golubić, Stjepan and
Bogunović, Nikola and
Ribarić, Slobodan and
Čičin-Šain, Marina and
Čišić, Dragan and
Hutinski, Željko and
Baranović, Mirta and
Mauher, Mladen and
Ordanić, Lea }, title = {Detecting and recognizing centerlines as parabolic sections of the steerable filter response}, booktitle = {Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011)}, publisher = {IEEE}, year = {2011}, pages = {903--908}, note = {google scholar entry: 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2011). Opatija, Croatia, 23-27 May 2011.}, url = {https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/4262/PALASEKDetectingAndRecognizing2011FINAL.pdf} }
Peixoto E, Zgaljic T and Izquierdo E (2011), "Application of Large Macroblocks in H.264/AVC to Wavelet-Based Scalable Video Transcoding", In Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, August, 2011, pp. 2171-2175. European Association for Signal Processing (EURASIP).
Abstract: In this paper an efficient transcoder from H.264/AVC to a wavelet-based scalable video (W-SVC) codec is proposed. It exploits the advantage of using large sizes of prediction blocks in the W-SVC codec, and it is flexible in the sense that it is able to cope with any prediction structure in H.264/AVC stream, such as IPP or IBBP configuration with multiple reference frames. The reference frame mismatch between the source and target codecs is solved by a novel framework for motion vector approximation and refinement. The transcoder performance benefits from the use of block sizes larger than 16�16, especially for higher resolution content. Experimental results show a very good performance in terms of decoded video quality and system complexity.
BibTeX:
@inproceedings{Peixoto2011,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Application of Large Macroblocks in H.264/AVC to Wavelet-Based Scalable Video Transcoding},
  booktitle = {Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011).},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2011},
  pages = {2171--2175},
  note = {google scholar entry: 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, 29 August - 2 September 2011.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2011/papers/1569425237.pdf}
}
Peixoto E, Zgaljic T and Izquierdo E (2011), "Application of Large Macroblocks in H.264/AVC to Wavelet-Based Scalable Video Transcoding", In Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, August, 2011, pp. 2171-2175. European Association for Signal Processing (EURASIP).
Abstract: In this paper an efficient transcoder from H.264/AVC to a wavelet-based scalable video (W-SVC) codec is proposed. It exploits the advantage of using large sizes of prediction blocks in the W-SVC codec, and it is flexible in the sense that it is able to cope with any prediction structure in H.264/AVC stream, such as IPP or IBBP configuration with multiple reference frames. The reference frame mismatch between the source and target codecs is solved by a novel framework for motion vector approximation and refinement. The transcoder performance benefits from the use of block sizes larger than 16�16, especially for higher resolution content. Experimental results show a very good performance in terms of decoded video quality and system complexity.
BibTeX:
@inproceedings{peixoto2011application,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Application of Large Macroblocks in H.264/AVC to Wavelet-Based Scalable Video Transcoding},
  booktitle = {Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011).},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2011},
  pages = {2171--2175},
  note = {google scholar entry: 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, 29 August - 2 September 2011.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2011/program.html}
}
Piatrik T and Izquierdo E (2011), "Multi-feature fusion in image clustering using ant-inspired methods", In Nature and Biologically Inspired Computing (NaBIC), Proceedings of the 2011 Third World Congress on. Salamanca, Spain, October, 2011, pp. 377-382. IEEE.
Abstract: Clustering of visual data is necessary for its effective organisation, summarisation and retrieval. In this paper, the appropriateness of biologically inspired models to tackle this problem is discussed and suitable strategies to solve this specific image processing task are derived. The proposed techniques are inspired by the optimal movements of ants and their biologically optimised colony behaviour. In the first proposal, the problem of multi-feature fusion using relevant, yet different, discriminative low-level features is tackled by Ant Colony Optimisation and its learning mechanism. In the second proposal, another metaheuristic model is applied. Here, the ability of ants to build live structures with their bodies is used in order to discover, in a distributed and unsupervised way, a tree-structured organisation of images. Finally, the proposed techniques are comprehensively evaluated and selected representative results are reported.
BibTeX:
@inproceedings{piatrik2011multi,
  author = {Piatrik, Tomas and Izquierdo, Ebroul},
  editor = {Abraham, Ajith and Corchado, Emilio and Berwick, Robert and de Carvalho, Andre and Zomaya, Albert and Yager, Ronald},
  title = {Multi-feature fusion in image clustering using ant-inspired methods},
  booktitle = {Nature and Biologically Inspired Computing (NaBIC), Proceedings of the 2011 Third World Congress on},
  publisher = {IEEE},
  year = {2011},
  pages = {377--382},
  note = {google scholar entry: Third World Congress on Nature and Biologically Inspired Computing (NaBIC 2011). Salamanca, Spain, 19-21 October 2011.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6089620},
  doi = {10.1109/NaBIC.2011.6089620}
}
Romero Macias C and Izquierdo E (2011), "Image CAPTCHA based on Distorted Faces", In Imaging for Crime Detection and Prevention 2011 (ICDP 2011), Proceedings of the 4th International Conference on. London, England, November, 2011, pp. 1-6. IET.
Abstract: An image recognition-based CAPTCHA is proposed for increasing security in web applications. The proposed method uses distorted faces to create an image for a CAPTCHA test. The user has to recognise the well-known person that appears in the image choosing the name from a list. The method uses a feature-line morphing technique to distort the faces which morphs the well-known person's face into a cartoon or an animal. The performance of this approach is evaluated through different face recognition systems. The results show an improvement in human recognition in comparison with word-based CAPTCHAs and an increment in robustness against robots when trying to break through the tests.
BibTeX:
@inproceedings{6203657,
  author = {Romero Macias, Cristina and Izquierdo, Ebroul},
  title = {Image CAPTCHA based on Distorted Faces},
  booktitle = {Imaging for Crime Detection and Prevention 2011 (ICDP 2011), Proceedings of the 4th International Conference on},
  publisher = {IET},
  year = {2011},
  pages = {1--6},
  note = {google scholar entry: 4th International Conference on Imaging for Crime Detection and Prevention (ICDP 2011). London, England, 3-4 November 2011},
  url = {http://digital-library.theiet.org/content/conferences/10.1049/ic.2011.0106},
  doi = {10.1049/ic.2011.0106}
}
Sanna M and Izquierdo E (2011), "Multirate Delivery of Scalable Video with Progressive Network Codes", In Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, August, 2011, pp. 2180-2184. European Association for Signal Processing (EURASIP).
Abstract: The future scenario in video content distribution will rely upon large interconnected systems. Multiple platforms need differentiated services. Scalable Video Coding is the upcoming standard solution for decoding multiple versions of the video from the same bitstream. We propose Network Coding to delivery seamlessly different versions of the video to users with different requirements on the same network. Network coding increases the network rate and provides error control at network level, sensibly improving the overall quality of transmission.
BibTeX:
@inproceedings{sanna2011multirate,
  author = {Sanna, Michele and Izquierdo, Ebroul},
  title = {Multirate Delivery of Scalable Video with Progressive Network Codes},
  booktitle = {Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011).},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2011},
  pages = {2180--2184},
  note = {google scholar entry: 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, 29 August - 2 September 2011.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2011/program.html}
}
Seneviratne L and Izquierdo E (2011), "A Mathematical Approach Towards Semi-Automatic Image Annotation", In Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, August, 2011, pp. 559-563. European Association for Signal Processing (EURASIP).
Abstract: In this paper, an interactive approach to obtain semantic annotations for images is presented. The proposed approach aims at what millions of single, online and cooperative gamers are keen to do, enjoy themselves in a competitive environment. It focuses on computer gaming and the use of humans in a widely distributed fashion. This approach deviates from the conventional ``content-based image retrieval (CBIR)'' paradigm favoured by the research community to tackle the problems related to the semantic annotation and tagging of multimedia contents. The proposed approach uses a multifaceted mathematical model based on game theories to aggregate numbers of different key-paradigms, such as Image Processing, Machine Learning and Game based approaches to generate accurate annotations. As a consequence, this approach is capable of identifying less-rational (cheating oriented) players, thus eliminating them from generating incorrect annotations. The performance of the proposed framework is tested with a number of game players. Result shows that this approach is capable of obtaining correct annotations in practice.
BibTeX:
@inproceedings{seneviratne2011mathematical,
  author = {Seneviratne, Lasantha and Izquierdo, Ebroul},
  title = {A Mathematical Approach Towards Semi-Automatic Image Annotation},
  booktitle = {Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011).},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2011},
  pages = {559--563},
  note = {google scholar entry: 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Catalonia, 29 August - 2 September 2011.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2011/program.html}
}
Wall J, McGinnity TM and Maguire LP (2011), "A Comparison of Sound Localisation Techniques using Cross-Correlation and Spiking Neural Networks for Mobile Robotics", In International Joint Conference on Neural Networks (IJCNN 2011), Proceedings of the 2011 IEEE. July, 2011, pp. 1981-1987. IEEE.
Abstract: This paper outlines the development of a cross-correlation algorithm and a spiking neural network (SNN) for sound localisation based on real sound recorded in a noisy and dynamic environment by a mobile robot. The SNN architecture aims to simulate the sound localisation ability of the mammalian auditory pathways by exploiting the binaural cue of interaural time difference (ITD). The medial superior olive was the inspiration for the SNN architecture which required the integration of an encoding layer which produced biologically realistic spike trains, a model of the bushy cells found in the cochlear nucleus and a supervised learning algorithm. The experimental results demonstrate that biologically inspired sound localisation achieved using a SNN can compare favourably to the more classical technique of cross-correlation.
BibTeX:
@inproceedings{wall2011comparison,
  author = {Wall, Julie and McGinnity, Thomas M. and Maguire, Liam P.},
  title = {A Comparison of Sound Localisation Techniques using Cross-Correlation and Spiking Neural Networks for Mobile Robotics},
  booktitle = {International Joint Conference on Neural Networks (IJCNN 2011), Proceedings of the 2011 IEEE},
  publisher = {IEEE},
  year = {2011},
  pages = {1981--1987},
  note = {google scholar entry: 2011 IEEE International Joint Conference on Neural Networks (IJCNN 2011). San Jose, California, 31 July - August 5 June 2011.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6033468},
  doi = {10.1109/IJCNN.2011.6033468}
}
Zhang Y, Yang H and Liu X (2011), "A line matching method based on local and global appearance", In Image and Signal Processing (CISP 2011), 4th International Congress on. Shanghai, China, October, 2011. Vol. 5, pp. 1381-1385. IEEE.
Abstract: This paper proposes a line matching method based on both local neighborhood gradient and global structure information of lines. Firstly we generate an initial set of line segment correspondences using our improved Mean Standard Deviation Line Descriptor (MSLD). Then candidate matches those violate the global topological structure of lines are removed to eliminate wrong matches. Finally iterative topological filter is used to search for more matches, whilst global angle constrains are implemented to get rid of wrong matches and make the algorithm more efficient. Experiments show that the proposed method is highly robust under the conditions of heavy illumination change, rotation, image blur, viewpoint change, scale change, etc. Comparisons on image database demonstrate that our method greatly outperforms the state-of-art methods in matching accuracy and efficiency.
BibTeX:
@inproceedings{zhang2011line,
  author = {Zhang, Yueqiang and Yang, Heng and Liu, Xiaolin},
  editor = {Qiu, Peihua and Xiang, Yong and Ding, Yongsheng and Li, Demin and Wang, Lipo},
  title = {A line matching method based on local and global appearance},
  booktitle = {Image and Signal Processing (CISP 2011), 4th International Congress on},
  publisher = {IEEE},
  year = {2011},
  volume = {5},
  pages = {1381--1385},
  note = {google scholar entry: 4th International Congress on Image and Signal Processing (CISP 2011). Shanghai, China, 15-17 October 2011.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6100414},
  doi = {10.1109/CISP.2011.6100414}
}

Presentations, Posters and Technical Reports

Sanna M, Ramzan N and Izquierdo E (2011), "Streaming of Scalable Video in Network-Coding-capable Networks: Progressive Code Design for Multirate Streaming". September, 2011.
BibTeX:
@misc{sanna2011streaming,
  author = {Sanna,Michele and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Streaming of Scalable Video in Network-Coding-capable Networks: Progressive Code Design for Multirate Streaming},
  booktitle = {Streaming Day Workshop (STDAY 2011), September 30, 2011 - Torino - Italy},
  publisher = {Politecnico di Torino},
  year = {2011},
  note = {google scholar entry: Streaming Day Workshop (STDAY 2011). Torino, Italy, 30 September 2011. },
  url = {http://www.telematica.polito.it/oldsite/stday2011/tech_program_stday2011.php}
}
Wall J, McGinnity TM and Maguire LP (2011), "Using the interaural time difference and cross-correlation to localise short-term complex noises".
BibTeX:
@misc{wall2011using,
  author = {Wall, Julie and McGinnity, Thomas M. and Maguire, Liam P.},
  title = {Using the interaural time difference and cross-correlation to localise short-term complex noises},
  booktitle = {Artificial Intelligence and Cognitive Science (AICS)},
  year = {2011},
  pages = {375},
  note = {Poster presented at AICS 2011},
  url = {http://www.eecs.qmul.ac.uk/~juliew/pub/papers/AICS2011.pdf}
}


2010

Journal Papers

Borges PVK and Izquierdo E (2010), "A Probabilistic Approach for Vision-Based Fire Detection in Videos", Circuits and Systems for Video Technology, IEEE Transactions on. May, 2010. Vol. 20(5), pp. 721-731. IEEE.
Abstract: Automated fire detection is an active research topic in computer vision. In this paper, we propose and analyze a new method for identifying fire in videos. Computer vision-based fire detection algorithms are usually applied in closed-circuit television surveillance scenarios with controlled background. In contrast, the proposed method can be applied not only to surveillance but also to automatic video classification for retrieval of fire catastrophes in databases of newscast content. In the latter case, there are large variations in fire and background characteristics depending on the video instance. The proposed method analyzes the frame-to-frame changes of specific low-level features describing potential fire regions. These features are color, area size, surface coarseness, boundary roughness, and skewness within estimated fire regions. Because of flickering and random characteristics of fire, these features are powerful discriminants. The behavioral change of each one of these features is evaluated, and the results are then combined according to the Bayes classifier for robust fire recognition. In addition, a priori knowledge of fire events captured in videos is used to significantly improve the classification results. For edited newscast videos, the fire region is usually located in the center of the frames. This fact is used to model the probability of occurrence of fire as a function of the position. Experiments illustrated the applicability of the method.
BibTeX:
@article{borges2010probabilistic,
  author = {Borges, Paulo Vinicius Koerich and Izquierdo, Ebroul},
  title = {A Probabilistic Approach for Vision-Based Fire Detection in Videos},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2010},
  volume = {20},
  number = {5},
  pages = {721--731},
  url = {http://www.paulovinicius.com/papers/borges_TCSVT_2010.pdf},
  doi = {10.1109/TCSVT.2010.2045813}
}
Caicedo JC and Izquierdo E (2010), "Combining Low-level Features for Improved Classification and Retrieval of Histology Images", Transactios on Mass-Data Analysis of Images and Signals. September, 2010. Vol. 2(1), pp. 68-82. IBaI Publishing.
Abstract: Feature combination for image classification and indexing is an important design aspect in modern image retrieval systems. It is particularly valuable in medical applications and specially in histology applications in which different features are extracted to estimate tissue composition and architecture. This paper presents an experimental evaluation of textural features combination for histology image classification and retrieval, following a late-fusion scheme. The main focus of this evaluation is oriented to feature normalization to guarantee fair conditions for feature comparison and integration. The experimental evaluation was carried out on a collection of histology images to evaluate the feature combination strategy. Experimental results show that it is possible to improve the system performance by appropriately considering the structure and distribution of visual features. Also, it is shown that feature combination may lead to a decreased performance due to fundamental differences between image descriptors.
BibTeX:
@article{caicedo2010combining,
  author = {Caicedo, Juan C. and Izquierdo, Ebroul},
  title = {Combining Low-level Features for Improved Classification and Retrieval of Histology Images},
  journal = {Transactios on Mass-Data Analysis of Images and Signals},
  publisher = {IBaI Publishing},
  year = {2010},
  volume = {2},
  number = {1},
  pages = {68--82},
  url = {http://www.informed.unal.edu.co/jccaicedo/papers/mda2010.pdf}
}
Glackin B, Wall J, McGinnity TM, Maguire LP and McDaid LJ (2010), "A spiking neural network model of the medial superior olive using spike timing dependent plasticity for sound localization", Frontiers in Computational Neuroscience. August, 2010. Vol. 4(18), pp. 1-16. Frontiers Research Foundation.
Abstract: Sound localization can be defined as the ability to identify the position of an input sound source and is considered a powerful aspect of mammalian perception. For low frequency sounds, i.e. in the range 270 Hz--1.5 KHz, the mammalian auditory pathway achieves this by extracting the Interaural Time Difference between sound signals being received by the left and right ear. This processing is performed in a region of the brain known as the Medial Superior Olive (MSO). This paper presents a Spiking Neural Network (SNN) based model of the MSO. The network model is trained using the Spike Timing Dependent Plasticity learning rule using experimentally observed Head Related Transfer Function data in an adult domestic cat. The results presented demonstrate how the proposed SNN model is able to perform sound localization with an accuracy of 91.82% when an error tolerance of 10textdegree is used. For angular resolutions down to 2.5 it will be demonstrated how software based simulations of the model incur significant computation times. The paper thus also addresses preliminary implementation on a Field Programmable Gate Array based hardware platform to accelerate system performance.
BibTeX:
@article{glackin2010spiking,
  author = {Glackin, Brendan and Wall, Julie and McGinnity, Thomas M. and Maguire, Liam P. and McDaid, Liam J.},
  title = {A spiking neural network model of the medial superior olive using spike timing dependent plasticity for sound localization},
  journal = {Frontiers in Computational Neuroscience},
  publisher = {Frontiers Research Foundation},
  year = {2010},
  volume = {4},
  number = {18},
  pages = {1--16},
  url = {http://www.frontiersin.org/computational_neuroscience/10.3389/fncom.2010.00018/abstract},
  doi = {10.3389/fncom.2010.00018}
}
Grzegorzek M (2010), "A system for 3D texture-based probabilistic object recognition and its applications", Pattern Analysis and Applications. August, 2010. Vol. 13, pp. 333-348. Springer.
Abstract: This article presents a system for texture-based probabilistic classification and localisation of three-dimensional objects in two-dimensional digital images and discusses selected applications. In contrast to shape-based approaches, our texture-based method does not rely on object features extracted using image segmentation techniques. Rather, the objects are described by local feature vectors computed directly from image pixel values using the wavelet transform. Both gray level and colour images can be processed. In the training phase, object features are statistically modelled as normal density functions. In the recognition phase, the system classifies and localises objects in scenes with real heterogeneous backgrounds. Feature vectors are calculated and a maximisation algorithm compares the learned density functions with the extracted feature vectors and yields the classes and poses of objects found in the scene. Experiments carried out on a real dataset of over 40,000 images demonstrate the robustness of the system in terms of classification and localisation accuracy. Finally, two important real application scenarios are discussed, namely recognising museum exhibits from visitors� own photographs and classification of metallography images.
BibTeX:
@article{Grzegorzek2010system,
  author = {Grzegorzek, Marcin},
  title = {A system for 3D texture-based probabilistic object recognition and its applications},
  journal = {Pattern Analysis and Applications},
  publisher = {Springer},
  year = {2010},
  volume = {13},
  pages = {333--348},
  url = {http://link.springer.com/article/10.1007/s10044-009-0163-0},
  doi = {10.1007/s10044-009-0163-0}
}
Grzegorzek M, Sav S, Izquierdo E and O'Connor NE (2010), "Local wavelet features for statistical object classification and localisation", MultiMedia, IEEE. January-March, 2010. Vol. 17(1), pp. 56-66. IEEE.
Abstract: This article presents a system for texture based probabilistic classification and localization of 3D objects in 2D digital images and discusses selected applications.
BibTeX:
@article{grzegorzek2009local,
  author = {Grzegorzek, Marcin and Sav, Sorin and Izquierdo, Ebroul and O'Connor, Noel E.},
  title = {Local wavelet features for statistical object classification and localisation},
  journal = {MultiMedia, IEEE},
  publisher = {IEEE},
  year = {2010},
  volume = {17},
  number = {1},
  pages = {56--66},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5415500},
  doi = {10.1109/MMUL.2010.16}
}
Grzegorzek M, Sav S, O'Connor NE and Izquierdo E (2010), "Local Wavelet Features for Statistical Object Classification and Localization", IEEE MultiMedia. January, 2010. Vol. 17(1), pp. 56-66. IEEE.
Abstract: This article presents a system for texture based probabilistic classification and localization of 3D objects in 2D digital images and discusses selected applications.
BibTeX:
@article{grzegorzek2010local,
  author = {Grzegorzek, Marcin and Sorin Sav and O'Connor, Noel E. and Izquierdo, Ebroul},
  title = {Local Wavelet Features for Statistical Object Classification and Localization},
  journal = {IEEE MultiMedia},
  publisher = {IEEE},
  year = {2010},
  volume = {17},
  number = {1},
  pages = {56--66},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5415500},
  doi = {10.1109/MMUL.2010.16}
}
Izquierdo E, Ho ATS, Kim HJ and Zhang Q (2010), "Special issue on Visual Information Engineering: Guest Editorial", Journal of Multimedia. April, 2010. Vol. 5(2), pp. 93-94. Academy Publisher.
BibTeX:
@article{Izquierdo2010b,
  author = {Izquierdo, Ebroul and Ho, Anthony T. S. and Kim, Hyoung Joong and Zhang, Qianni},
  title = {Special issue on Visual Information Engineering: Guest Editorial},
  journal = {Journal of Multimedia},
  publisher = {Academy Publisher},
  year = {2010},
  volume = {5},
  number = {2},
  pages = {93--94},
  url = {http://ojs.academypublisher.com/index.php/jmm/article/view/2762},
  doi = {10.4304/jmm.5.2.93-94}
}
Izquierdo E, Ho ATS, Kim HJ and Zhang Q (2010), "Special issue on Visual Information Engineering: Guest Editorial", Journal of Multimedia. April, 2010. Vol. 5(2), pp. 93-94. Academy Publisher.
BibTeX:
@article{izquierdo2010special,
  author = {Izquierdo, Ebroul and Ho, Anthony T. S. and Kim, Hyoung Joong and Zhang, Qianni},
  title = {Special issue on Visual Information Engineering: Guest Editorial},
  journal = {Journal of Multimedia},
  publisher = {Academy Publisher},
  year = {2010},
  volume = {5},
  number = {2},
  pages = {93--94},
  url = {http://ojs.academypublisher.com/index.php/jmm/article/download/05029394/1783},
  doi = {10.4304/jmm.5.2.93-94}
}
Janjusevic T, Benini S, Izquierdo E and Leonardi R (2010), "Random Assisted Browsing of Rushes Archives", Journal of Multimedia. April, 2010. Vol. 5(2), pp. 142-150. Academy Publisher.
Abstract: How to efficiently browse a large video database if its content is unknown to the user? In this paper we propose new approaches for browsing initialisation, exploration and content access of a rushes archive, where the span of information stored can be huge and difficult to understand at a glance. Exploring and navigating through raw footage is assisted by organising the video material in a meaningful structure and by adopting appropriate visualisation solutions. Un-annotated content is organised in hierarchical previews, while browsing is enabled by novel methods of random exploration and random content access to preview nodes. User tests conducted on professional users in a realwork scenario aim at demonstrating how the hierarchical visualisation and the proposed random browsing solutions assist the process of accessing and retrieving desired content.
BibTeX:
@article{janjusevic2010random,
  author = {Janjusevic, Tijana and Benini, Sergio and Izquierdo, Ebroul and Leonardi, Riccardo},
  title = {Random Assisted Browsing of Rushes Archives},
  journal = {Journal of Multimedia},
  publisher = {Academy Publisher},
  year = {2010},
  volume = {5},
  number = {2},
  pages = {142--150},
  url = {http://ojs.academypublisher.com/index.php/jmm/article/view/2768},
  doi = {10.4304/jmm.5.2.142-150}
}
Koelstra S, Pantic M and Patras I (2010), "A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models", Pattern Analysis and Machine Intelligence, IEEE Transactions on. March, 2010. Vol. 32(11), pp. 1940-1954. IEEE.
Abstract: In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e. sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set.
BibTeX:
@article{koelstra2010dynamic,
  author = {Koelstra, Sander and Pantic, Maja and Patras, Ioannis},
  title = {A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models},
  journal = {Pattern Analysis and Machine Intelligence, IEEE Transactions on},
  publisher = {IEEE},
  year = {2010},
  volume = {32},
  number = {11},
  pages = {1940--1954},
  url = {http://eprints.eemcs.utwente.nl/19457/01/pantic_a_dynamic_texture-based_approach.pdf},
  doi = {10.1109/TPAMI.2010.50}
}
Kumar BGV and Aravind R (2010), "Computationally efficient algorithm for face super-resolution using (2D)2-PCA based prior", Image Processing, IET. April, 2010. Vol. 4(2), pp. 61-69. IET Image Processing.
Abstract: Super-resolution algorithms typically transform images into 1D vectors and operate on these vectors to obtain a high-resolution image. In this study, the authors first propose a 2D method for super-resolution using a 2D model that treats images as matrices. We then apply this 2D model to the super-resolution of face images. Two-directional two-dimensional principal component analysis (PCA) [(2D)$^2$-PCA] is an efficient face representation technique where the images are treated as matrices instead of vectors. We use (2D)$^2$-PCA to learn the face subspace and use it as a prior to super-resolve face images. Experimental results show that our approach can reconstruct high quality face images with low computational cost.
BibTeX:
@article{kumar2010computationally,
  author = {Kumar, B. G. Vijay and Aravind, Rangarajan},
  title = {Computationally efficient algorithm for face super-resolution using (2D)$^2$-PCA based prior},
  journal = {Image Processing, IET},
  publisher = {IET Image Processing},
  year = {2010},
  volume = {4},
  number = {2},
  pages = {61--69},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5440738},
  doi = {10.1049/iet-ipr.2009.0072}
}
Peixoto E, de Queiroz RL and Mukherjee D (2010), "A "Wyner-Ziv" Video Transcoder", IEEE Transactions on Circuits and Systems for Video Technology. February, 2010. Vol. 20(2), pp. 189-200. IEEE.
Abstract: Wyner-Ziv (WZ) coding of video utilizes simple encoders and highly complex decoders. A transcoder from a WZ codec to a traditional codec can potentially increase the range of applications for WZ codecs. We present a transcoder scheme from the most popular WZ codec architecture to a differential pulse code modulation/discrete cosine transform codec. As a proof of concept, we implemented this transcoder using a simple pixel-domain WZ codec and the standard H.263+. The transcoder design aims at reducing complexity as a large amount of computation is saved by reusing the motion estimation, calculated at the side information generation process, and the l-frame streams. New approaches are used to generate side information and to map motion vectors for the transcoder. Results are presented to demonstrate the transcoder performance.
BibTeX:
@article{peixoto2010wyner,
  author = {Peixoto, Eduardo and de Queiroz, Ricardo L. and Mukherjee, Debargha},
  title = {A "Wyner-Ziv" Video Transcoder},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology},
  publisher = {IEEE},
  year = {2010},
  volume = {20},
  number = {2},
  pages = {189--200},
  url = {http://queiroz.divp.org/papers/wztranscoder.pdf},
  doi = {10.1109/TCSVT.2009.2031374}
}
Schreer O, Feldmann I, Alonso Mediavilla I, Concejero P, Sadka AH, Swash MR, Benini S, Leonardi R, Janjusevic T and Izquierdo E (2010), "RUSHES -- an annotation and retrieval engine for multimedia semantic units", Multimedia Tools and Applications. May, 2010. Vol. 48, pp. 23-49. Springer.
Abstract: Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. In this paper, we present a summary of the most relevant achievements of the RUSHES project, focusing on specific approaches for automatic annotation as well as the main features of the final RUSHES search engine.
BibTeX:
@article{schreer2010rushes,
  author = {Schreer, Oliver and Feldmann, Ingo and Alonso Mediavilla, Isabel and Concejero, Pedro and Sadka, Abdul H. and Swash, Mohammad Rafiq and Benini, Sergio and Leonardi, Riccardo and Janjusevic, Tijana and Izquierdo, Ebroul},
  title = {RUSHES -- an annotation and retrieval engine for multimedia semantic units},
  journal = {Multimedia Tools and Applications},
  publisher = {Springer},
  year = {2010},
  volume = {48},
  pages = {23--49},
  url = {http://link.springer.com/article/10.1007/s11042-009-0375-8},
  doi = {10.1007/s11042-009-0375-8}
}
Tsomko E, Kim H-J and Izquierdo E (2010), "Linear Gaussian blur evolution for detection of blurry images", Image Processing, IET. August, 2010. Vol. 4(4), pp. 302-312. IET.
Abstract: Even though state-of-the-art digital cameras are equipped with auto-focusing and motion compensation functions, several other factors including limited contrast, inappropriate exposure time and improper device handling can still lead to unsatisfactory image quality such as blurriness. Indeed, blurry images make up a significant percentage of anyone's picture collections. Consequently, an efficient tool to detect blurry images and label or separate them for automatic deletion in order to preserve storage capacity and the quality of image collections is needed. A new technique for automatic detection and removal of blurry pictures is presented. Initially, a set of interest points and local image areas is extracted. These areas are then evolved in time according to the conventional linear scale space. The gradient of the evolution curve through scale is then used to produce a 'blur graph' representing the probability of a picture being blurred or not. Complexity is kept low by applying a Monte-Carlo like technique for the selection of representative image areas and interest points and by implicitly estimating the gradient of the scale-space curve evolution. An exhaustive evaluation of the proposed technique is conducted to validate its performance in terms of detection accuracy and efficiency.
BibTeX:
@article{tsomko2010linear,
  author = {Tsomko, Elena and Kim, Hyoung-Joong and Izquierdo, Ebroul},
  title = {Linear Gaussian blur evolution for detection of blurry images},
  journal = {Image Processing, IET},
  publisher = {IET},
  year = {2010},
  volume = {4},
  number = {4},
  pages = {302--312},
  note = {In Special Section on VIE 2008},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5533184},
  doi = {10.1049/iet-ipr.2009.0001}
}
Zeljkovic V, Tameze C, Vincelette R and Izquierdo E (2010), "Different non-linear diffusion filters combined with triangle method used for noise removal from polygonal shapes", Image Processing, IET. August, 2010. Vol. 4(4), pp. 313-333. IEEE.
Abstract: A two-step process for removing noise from polygonal shapes is presented in this study. The authors present a polygonal shape as its turning function and then apply a non-linear diffusion filter and a triangle method on it. In the first step the authors apply several different non-linear diffusion filters on the turning function and compare the performance of these filters later. Non-linear diffusion filters identify dominant vertices in a polygon and remove those vertices that are identified as noise or irrelevant features. The vertices in the turning function which diffuse until the sides that immediately surround them approach the same turning function are identified as noise and removed. The vertices that are enhanced are preserved without changing their coordinates and they are identified as dominant ones. After the authors carry this process as far as it will go without introducing noticeable shape distortion, and switch to the triangle method for further removal of vertices that are to be treated as noise. In the second step the authors remove the vertices that form the smallest area triangles. The authors submit experimental results of the tests that demonstrate that this two-step process successfully removes vertices that should be dismissed as noise while preserving dominant vertices that can be accepted as relevant features and give a faithful description of the shape of the polygon. In experimental tests of this procedure the authors demonstrate successful removal of noise and excellent preservation of shape, thanks to appropriate emphasis of dominant vertices.
BibTeX:
@article{zeljkovic2010different,
  author = {Zeljkovic, Vesna and Tameze, Claude and Vincelette, Robert and Izquierdo, Ebroul},
  title = {Different non-linear diffusion filters combined with triangle method used for noise removal from polygonal shapes},
  journal = {Image Processing, IET},
  publisher = {IEEE},
  year = {2010},
  volume = {4},
  number = {4},
  pages = {313--333},
  note = {in Special Section on VIE 2008},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5533185},
  doi = {10.1049/iet-ipr.2008.0233}
}

Books and Chapters in Books

Asioli S, Ramzan N and Izquierdo E (2010), "Efficient Scalable Video Streaming over P2P Network", In User Centric Media. First International Conference, UCMedia 2009. Venice, Italy, December 2009. Revised Selected Papers. Venice, Italy, 12, 2010. Vol. 40, pp. 153-160. Springer.
Abstract: In this paper, we exploit the characteristics of scalable video and Peer-to-peer (P2P) network in order to propose an efficient streaming mechanism for scalable video. The scalable video is divided into chunks and prioritized with respect to its significance in the sliding window by an efficient proposed piece picking policy. Furthermore the neighbour selective policy is also proposed to receive the most important chunks from the good peers in the neighbourhood to maintain smooth content delivery of certain Quality of Service for the received video. Experimental evaluation of the proposed system clearly demonstrates the superiority of the proposed approach.
BibTeX:
@incollection{asioli2012exploitingstreaming,
  author = {Asioli, Stefano and Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Daras, Petros and Ibarra Mayora, Oscar },
  title = {Efficient Scalable Video Streaming over P2P Network},
  booktitle = {User Centric Media. First International Conference, UCMedia 2009. Venice, Italy, December 2009. Revised Selected Papers.},
  publisher = {Springer},
  year = {2010},
  volume = {40},
  pages = {153--160},
  note = {google scholar entry: 1st International Conference on User Centric Media (UCMedia 2009). Venice, Italy, 9-11 December 2009.},
  url = {http://www.springerlink.com/content/k488843266u054q0/},
  doi = {10.1007/978-3-642-12630-7_18}
}
Bozas K, Dimitriadis SI, Laskaris NA and Tzelepi A (2010), "A Novel Single-Trial Analysis Scheme for Characterizing the Presaccadic Brain Activity Based on a SON Representation", September, 2010. Vol. 6353, pp. 362-371. Springer.
BibTeX:
@incollection{bozas2010novel,
  author = {Bozas, Konstantinos and Dimitriadis, Stavros I. and Laskaris, Nikolaos A. and Tzelepi, Areti},
  editor = {Diamantaras, Konstantinos and Duch, Wlodek and Iliadis, Lazaros S.},
  title = {A Novel Single-Trial Analysis Scheme for Characterizing the Presaccadic Brain Activity Based on a SON Representation},
  publisher = {Springer},
  year = {2010},
  volume = {6353},
  pages = {362--371},
  note = {google scholar entry: Artificial Neural Networks (ICANN 2010), 20th International Conference, Thessaloniki, Greece, 15-18 September 2010.},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-15822-3_44},
  doi = {10.1007/978-3-642-15822-3_44}
}
Ramzan N, Zgaljic T and Izquierdo E (2010), "Scalable Video Coding: Source for Future Media Internet", In Towards the Future Internet: Emerging Trends from European Research, pp. 205-215. IOS Press.
Abstract: A flexible wavelet-based scalable video coding framework (W-SVC) is proposed to support future media internet, specifically content delivery to different display terminals through heterogeneous networks as the Future Internet. Scalable video bit-stream can easily be adapted to required spatio-temporal resolution and quality, according to the transmission and user context requirements. This enables content adaptation and interoperability in Internet networking environment. Adaptation of the bit-stream is performed in the compressed domain, by discarding the bit-stream portions that represent higher spatio-temporal resolution and/or quality than the desired. Thus, the adaptation is of very low complexity. Furthermore, the embedded structure of a scalable bit-stream provides a natural solution for protection of the video against transmission errors inherent to content transmission over Internet. The practical capabilities of the W-SVC are demonstrated by using the error resilient transmission and surveillance applications. The experimental result shows that the W-SVC framework provides effusive flexible architecture with respect to different application in future media internet.
BibTeX:
@incollection{ramzan2010scalable,
  author = {Ramzan, Naeem and Zgaljic, Toni and Izquierdo, Ebroul},
  editor = {Tselentis, Georgios and Galis, Alex and Gavras, Anastasius and Krco, Srdjan and Lotz, Volkmar and Simperl, Elena and Stiller, Burkhard and Zahariadis, Theodore},
  title = {Scalable Video Coding: Source for Future Media Internet},
  booktitle = {Towards the Future Internet: Emerging Trends from European Research},
  publisher = {IOS Press},
  year = {2010},
  pages = {205--215},
  url = {http://ebooks.iospress.nl/book/towards-the-future-internet-2},
  doi = {10.3233/978-1-60750-539-6-205}
}

Conference Papers

Akram M and Izquierdo E (2010), "Fast Multiframe Motion Estimation for Surveillance Videos", In Image Processing (ICIP 2010), Proceedings of the 17th International Conference on. Hong Kong, China, September, 2010, pp. 753-756.
Abstract: We propose a fast multiple reference frames based motion estimation technique for surveillance videos. In the very first reference frame of each motion vector search, successive elimination algorithm is used to find the best motion vector in the search window. For the remaining reference frames, a difference between current and previous reference frames is performed to indentify candidate matching blocks in the current reference frame. Different block matching strategies are proposed to find the optimum motion vector. Experimental evaluation shows that significant reduction in computational complexity can be achieved by applying the proposed strategy.
BibTeX:
@inproceedings{akram2010image,
  author = {Akram, Muhammad and Izquierdo, Ebroul},
  title = {Fast Multiframe Motion Estimation for Surveillance Videos},
  booktitle = {Image Processing (ICIP 2010), Proceedings of the 17th International Conference on},
  year = {2010},
  pages = {753--756},
  note = {google scholar entry: 17th International Conference on Image Processing (ICIP 2010). Hong Kong, China, 26-29 September 2010.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5652089},
  doi = {10.1109/ICIP.2010.5652089}
}
Akram M and Izquierdo E (2010), "A Multi-Pattern Search Algorithm for Block Motion Estimation in Video Coding", In Proceedings of the 12th International Asia-Pacific Web Conference (APWEB 2010). Busan, Korea, April, 2010, pp. 407-410. IEEE.
Abstract: In this paper, we propose a novel multi-pattern based search technique, TCon search, for fast block matching motion estimation. It starts with small cross shaped and small triangular shaped patterns. Afterwards, based on the previous step optimal motion vector, the search pattern for next step is selected. Except first and last step, each search step considers only three points thus reducing the number of search points significantly. Experimental results demonstrate that the proposed TCon search algorithm performs better than the well-known diamond search (DS) and cross-diamond-hexagonal search (CDHS) algorithms. Compared with the DS algorithm, the proposed TCon search performs up to 2.67 times faster in terms of search point computation and up to 1.67 times faster than CDHS algorithm while comparable quality of reconstructed sequence is maintained.
BibTeX:
@inproceedings{akram2010multi,
  author = {Akram, Muhammad and Izquierdo, Ebroul},
  editor = {Han, Wook-Shin and Srivastava, Divesh and Yu, Ge and Yu, Hwanjo and Huang, Zi Helen},
  title = {A Multi-Pattern Search Algorithm for Block Motion Estimation in Video Coding},
  booktitle = {Proceedings of the 12th International Asia-Pacific Web Conference (APWEB 2010)},
  publisher = {IEEE},
  year = {2010},
  pages = {407--410},
  note = {google scholar entry: 12th International Asia-Pacific Web Conference (APWEB 2010). Busan, Korea, 6-8 April 2010.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5474098},
  doi = {10.1109/APWeb.2010.74}
}
Akram M and Izquierdo E (2010), "Selective Block Search For Surveillance Centric Motion Estimation", In Proceedings of the 52nd International Symposium on Electronics in Marine (ELMAR 2010). Zadar, Croatia, September, 2010, pp. 93-96. Croatian Society of Electronics in Marine (ELMAR).
Abstract: In this paper, we propose a novel approach to perform the selective motion estimation specific to surveillance videos. A real-time background subtractor is used in which pixels representing motion activity are bounded in multiple rectangular boxes. Dimensions and coordinates of each bounding box are used to locate these pixels in the motion estimation module. Motion estimation is performed only for those blocks which overlap with bounding box. This improves utilization efficiency of computing resources by focusing on pixels which are important from surveillance standpoint. Experimental evaluation shows that significant reduction in computational complexity can be achieved by applying the proposed approach.
BibTeX:
@inproceedings{Akram2010selective,
  author = {Akram, Muhammad and Izquierdo, Ebroul},
  editor = {Grgić, Mislav and Božek, Jelena and Grgić, Sonja},
  title = {Selective Block Search For Surveillance Centric Motion Estimation},
  booktitle = {Proceedings of the 52nd International Symposium on Electronics in Marine (ELMAR 2010)},
  publisher = {Croatian Society of Electronics in Marine (ELMAR)},
  year = {2010},
  pages = {93--96},
  note = {google scholar entry: 52nd International Symposium on Electronics in Marine (ELMAR 2010). Zadar, Croatia, 15-17 September 2010.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5606090}
}
Akram M, Ramzan N and Izquierdo E (2010), "Selective Motion Estimation for Surveillance Videos", In User Centric Media. First International Conference, UCMedia 2009. Venice, Italy, December 2009. Revised Selected Papers. Venice, Italy, 12, 2010. Vol. 40, pp. 199-206. Springer.
Abstract: In this paper, we propose a novel approach to perform efficient motion estimation specific to surveillance videos. A real-time background subtractor is used to detect the presence of any motion activity in the sequence. Two approaches for selective motion estimation, GOP-by-GOP and Frame-by-Frame, are implemented. In the former, motion estimation is performed for the whole group of pictures (GOP) only when moving object is detected for any frame of the GOP. While for the latter approach; each frame is tested for the motion activity and consequently for selective motion estimation. Experimental evaluation shows that significant reduction in computational complexity can be achieved by applying the proposed strategy.
BibTeX:
@inproceedings{akram2010selective,
  author = {Akram, Muhammad and Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Daras, Petros and Ibarra Mayora, Oscar},
  title = {Selective Motion Estimation for Surveillance Videos},
  booktitle = {User Centric Media. First International Conference, UCMedia 2009. Venice, Italy, December 2009. Revised Selected Papers.},
  publisher = {Springer},
  year = {2010},
  volume = {40},
  pages = {199--206},
  note = {google scholar entry: 1st International Conference on User Centric Media (UCMedia 2009). Venice, Italy, 9-11 December 2009.},
  url = {http://books.google.co.uk/books?id=ti8WoFmQHdoC},
  doi = {10.1007/978-3-642-12630-7_23}
}
Aksay A, Kitanovski V, Vaiapury K, Onasoglou E, Agapito Prez-Moneo JD, Daras P and Izquierdo E (2010), "Robust 3D Tracking in Tennis Videos", In Proceedings of the Summer School ``ENGAGE''. Zermatt, Switzerland , September, 2010, pp. 1-15. MIRALab, University of Geneva.
Abstract: In this paper, we present a framework for robust 3D tracking in tennis multi-view videos. First, we propose feature-based method for automatic synchronization of multi-view sports videos. Next, we use a motion tracking method based on the modelling of the tracked objects local background with the help of a Self Organizing Map (SOM), followed by the construction of 2D Centre of Gravity Map (CoGM). Further, in order to nd 3D trajectory, we estimate the 3D locations using triangula- tion of correspondent 2D locations obtained from automatically synchro- nized videos. Finally, we use the obtained 2D locations back-projected from 3D to aid in the 2D tracking. The advantages of this system is to reduce the complexity and occlusions, thus improving the robustness and accuracy of the 3D tracking. Experiments results show that we man- aged to calculate accurate 3D locations using the multiple 2D trackers, regardless of their particular occlusion or eventual inaccurate tracking data.
BibTeX:
@inproceedings{Aksay2010,
  author = {Aksay, Anil and Kitanovski, Vlado and Vaiapury, Karthikeyan and Onasoglou, Efstathios and Agapito Prez-Moneo, Juan Diego and Daras, Petros and Izquierdo, Ebroul},
  title = {Robust 3D Tracking in Tennis Videos},
  booktitle = {Proceedings of the Summer School ``ENGAGE''},
  publisher = {MIRALab, University of Geneva},
  year = {2010},
  pages = {1--15},
  note = {google scholar entry: Summer School "ENGAGE". Zermatt, Switzerland, 13-15 September 2010.},
  url = {http://engage.miralab.ch/main_program.htm}
}
Aksay A, Kitanovski V, Vaiapury K, Onasoglou E, Agapito Prez-Moneo JD, Daras P and Izquierdo E (2010), "Robust 3D Tracking in Tennis Videos", In Proceedings of the Summer School ``ENGAGE''. Zermatt, Switzerland , September, 2010, pp. 1-15. MIRALab, University of Geneva.
Abstract: In this paper, we present a framework for robust 3D tracking in tennis multi-view videos. First, we propose feature-based method for automatic synchronization of multi-view sports videos. Next, we use a motion tracking method based on the modelling of the tracked objects local background with the help of a Self Organizing Map (SOM), followed by the construction of 2D Centre of Gravity Map (CoGM). Further, in order to nd 3D trajectory, we estimate the 3D locations using triangula- tion of correspondent 2D locations obtained from automatically synchro- nized videos. Finally, we use the obtained 2D locations back-projected from 3D to aid in the 2D tracking. The advantages of this system is to reduce the complexity and occlusions, thus improving the robustness and accuracy of the 3D tracking. Experiments results show that we man- aged to calculate accurate 3D locations using the multiple 2D trackers, regardless of their particular occlusion or eventual inaccurate tracking data.
BibTeX:
@inproceedings{aksay2010robust,
  author = {Aksay, Anil and Kitanovski, Vlado and Vaiapury, Karthikeyan and Onasoglou, Efstathios and Agapito Prez-Moneo, Juan Diego and Daras, Petros and Izquierdo, Ebroul},
  title = {Robust 3D Tracking in Tennis Videos},
  booktitle = {Proceedings of the Summer School ``ENGAGE''},
  publisher = {MIRALab, University of Geneva},
  year = {2010},
  pages = {1--15},
  note = {google scholar entry: Summer School "ENGAGE". Zermatt, Switzerland, 13-15 September 2010.},
  url = {http://engage.miralab.ch/main_program.htm}
}
Asioli S, Ramzan N and Izquierdo E (2010), "A Novel Technique for Efficient Peer-to-Peer Scalable Video Transmission", In Proceedings of the 18th European Signal Processing Conference (EUSIPCO 2010). Aalborg, Denmark , pp. 2047-2051. EUSIPCO.
Abstract: In this paper, we exploit the characteristics of scalable video and Peer-to-peer (P2P) network in order to propose an e cient streaming mechanism for scalable video. The scalable video is divided into chunks and prioritised with respect to its significance in the sliding window by an efficient proposed piece picking policy. Furthermore, a neighbour selective policy is also proposed to receive the most important chunks from the good peers in the neighbourhood to maintain smooth content delivery of certain Quality of Service for the received video. Experimental evaluation of the proposed system clearly demonstrates the superiority of this approach.
BibTeX:
@inproceedings{asioli2010atransmission,
  author = {Asioli, Stefano and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {A Novel Technique for Efficient Peer-to-Peer Scalable Video Transmission},
  booktitle = {Proceedings of the 18th European Signal Processing Conference (EUSIPCO 2010)},
  publisher = {EUSIPCO},
  year = {2010},
  pages = {2047--2051},
  note = {google scholar entry: 18th European Signal Processing Conference (EUSIPCO 2010). Aalborg, Denmark, 23-27 August 2010.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2010/Contents/showProgram.php.html}
}
Bulović A, Bučar D, Palašek P, Popović B, Zadrija L, Brkić K, Kalafatić Z and Šegvić S (2010), "Streamlining collection of training samples for object detection and classification in video", In Proceedings of the 33rd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2010). Opatija, Croatia, May, 2010, pp. 728-733. IEEE.
Abstract: This paper is concerned with object recognition and detection in computer vision. Many promising approaches in the field exploit the knowledge contained in a collection of manually annotated training samples. In the resulting paradigm, the recognition algorithm is automatically constructed by some machine learning technique. It has been shown that the quantity and quality of positive and negative training samples is critical for good performance of such approaches. However, collecting the samples requires tedious manual effort which is expensive in time and prone to error. In this paper we present design and implementation of a software system which addresses these problems. The system supports an iterative approach whereby the current state-of-the-art detection and recognition algorithms are used to streamline the collection of additional training samples. The presented experiments have been performed in the frame of a research project aiming at automatic detection and recognition of traffic signs in video.
BibTeX:
@inproceedings{bulovic2010streamlining,
  author = {Bulović, Ana and Bučar, Damir and Palašek, Petar and Popović, Bojan and Zadrija, Lucija and Brkić, Karla and Kalafatić, Zoran and Šegvić, Siniša},
  editor = { Biljanović, Petar and 
Skala, Karolj and
Golubić, Stjepan and
Bogunović, Nikola and
Ribarić, Slobodan and
Čičin-Šain, Marina and
Čišić, Dragan and
Hutinski, Željko and
Baranović, Mirta and
Mauher, Mladen and
Pletikosa, Marko }, title = {Streamlining collection of training samples for object detection and classification in video}, booktitle = {Proceedings of the 33rd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2010)}, publisher = {IEEE}, year = {2010}, pages = {728--733}, note = {google scholar entry: 33rd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2010). Opatija, Croatia, 24-28 May 2010.}, url = {https://qmro.qmul.ac.uk/jspui/bitstream/123456789/4261/2/BULOVICStreamliningCollection2010FINAL.pdf} }
Chandramouli K and Izquierdo E (2010), "Semantic Structuring and Retrieval of Event Chapters in Social Photo Collections", In Proceedings of the 11th international conference on Multimedia Information Retrieval (MIR 2010). Philadelphia, Pennsylvania, March, 2010, pp. 507-516. ACM.
Abstract: The phenomenal growth of multimedia content on the web over the last couple of decades has paved the way for content management systems integrating intelligent information retrieval and indexing techniques. Also, in order to improve the performance of retrieval techniques while searching and navigating the database, many relevance feedback algorithms are implemented, in which the subjective semantics of individual users are included in the image search. Following the recent developments in social networking, there is an emerging interest to share experiences online with friends using multimedia data. As the experiences to be shared among social peers vary from a simple social gathering to a tourism visit with a group of peers, there is a critical need for intelligent content management tools driven by a social perspective. Addressing the challenges related to socially-driven content management, the objective of this paper is twofold. First, we investigate techniques to intelligently structure multimedia content to enable efficient browsing of photo albums. The proposed structuring schemes exploit EXIF metadata, visual content and social peer relationships. Second, we propose a retrieval model based on social context to identify users with similar interests. The retrieval model aims to allow increased interaction among social peers. The proposed techniques have been evaluated against tourism pictures captured across Europe.
BibTeX:
@inproceedings{Chandramouli2010semantic,
  author = {Chandramouli, Krishna and Izquierdo, Ebroul},
  editor = {Wang, James Ze and Boujemaa, Nozha and Ramirez, Nuria Oliver and Natsev, Apostol},
  title = {Semantic Structuring and Retrieval of Event Chapters in Social Photo Collections},
  booktitle = {Proceedings of the 11th international conference on Multimedia Information Retrieval (MIR 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {507--516},
  note = {google scholar entry: 11th international conference on Multimedia Information Retrieval (MIR 2010). Philadelphia, Pennsylvania, 29-31 March 2010.},
  url = {http://dl.acm.org/citation.cfm?doid=1743384.1743472},
  doi = {10.1145/1743384.1743472}
}
Chandramouli K, Kliegr T, Piatrik T and Izquierdo E (2010), "QMUL@MediaEval 2010 Tagging Task: Semantic Query Expansion for Predicting User Tags", In Working Notes Proceedings of the MediaEval 2010 Workshop. Pisa, Italy, October, 2010, pp. 1-2. MediaEval Multimedia Benchmark.
Abstract: This paper describes our participation in ``The Wild Wild Web Tagging Task @ MediaEval 2010'', which aims to predict user tags based on features derived from video such as speech, audio, visual content or associated textual or social information. Two tasks were pursued: (i) closed-set annotations and (ii) open-set annotations. We have attempted to evaluate whether using only a limited number of features (video title, filename and description) can be compensated by semantic expansion with NLP tools and Wikipedia and Wordnet. This technique proved successful on the open-set task with approximately 20% generated tags being considered relevant by all manual annotators. On the closed-set task, the best result (MAP 0.3) was achieved on tokenized filenames combined with video descriptions, indicating that filenames are a valuable tag predictor.
BibTeX:
@inproceedings{chandramouli2010qmul,
  author = {Chandramouli, Krishna and Kliegr, Tomas and Piatrik, Tomas and Izquierdo, Ebroul},
  editor = {Larson, Martha, Soleymani, Mohammad, Serdyukov, Pavel, Murdock, Vanessa and Jones, Gareth},
  title = {"QMUL @ MediaEval" 2010 Tagging Task: Semantic Query Expansion for Predicting User Tags},
  booktitle = {Working Notes Proceedings of the MediaEval 2010 Workshop},
  publisher = {MediaEval Multimedia Benchmark},
  year = {2010},
  pages = {1--2},
  note = {google scholar entry: 2010 Multimedia Benchmark Workshop (MediaEval 2010). Pisa, Italy, 24 October 2010},
  url = {http://www.multimediaeval.org/mediaeval2010/2010worknotes/index.html}
}
Conci N and Izquierdo E (2010), "Detection and Enhancement of Moving Objects in Surveillance Centric Coding", In Acoustics Speech and Signal Processing (ICASSP 2010), Proceedings of the 35th IEEE International Conference on. Dallas, Texas, March, 2010, pp. 1406-1409. IEEE.
Abstract: The coexistence of multiple cameras, especially in wireless video surveillance systems imposes severe constraints in terms of computational resources, power supply and bandwidth. These limitations hamper coding and transmission of high-quality video streams. In this paper, we propose a new approach for video coding and transmission of surveillance video. It integrates the coding sub-system and the motion detection module to enhance the quality of moving objects in low-bit rate streams. At the sender side, videos are downsampled and compressed. At the decoder, the video is upsampled and the foreground quality is enhanced by detecting meaningful edges of moving objects via the Hough transform. The quality of the background is also enhanced through progressive update of the background model.
BibTeX:
@inproceedings{conci2010detection,
  author = {Conci, Nicola and Izquierdo, Ebroul},
  title = {Detection and Enhancement of Moving Objects in Surveillance Centric Coding},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2010), Proceedings of the 35th IEEE International Conference on},
  publisher = {IEEE},
  year = {2010},
  pages = {1406--1409},
  note = {google scholar entry: 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010). Dallas, Texas, 14-19 March 2010.},
  url = {http://disi.unitn.it/~conci/nix/Publications_files/ICASSP_hough_CR.pdf},
  doi = {10.1109/ICASSP.2010.5495465}
}
Haji Mirza SN and Izquierdo E (2010), "Finding the User's Interest Level from their Eyes", In Proceedings of the 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010). Firenze, Italy, October, 2010, pp. 25-28. ACM.
Abstract: An innovative model is proposed that empowers the semi-automatic image annotation algorithms with the implicit feedback of the users' eyes. This frame work extracts the features from the users' gaze pattern over an image by the help of the eye-trackers and combines them with the low level feature properties of that image. The resulting feature vector is sent to a fuzzy inference framework which grades the users' interest in the visited images. By defining a threshold in the middle of the interest level, the images can be classified in case the user is searching for a target concept. In addition to classifying the user's interest level enables us to cluster the visited images according to the users concerns. The primary results show that this model can classify the images with a F1 measure over 0.52.
BibTeX:
@inproceedings{haji2010finding,
  author = {Haji Mirza, Seyed Navid and Izquierdo, Ebroul},
  title = {Finding the User's Interest Level from their Eyes},
  booktitle = {Proceedings of the 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {25--28},
  note = {google scholar entry: 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://doi.acm.org/10.1145/1878061.1878070},
  doi = {10.1145/1878061.1878070}
}
Haji Mirza SN and Izquierdo E (2010), "Gaze Movement Inference for Implicit Image Annotation", In WIAMIS 2010: 11th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 12-14, 2010. Desenzano del Garda, Italy, April, 2010, pp. 1-4. IEEE.
Abstract: An innovative semi-automatic image annotation system is enriched with the feedbacks of user's eyes. This system implicitly exploits the competence of human mind and it utilizes the computational power of the computers in order to achieve a pervasive and accurate annotation. This method requires minimal user interaction. It makes it suitable to be used in distributed environments while the users are performing their usual daily surfing. The user's gaze state on the trial screen is monitored and interpreted by an interface promoted by fuzzy inference. The preliminary results indicate that in a multi-user environment the annotating precision of the system is over 80% with the Recall between 6080
BibTeX:
@inproceedings{haji2010gaze,
  author = {Haji Mirza, Seyed Navid and Izquierdo, Ebroul},
  title = {Gaze Movement Inference for Implicit Image Annotation},
  booktitle = {WIAMIS 2010: 11th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands, April 12-14, 2010},
  publisher = {IEEE},
  year = {2010},
  pages = {1--4},
  note = {google scholar entry: 11th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2010). Desenzano del Garda, Italy, 12-14 April 2010.},
  url = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5608542}
}
Haji Mirza SN and Izquierdo E (2010), "Finding the User's Interest Level from their Eyes", In Proceedings of the 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010). Firenze, Italy, October, 2010, pp. 25-28. ACM.
Abstract: An innovative model is proposed that empowers the semi-automatic image annotation algorithms with the implicit feedback of the users' eyes. This frame work extracts the features from the users' gaze pattern over an image by the help of the eye-trackers and combines them with the low level feature properties of that image. The resulting feature vector is sent to a fuzzy inference framework which grades the users' interest in the visited images. By defining a threshold in the middle of the interest level, the images can be classified in case the user is searching for a target concept. In addition to classifying the user's interest level enables us to cluster the visited images according to the users concerns. The primary results show that this model can classify the images with a F1 measure over 0.52.
BibTeX:
@inproceedings{HajiMirza2010,
  author = {Haji Mirza, Seyed Navid and Izquierdo, Ebroul},
  title = {Finding the User's Interest Level from their Eyes},
  booktitle = {Proceedings of the 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {25--28},
  note = {google scholar entry: 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://doi.acm.org/10.1145/1878061.1878070},
  doi = {10.1145/1878061.1878070}
}
Izquierdo E, Cai Y, Zhang Q and Garca-Herranz M (2010), "ACM workshop on surreal media and virtual cloning", In Proceedings of the 18th international conference on MultiMedia (MM 2010). Firenze, Italy, October, 2010, pp. 1767-1768. ACM.
Abstract: This paper gives an overview of ACM Multimedia 2010 Workshop on Surreal Media and Virtual Cloning, including research work towards the creation of surreal media and realistic 3D virtual environments where virtual humans and objects can interact remotely. The primary objective is to discuss key research issues related to the generation of surreal media and 3D cooperative virtual worlds. We expect that the one-day program will bring together research groups from related fields and explore research problems, potential applications and collaborative opportunities.
BibTeX:
@inproceedings{Izquierdo2010,
  author = {Izquierdo, Ebroul and Cai, Yang and Zhang, Qianni and Garca-Herranz, Manuel},
  title = {ACM workshop on surreal media and virtual cloning},
  booktitle = {Proceedings of the 18th international conference on MultiMedia (MM 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {1767--1768},
  note = {google scholar entry: 18th international conference on MultiMedia (MM 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://doi.acm.org/10.1145/1873951.1874361},
  doi = {10.1145/1873951.1874361}
}
Izquierdo E, Cai Y, Zhang Q and Garca-Herranz M (2010), "ACM Workshop on Surreal Media and Virtual Cloning", In Proceedings of the 18th International Conference on Multimedia (MM 2010). Firenze, Italy, October, 2010, pp. 1767-1768. ACM.
Abstract: This paper gives an overview of ACM Multimedia 2010 Workshop on Surreal Media and Virtual Cloning, including research work towards the creation of surreal media and realistic 3D virtual environments where virtual humans and objects can interact remotely. The primary objective is to discuss key research issues related to the generation of surreal media and 3D cooperative virtual worlds. We expect that the one-day program will bring together research groups from related fields and explore research problems, potential applications and collaborative opportunities.
BibTeX:
@inproceedings{izquierdo2010acm,
  author = {Izquierdo, Ebroul and Cai, Yang and Zhang, Qianni and Garca-Herranz, Manuel},
  editor = {Del Bimbo, Alberto and Chang, Shih-Fu and Smeulders, Arnold W. M.},
  title = {ACM Workshop on Surreal Media and Virtual Cloning},
  booktitle = {Proceedings of the 18th International Conference on Multimedia (MM 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {1767--1768},
  note = {google scholar entry: 18th International Conference on Multimedia (MM 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://dl.acm.org/citation.cfm?doid=1873951.1874361},
  doi = {10.1145/1873951.1874361}
}
Izquierdo E, Piatrik T and Zhang Q (2010), "3DLife: Bringing the Media Internet to Life", In Leveraging Applications of Formal Methods, Verification, and Validation: 4th International Symposium on Leveraging Applications, ISoLA 2010, Heraklion, Crete, Greece, October 18-21, 2010, Proceedings. Heraklion, Crete Vol. 6416, pp. 13-14. Springer.
Abstract: ``Bringing the Media Internet to Life'' - or simply, 3DLife - is a European Union funded project that aims to integrate research conducted within Europe in the field of Media Internet. In this contribution, we give an overview of the project?s main objectives and activities.
BibTeX:
@inproceedings{izquierdo20103dlife,
  author = {Izquierdo, Ebroul and Piatrik, Tomas and Zhang, Qianni},
  editor = {Margaria, Tiziana and Steffen, Bernhard},
  title = {3DLife: Bringing the Media Internet to Life},
  booktitle = {Leveraging Applications of Formal Methods, Verification, and Validation: 4th International Symposium on Leveraging Applications, ISoLA 2010, Heraklion, Crete, Greece, October 18-21, 2010, Proceedings},
  publisher = {Springer},
  year = {2010},
  volume = {6416},
  pages = {13--14},
  note = {google scholar entry: 4th International Symposium on Leveraging Applications (ISoLA 2010). Heraklion, Crete, 18-21 October 2010.},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-642-16561-0_4},
  doi = {10.1007/978-3-642-16561-0_4}
}
Izquierdo E, Piatrik T and Zhang Q (2010), "3DLife: Bringing the Media Internet to Life", In Leveraging Applications of Formal Methods, Verification, and Validation: 4th International Symposium on Leveraging Applications, ISoLA 2010, Heraklion, Crete, Greece, October 18-21, 2010, Proceedings. Heraklion, Crete Vol. 6416, pp. 13-14. Springer.
Abstract: ``Bringing the Media Internet to Life'' - or simply, 3DLife - is a European Union funded project that aims to integrate research conducted within Europe in the field of Media Internet. In this contribution, we give an overview of the project?s main objectives and activities.
BibTeX:
@inproceedings{Izquierdo2010a,
  author = {Izquierdo, Ebroul and Piatrik, Tomas and Zhang, Qianni},
  editor = {Margaria, Tiziana and Steffen, Bernhard},
  title = {3DLife: Bringing the Media Internet to Life},
  booktitle = {Leveraging Applications of Formal Methods, Verification, and Validation: 4th International Symposium on Leveraging Applications, ISoLA 2010, Heraklion, Crete, Greece, October 18-21, 2010, Proceedings},
  publisher = {Springer},
  year = {2010},
  volume = {6416},
  pages = {13--14},
  note = {google scholar entry: 4th International Symposium on Leveraging Applications (ISoLA 2010). Heraklion, Crete, 18-21 October 2010.},
  url = {http://link.springer.com/chapter/10.1007%2F978-3-642-16561-0_4},
  doi = {10.1007/978-3-642-16561-0_4}
}
Janjusevic T, Zhang Q, Chandramouli K and Izquierdo E (2010), "Concept based interactive retrieval for social environment", In Proceedings of the 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access (SAPMIA 2010). Firenze, Italy, October, 2010, pp. 15-20. ACM.
Abstract: Following the recent developments in social networking, there is an emerging interest to share experiences online with social peers through multimedia data. Consequently, exponential amount of multimedia information has been generated by everyday users and shared among social peers. As opposed to conventional digital archives, the user generated content archive does not confine to one particular domain and therefore semantic indexing of the content requires the creation of large number of training samples for each semantic query concept. Addressing this problem, we present an interactive multi-concept based browsing and retrieval framework using which users can construct high-level semantic queries based on mid-level primitive features. The proposed framework integrates innovative visualisation methodology developed for browsing, navigating and retrieving information from multimedia database. The framework is user centric and supports interactive formulation of high-level semantic queries for content retrieval using available content annotation. The performance of the proposed framework is evaluated using annotation based on automatic algorithms against Support Vector Machines, Multi-feature classification and particle swarm optimisation based relevance feedback techniques.
BibTeX:
@inproceedings{janjusevic2010concept,
  author = {Janjusevic, Tijana and Zhang, Qianni and Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Concept based interactive retrieval for social environment},
  booktitle = {Proceedings of the 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access (SAPMIA 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {15--20},
  note = {google scholar entry: 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access (SAPMIA 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://dl.acm.org/citation.cfm?id=1878067},
  doi = {10.1145/1878061.1878067}
}
Koelstra S, Yazdani A, Soleymani M, Mühl C, Lee J, Nijholt A, Pun T, Ebrahimi T and Patras I (2010), "Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos", In Brain Informatics, International Conference (BI 2010). Toronto, Ontario, 28-30 August 2010. Proceedings. Toronto, Ontario, August, 2010. Vol. 6334, pp. 89-100. Springer.
Abstract: Recently, the field of automatic recognition of users� affective states has gained a great deal of attention. Automatic, implicit recognition of affective states has many applications, ranging from personalized content recommendation to automatic tutoring systems. In this work, we present some promising results of our research in classification of emotions induced by watching music videos. We show robust correlations between users� self-assessments of arousal and valence and the frequency powers of their EEG activity. We present methods for single trial classification using both EEG and peripheral physiological signals. For EEG, an average (maximum) classification rate of 55.7% (67.0 for arousal and 58.8% (76.0 for valence was obtained. For peripheral physiological signals, the results were 58.9% (85.5 for arousal and 54.2% (78.5 for valence.
BibTeX:
@inproceedings{koelstra2010single,
  author = { Sander Koelstra and
Ashkan Yazdani and
Mohammad Soleymani and
Christian Mühl and
Jong-Seok Lee and
Anton Nijholt and
Thierry Pun and
Touradj Ebrahimi and
Ioannis Patras}, editor = { Yiyu Yao and
Ron Sun and
Tomaso Poggio and
Jiming Liu and
Ning Zhong and
Jimmy Huang }, title = {Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos}, booktitle = {Brain Informatics, International Conference (BI 2010). Toronto, Ontario, 28-30 August 2010. Proceedings.}, publisher = {Springer}, year = {2010}, volume = {6334}, pages = {89--100}, note = {google scholar entry: International Conference on Brain Informatics (BI 2010). Toronto, Ontario, 28-30 August 2010.}, url = {http://infoscience.epfl.ch/record/149960/files/BI_paper.pdf}, doi = {10.1007/978-3-642-15314-3_9} }
Kumar BGV and Patras I (2010), "A Discriminative Voting Scheme for Object Detection using Hough Forests", In Proceedings of the BMVC 2010 UK postgraduate workshop. Aberystwyth, Wales , pp. 3.1-3.10. BMVA Press.
Abstract: Variations of the Implicit Shape Models (ISM) have been extensively used for partbased object detection. Such methods model the information object parts provide about the location of the center and the size of the object in question. Recent object detection techniques employ the generalized Hough transform using random forests, constructing the trees using existing generic criteria for this purpose. In this work, we propose adiscriminative criterion for the tree construction, that aims explicitly at maximizing the response at the true object locations in the Hough space while suppressing it at all other locations. To do so, we exploit the knowledge of the object locations in the training images. During training, the Hough images are computed at each node for every training image using the votes from the corresponding training patches. This enables us to utilize a new criterion that discriminates the object locations from the background in the actual Hough space in comparison to the methods that employ classical tree construction criteria. The proposed algorithm results in Hough images with high responses at the object locations and fewer false positives. We present results on several publicly available datasets to demonstrate the effectiveness of the algorithm.
BibTeX:
@inproceedings{kumar2010discriminative,
  author = {Kumar, B. G. Vijay and Patras, Ioannis},
  title = {A Discriminative Voting Scheme for Object Detection using Hough Forests},
  booktitle = {Proceedings of the BMVC 2010 UK postgraduate workshop},
  publisher = {BMVA Press},
  year = {2010},
  pages = {3.1--3.10},
  url = {http://www.bmva.org/bmvc/2010/workshop/paper3/}
}
Lee J-S, De Simone F, Ramzan N, Zhao Z, Kurutepe E, Sikora T, Ostermann J, Izquierdo E and Ebrahimi T (2010), "Subjective evaluation of scalable video coding for content distribution", In Proceedings of the 18th International Conference on Multimedia (MM 2010). Firenze, Italy, October, 2010, pp. 65-72. ACM.
Abstract: This paper investigates the influence of the combination of the scalability parameters in scalable video coding (SVC) schemes on the subjective visual quality. We aim at providing guidelines for an adaptation strategy of SVC that can select the optimal scalability options for resource-constrained networks. Extensive subjective tests are conducted by using two different scalable video codecs and high definition contents. The results are analyzed with respect to five dimensions, namely, codec, content, spatial resolution, temporal resolution, and frame quality.
BibTeX:
@inproceedings{lee2010subjective,
  author = {Lee, Jong-Seok and De Simone, Francesca and Ramzan, Naeem and Zhao, Zhijie and Kurutepe, Engin and Sikora, Thomas and Ostermann, Jörn and Izquierdo, Ebroul and Ebrahimi, Touradj},
  editor = {Del Bimbo, Alberto and Chang, Shih-Fu and Smeulders, Arnold W. M.},
  title = {Subjective evaluation of scalable video coding for content distribution},
  booktitle = {Proceedings of the 18th International Conference on Multimedia (MM 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {65--72},
  note = {google scholar entry: 18th International Conference on Multimedia (MM 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://dl.acm.org/citation.cfm?id=1873981},
  doi = {10.1145/1873951.1873981}
}
Moussa MB, Kasap Z, Magnenat-Thalmann N, Chandramouli K, Haji Mirza SN, Zhang Q, Izquierdo E, Biperis I and Daras P (2010), "Towards an expressive virtual tutor: an implementation of a virtual tutor based on an empirical study of non-verbal behaviour", In Proceedings of the 2010 ACM workshop on Surreal Media and Virtual Cloning (SMVC 2010). Firenze, Italy, October, 2010, pp. 39-44. ACM.
Abstract: In this paper we investigate the non-verbal behaviour of a tutor and propose a model for ECAs (Embodied Conversational Agents) acting as virtual tutors. We have conducted an empirical study where we focused on the distribution of gaze, head and eyebrowns behaviour of the tutors in a teaching scenario and on the co-occurrences of these behaviours with certain teaching activities or conversational events. Further, we built an ECA with conversational capabilities, episodic memory, emotions and expressive behaviour based on the result from the empirical study.
BibTeX:
@inproceedings{Moussa2010,
  author = {Moussa, Maher Ben and Kasap, Zerrin and Magnenat-Thalmann, Nadia and Chandramouli, Krishna and Haji Mirza, Seyed Navid and Zhang, Qianni and Izquierdo, Ebroul and Biperis, Iordanis and Daras, Petros},
  title = {Towards an expressive virtual tutor: an implementation of a virtual tutor based on an empirical study of non-verbal behaviour},
  booktitle = {Proceedings of the 2010 ACM workshop on Surreal Media and Virtual Cloning (SMVC 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {39--44},
  note = {google scholar entry: ACM workshop on Surreal Media and Virtual Cloning (SMVC 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://doi.acm.org/10.1145/1878083.1878096},
  doi = {10.1145/1878083.1878096}
}
Moussa MB, Kasap Z, Magnenat-Thalmann N, Chandramouli K, Haji Mirza SN, Zhang Q, Izquierdo E, Biperis I and Daras P (2010), "Towards an expressive virtual tutor: an implementation of a virtual tutor based on an empirical study of non-verbal behaviour", In Proceedings of the 2010 ACM workshop on Surreal Media and Virtual Cloning (SMVC 2010). Firenze, Italy, October, 2010, pp. 39-44. ACM.
Abstract: In this paper we investigate the non-verbal behaviour of a tutor and propose a model for ECAs (Embodied Conversational Agents) acting as virtual tutors. We have conducted an empirical study where we focused on the distribution of gaze, head and eyebrowns behaviour of the tutors in a teaching scenario and on the co-occurrences of these behaviours with certain teaching activities or conversational events. Further, we built an ECA with conversational capabilities, episodic memory, emotions and expressive behaviour based on the result from the empirical study.
BibTeX:
@inproceedings{moussa2010towards,
  author = {Moussa, Maher Ben and Kasap, Zerrin and Magnenat-Thalmann, Nadia and Chandramouli, Krishna and Haji Mirza, Seyed Navid and Zhang, Qianni and Izquierdo, Ebroul and Biperis, Iordanis and Daras, Petros},
  title = {Towards an expressive virtual tutor: an implementation of a virtual tutor based on an empirical study of non-verbal behaviour},
  booktitle = {Proceedings of the 2010 ACM workshop on Surreal Media and Virtual Cloning (SMVC 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {39--44},
  note = {google scholar entry: ACM workshop on Surreal Media and Virtual Cloning (SMVC 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://doi.acm.org/10.1145/1878083.1878096},
  doi = {10.1145/1878083.1878096}
}
Pantoja C and Trujillo M (2010), "An MPEG-7 Shape Browser", In Proceedings of the 2nd Latin-American Conference on Networked and Electronic Media (LACNEM 2013). Cali, Colombia, September, 2010. Kingston University.
BibTeX:
@inproceedings{pantoja2010mpeg,
  author = {Pantoja, Cesar and Trujillo, Maria},
  title = {An MPEG-7 Shape Browser},
  booktitle = {Proceedings of the 2nd Latin-American Conference on Networked and Electronic Media (LACNEM 2013)},
  publisher = {Kingston University},
  year = {2010},
  note = {google scholar entry: 2nd Latin-American Conference on Networked and Electronic Media (LACNEM 2010). Cali, Colombia, 8-10 September 2010.}
}
Passino G, Patras I and Izquierdo E (2010), "Pyramidal Model for Image Semantic Segmentation", In Proceedings of the 20th International Conference on Pattern Recognition (ICPR 2010). Istanbul, Turkey, August, 2010, pp. 1554-1557. IEEE.
Abstract: We present a new hierarchical model applied to the problem of image semantic segmentation, that is, the association to each pixel in an image with a category label (e.g. tree, cow, building, ...). This problem is usually addressed with a combination of an appearance-based pixel classification and a pixel context model. In our proposal, the images are initially over-segmented in dense patches. The proposed pyramidal model naturally embeds the compositional nature of a scene to achieve a multi-scale contextualisation of patches. This is obtained by imposing an order on the patches aggregation operations towards the final scene. The nodes of the pyramid (that is, a dendrogram) thus represent patch clusters, or super-patches. The probabilistic model favours the homogeneous labelling of super-patches that are likely to contain a single object instance, modelling the uncertainty in identifying such super-patches. The proposed model has several advantages, including the computational efficiency, as well as the expandability. Initial results place the model in line with other works in the recent literature.
BibTeX:
@inproceedings{passino2010pyramidal,
  author = {Passino, Giuseppe and Patras, Ioannis and Izquierdo, Ebroul},
  title = {Pyramidal Model for Image Semantic Segmentation},
  booktitle = {Proceedings of the 20th International Conference on Pattern Recognition (ICPR 2010)},
  publisher = {IEEE},
  year = {2010},
  pages = {1554--1557},
  note = {google scholar entry: 20th International Conference on Pattern Recognition (ICPR 2010). Istanbul, Turkey, 23-26 August 2010.},
  url = {http://icpr2010.org/pdfs/icpr2010_TuCT3.4.pdf},
  doi = {10.1109/ICPR.2010.384}
}
Peixoto E, Zgaljic T and Izquierdo E (2010), "Transcoding from H.264/AVC to a Wavelet-based Scalable Video Codec", In Image Processing (ICIP 2010), Proceedings of the 17th International Conference on. Hong Kong, China, September, 2010, pp. 2845-2848. IEEE.
Abstract: Scalable Video Coding (SVC) enables low complexity adaptation according to transmission and display requirements, providing an efficient solution for video content delivery through heterogeneous networks. However, legacy video and most commercially available content capturing devices use conventional non-scalable coding, e.g. H.264/AVC, to compress and store video streams. As a consequence and in order to fully exploit the advantages of SVC technology, efficient transcoding from conventionally coded to scalable content is urgently needed. In this paper an efficient transcoder from H.264/AVC to a wavelet-based SVC is proposed. The complexity of the transcoder is kept very low by using information extracted directly from the decoded H.264/AVC bitstream, such as motion vectors and the presence of residual data. The proposed approach has been tested with well known benchmarking sequences, showing a good performance in terms of decoded video quality and system complexity.
BibTeX:
@inproceedings{Peixoto2010,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Transcoding from H.264/AVC to a Wavelet-based Scalable Video Codec},
  booktitle = {Image Processing (ICIP 2010), Proceedings of the 17th International Conference on},
  publisher = {IEEE},
  year = {2010},
  pages = {2845-2848},
  note = {google scholar entry: 17th International Conference on Image Processing (ICIP 2010). Hong Kong, China, 26-29 September 2010.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICIP2010_Eduardo.pdf},
  doi = {10.1109/ICIP.2010.5652048}
}
Peixoto E, Zgaljic T and Izquierdo E (2010), "H.264/AVC to Wavelet-based Scalable Video Transcoding Supporting Multiple Coding Configurations", In Proceedings of the 28th Picture Coding Symposium (PCS 2010). Nagoya, Japan, December, 2010, pp. 562-565. IEEE.
Abstract: Scalable Video Coding (SVC) enables low complexity adaptation of the compressed video, providing an efficient solution for video content delivery through heterogeneous networks and to different displays. However, legacy video and most commercially available content capturing devices use conventional non-scalable coding, e.g. H.264/AVC. This paper proposes an efficient transcoder from H.264/AVC to a wavelet-based SVC to exploit the advantages offerend by the SVC technology. The proposed transcoder is able to cope with different coding configurations in H.264/AVC, such as IPP or IBBP with multiple reference frames. To reduce the transcoder's complexity, motion information and presence of the residual data extracted from the decoded H.264/AVC video are exploited. Experimental results show a good performance of the proposed transcoder in terms of decoded video quality and system complexity.
BibTeX:
@inproceedings{Peixoto2010a,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {H.264/AVC to Wavelet-based Scalable Video Transcoding Supporting Multiple Coding Configurations},
  booktitle = {Proceedings of the 28th Picture Coding Symposium (PCS 2010)},
  publisher = {IEEE},
  year = {2010},
  pages = {562--565},
  note = {google scholar entry: 28th Picture Coding Symposium (PCS 2010). Nagoya, Japan, 8-10 December 2010.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICIP2010_Eduardo.pdf},
  doi = {10.1109/PCS.2010.5702564}
}
Peixoto E, Zgaljic T and Izquierdo E (2010), "H.264/AVC to Wavelet-based Scalable Video Transcoding Supporting Multiple Coding Configurations", In Proceedings of the 28th Picture Coding Symposium (PCS 2010). Nagoya, Japan, December, 2010, pp. 562-565. IEEE.
Abstract: Scalable Video Coding (SVC) enables low complexity adaptation of the compressed video, providing an efficient solution for video content delivery through heterogeneous networks and to different displays. However, legacy video and most commercially available content capturing devices use conventional non-scalable coding, e.g. H.264/AVC. This paper proposes an efficient transcoder from H.264/AVC to a wavelet-based SVC to exploit the advantages offerend by the SVC technology. The proposed transcoder is able to cope with different coding configurations in H.264/AVC, such as IPP or IBBP with multiple reference frames. To reduce the transcoder's complexity, motion information and presence of the residual data extracted from the decoded H.264/AVC video are exploited. Experimental results show a good performance of the proposed transcoder in terms of decoded video quality and system complexity.
BibTeX:
@inproceedings{peixoto2010h,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {H.264/AVC to Wavelet-based Scalable Video Transcoding Supporting Multiple Coding Configurations},
  booktitle = {Proceedings of the 28th Picture Coding Symposium (PCS 2010)},
  publisher = {IEEE},
  year = {2010},
  pages = {562--565},
  note = {google scholar entry: 28th Picture Coding Symposium (PCS 2010). Nagoya, Japan, 8-10 December 2010.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICIP2010_Eduardo.pdf},
  doi = {10.1109/PCS.2010.5702564}
}
Peixoto E, Zgaljic T and Izquierdo E (2010), "Transcoding from H.264/AVC to a Wavelet-based Scalable Video Codec", In Image Processing (ICIP 2010), Proceedings of the 17th International Conference on. Hong Kong, China, September, 2010, pp. 2845-2848. IEEE.
Abstract: Scalable Video Coding (SVC) enables low complexity adaptation according to transmission and display requirements, providing an efficient solution for video content delivery through heterogeneous networks. However, legacy video and most commercially available content capturing devices use conventional non-scalable coding, e.g. H.264/AVC, to compress and store video streams. As a consequence and in order to fully exploit the advantages of SVC technology, efficient transcoding from conventionally coded to scalable content is urgently needed. In this paper an efficient transcoder from H.264/AVC to a wavelet-based SVC is proposed. The complexity of the transcoder is kept very low by using information extracted directly from the decoded H.264/AVC bitstream, such as motion vectors and the presence of residual data. The proposed approach has been tested with well known benchmarking sequences, showing a good performance in terms of decoded video quality and system complexity.
BibTeX:
@inproceedings{peixoto2010transcoding,
  author = {Peixoto, Eduardo and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Transcoding from H.264/AVC to a Wavelet-based Scalable Video Codec},
  booktitle = {Image Processing (ICIP 2010), Proceedings of the 17th International Conference on},
  publisher = {IEEE},
  year = {2010},
  pages = {2845-2848},
  note = {google scholar entry: 17th International Conference on Image Processing (ICIP 2010). Hong Kong, China, 26-29 September 2010.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICIP2010_Eduardo.pdf},
  doi = {10.1109/ICIP.2010.5652048}
}
Ramzan N, Patrikakis C, Zhang Q and Izquierdo E (2010), "Analysing multimedia content in social networking environments", In Proceedings of the 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010). Firenze, Italy, October, 2010, pp. 73-76. ACM.
Abstract: Social and Peer-to-Peer (P2P) networks have received considerable interest in recent decades due to its focus on analysis and relationships among entities and on the patterns and implications of these relationships. In the meantime, with the rapid increase in production and distribution of multimedia content, effectively integrating context and content for multimedia mining, management, indexing and retrieval on the Internet has become an evident and difficult problem. As this problem in multimedia content analysis becomes widely recognised, the search for solutions to these problems becomes an increasingly active area for research and development. The interest in this area is verified by a significantly increasing number of publications each year. In this paper, we give an overview of the key theoretical and empirical advances in the current decade related to multimedia content analysis. We also discuss the significant challenges involved in the adaptation of existing multimedia content analysis techniques for interactive content sharing in social and P2P networks.
BibTeX:
@inproceedings{Ramzan2010,
  author = {Ramzan, Naeem and Patrikakis, Charalampos and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Analysing multimedia content in social networking environments},
  booktitle = {Proceedings of the 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {73--76},
  note = {google scholar entry: 2010 ACM workshop on Social, Adaptive and Personalized Multimedia Interaction and Access (SAPMIA 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://www.researchgate.net/publication/237061951_Analysing_multimedia_content_in_social_networking_environments},
  doi = {10.1145/1878061.1878082}
}
Ramzan N, Patrikakis C, Zhang Q and Izquierdo E (2010), "Analysing multimedia content in social networking environments", In Proceedings of the 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access (SAPMIA 2010). Firenze, Italy, October, 2010, pp. 73-76. ACM.
Abstract: Social and Peer-to-Peer (P2P) networks have received considerable interest in recent decades due to its focus on analysis and relationships among entities and on the patterns and implications of these relationships. In the meantime, with the rapid increase in production and distribution of multimedia content, effectively integrating context and content for multimedia mining, management, indexing and retrieval on the Internet has become an evident and difficult problem. As this problem in multimedia content analysis becomes widely recognised, the search for solutions to these problems becomes an increasingly active area for research and development. The interest in this area is verified by a significantly increasing number of publications each year. In this paper, we give an overview of the key theoretical and empirical advances in the current decade related to multimedia content analysis. We also discuss the significant challenges involved in the adaptation of existing multimedia content analysis techniques for interactive content sharing in social and P2P networks.
BibTeX:
@inproceedings{ramzan2010analysing,
  author = {Ramzan, Naeem and Patrikakis, Charalampos and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Analysing multimedia content in social networking environments},
  booktitle = {Proceedings of the 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access (SAPMIA 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {73--76},
  note = {google scholar entry: 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and access (SAPMIA 2010). Firenze, Italy, 25-29 October 2010.},
  url = {http://www.researchgate.net/publication/237061951_Analysing_multimedia_content_in_social_networking_environments},
  doi = {10.1145/1878061.1878082}
}
Seneviratne L and Izquierdo E (2010), "An Interactive Game for Semi-Automatic Image Annotation", In Acoustics Speech and Signal Processing (ICASSP 2010), Proceedings of the 35th IEEE International Conference on. Dallas, TX, March, 2010, pp. 1254-1257. IEEE.
Abstract: The coexistence of multiple cameras, especially in wireless video surveillance systems imposes severe constraints in terms of computational resources, power supply and bandwidth. These limitations hamper coding and transmission of high-quality video streams. In this paper, we propose a new approach for video coding and transmission of surveillance video. It integrates the coding sub-system and the motion detection module to enhance the quality of moving objects in low-bit rate streams. At the sender side, videos are downsampled and compressed. At the decoder, the video is upsampled and the foreground quality is enhanced by detecting meaningful edges of moving objects via the Hough transform. The quality of the background is also enhanced through progressive update of the background model.
BibTeX:
@inproceedings{seneviratne2010interactive,
  author = {Seneviratne, Lasantha and Izquierdo, Ebroul},
  title = {An Interactive Game for Semi-Automatic Image Annotation},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2010), Proceedings of the 35th IEEE International Conference on},
  publisher = {IEEE},
  year = {2010},
  pages = {1254-1257},
  note = {missing on ieee, no assigned doi},
  url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5496293},
  doi = {10.1109/ICASSP.2010.5495465}
}
Seneviratne L and Izquierdo E (2010), "An Interactive Framework for Image Annotation through Gaming", In Proceedings of the 11th international conference on Multimedia Information Retrieval (MIR 2010). Philadelphia, Pennsylvania, March, 2010, pp. 517-526. ACM.
Abstract: Image indexing is one of the most difficult challenges facing the computer vision community. Addressing this issue, we designed an innovative approach to obtain an accurate label for images by taking into account the social aspects of human-based computation. The proposed approach is highly discriminative in comparison to an ordinary content-based image retrieval (CBIR) paradigm. It aims at what millions of individual gamers are enthusiastic to do, to enjoy themselves within a social competitive environment. It is achieved by setting the focus of the system on the social aspects of the gaming environment, which involves a widely distributed network of human players. Furthermore, this framework integrates a number of different algorithms that are commonly found in image processing and game theoretic approaches to obtain an accurate label. As a result, the framework is able to assign (or derive) accurate tags for images by eliminating annotations made by a less-rational (cheater) player. The performance analysis of this framework has been evaluated with a group of 10 game players. The result shows that the proposed approach is capable of obtaining a good annotation through a small number of game players.
BibTeX:
@inproceedings{Seneviratne2010interactive,
  author = {Seneviratne, Lasantha and Izquierdo, Ebroul},
  editor = {Wang, James Ze and Boujemaa, Nozha and Ramirez, Nuria Oliver and Natsev, Apostol},
  title = {An Interactive Framework for Image Annotation through Gaming},
  booktitle = {Proceedings of the 11th international conference on Multimedia Information Retrieval (MIR 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {517--526},
  note = {google scholar entry: 11th international conference on Multimedia Information Retrieval (MIR 2010). Philadelphia, Pennsylvania, 29-31 March 2010.},
  url = {http://dl.acm.org/citation.cfm?doid=1743384.1743472},
  doi = {10.1145/1743384.1743473}
}
Vaiapury K, Aksay A and Izquierdo E (2010), "GrabcutD: improved grabcut using depth information", In Proceedings of the 2010 ACM workshop on Surreal media and virtual cloning. Firenze, Italy , pp. 57-62. ACM.
Abstract: Popular state of the art segmentation methods such as Grab cut include a matting technique to calculate the alpha values for boundaries of segmented regions. Conventional Grabcut relies only on color information to achieve segmentation. Recently, there have been attempts to improve Grabcut using motion in video sequences. However, in stereo or multi-view analysis, there is additional information that could be also used to improve segmentation. Clearly, depth based approaches bear the potential discriminative power of ascertaining whether the object is nearer of farer. In this work, we propose and evaluate a Grabcut segmentation technique based on combination of color and depth information. We show the usefulness of the approach when stereo information is available and evaluate it using standard datasets against state of the art results.
BibTeX:
@inproceedings{Vaiapury2010grabcutd,
  author = {Vaiapury, Karthikeyan and Aksay, Anil and Izquierdo, Ebroul},
  title = {GrabcutD: improved grabcut using depth information},
  booktitle = {Proceedings of the 2010 ACM workshop on Surreal media and virtual cloning},
  publisher = {ACM},
  year = {2010},
  pages = {57--62},
  url = {http://dl.acm.org/citation.cfm?id=1878083.1878099},
  doi = {10.1145/1878083.1878099}
}
Vaiapury K and Izquierdo E (2010), "An O-FDP Framework in 3D Model Based Reconstruction", In Proceedings of the 12th International Asia-Pacific Web Conference (APWEB 2010). Busan, Korea, April, 2010, pp. 424-429. IEEE.
Abstract: Three dimensional scene synthesis and analysis has drawn considerable attention in manufacturing industries due to its multifaceted applications ranging from real time recognition, verification, vehicle guidance etc. Nowadays, there is a growing surge in FDP (Focused disparity map) of given objects of interest of multiview stereo images. In order to design customized specific application such as industrial part verification, we propose to use O-FDP (Optimal Focused Disparity Map) that spins off from the fact that only basic geometric information is available from DMU (Digital Mock Up) models such as CATIA in industrial installations. Instead of using the whole image information, we use only the experiential information that is really necessary for application. The proposed framework unifies LIFE(Local Invariant Feature Extraction) techniques, edge information, epipolar geometry and object silhouette information. The framework results are presented and compared with state of the art work.
BibTeX:
@inproceedings{vaiapury2010anreconstruction,
  author = {Vaiapury, Karthikeyan and Izquierdo, Ebroul},
  title = {An O-FDP Framework in 3D Model Based Reconstruction},
  booktitle = {Proceedings of the 12th International Asia-Pacific Web Conference (APWEB 2010)},
  publisher = {IEEE},
  year = {2010},
  pages = {424--429},
  note = {google scholar entry: 12th International Asia-Pacific Web Conference (APWEB 2010). Busan, Korea, 6-8 April 2010.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5474094},
  doi = {10.1109/APWeb.2010.79}
}
Zhang Q and Izquierdo E (2010), "Demonstration for the 3DLife Framework" Ghent, Belgium Vol. 6481, pp. 205-206. Springer.
Abstract: This paper describes a demonstration for the 3DLife framework, which embraces technologies developed in EU FP7 NoE project 3DLife - Bringing the Media Internet to Life. One of the key objectives in 3DLife is to build an open and expandable framework for collaborative research on Media interactive communication over the Internet. This framework will be based on a distributed repository of software tools. Currently the 3DLife framework consists of four main modules: simulation of athlete body and motion, a virtual mirror and dressing room, autonomous virtual human, and sports activity analysis in camera networks. This demonstration will be organised in four parts according to the main modules in 3DLife framework.
BibTeX:
@inproceedings{zhang2010demonstration,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Nitto, Elisabetta and Yahyapour, Ramin},
  title = {Demonstration for the 3DLife Framework},
  publisher = {Springer},
  year = {2010},
  volume = {6481},
  pages = {205--206},
  note = {google scholar entry: Towards a Service-Based Internet: Third European Conference, ServiceWave 2010. Ghent, Belgium, 13-15 December 2010.},
  url = {http://books.google.co.uk/books?id=ynQG-TnKq7YC},
  doi = {10.1007/978-3-642-17694-4_25}
}
Zhang Q and Izquierdo E (2010), "From mid-level to high-level: Semantic inference for multimedia retrieval", In Semantic Media Adaptation and Personalization (SMAP 2010), Proceedings of the 5th International Workshop on. Limassol, Cyprus, December, 2010, pp. 70-75. IEEE.
Abstract: The problem of bridging the semantic gap can be approached by dividing all types of metadata extracted from multimedia content into three levels - low, mid and high - according to their levels of semantic abstraction and try to define the mapping between them. This paper proposes a scheme for extracting high-level semantic information out of mid-level features, which can be applied in dealing with highly semantic queries in image retrieval. Mid-level features used in this research contain some level of semantic meaning but are not directly useful in real retrieval scenarios. However, they usually have strong relationships to high-level queries but these relationships are often ignored due to their implicitness. The aim of the proposed approach is to explore hidden interrelationships between mid-level features and the high-level query terms, by learning a Bayesian network model from a small amount of training data. Semantic inference and reasoning is then carried out based on the learned Bayesian network model, in order to decide whether a video is relevant to a high-level query. The extracted high-level semantic terms can be annotated on the video content for future retrieval. Two experimental scenarios were considered in this paper and the experiments on RUSHES videos have produced satisfactory results.
BibTeX:
@inproceedings{zhang2010mid,
  author = {Qianni Zhang and Izquierdo, Ebroul},
  editor = {Nicolas Tsapatsoulis and Theodosiou, Zenonas and Georgiou, Olga},
  title = {From mid-level to high-level: Semantic inference for multimedia retrieval},
  booktitle = {Semantic Media Adaptation and Personalization (SMAP 2010), Proceedings of the 5th International Workshop on},
  publisher = {IEEE},
  year = {2010},
  pages = {70--75},
  note = {google scholar entry: 5th International Workshop on Semantic Media Adaptation and Personalization (SMAP 2010). Limassol, Cyprus, 9-10 December 2010.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5706853},
  doi = {10.1109/SMAP.2010.5706853}
}

Theses and Monographs

Wall J (2010), "Post-Cochlear Auditory Modelling for Sound Localisation using Bio-Inspired Techniques". Thesis at: University of Ulster. April, 2010, pp. 1-197.
Abstract: This thesis presents spiking neural architectures which simulate the sound localisation capability of the mammalian auditory pathways. This localisation ability is achieved by exploiting important differences in the sound stimulus received by each ear, known as binaural cues. Interaural time difference and interaural intensity difference are the two binaural cues which play the most significant role in mammalian sound localisation. These cues are processed by different regions within the auditory pathways and enable the localisation of sounds at different frequency ranges; interaural time difference is used to localise low frequency sounds whereas interaural intensity difference localises high frequency sounds. Interaural time difference refers to the different points in time at which a sound from a single location arrives at each ear and interaural intensity difference refers to the difference in sound pressure levels of the sound at each ear, measured in decibels. Taking inspiration from the mammalian brain, two spiking neural network topologies were designed to extract each of these cues. The architecture of the spiking neural network designed to process the interaural time difference cue was inspired by the medial superior olive. The lateral superior olive was the inspiration for the architecture designed to process the interaural intensity difference cue. The development of these spiking neural network architectures required the integration of other biological models, such as an auditory periphery (cochlea) model, models of bushy cells and the medial nucleus of the trapezoid body, leaky integrate and are spiking neurons, facilitating synapses, receptive fields and the appropriate use of excitatory and inhibitory neurons. Two biologically inspired learning algorithms were used to train the architectures to perform sound localisation. Experimentally derived HRTF acoustical data from adult domestic cats was employed to validate the localisation ability of the two architectures. The localisation abilities of the two models are comparable to other computational techniques employed in the literature. The experimental results demonstrate that the two SNN models behave in a similar way to the mammalian auditory system, i.e. the spiking neural network for interaural time difference extraction performs best when it is localising low frequency data, and the interaural intensity difference spiking neuron model performs best when it is localising high frequency data. Thus, the combined models form a duplex system of sound localisation. Additionally, both spiking neural network architectures show a high degree of robustness when the HRTF acoustical data is corrupted by noise.
BibTeX:
@phdthesis{wall2010post,
  author = {Wall, Julie},
  title = {Post-Cochlear Auditory Modelling for Sound Localisation using Bio-Inspired Techniques},
  school = {University of Ulster},
  year = {2010},
  pages = {1--197},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/JulieWall_PhDthesis.pdf}
}


2009

Journal Papers

Abhayaratne C, Izquierdo E, Mrak M and Tubaro S (2009), "Special issue on scalable coded media beyond compression", Signal Processing: Image Communication. July, 2009. Vol. 24(6), pp. 415-416. Elsevier.
BibTeX:
@article{Abhayaratne2009special,
  author = {Abhayaratne, Charith and Izquierdo, Ebroul and Mrak, Marta and Tubaro, Stefano},
  title = {Special issue on scalable coded media beyond compression},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2009},
  volume = {24},
  number = {6},
  pages = {415--416},
  note = {Scalable Coded Media beyond Compression},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596509000241},
  doi = {10.1016/j.image.2009.02.006}
}
Macchiavello B, Brandi F, Peixoto E, de Queiroz RL and Mukherjee D (2009), "Side-Information Generation for Temporally and Spatially Scalable Wyner-Ziv Codecs", "EURASIP" Journal on Image and Video Processing. January, 2009. Vol. 2009(2), pp. 1-6. Springer.
Abstract: The distributed video coding paradigmenables video codecs to operate with reversed complexity, in which the complexity is shifted from the encoder toward the decoder. Its performance is heavily dependent on the quality of the side information generated by motio estimation at the decoder. We compare the rate-distortion performance of different side-information estimators, for both temporally and spatially scalableWyner-Ziv codecs. For the temporally scalable codec we compared an established method with a new algorithm that uses a linear-motion model to produce side-information. As a continuation of previous works, in this paper, we propose to use a super-resolution method to upsample the nonkey frame, for the spatial scalable codec, using the key frames as reference.We verify the performance of the spatial scalableWZcoding using the state-of-the-art video coding standard H.264/AVC.
BibTeX:
@article{Macchiavello2009side,
  author = { Macchiavello, Bruno and
Brandi, Fernanda and
Peixoto, Eduardo and
de Queiroz, Ricardo L. and
Mukherjee,Debargha }, title = {Side-Information Generation for Temporally and Spatially Scalable Wyner-Ziv Codecs}, journal = {"EURASIP" Journal on Image and Video Processing}, publisher = {Springer}, year = {2009}, volume = {2009}, number = {2}, pages = {1--6}, url = {http://jivp.eurasipjournals.com/content/2009/1/171257}, doi = {10.1155/2009/171257} }
Ramzan N, Zgaljic T and Izquierdo E (2009), "An Efficient Optimisation Scheme for Scalable Surveillance Centric Video Communications", Signal Processing: Image Communication. July, 2009. Vol. 24(6), pp. 510-523. Elsevier.
Abstract: State-of-the-art coders have been optimised over years according to the needs of the broadcasting industry. There are however key applications of coding technology whose challenges and requirements substantially differ from broadcasting. One of these key applications is surveillance. In this paper an efficient approach for surveillance centric joint source and channel coding is proposed. Contrasting conventional coders, the proposed system has been developed according to the requirements of surveillance application scenarios. It aims at achieving bit-rate optimisation and adaptation of surveillance videos for storing and transmission purposes. In the proposed approach the encoder communicates with a video content analysis (VCA) module that detects events of interests in video captured by CCTV. Bit-rate optimisation and adaptation is achieved by exploiting the scalability properties of the employed codec. Temporal segments containing events relevant to surveillance application are encoded using high spatio-temporal resolution and quality while the portions irrelevant from the surveillance standpoint are encoded at low spatio-temporal resolution and/or quality. Furthermore, the approach jointly optimises the bit allocation between the wavelet-based scalable video coder and forward error correction codes. The forward error correction code is based on the product code constituting of LDPC codes and turbo codes. Turbo codes show good performance at high error rates region but LDPC outperforms turbo codes at low error rates. Therefore, the concatenation of LDPC and TC enhances the performance at both low and high signal-to-noise (SNR) ratios. The proposed approach minimises the distortion of reconstructed video, subject to constraint on the overall transmission bit-rate budget. Experimental results clearly demonstrate the efficiency and suitability of the proposed approach in surveillance applications.
BibTeX:
@article{ramzan2009efficient,
  author = {Ramzan, Naeem and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {An Efficient Optimisation Scheme for Scalable Surveillance Centric Video Communications},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2009},
  volume = {24},
  number = {6},
  pages = {510--523},
  note = {Scalable Coded Media beyond Compression},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596509000277},
  doi = {10.1016/j.image.2009.02.008}
}
Vaiapury K, Nagarajan M and Jain S (2009), "Ambience-Based Voice Over Internet Protocol Quality Testing Model", IETE Journal of Research. Vol. 55(5), pp. 212-217.
Abstract: In this paper, we explore a new voice quality management model under ambience environment suitable for voice over internet protocol (VoIP) calls in wireless WLAN 802.11 Linux environment. The system is based on a setup that assimilates environment noise level using noise detector and adaptive audio manager used to tune the audio level and thereby quality is ensured. Further, existing models such as E-model, PESQ model and adaptive model address QoS in a VoIP networking perspective but do not address voice quality management in real time environment to �improve voice quality. We propose to use background noise and its associated source information in an adaptive �environment to boost user perception, audio level, thereby ensuring quality. This issue is important because in real time, consumers might be interested in mobiles that address ambience effect to determine human audible effect based on environment conditions. The experimental results of the proposed method outperform the existing method in terms of QoS factor metric; we get extensive results with more number of calls.
BibTeX:
@article{vaiapury2009ambience,
  author = {Vaiapury, Karthikeyan. and Nagarajan, Malmurugan. and Jain, Sunil.},
  title = {Ambience-Based Voice Over Internet Protocol Quality Testing Model},
  journal = {IETE Journal of Research},
  year = {2009},
  volume = {55},
  number = {5},
  pages = {212-217},
  url = {http://www.jr.ietejournals.org/article.asp?issn=0377-2063;year=2009;volume=55;issue=5;spage=212;epage=217;aulast=Vaiapury;t=6},
  doi = {10.4103/0377-2063.57598}
}
Wan S, Yang F and Izquierdo E (2009), "Lagrange multiplier selection in wavelet-based scalable video coding for quality scalability", Signal Processing: Image Communication. October, 2009. Vol. 24(9), pp. 730-739. Elsevier.
Abstract: In this paper, a method for Lagrange multiplier selection is proposed in the context of rate-distortion optimisation for wavelet-based scalable video coding targeting quality scalability. Despite the prevalence of the conventional method for Lagrange multiplier selection in hybrid video coding, the underlying formulation is not applicable to wavelet-based scalable video coding. To address the inherent challenges, a thorough analysis of the rate-distortion models for transform video coding is provided with regard to low and middle-to-high bit-rates, respectively. Based on the analysis, the models are consolidated according to experimental observations and the consolidated rate-distortion models serve as the basis for the derivation of the Lagrange multiplier. Considering the influence of the open-loop prediction structure on the rate-distortion performance, the Lagrange multiplier is initially derived for a single-targeted bit-rate. Moreover, the method for Lagrange multiplier selection in scalable video coding aiming at multiple-targeted bit-rates is proposed in a general sense of bit-rate range, varying from low to high bit-rates, building on the initially derived Lagrange multiplier for a single-targeted bit-rate. The proposed Lagrange multiplier is content adaptive and well suited for wavelet-based scalable video coding where quantisation steps are unavailable. Detailed performance evaluation of the proposed method for wavelet-based scalable video coding is provided with regard to a given targeted bit-rate and multiple-targeted bit-rates, respectively. The experimental results have demonstrated the effectiveness of the proposed Lagrange multiplier for rate-distortion optimisation considering quality scalability in wavelet-based scalable video coding.
BibTeX:
@article{wan2009lagrange,
  author = {Wan, Shuai and Yang, Fuzheng and Izquierdo, Ebroul},
  title = {Lagrange multiplier selection in wavelet-based scalable video coding for quality scalability},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2009},
  volume = {24},
  number = {9},
  pages = {730--739},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596509000770},
  doi = {10.1016/j.image.2009.05.001}
}
Ye X, Lin X, Dehmeshki J, Slabaugh G and Beddoe G (2009), "Shape-Based Computer-Aided Detection of Lung Nodules in Thoracic CT Images", Biomedical Engineering, IEEE Transactions on. july , 2009. Vol. 56(7), pp. 1810-1820. IEEE.
Abstract: In this paper, a new computer tomography (CT) lung nodule computer-aided detection (CAD) method is proposed for detecting both solid nodules and ground-glass opacity (GGO) nodules (part solid and nonsolid). This method consists of several steps. First, the lung region is segmented from the CT data using a fuzzy thresholding method. Then, the volumetric shape index map, which is based on local Gaussian and mean curvatures, and the ldquodotrdquo map, which is based on the eigenvalues of a Hessian matrix, are calculated for each voxel within the lungs to enhance objects of a specific shape with high spherical elements (such as nodule objects). The combination of the shape index (local shape information) and ldquodotrdquo features (local intensity dispersion information) provides a good structure descriptor for the initial nodule candidates generation. Antigeometric diffusion, which diffuses across the image edges, is used as a preprocessing step. The smoothness of image edges enables the accurate calculation of voxel-based geometric features. Adaptive thresholding and modified expectation-maximization methods are employed to segment potential nodule objects. Rule-based filtering is first used to remove easily dismissible nonnodule objects. This is followed by a weighted support vector machine (SVM) classification to further reduce the number of false positive (FP) objects. The proposed method has been trained and validated on a clinical dataset of 108 thoracic CT scans using a wide range of tube dose levels that contain 220 nodules (185 solid nodules and 35 GGO nodules) determined by a ground truth reading process. The data were randomly split into training and testing datasets. The experimental results using the independent dataset indicate an average detection rate of 90.2%, with approximately 8.2 FP/scan. Some challenging nodules such as nonspherical nodules and low-contrast part-solid and nonsolid nodules were identified, while most tissues such as blood vessels we- - re excluded. The method's high detection rate, fast computation, and applicability to different imaging conditions and nodule types shows much promise for clinical applications.
BibTeX:
@article{ye2009shape,
  author = {Xujiong Ye and Xinyu Lin and Dehmeshki, Jamshid and Slabaugh, Greg and Beddoe, Gareth},
  title = {Shape-Based Computer-Aided Detection of Lung Nodules in Thoracic CT Images},
  journal = {Biomedical Engineering, IEEE Transactions on},
  publisher = {IEEE},
  year = {2009},
  volume = {56},
  number = {7},
  pages = {1810--1820},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5073252},
  doi = {10.1109/TBME.2009.2017027}
}

Conference Papers

Akram M, Ramzan N and Izquierdo E (2009), "Efficient Motion Estimation for Video Coding in Wireless Surveillance Applications", In Ultra Modern Telecommunications (ICUMT 2009). Proceedings of the International Conference on. St. Petersburg, Russia, October, 2009, pp. 1-4. IEEE.
Abstract: In this paper, we propose a novel approach to perform selective motion estimation specific to video coding for wireless surveillance applications. A real-time background subtractor is used to detect the presence of any motion activity in the sequence. Two approaches for selective motion estimation, group of pictures (GOP) based and frame based, are implemented. In the former, motion estimation is performed for the whole group of pictures only when moving object is detected in any frame of the GOP. While for the latter approach; each frame is tested for the motion activity and consequently for selective motion estimation. Experimental evaluation shows that significant reduction in computational complexity can be achieved by applying the proposed strategy. The achieved efficiency gains make the proposed approach suitable for low bitrate transmission in wireless applications.
BibTeX:
@inproceedings{akram2009efficient,
  author = {Akram, Muhammad and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Efficient Motion Estimation for Video Coding in Wireless Surveillance Applications},
  booktitle = {Ultra Modern Telecommunications (ICUMT 2009). Proceedings of the International Conference on},
  publisher = {IEEE},
  year = {2009},
  pages = {1--4},
  note = {google scholar entry: International Conference on Ultra Modern Telecommunications (ICUMT 2009). St. Petersburg, Russia, 12-14 October 2009.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5345406},
  doi = {10.1109/ICUMT.2009.5345406}
}
Athanasiadis T, Simou N, Papadopoulos G, Benmokhtar R, Chandramouli K, Tzouvaras V, Mezaris V, Phiniketos M, Avrithis Y, Kompatsiaris Y, Huet B and Izquierdo E (2009), "Integrating Image Segmentation and Classification for Fuzzy Knowledge-Based Multimedia Indexing", In Advances in Multimedia Modeling. Proceedings of the 15th International Multimedia Modeling Conference (MMM 2009). Sophia-Antipolis, France, January, 2009. Vol. 5371, pp. 263-274. Springer.
Abstract: In this paper we propose a methodology for semantic indexing of images, based on techniques of image segmentation, classification and fuzzy reasoning. The proposed knowledge-assisted analysis architecture integrates algorithms applied on three overlapping levels of semantic information: i) no semantics, i.e. segmentation based on low-level features such as color and shape, ii) mid-level semantics, such as concurrent image segmentation and object detection, region-based classification and, iii) rich semantics, i.e. fuzzy reasoning for extraction of implicit knowledge. In that way, we extract semantic description of raw multimedia content and use it for indexing and retrieval purposes, backed up by a fuzzy knowledge repository. We conducted several experiments to evaluate each technique, as well as the whole methodology in overall and, results show the potential of our approach.
BibTeX:
@inproceedings{Athanasiadis2009integrating,
  author = {Athanasiadis, Thanos and Simou, Nikolaos and Papadopoulos, Georgios and Benmokhtar, Rachid and Chandramouli, Krishna and Tzouvaras, Vassilis and Mezaris, Vasileios and Phiniketos, Marios and Avrithis, Yannis and Kompatsiaris, Yiannis and Huet, Benoit and Izquierdo, Ebroul},
  editor = {Huet, Benoit and Smeaton, Alan and Mayer-Patel, Ketan and Avrithis, Yannis},
  title = {Integrating Image Segmentation and Classification for Fuzzy Knowledge-Based Multimedia Indexing},
  booktitle = {Advances in Multimedia Modeling. Proceedings of the 15th International Multimedia Modeling Conference (MMM 2009).},
  publisher = {Springer},
  year = {2009},
  volume = {5371},
  pages = {263--274},
  note = {google scholar entry: 15th International Multimedia Modeling Conference (MMM 2009). Sophia-Antipolis, France, 7-9 January 2009.},
  url = {http://www.image.ece.ntua.gr/papers/566.pdf},
  doi = {10.1007/978-3-540-92892-8_29}
}
Chandramouli K and Izquierdo E (2009), "Multi-class Relevance Feedback for Collaborative Image Retrieval", In Image Analysis for Multimedia Interactive Services (WIAMIS 2009), Proceedings of the 10th International Workshop on. London, England, May, 2009, pp. 214-217. IEEE.
Abstract: In recent years, there is an emerging interest to analyse and exploit the log data recorded from different user interactions for minimising the semantic gap problem from multi-user collaborative environments. These systems are referred as ``collaborative image retrieval systems''. In this paper, we present an approach for collaborative image retrieval using multiclass relevance feedback. The relationship between users and concepts is derived using Lin Semantic similarity measure from WordNet. Subsequently, the particle swarm optimisation classifier based relevance feedback is used to retrieve similar documents. The experimental results are presented on two well-known datasets namely Corel 700 and Flickr Image dataset. Similarly, the performance of the Particle Swarm Optimised retrieval engine is evaluated against the Genetic Algorithm optimised retrieval engine.
BibTeX:
@inproceedings{chandramouli2009multi,
  author = {Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Multi-class Relevance Feedback for Collaborative Image Retrieval},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2009), Proceedings of the 10th International Workshop on},
  publisher = {IEEE},
  year = {2009},
  pages = {214--217},
  note = {google scholar entry: 10th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2009). London, England, 6-8 May 2009.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5031471},
  doi = {10.1109/WIAMIS.2009.5031471}
}
Chandramouli K and Izquierdo E (2009), "Visual Highlight Detection using Particle Swarm Optimisation", In Proceedings of Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia, August, 2009, pp. 1-5. Kingston University.
Abstract: Video summaries provide succinct representation of video content. Such summaries are created by dynamic skimming of video content, where the time-evolving nature of a video is preserved by linearly browsing the portions of the video content. In this paper, we present a novel technique for visual highlight detection using particle swarm optimisation. We show that such highlight detection can be used in video summarisation and event detection in surveillance. The experimental results are presented for news video and surveillance video content.
BibTeX:
@inproceedings{chandramouli2009visual,
  author = {Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Visual Highlight Detection using Particle Swarm Optimisation},
  booktitle = {Proceedings of Latin-American Conference on Networked and Electronic Media (LACNEM 2009)},
  publisher = {Kingston University},
  year = {2009},
  pages = {1--5},
  note = {google scholar entry: Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia, 4-6 August 2009.},
  url = {http://dilnxsrv.king.ac.uk/lacnem2012/PastProceedings/lacnem2009/papers/lacnem09_06.pdf}
}
Chandramouli K, Kliegr T, Svatek V and Izquierdo E (2009), "Towards Semantic Tagging in Collaborative Environments", In Digital Signal Processing (DSP 2009), Proceedings of the 16th International Conference on. Santorini, Greece, July, 2009, pp. 248-253.
Abstract: Tags pose an efficient and effective way of organization of resources, but they are not always available. A technique called SCM/THD investigated in this paper extracts entities from free-text annotations, and using the Lin similarity measure over the WordNet thesaurus classifies them into a controlled vocabulary of tags. Hypernyms extracted from Wikipedia are used to map uncommon entities to Wordnet synsets. In collaborative environments, users can assign multiple annotations to the same object hence increasing the amount of information available. Assuming that the semantics of the annotations overlap, this redundancy can be exploited to generate higher quality tags. A preliminary experiment presented in the paper evaluates the consistency and quality of tags generated from multiple annotations of the same image. The results obtained on an experimental dataset comprising of 62 annotations from four annotators show that the accuracy of a simple majority vote surpasses the average accuracy obtained through assessing the annotations individually by 18%. A moderate-strength correlation has been found between the quality of generated tags and the consistency of annotations.
BibTeX:
@inproceedings{chandramouli2009towards,
  author = {Chandramouli, Krishna and Kliegr, Tomas and Svatek, Vojtech and Izquierdo, Ebroul},
  title = {Towards Semantic Tagging in Collaborative Environments},
  booktitle = {Digital Signal Processing (DSP 2009), Proceedings of the 16th International Conference on},
  year = {2009},
  pages = {248--253},
  note = {google scholar entry: 16th International Conference on Digital Signal Processing (DSP 2009). Santorini, Greece, 5-7 July 2009.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5201138},
  doi = {10.1109/ICDSP.2009.5201138}
}
Dong L and Izquierdo E (2009), "Parallel Scene Perception on Various Blurry Images", In Proceedings of the First International Conference on Internet Multimedia Computing and Service (ICIMCS 2009). Kunming, China, November, 2009, pp. 60-64. ACM.
Abstract: A parallel framework for scene perception on various degrees of blurry images is described and validated. The global-to-local oriented approach is used for scene perception on various blurry visual information. Essential capture in global pathway and highlight detection in local pathway are integrated for various blurry visual information applications. The system can differentiate scenes from various semantic meanings using a spatial layout of context information, which capture the ``essential'' of the scene. The system can even further discriminate the contents contained in the scene via local highlight detection. Distinct from previous frameworks, the system presents the entire scheme of being biologically plausible and application efficiency, thus offering a straightforward platform for rapid analysis and interpretation on various blurry visual information.
BibTeX:
@inproceedings{dong2009parallel,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {Parallel Scene Perception on Various Blurry Images},
  booktitle = {Proceedings of the First International Conference on Internet Multimedia Computing and Service (ICIMCS 2009)},
  publisher = {ACM},
  year = {2009},
  pages = {60--64},
  note = {google scholar entry: 1st International Conference on Internet Multimedia Computing and Service (ICIMCS 2009). Kunming, China, 23-25 November 2009.},
  url = {http://doi.acm.org/10.1145/1734605.1734623},
  doi = {10.1145/1734605.1734623}
}
Fernandez Arguedas V, Chandramouli K and Izquierdo E (2009), "Exploiting Complementary Resources for Cross-Discipline Multimedia Indexing and Retrieval", In First International Conference on User Centric Media (UCMedia 2009). Venice, Italy, 9-11 December 2009. Revised Selected Papers. Venice, Italy Vol. 40, pp. 109-116. Springer.
Abstract: In recent times, the exponential growth of multimedia retrieval techniques has stimulated interest in the application of these techniques to other alien disciplines. Addressing the challenges raised by such cross-discpline multimedia retrieval engines, in this paper we present a multi-user framework in which complementary resources are exploited to model visual semantics expressed by users. The cross-discpline areas include history of technology and news archives. In the framework presented the query terms generated by historians are first analysed and the extraction of corresponding complementary resources are used to index the multimedia news archives. The experimental evaluation is presented on three semantic queries namely wind mills, solar energy and tidal energy.
BibTeX:
@inproceedings{fernandez2009exploiting,
  author = {Fernandez Arguedas, Virginia and Chandramouli, Krishna and Izquierdo, Ebroul},
  editor = {Daras, Petros and Mayora Ibarra, Oscar},
  title = {Exploiting Complementary Resources for Cross-Discipline Multimedia Indexing and Retrieval},
  booktitle = {First International Conference on User Centric Media (UCMedia 2009). Venice, Italy, 9-11 December 2009. Revised Selected Papers.},
  publisher = {Springer},
  year = {2009},
  volume = {40},
  pages = {109--116},
  note = {google scholar entry: 1st International Conference on User Centric Media (UCMedia 2009). Venice, Italy, 9-11 December 2009.},
  url = {http://link.springer.com/chapter/10.1007/978-3-642-12630-7_13},
  doi = {10.1007/978-3-642-12630-7_13}
}
Fernandez Arguedas V, Chandramouli K and Izquierdo E (2009), "Semantic Object Based Retrieval from Surveillance Videos", In Semantic Media Adaptation and Personalization, Proceedings 2009 Fourth International Workshop on. December, 2009, pp. 79-83.
Abstract: In recent years, due to technological developments, the use of Closed-Circuit Television monitoring has been widely used not only in public areas but also in confined and/or private spaces for improved personal safety and security. The increased data acquisition has naturally resulted in the critical need for multimedia analysis for semantic object and event detection. Addressing this research problem, in this paper we present an novel architecture for extracting and indexing semantic objects with Scale Invariant Feature Transform features. The proposed approach exploits the developments of motion tracking and video indexing algorithms. The proposed framework is an ongoing development with the objective to enable the semantic retrieval of objects. The preliminary performance analysis of the proposed approach has been evaluated on a set of surveillance videos.
BibTeX:
@inproceedings{fernandez2009semantic,
  author = {Fernandez Arguedas, Virginia and Chandramouli, Krishna and Izquierdo, Ebroul},
  editor = {Mylonas, Phivos and Wallace, Manolis and Anagnostopoulos, Ioannis},
  title = {Semantic Object Based Retrieval from Surveillance Videos},
  booktitle = {Semantic Media Adaptation and Personalization, Proceedings 2009 Fourth International Workshop on},
  year = {2009},
  pages = {79--83},
  note = {add to google scholar},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5381692},
  doi = {10.1109/SMAP.2009.20}
}
Janjusevic T, Benini S, Izquierdo E and Leonardi R (2009), "Random Methods for Fast Exploration of the Raw Video Material", In Proceedings of the 20th International Workshop on Database and Expert Systems Application (DEXA 2009). Linz, Austria, August, 2009, pp. 226-230.
Abstract: In this work we propose a visualization and access method for exploring a database of raw video material, where the span of information stored can be huge and difficult to understand at a glance. Developed visual mining tool presents the overview of the hierarchically structured repository in a intuitive way and provides interactive navigational support for the user. Browsing through the un-annotated content is enabled by two complementary interactive methods, a direct selection and a random access, both supported by visual display of the preview nodes. User evaluation aims at demonstrating how the hierarchical random visualization assists the process of accessing and retrieving content relevant to the user.
BibTeX:
@inproceedings{janjusevic2009random,
  author = {Janjusevic, Tijana and Benini, Sergio and Izquierdo, Ebroul and Leonardi, Riccardo},
  editor = {Tjoa, A. Min and Wagner, Roland R.},
  title = {Random Methods for Fast Exploration of the Raw Video Material},
  booktitle = {Proceedings of the 20th International Workshop on Database and Expert Systems Application (DEXA 2009).},
  year = {2009},
  pages = {226--230},
  note = {google scholar entry: 20th International Workshop on Database and Expert Systems Application (DEXA 2009). Linz, Austria, 31 August - 4 September 2009.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5337188},
  doi = {10.1109/DEXA.2009.80}
}
Janjusevic T and Izquierdo E (2009), "Visualising the Query Space of the Image Collection", In Information Visualization (IV 2009), Proceedings of the 13th IEEE International Conference on. Barcelona, Catalonia, July, 2009, pp. 86-91. IEEE.
Abstract: In this paper, we propose an information visualisation solution for multimedia retrieval based on semantic concepts defined in an image database. The proposed visualisation approach utilises concept map, Venn diagram and Fisheye distortion to enable effective and efficient visualisation of image database. In addition, the proposed approach enables displaying the local and global views of the collection subset, selected as relevant by the user. The proposed solution is evaluated on Corel 700 dataset with 10 semantic concepts for the following user actions: exploratory browsing, querying and detecting patterns.
BibTeX:
@inproceedings{janjusevic2009visualising,
  author = {Janjusevic, Tijana and Izquierdo, Ebroul},
  editor = {Banissi, Ebad and Stuart, Liz and Wyeld, Theodor G. and Jern, Mikael and Andrienko, Gennady and Memon, Nasrullah and Alhajj, Reda and Burkhard, Remo Aslak and Grinstein, Georges and Groth, Dennis and Ursyn, Anna and Johansson, Jimmy and Forsell, Camilla and Cvek, Urska and Trutschl, Marjan and Marchese, Francis T. and Maple, Carsten and Cowell, Andrew J. and Vande Moere, Andrew},
  title = {Visualising the Query Space of the Image Collection},
  booktitle = {Information Visualization (IV 2009), Proceedings of the 13th IEEE International Conference on},
  publisher = {IEEE},
  year = {2009},
  pages = {86--91},
  note = {google scholar entry: 13th IEEE International Conference on Information Visualization (IV 2009). Barcelona, Catalonia, 15-17 July 2009.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5190867},
  doi = {10.1109/IV.2009.69}
}
Koelstra S, Mühl C and Patras I (2009), "EEG analysis for implicit tagging of video data", In Proceedings. Volume I. 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII 2009). Amsterdam, The Netherlands, 10-12 September 2009, pp. 1-6.
BibTeX:
@inproceedings{koelstra2009eeg,
  author = {Sander Koelstra and Christian Mühl and Ioannis Patras},
  title = {EEG analysis for implicit tagging of video data},
  booktitle = {Proceedings. Volume I. 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII 2009). Amsterdam, The Netherlands, 10-12 September 2009.},
  year = {2009},
  pages = {1--6}
}
Pantoja C, Ortiz E and Trujillo M (2009), "An MPEG-7 Browser", In Proceedings of the 1st Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia, August, 2009. Kingston University.
BibTeX:
@inproceedings{pantoja2009mpeg,
  author = {Pantoja, Cesar and Ortiz, Edward and Trujillo, Maria},
  title = {An MPEG-7 Browser},
  booktitle = {Proceedings of the 1st Latin-American Conference on Networked and Electronic Media (LACNEM 2009)},
  publisher = {Kingston University},
  year = {2009},
  note = {google scholar entry: 1st Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia, 4-6 August 2009.},
  url = {http://dilnxsrv.king.ac.uk/lacnem2012/PastProceedings/lacnem2009/papers/}
}
Passino G, Patras I and Izquierdo E (2009), "Context Awareness in Graph-based Image Semantic Segmentation via Visual Word Distributions", In Image Analysis for Multimedia Interactive Services (WIAMIS 2009), Proceedings of the 10th International Workshop on. London, England, May, 2009, pp. 33-36. IEEE.
Abstract: This paper addresses the problem of image semantic segmentation (or semantic labelling), that is the association of one of a predefined set of semantic categories (e.g. cow, car, face) to each image pixel. We adopt a patch-based approach, in which super-pixel elements are obtained via oversegmentation of the original image. We then train a conditional random field on heterogeneous descriptors extracted at different scales and locations. This discriminative graphical model can effectively account for the statistical dependence of neighbouring patches. For the more challenging task of considering long-range patch dependency and contextualisation, we propose the use of a descriptor based on histograms of visual words extracted in the vicinity of each patch at different scales. Experiments validate our approach by showing improvements with respect to both a base model not using distributed features and the state of the art works in the area.
BibTeX:
@inproceedings{passino2009context,
  author = {Passino, Giuseppe and Patras, Ioannis and Izquierdo, Ebroul},
  title = {Context Awareness in Graph-based Image Semantic Segmentation via Visual Word Distributions},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2009), Proceedings of the 10th International Workshop on},
  publisher = {IEEE},
  year = {2009},
  pages = {33--36},
  note = {google scholar entry: 10th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2009). London, England, 6-8 May 2009.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5031425},
  doi = {10.1109/WIAMIS.2009.5031425}
}
Passino G, Patras I and Izquierdo E (2009), "Latent Semantics Local Distribution for CRF-based Image Semantic Segmentation", In Proceedings of the British Machine Vision Conference (BMVC 2009). London, England, September, 2009. (26), pp. 1-12. BMVA Press.
Abstract: Semantic image segmentation is the task of assigning a semantic label to every pixel of an image. This task is posed as a supervised learning problem in which the appearance of areas that correspond to a number of semantic categories are learned from a dataset of manually labelled images. This paper proposes a method that combines a region-based probabilistic graphical model that builds on the recent success of Conditional Random Fields (CRFs) in the problem of semantic segmentation, with a salient-points-based bagsof- words paradigm. In a first stage, the image is oversegmented into patches. Then, in a CRF-based formulation we learn both the appearance for each semantic category and the neighbouring relations between patches. In addition to patch features, we also consider information extracted on salient points that are detected in the patch�s vicinity. A visual word is associated to each salient point. Two different types of information are used. First, we consider the local weighted distribution of visual words. Using local (i.e. centred at each patch) word histograms enriches the classical global bags-of-word representation with positional information on word distributions. Second, we consider the un-normalised local distribution of a set of latent topics that are obtained by probabilistic Latent Semantic Analysis (pLSA). This distribution is obtained by the weighted accumulation of the latent topic distributions that are associated to the visual words in the area. The advantage of this second approach lays in the separate representation of the semantic content for each visual word. This allows us to consider the word contributions as independent in the CRF formulation without introducing too strong simplification assumptions. Tests on a publicly available dataset demonstrate the validity of the proposed salient point integration strategies. The results obtained with different configurations show an advance compared to other leading works in the area.
BibTeX:
@inproceedings{passino2009latent,
  author = {Passino, Giuseppe and Patras, Ioannis and Izquierdo, Ebroul},
  editor = {Cavallaro, Andrea and Prince, Simon J. D. and Alexander, Daniel},
  title = {Latent Semantics Local Distribution for CRF-based Image Semantic Segmentation},
  booktitle = {Proceedings of the British Machine Vision Conference (BMVC 2009)},
  publisher = {BMVA Press},
  year = {2009},
  number = {26},
  pages = {1--12},
  note = {google scholar entry: British Machine Vision Conference (BMVC 2009). London, England, 7-10 September 2009.},
  url = {http://www.eecs.qmul.ac.uk/~ioannisp/pubs/PassinoBMVC2009.pdf},
  doi = {10.5244/C.23.26}
}
Passino G, Piatrik T, Patras I and Izquierdo E (2009), "A Multimedia Content Semantics Extraction Framework for Enhanced Social Interaction", In Adjunct proceedings EuroITV 2009 Networked Television. Leuven, Belgium, June, 2009, pp. 89-91. KU Leuven.
Abstract: In this paper, a system for improved social interaction via the Internet or interactive TV is proposed. The aim is to provide a small group of closely connected users with a rich social experience, sharing intimate moments of life and emotions, taking full advantage of the existent Internet technology and broadcasting practices. Starting from a single use-case, a feasibility study for a social interaction is illustrated. The proposed architecture for social interaction is based on techniques for automated extraction of semantics from streamed content. In particular, technical feasibility and real-time processing issues are considered. Semantic information is used in the multimedia editing and composition phase, enabling the system to offer an experience that goes beyond the classical face-to-face video-conference. The efficient and rich presentation of the content is driven by technology for semantic segmentation, object detection and automated extraction of interesting regions in the scene. Furthermore, a face detection module is used to guarantee a constant visual presence of the parties. Finally, a summary of the session is automatically generated for future uses or on-line-browsing during the conversation.
BibTeX:
@inproceedings{passinomultimedia,
  author = {Passino, Giuseppe and Piatrik, Tomas and Patras, Ioannis and Izquierdo, Ebroul},
  editor = {Donoso, Verónica and Geerts, David and Cesar, Pablo and De Grooff, Dirk},
  title = {A Multimedia Content Semantics Extraction Framework for Enhanced Social Interaction},
  booktitle = {Adjunct proceedings EuroITV 2009 Networked Television},
  publisher = {KU Leuven},
  year = {2009},
  pages = {89--91},
  url = {http://www.euroitv2009.org/proceedings.html}
}
Peixoto E, de Queiroz RL and Mukherjee D (2009), "Mapping Motion Vectors for a "Wyner-Ziv" Video Transcoder", In Image Processing (ICIP 2009), Proceedings of the 16th International Conference on. Cairo, Egypt, November, 2009. Vol. 7(10), pp. 3681-3684. IEEE.
Abstract: Wyner-Ziv (WZ) coding of video utilizes simple encoders and highly complex decoders. A transcoder from a WZ codec to a traditional codec can potentially increase the range of applications for WZ codecs. We present a transcoder scheme from the most popular WZ codec architecture to a DPCM/DCT codec. As a proof of concept, we implemented this transcoder using a simple pixel domain WZ codec and the standard H.263+. The transcoder design aims at reducing complexity, since the transcoder has to perform both WZ decoding and DPCM/DCT encoding, including motion estimation. New approaches are used to map motion vectors for such a transcoder. Results are presented to demonstrate the transcoder performance.
BibTeX:
@inproceedings{Peixoto2009mapping,
  author = {Peixoto, Eduardo and de Queiroz, Ricardo L. and Mukherjee, Debargha},
  title = {Mapping Motion Vectors for a "Wyner-Ziv" Video Transcoder},
  booktitle = {Image Processing (ICIP 2009), Proceedings of the 16th International Conference on},
  publisher = {IEEE},
  year = {2009},
  volume = {7},
  number = {10},
  pages = {3681--3684},
  note = {google scholar entry: 16th International Conference on Image Processing (ICIP 2007). Cairo, Egypt, 7-10 November 2009.},
  url = {http://image.unb.br/queiroz/papers/icip09wztranscoder.pdf},
  doi = {10.1109/ICIP.2009.5414231}
}
Piatrik T and Izquierdo E (2009), "Hierarchical Summarisation of Video Using Ant-Tree Strategy", In Content-Based Multimedia Indexing (CBMI 2009), Proceedings of the 7th International Workshop on. Chania, Crete , pp. 107-112. IEEE.
Abstract: This paper presents a simple but fully functioning and complete artificial visual system. The triangle is the simplest object of perception and therefore the simplest visual system is one which sees only triangles. The system presented is complete in the sense that it will see any triangle presented to it as visual input in bitmap form, even triangles with illusory contours that can only be detected by inference.
BibTeX:
@inproceedings{piatrik2009hierarchical,
  author = {Piatrik, Tomas and Izquierdo, Ebroul},
  editor = {Kollias, Stefanos D. and Avrithis, Yannis S.},
  title = {Hierarchical Summarisation of Video Using Ant-Tree Strategy},
  booktitle = {Content-Based Multimedia Indexing (CBMI 2009), Proceedings of the 7th International Workshop on},
  publisher = {IEEE},
  year = {2009},
  pages = {107--112},
  note = {google scholar entry: 7th International Workshop on Content-Based Multimedia Indexing (CBMI 2009). Chania, Crete, 3-5 June 2009.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5137825},
  doi = {10.1109/CBMI.2009.50}
}
Piatrik T and Izquierdo E (2009), "Subspace clustering of images using Ant colony Optimisation", In Image Processing (ICIP 2009), Proceedings of the 16th International Conference on. Cairo, Egypt, November, 2009. Vol. 7(10), pp. 229-232. IEEE.
Abstract: Content-based image retrieval can be dramatically improved by providing a good initial clustering of visual data. The problem of image clustering is that most current algorithms are not able to identify individual clusters that exist in different feature subspaces. In this paper, we propose a novel approach for subspace clustering based on Ant Colony Optimisation and its learning mechanism. The proposed algorithm breaks the assumption that all of the clusters in a dataset are found in the same set of dimensions by assigning weights to features according to the local correlations of data along each dimension. Experiment results on real image datasets show the need for feature selection in clustering and the benefits of selecting features locally.
BibTeX:
@inproceedings{piatrik2009subspace,
  author = {Piatrik,Tomas and Izquierdo, Ebroul},
  title = {Subspace clustering of images using Ant colony Optimisation},
  booktitle = {Image Processing (ICIP 2009), Proceedings of the 16th International Conference on},
  publisher = {IEEE},
  year = {2009},
  volume = {7},
  number = {10},
  pages = {229--232},
  note = {google scholar entry: 16th International Conference on Image Processing (ICIP 2007). Cairo, Egypt, 7-10 November 2009.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5414503},
  doi = {10.1109/ICIP.2009.5414503}
}
Ramzan N and Izquierdo E (2009), "An Optimised Error Protection Scheme for Wavelet-based Scalable Video", In Proceedings of Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia. August, 2009, pp. 1-5. Kingston University.
Abstract: Scalable video coding provides an efficient solution when video is delivered through heterogeneous networks to terminals with different computational and display capabilities. Scalable video bit-streams can easily be adapted to required spatio-temporal resolution and quality, according to the transmission requirements. In this paper, the Scalable Video Coding (aceSVC) architecture is explained in detail. The aceSVC framework is based on wavelet based motion compensated approaches. This paper reviews the individual components of used aceSVC. The performance of aceSVC is compared to the state-of-the-art SVC codec. The practical capabilities of the aceSVC are demonstrated by using the error resilient transmission application. The experimental result shows that the aceSVC framework produces better result than existing method and provides full flexible architecture with respect to different application scenarios.
BibTeX:
@inproceedings{Ramzan2009,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  title = {An Optimised Error Protection Scheme for Wavelet-based Scalable Video},
  booktitle = {Proceedings of Latin-American Conference on Networked and Electronic Media (LACNEM 2009)},
  publisher = {Kingston University},
  year = {2009},
  pages = {1--5},
  note = {google scholar entry: Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia, 4-6 August 2009.},
  url = {http://dilnxsrv.king.ac.uk/lacnem2012/PastProceedings/lacnem2009/papers/lacnem09_01.pdf}
}
Ramzan N and Izquierdo E (2009), "An Optimised Error Protection Scheme for Wavelet-based Scalable Video", In Proceedings of Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia. August, 2009, pp. 1-5. Kingston University.
Abstract: Scalable video coding provides an efficient solution when video is delivered through heterogeneous networks to terminals with different computational and display capabilities. Scalable video bit-streams can easily be adapted to required spatio-temporal resolution and quality, according to the transmission requirements. In this paper, the Scalable Video Coding (aceSVC) architecture is explained in detail. The aceSVC framework is based on wavelet based motion compensated approaches. This paper reviews the individual components of used aceSVC. The performance of aceSVC is compared to the state-of-the-art SVC codec. The practical capabilities of the aceSVC are demonstrated by using the error resilient transmission application. The experimental result shows that the aceSVC framework produces better result than existing method and provides full flexible architecture with respect to different application scenarios.
BibTeX:
@inproceedings{ramzan2009optimised,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  title = {An Optimised Error Protection Scheme for Wavelet-based Scalable Video},
  booktitle = {Proceedings of Latin-American Conference on Networked and Electronic Media (LACNEM 2009)},
  publisher = {Kingston University},
  year = {2009},
  pages = {1--5},
  note = {google scholar entry: Latin-American Conference on Networked and Electronic Media (LACNEM 2009). Bogotá, Columbia, 4-6 August 2009.},
  url = {http://dilnxsrv.king.ac.uk/lacnem2012/PastProceedings/lacnem2009/papers/lacnem09_01.pdf}
}
Ramzan N, Zgaljic T and Izquierdo E (2009), "Scalable Video Coding as Basis for Networked Media Internet", In Proceedings of the 2nd summit on Networked and Electronic Media (NEM 2009). Saint-Malo, France, September, 2009. The NEM Initiative.
Abstract: An efficient Joint Source Channel Coding (JSCC) framework is proposed to support seamless networked electronic media, specifically content delivery to different display terminals through heterogeneous networks as the Internet. The proposed approach applies a scalable video coder for source coding and forward error correction codes for channel coding. Scalable video bit-streams can easily be adapted to required spatio-temporal resolution and quality, according to the transmission and user context requirements. This enables content adaptation and interoperability in Internet networking environment. Adaptation of the bit-stream is performed in the compressed domain, by discarding the bit-stream portions that represent higher spatio-temporal resolution and quality than desired. Thus, the adaptation is of very low complexity. Furthermore, the embedded structure of a scalable bit-stream provides a natural solution for protection of the video against transmission errors inherent to Internet content transmission. Here, strongest protection is applied to the most important portions of the bit-stream while the weakest protection is applied to the least important portions of the bit-stream.
BibTeX:
@inproceedings{ramzan2009scalable,
  author = {Ramzan, Naeem and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Scalable Video Coding as Basis for Networked Media Internet},
  booktitle = {Proceedings of the 2nd summit on Networked and Electronic Media (NEM 2009)},
  publisher = {The NEM Initiative},
  year = {2009},
  note = {European Commission NEM Summit -- Towards Future Media Internet},
  url = {http://nem-summit.eu/}
}
Romero Macias C and Izquierdo E (2009), "Visual Word-Based CAPTCHA using 3D Characters", In Crime Detection and Prevention (ICDP 2009), Proceedings of the 3rd International Conference on. London, England, December, 2009, pp. 1-5. IET.
Abstract: A secure login authentication for web applications is presented. The proposed technique uses a CAPTCHA to discriminate between users and automated programs designed to access web applications. The novelty of the proposed CAPTCHA resides on the use of 3D characters with 3D boundaries delimited by shadows. The robustness of the proposed CAPTCHA is increased by applying random distortions in each character according to geometric transformation in a 3D space. In particular, shadows obtained from different light effects are used to further enhance security against automatic character recognition tools. Experimental evaluations are reported to confirm the security of the approach.
BibTeX:
@inproceedings{romero2009visual,
  author = {Romero Macias, Cristina and Izquierdo, Ebroul},
  title = {Visual Word-Based CAPTCHA using 3D Characters},
  booktitle = {Crime Detection and Prevention (ICDP 2009), Proceedings of the 3rd International Conference on},
  publisher = {IET},
  year = {2009},
  pages = {1--5},
  note = {google scholar entry: 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP 2009). London, England, 3 December 2009.},
  url = {http://digital-library.theiet.org/content/conferences/10.1049/ic.2009.0269},
  doi = {10.1049/ic.2009.0269}
}
Seneviratne L and Izquierdo E (2009), "Image Annotation through Gaming (TAG4FUN)", In Digital Signal Processing (DSP 2009), Proceedings of the 16th International Conference on. Santorini, Greece, july, 2009, pp. 940-945. IEEE.
Abstract: This paper introduces a new technique for image annotation in which social aspects of human-based computation are exploited. The proposed approach aims at exploiting what millions of single, online and cooperative gamers are keen to do, (in some cases gaming enthusiasts) to tackle the challenging image annotation task. The proposed approach deviates from the conventional ``content-based image retrieval (CBIR)'' paradigm, favored by the research community to tackle problems related to semantic annotation and tagging of multimedia content. The proposed approach focuses on social aspects of gaming and the use of humans in a widely distributed fashion through a process of human-based computation. It aims at motivating people towards image tagging while entertaining themselves. Regarding key aspect of label accuracy, a combination of computer vision techniques, machine learning and game strategies have been used.
BibTeX:
@inproceedings{seneviratne2009image,
  author = {Seneviratne, Lasantha and Izquierdo, Ebroul},
  title = {Image Annotation through Gaming (TAG4FUN)},
  booktitle = {Digital Signal Processing (DSP 2009), Proceedings of the 16th International Conference on},
  publisher = {IEEE},
  year = {2009},
  pages = {940--945},
  note = {google scholar entry: 16th International Conference on Digital Signal Processing (DSP 2009). Santorini, Greece, 5-7 July 2009.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5201118},
  doi = {10.1109/ICDSP.2009.5201118}
}
Tsomko E, Kim H-J, Izquierdo E and Guerra Ones V (2009), "A Spectral Technique for Image Clustering", In Ultra Modern Telecommunications (ICUMT 2009). Proceedings of the International Conference on. St. Petersburg, Russia, October, 2009, pp. 1-5. IEEE.
Abstract: Given an image i* and an image database V containing and unknown number of image classes, in this paper we propose a technique for finding the class A of V that contains i*. To solve this (1+x)-class clustering problem a novel spectral ``asymmetric'' formulation of the problem is introduced: The Asymmetric Cut. It permits the extraction of the required class regardless other classes in the data base. The actual goal is to find a spectral formulation of the (1+x)-class clustering problem and to propose an efficient numerical implementation of the approach for large image database. The proposed method finds a subset A that maximizes the similarities within the chosen cluster but it does not involve affinities or dissimilarities among remaining unknown clusters in the database. Asymmetric cuts seamlessly lead to a spectral representation which can be solved by finding the critical points of the corresponding Rayleigh quotient. Following the underlying spectral theoretical approach the critical points correspond to the eigenvectors of an affinity matrix derived from pair-wise similarities involving information related to a single image i* representing the image class of concern. Selected results from experimental evaluation are presented.
BibTeX:
@inproceedings{tsomko2009spectral,
  author = {Tsomko, Elena and Hyoung-Joong Kim and Izquierdo, Ebroul and Guerra Ones, Valia},
  title = {A Spectral Technique for Image Clustering},
  booktitle = {Ultra Modern Telecommunications (ICUMT 2009). Proceedings of the International Conference on},
  publisher = {IEEE},
  year = {2009},
  pages = {1--5},
  note = {google scholar entry: International Conference on Ultra Modern Telecommunications (ICUMT 2009). St. Petersburg, Russia, 12-14 October 2009.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5345343},
  doi = {10.1109/ICUMT.2009.5345343}
}

Presentations, Posters and Technical Reports

Izquierdo E, Davis VM, Johns AW and Bitar N (2009), "Method and Apparatus for Fragile Watermarking". February, 2009.
BibTeX:
@misc{izquierdo2009method,
  author = {Izquierdo, Ebroul and Davis, Valerie M. and Johns, Andrew W. and Bitar, Nancy},
  title = {Method and Apparatus for Fragile Watermarking},
  year = {2009},
  note = {US Patent 7,489,797}
}

Theses and Monographs

Piatrik T (2009), "Image clustering and Video Summarisation using ant-inspired methods". Thesis at: Queen Mary University of London.
BibTeX:
@phdthesis{piatrik2009image,
  author = {Piatrik, Tomas},
  editor = {Izquierdo, Ebroul},
  title = {Image clustering and Video Summarisation using ant-inspired methods},
  school = {Queen Mary University of London},
  year = {2009},
  url = {http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.509746}
}
Romero Macias C (2009), "Image Understanding for Automatic Human and Machine Separation". Thesis at: Queen Mary University of London, pp. 1-146.
Abstract: The research presented in this thesis aims to extend the capabilities of human interaction proofs in order to improve security in web applications and services. The research focuses on developing a more robust and efficient Completely Automated Public Turing test to tell Computers and Human Apart (CAPTCHA) to increase the gap between human recognition and machine recognition. Two main novel approaches are presented, each one of them targeting a different area of human and machine recognition: a character recognition test, and an image recognition test. Along with the novel approaches, a categorisation for the available CAPTCHA methods is also introduced. The character recognition CAPTCHA is based on the creation of depth perception by using shadows to represent characters. The characters are created by the imaginary shadows produced by a light source, using as a basis the gestalt principle that human beings can perceive whole forms instead of just a collection of simple lines and curves. This approach was developed in two stages: firstly, two dimensional characters, and secondly three-dimensional character models. The image recognition CAPTCHA is based on the creation of cartoons out of faces. The faces used belong to people in the entertainment business, politicians, and sportsmen. The principal basis of this approach is that face perception is a cognitive process that humans perform easily and with a high rate of success. The process involves the use of face morphing techniques to distort the faces into cartoons, allowing the resulting image to be more robust against machine recognition. Exhaustive tests on both approaches using OCR software, SIFT image recognition, and face recognition software show an improvement in human recognition rate, whilst preventing robots break through the tests.
BibTeX:
@phdthesis{macias2009image,
  author = {Romero Macias, Cristina},
  editor = {Izquierdo, Ebroul},
  title = {Image Understanding for Automatic Human and Machine Separation},
  school = {Queen Mary University of London},
  year = {2009},
  pages = {1--146},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/ChristinaRomero_PhDThesis.pdf}
}


2008

Journal Papers

Borges PVK, Mayer J and Izquierdo E (2008), "Robust and transparent color modulation for text data hiding", Multimedia, IEEE Transactions on. Vol. 10(8), pp. 1479-1489. IEEE.
Abstract: This paper improves the use of text color modulation (TCM) as a reliable text document data hiding method. Using TCM, the characters in a document have their color components modified (possibly unperceptually) according to a side message to be embedded. This work presents a detection metric and an analysis determining the detection error rate in TCM, considering an assumed print and scan (PS) channel model. In addition, a perceptual impact model is employed to evaluate the perceptual difference between a modified and a non-modified character. Combining this perceptual model and the results from the detection error analysis it is possible to determine the optimum color modulation values. The proposed detection metric also exploits the orientation characteristics of color halftoning to reduce the error rate. In particular, because color halftoning algorithms use different screen orientation angles for each color channel, this is used as an effective feature to detect the embedded message. Experiments illustrate the validity of the analysis and the applicability of the method.
BibTeX:
@article{borges2008robust,
  author = {Borges, Paulo Vinicius Koerich and Mayer, Joceli and Izquierdo, Ebroul},
  title = {Robust and transparent color modulation for text data hiding},
  journal = {Multimedia, IEEE Transactions on},
  publisher = {IEEE},
  year = {2008},
  volume = {10},
  number = {8},
  pages = {1479--1489},
  url = {http://www.paulovinicius.com/papers/borges_TM_2008_b.pdf},
  doi = {10.1109/TMM.2008.2007294}
}
Borges VKB, Mayer J and Izquierdo E (2008), "Document Image Processing for Paper Side Communications", Multimedia, IEEE Transactions on. November, 2008. Vol. 10(7), pp. 1277-1287. IEEE.
Abstract: This paper proposes the use of higher order statistical moments in document image processing to improve the performance of systems which transmit side information through the print and scan channel. Examples of such systems are multilevel 2-D bar codes and certification via text luminance modulation. These systems print symbols with different luminances, according to the target side information. In previous works, the detection of a received symbol is usually performed by evaluating the average luminance or spectral characteristics of the received signal. This paper points out that, whenever halftoning algorithms are used in the printing process, detection can be improved by observing that third and fourth order statistical moments of the transmitted symbol also change, depending on the luminance level. This work provides a thorough analysis for those moments used as detection metrics. A print and scan channel model is exploited to derive the relationship between the modulated luminance level and the higher order moments of a halftone image. This work employs a strategy to merge the different moments into a single metric to achieve a reduced detection error rate. A transmission protocol for printed documents is proposed which takes advantage of the resulting higher robustness achieved with the combined detection metrics. The applicability of the introduced document image analysis approach is validated by comprehensive computer simulations.
BibTeX:
@article{borges2008document,
  author = {Borges, Vinicius Koerich Borges and Mayer, Joceli and Izquierdo, Ebroul},
  title = {Document Image Processing for Paper Side Communications},
  journal = {Multimedia, IEEE Transactions on},
  publisher = {IEEE},
  year = {2008},
  volume = {10},
  number = {7},
  pages = {1277--1287},
  url = {http://www.paulovinicius.com/papers/borges_TM_2008.pdf},
  doi = {10.1109/TMM.2008.2004906}
}
Henderson C (2008), "Managing Software Defects: Defect Analysis and Traceability", SIGSOFT Software Engineering Notes. July, 2008. Vol. 33(4), pp. 1-2. ACM.
Abstract: This paper describes a mechanism for presenting software defect metrics to aid analysis. A graphical representation of the history of software builds is presented, that records software build quality in a way that cannot be displayed in a single numerical table, and is visually more appealing and more easily digestible than a series of related tables. The radial analysis charts can be used to represent derivative information in a two-dimensional form and is demonstrated with practical examples of Defect Analysis and Root Cause Analysis.
BibTeX:
@article{henderson2008managing,
  author = {Henderson, Craig},
  title = {Managing Software Defects: Defect Analysis and Traceability},
  journal = {SIGSOFT Software Engineering Notes},
  publisher = {ACM},
  year = {2008},
  volume = {33},
  number = {4},
  pages = {1--2},
  url = {http://dl.acm.org/citation.cfm?id=1384141},
  doi = {10.1145/1384139.1384141}
}
Izquierdo E, Kim H-J and Sikora T (2008), "Knowledge-Assisted Media Analysis for Interactive Multimedia Applications", EURASIP Journal on Advances in Signal Processing. February, 2008. (1), pp. 1-2. Springer.
BibTeX:
@article{izquierdo2008knowledge,
  author = {Izquierdo, Ebroul and Kim, Hyoung-Joong and Sikora, Thomas},
  title = {Knowledge-Assisted Media Analysis for Interactive Multimedia Applications},
  journal = {EURASIP Journal on Advances in Signal Processing},
  publisher = {Springer},
  year = {2008},
  number = {1},
  pages = {1--2},
  url = {http://asp.eurasipjournals.com/content/pdf/1687-6180-2007-036404.pdf},
  doi = {10.1155/2007/36404}
}
Mrak M, Zgaljic T and Izquierdo E (2008), "Influence of downsampling filter characteristics on compression performance in wavelet-based scalable video coding", Image Processing, IET. June, 2008. Vol. 2(3), pp. 116-129. IET.
Abstract: The application of different downsampling filters in video coding directly models visual information at lower resolutions and influences the compression performance of a chosen coding system. In wavelet-based scalable video coding the spatial scalability is achieved by the application of wavelets as downsampling filters. However, characteristics of different wavelets influence the performance at targeting spatio-temporal decoding points. An analysis of different downsampling filters in popular wavelet-based scalable video coding schemes is presented. Evaluation is performed for both intra- and inter-coding schemes using wavelets and standard downsampling strategies. On the basis of the obtained results a new concept of inter-resolution prediction is proposed, which maximises the average performance using a combination of standard downsampling filters and wavelet-based coding.
BibTeX:
@article{mrak2008influence,
  author = {Mrak, Marta and Zgaljic, Toni and Izquierdo, Ebroul},
  title = {Influence of downsampling filter characteristics on compression performance in wavelet-based scalable video coding},
  journal = {Image Processing, IET},
  publisher = {IET},
  year = {2008},
  volume = {2},
  number = {3},
  pages = {116--129},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4545879},
  doi = {10.1049/iet-ipr:20070185}
}
Passino G, Patras I and Izquierdo E (2008), "Aspect Coherence for Graph-Based Semantic Image Labelling", IET Computer Vision. September, 2008. Vol. 4(3), pp. 183-194. IET.
Abstract: In image semantic segmentation a semantic category label is associated to each image pixel. This classification problem is characterised by pixel dependencies at different scales. On a small-scale pixel correlation is related to object instance sharing, whereas on a middle- and large scale to category co-presence and relative location constraints. The contribution of this study is two-fold. First, the authors present a framework that jointly learns category appearances and pixel dependencies at different scales. Small-scale dependencies are accounted by clustering pixels into larger patches via image oversegmentation. To tackle middle-scale dependencies a conditional random field (CRF) is built over the patches. A novel strategy to exploit local patch aspect coherence is used to impose an optimised structure in the graph to have exact and efficient inference. The second contribution is a method to account for full patch neighbourhoods without introducing loops in the graphical structures. `Weak neighbours` are introduced, which are patches connected in the image but not in the inference graph. They are pre-classified according to their visual appearance and their category distribution probability is then used in the CRF inference step. Experimental evidence of the validity of the method shows improvements in comparison to other works in the field.
BibTeX:
@article{passino2010aspect,
  author = {Passino, Giuseppe and Patras, Ioannis and Izquierdo, Ebroul},
  title = {Aspect Coherence for Graph-Based Semantic Image Labelling},
  journal = {IET Computer Vision},
  publisher = {IET},
  year = {2008},
  volume = {4},
  number = {3},
  pages = {183--194},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/passino_iet-cvi_09.pdf},
  doi = {10.1049/iet-cvi.2008.0093}
}
Rodriguez R, Castillo PJ, Guerra V, Sossa Azuela JH, Suáreza AG and Izquierdo E (2008), "A comparison between two robust techniques for segmentation of blood vessels", Computers in Biology and Medicine. August, 2008. Vol. 38(8), pp. 931-940. Elsevier.
Abstract: Image segmentation plays an important role in image analysis. According to several authors, segmentation terminates when the observer's goal is satisfied. For this reason, a unique method that can be applied to all possible cases does not yet exist. In this paper, we have carried out a comparison between two current segmentation techniques, namely the mean shift method, for which we propose a new algorithm, and the so-called spectral method. In this investigation the important information to be extracted from an image is the number of blood vessels (BV) present in the image. The results obtained by both strategies were compared with the results provided by manual segmentation. We have found that using the mean shift segmentation an error less than 20% for false positives (FP) and 0% for false negatives (FN) was observed, while for the spectral method more than 45% for FP and 0% for FN were obtained. We discuss the advantages and disadvantages of both methods.
BibTeX:
@article{rodriguez2008comparison,
  author = {Rodriguez, Roberto and Castillo, Patricio J. and Guerra, Valia and Sossa Azuela, Juan Humberto and Suáreza, Ana G. and Izquierdo, Ebroul},
  title = {A comparison between two robust techniques for segmentation of blood vessels},
  journal = {Computers in Biology and Medicine},
  publisher = {Elsevier},
  year = {2008},
  volume = {38},
  number = {8},
  pages = {931--940},
  note = {google scholar author list: Rodr�guez, Roberto; Castillo, Patricio J; Guerra, Valia; Sossa Azuela, Juan Humberto; Su�reza, Ana G; Izquierdo, Ebroul},
  url = {http://www.sciencedirect.com/science/article/pii/S0010482508000978},
  doi = {10.1016/j.compbiomed.2008.06.002}
}
Vaiapury K and Kankanhalli MS (2008), "Finding interesting images in albums using attention", Journal of Multimedia. October, 2008. Vol. 3(4), pp. 2-13.
Abstract: Commercial systems such as Flickr display interesting photos from their collection as an interaction mechanism for sampling the collection. It purely relies on social activity analysis for determining the notion of interestingness. We propose an alternative technique based on content analysis for finding interesting photos in a collection. We use a combination of visual attention models and an interactive feedback mechanism to compute interestingness. A differentiating feature of our approach is the ability to customize the set of interesting photos depending upon the individual interest. Also, we incorporate non-identical duplicate detection as a mechanism to strengthen the surprise factor among the potentially interesting set of candidate photos. We have implemented the system and conducted a user study whose results are promising. This proposed work presents a variant on query by example integrating user relevance feedback to choose ''interesting'' photos.
BibTeX:
@article{vaiapury2008finding,
  author = {Vaiapury, Karthikeyan and Kankanhalli, Mohan S.},
  title = {Finding interesting images in albums using attention},
  journal = {Journal of Multimedia},
  year = {2008},
  volume = {3},
  number = {4},
  pages = {2--13},
  url = {https://academypublisher.com/~academz3/ojs/index.php/jmm/article/view/03040213},
  doi = {10.4304/jmm.3.4.2-13}
}

Conference Papers

Akram M, Ramzan N and Izquierdo E (2008), "Event Based Video Coding Architecture", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, July, 2008, pp. 807-812. IET.
Abstract: In this work, scalable video codec (SVC) has been used for security surveillance video. The main event in the video is motion of an object. A strategy has been proposed to find out the different motion levels (events) in the video. The level of motion in a Group of Picture (GOP) is used to assign different scalability features to the GOP. The architecture of SVC has been modified to provide this support. The model of the modified SVC architecture is presented in detail. The improved system handles fewer amounts of data for processing and storage yet conveys all the important information related to surveillance video. The implementation and experimental results on the surveillance video has been presented. Results show that the proposed system efficiently detects motion and adapts to the scalability level accordingly.
BibTeX:
@inproceedings{akram2008event,
  author = {Akram, Muhammad and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Event Based Video Coding Architecture},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  publisher = {IET},
  year = {2008},
  pages = {807--812},
  note = {google scholar entry: 5th International Conference on Visual Information Engineering (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4743529},
  doi = {10.1049/cp:20080421}
}
Bangert T (2008), "TriangleVision: A Toy Visual System", In Artificial Neural Networks (ICANN 2008), 18th International Conference. Prague, Czech Republic, September 3-6, 2008. Proceedings, Part I. Prague, Czech Republic, September, 2008. Vol. 5163, pp. 937-950. Springer.
Abstract: This paper presents a simple but fully functioning and complete artificial visual system. The triangle is the simplest object of perception and therefore the simplest visual system is one which sees only triangles. The system presented is complete in the sense that it will see any triangle presented to it as visual input in bitmap form, even triangles with illusory contours that can only be detected by inference.
BibTeX:
@inproceedings{bangert2008trianglevision,
  author = {Bangert, Thomas},
  editor = {Kůrková, Vra and Neruda, Roman and Koutník, Jan},
  title = {TriangleVision: A Toy Visual System},
  booktitle = {Artificial Neural Networks (ICANN 2008), 18th International Conference. Prague, Czech Republic, September 3-6, 2008. Proceedings, Part I.},
  publisher = {Springer},
  year = {2008},
  volume = {5163},
  pages = {937--950},
  note = {google scholar entry: Artificial Neural Networks (ICANN 2008), 18th International Conference. Prague, Czech Republic, 3-6 September 2008.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/TV.pdf},
  doi = {10.1007/978-3-540-87536-9_96}
}
Borges PVK, Mayer J and Izquierdo E (2008), "Efficient Visual Fire Detection Applied for Video Retrieval", In Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008). Lausanne, Switzerland , pp. 1-5. European Association for Signal Processing (EURASIP).
Abstract: In this paper we propose a new image event detection method for identifying fire in videos. Traditional image based fire detection is often applied in surveillance camera scenarios with well be- haved background. In contrast, the proposed method is applied for retrieval of fire catastrophes in newscast content, such that there is great variation in fire and background characteristics, depending on the video instance. The method analyses the frame-to-frame change in given features of potential fire regions. These features are colour, area size, texture, boundary roughness and skewness of the estimated fire regions. Because of flickering and random characteristics of fire, these are powerful discriminants. The change of each of these features is evaluated, and the results are combined according to the Bayes classifier to to achieve a decision (i.e. fire happens, fire does not happen). Experiments illustrated the applicability of the method and the improved performance in comparison to other techniques.
BibTeX:
@inproceedings{borges2008efficient,
  author = {Borges, Paulo Vinicius Koerich and Mayer, Joceli and Izquierdo, Ebroul},
  title = {Efficient Visual Fire Detection Applied for Video Retrieval},
  booktitle = {Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008)},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2008},
  pages = {1--5},
  note = {google scholar entry: 16th European Signal Processing Conference (EUSIPCO-2008). Lausanne, Switzerland, 25-29 August 2008.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2008/program.html}
}
Borges PVK, Mayer J and Izquierdo E (2008), "A Probabilistic Model for Flood Detection in Video Sequences", In Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on. San Diego, CA, October, 2008, pp. 13-16. IEEE.
Abstract: In this paper we propose a new image event detection method for identifying flood in videos. Traditional image based flood detection is often used in remote sensing and satellite imaging applications. In contrast, the proposed method is applied for retrieval of flood catastrophes in newscast content, which present great variation in flood and background characteristics, depending on the video instance. Different flood regions in different images share some common features which are reasonably invariant to lightness, camera angle or background scene. These features are texture, relation among color channels and saturation characteristics. The method analyses the frame-to-frame change in these features and the results are combined according to the Bayes classifier to achieve a decision (i.e. flood happens, flood does not happen). In addition, because the flooded region is usually located around the lower and middle parts of an image, a model for the probability of occurrence of flood as a function of the vertical position is proposed, significantly improving the classification performance. Experiments illustrated the applicability of the method and the improved performance in comparison to other techniques.
BibTeX:
@inproceedings{borges2008probabilistic,
  author = {Borges, Paulo Vinicius Koerich and Mayer, Joceli and Izquierdo, Ebroul},
  title = {A Probabilistic Model for Flood Detection in Video Sequences},
  booktitle = {Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {13--16},
  note = {google scholar entry: 15th IEEE International Conference on Image Processing (ICIP 2008). San Diego, CA, 12-15 October 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4711679},
  doi = {10.1109/ICIP.2008.4711679}
}
Borges VKB, Izquierdo E and Mayer J (2008), "Efficient Text Color Modulation for Printed Side Communications and Data Hiding", In Computer Graphics and Image Processing (SIBGRAPI 2008), Proceedings of the 21st Brazilian Symposium on. Campo Grande, Brazil, October, 2008, pp. 79-86. IEEE.
Abstract: This paper improves the use of text color modulation (TCM) as a reliable text document data hiding method. Using TCM, the characters in a document have their color components modified (possibly unperceptually) according to a side message to be embedded. This work presents a detection metric and an analysis determining the detection error rate in TCM, considering an assumed print and scan (PS) channel model. In addition, a perceptual impact model is employed to evaluate the perceptual difference between a modified and a non-modified character. Combining this perceptual model and the results from the detection error analysis it is possible to determine the optimum color modulation values. The proposed detection metric also exploits the orientation characteristics of color halftoning to reduce the error rate. In particular, because color halftoning algorithms use different screen orientation angles for each color channel, this is used as an effective feature to detect the embedded message. Experiments illustrate the validity of the analysis and the applicability of the method.
BibTeX:
@inproceedings{borges2008efficient2,
  author = {Borges, Vinicius Koerich Borges and Izquierdo, Ebroul and Mayer, Joceli},
  title = {Efficient Text Color Modulation for Printed Side Communications and Data Hiding},
  booktitle = {Computer Graphics and Image Processing (SIBGRAPI 2008), Proceedings of the 21st Brazilian Symposium on},
  publisher = {IEEE},
  year = {2008},
  pages = {79--86},
  note = {google scholar entry: 21st Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2008). Campo Grande, Brazil, 12-15 October 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4654146},
  doi = {10.1109/SIBGRAPI.2008.23}
}
Chandramouli K and Izquierdo E (2008), "Exploiting User Logs in CBIR using fuzzy-PSO", In 3rd International Conference on Semantic and Digital Media Technologies (SAMT 2008). 3 December 2008, Koblenz, Germany. Koblenz, Germany ACM.
Abstract: In this paper, effective exploitation of user logs for improving the performance of interactive Content Based Image Retrieval is presented. A fuzzy membership function is derived by analysing the user logs generated from the relevance feedback information provided by multiple users for a set of target queries. The underlying machine learning algorithm for User Relevance Feedback is based on Self Organising Maps. The training of network nodes is achieved by Particle Swarm Optimisation. The proposed approach is evaluated on two datasets namely Corel 700 and Flickr 500.
BibTeX:
@inproceedings{chandramouli2008exploiting,
  author = {Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Exploiting User Logs in CBIR using fuzzy-PSO},
  booktitle = {3rd International Conference on Semantic and Digital Media Technologies (SAMT 2008). 3 December 2008, Koblenz, Germany},
  publisher = {ACM},
  year = {2008},
  note = {google scholar entry: 3rd International Conference on Semantic and Digital Media Technologies (SAMT 2008). Koblenz, Germany, 3 December 2008.},
  url = {http://resources.smile.deri.ie/conference/2008/samt/}
}
Chandramouli K, Kliegr T, Nemrava J, Svatek V and Izquierdo E (2008), "Query Refinement and User Relevance Feedback for Contextualized Image Retrieval", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, July, 2008, pp. 453-458.
Abstract: The motivation of this paper is to enhance the user perceived precision of results of Content Based Information Retrieval (CBIR) systems with Query Refinement (QR), Visual Analysis (VA) and Relevance Feedback (RF) algorithms. The proposed algorithms were implemented as modules into K-Space CBIR system. The QR module discovers hypernyms for the given query from a free text corpus (such as Wikipedia) and uses these hypernyms as refinements for the original query. Extracting hypernyms from Wikipedia makes it possible to apply query refinement to more queries than in related approaches that use static predefined thesaurus such as Wordnet. The VA Module uses the K-Means algorithm for clustering the images based on low-level MPEG - 7 Visual features. The RF Module uses the preference information expressed by the user to build user profiles by applying SOM-based supervised classification, which is further optimized by a hybrid Particle Swarm Optimization (PSO) algorithm. The experiments evaluating the performance of QR and VA modules show promising results.
BibTeX:
@inproceedings{chandramouli2008query,
  author = {Chandramouli, Krishna and Kliegr, Tomas and Nemrava, Jan and Svatek, Vojtech and Izquierdo, Ebroul},
  title = {Query Refinement and User Relevance Feedback for Contextualized Image Retrieval},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  year = {2008},
  pages = {453--458},
  note = {google scholar entry: 5th International Conference on Visual Information Engineering, (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://nb.vse.cz/~svatek/vie08.pdf},
  doi = {10.1049/cp:20080356}
}
Chandramouli K, Stewart C, Brailsford T and Izquierdo E (2008), "CAE-L: An Ontology Modelling Cultural Behaviour in Adaptive Education", In Semantic Media Adaptation and Personalization (SMAP 2008), Proceedings of the Third International Workshop on. Prague, Czech Republic, December, 2008, pp. 183-188. IEEE.
Abstract: The presentation of learning materials in Adaptive Education Hypermedia is influenced by several factors such as learning style, background knowledge and cultural background, to name a few. In this paper, we introduce the notion of the CAE-L Ontology for modelling stereotype cultural artefacts in adaptive education. The Ontology design is based on the user study gathered from the respondents to the CAE questionnaire which determines the cultural artefacts that influence a learner's behaviour within an educational environment. We present a brief overview of the implementation and discuss the stereotype presentation styles from three different countries, namely China, Ireland and UK.
BibTeX:
@inproceedings{chandramouli2008cae,
  author = {Chandramouli, Krishna and Stewart, Craig and Brailsford, Tim and Izquierdo, Ebroul},
  editor = {Mylonas, Phivos and Wallace, Manolis and Angelides, Marios},
  title = {CAE-L: An Ontology Modelling Cultural Behaviour in Adaptive Education},
  booktitle = {Semantic Media Adaptation and Personalization (SMAP 2008), Proceedings of the Third International Workshop on},
  publisher = {IEEE},
  year = {2008},
  pages = {183--188},
  note = {google scholar entry: Third International Workshop on Semantic Media Adaptation and Personalization (SMAP 2008). Prague, Czech Republic, 15-16 December 2008.},
  url = {http://www.smap2008.org/presentations/SS2/Chandramouli.pdf},
  doi = {10.1109/SMAP.2008.24}
}
Damnjanovic I, Landone C, Reiss J and Izquierdo E (2008), "Enriched Access to Digital Audiovisual Content", In Neural Network Applications in Electrical Engineering (NEUREL 2008), 9th Symposium on. Belgrade, Serbia, September, 2008, pp. 17-20. IEEE.
Abstract: This paper presents access engine to digital audio and related content developed under IST FP6 project EASAIER. The main driving force for the project was the lack of qualitative solutions for access to digital sound archives. An innovative remote access system which extends beyond standard content management and retrieval systems, addresses a range of issues identified, such as inconsistent formats of archived materials with related media often in separate collections and related metadata given in non-standard specialist format, incomplete or even erroneous. The system focuses on sound archives, libraries, museums, broadcast archives, and music schools, but the tools may be used by anyone interested in accessing archived material; amateur or professional, regardless of the material involved. The system functionalities; enhanced cross media retrieval, multi-media synchronisation, audio and video processing, analysis and visualisation tools, enable the user to experiment with the materials in exciting new ways.
BibTeX:
@inproceedings{Damnjanovic2008enriched,
  author = {Damnjanovic, Ivan and Landone, Chris and Reiss, Josh and Izquierdo, Ebroul},
  editor = {Reljin, Branimir and Stankovic, Srdjan},
  title = {Enriched Access to Digital Audiovisual Content},
  booktitle = {Neural Network Applications in Electrical Engineering (NEUREL 2008), 9th Symposium on},
  publisher = {IEEE},
  year = {2008},
  pages = {17--20},
  note = {google scholar entry: 9th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2008). Belgrade, Serbia, 25-27 September 2008.},
  url = {https://www.elec.qmul.ac.uk/people/josh/documents/DamnjanovicReiss-Neurel-Cost2922008.pdf},
  doi = {10.1109/NEUREL.2008.4685549}
}
Damnjanovic U, Fernandez Arguedas V, Izquierdo E and Martinez JM (2008), "Event Detection and Clustering for Surveillance Video Summarization", In Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on. may, 2008, pp. 63-66. IEEE.
Abstract: The target of surveillance summarization is to identify high-value information events in a video stream and to present it to a user. In this paper we present surveillance summarization approach using detection and clustering of important events. Assuming that events are main source of energy change between consecutive frames set of interesting frames is extracted and then clustered. Based on the structure of clusters two types of summaries are created static and dynamic. Static summary is build of key frames that are organized in clusters. Dynamic summary is created from short video segments representing each cluster and is used to lead user to the event of interest captures in key frames. We describe our approach and present experimental results.
BibTeX:
@inproceedings{Damnjanovic2008event,
  author = {Damnjanovic, Uros and Fernandez Arguedas, Virginia and Izquierdo, Ebroul and Martinez, José María},
  title = {Event Detection and Clustering for Surveillance Video Summarization},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on},
  publisher = {IEEE},
  year = {2008},
  pages = {63--66},
  note = {fix google scholar entry: add publication},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4556883},
  doi = {10.1109/WIAMIS.2008.53}
}
Damnjanovic U and Izquierdo E (2008), "Asymmetric and Normalized Cuts for Image Clustering and Segmentation", In Neural Network Applications in Electrical Engineering (NEUREL 2006), Proceedings of the 8th Seminar on. Belgrade, Serbia, September, 2008, pp. 5-9. IEEE.
Abstract: Over the last few years spectral clustering has emerged as a powerful model for data partitioning and segmentation. Spectral clustering techniques use eigenvalues and eigenvectors of the matrix representation of a suitable graph representing the original data. In this paper a new spectral clustering method is proposed: the asymmetric cut. It allows extraction of relevant information from a dataset by making just one cut over the database. The approach is tailored to the image classification task where a given image class is to be extracted from an image database containing an unknown number of classes. The main goal of this paper is to show that the proposed technique outperforms standard spectral methods under given circumstances. The technique is compared against the conventional and well-known normalized cut algorithm.
BibTeX:
@inproceedings{damnjanovic2006asymmetric,
  author = {Damnjanovic, Uros and Izquierdo, Ebroul},
  editor = {Reljin, Branimir and Stanković, Srdjan},
  title = {Asymmetric and Normalized Cuts for Image Clustering and Segmentation},
  booktitle = {Neural Network Applications in Electrical Engineering (NEUREL 2006), Proceedings of the 8th Seminar on},
  publisher = {IEEE},
  year = {2008},
  pages = {5--9},
  note = {google scholar entry: 8th Seminar on Neural Network Applications in Electrical Engineering (NEUREL 2006). Belgrade, Serbia, 25-27 September 2006.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4147151},
  doi = {10.1109/NEUREL.2006.341163}
}
Dong L and Izquierdo E (2008), "Global-to-Local Oriented Rapid Scene Perception", In Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on. May, 2008, pp. 155-158. IEEE.
Abstract: An approach for rapid scene perception from global layout to local features is presented. The representation of a complex scene is initially built from a collection of global features from which properties related to the spatial layout of the scene and its semantic category can be estimated. The rapid perception of natural scenes relies partly on a global estimation of the features contained in the scene. Further analysis on the local essential areas is deployed on the basis. Such kind of integrated model guarantees the interactive processing between local and global features, thus enabling low-level features to initiate scene perception and categorization efficiently.
BibTeX:
@inproceedings{dong2008global,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {Global-to-Local Oriented Rapid Scene Perception},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on},
  publisher = {IEEE},
  year = {2008},
  pages = {155--158},
  note = {google scholar entry: 9th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2008). Klagenfurt, Austria, 7-9 May 2008.},
  url = {http://www.ing.unibs.it/~cost292/pubs/wiamis08/global_to_local.pdf},
  doi = {10.1109/WIAMIS.2008.12}
}
Dong L and Izquierdo E (2008), "Global-to-Local Oriented Perception on Blurry Visual Information", In Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on. San Diego, California, October, 2008, pp. 2168-2171. IEEE.
Abstract: A system for perception on blurry visual information is described and validated. Essential capture in global pathway and saliency highlight in local pathway are integrated for low-quality visual information applications. The system can differentiate scenes from various semantic meanings using a spatial layout of context information, which capture the "essential" of the scene. The system can even further discriminate the contents contained in the scene via local highlight. Distinct from previous frameworks, the system presents the entire scheme of being biologically plausible and application efficiency, thus offering a straightforward platform for rapid analysis and interpretation on blurry visual information.
BibTeX:
@inproceedings{dong2008global2,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {Global-to-Local Oriented Perception on Blurry Visual Information},
  booktitle = {Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {2168--2171},
  note = {google scholar entry: 15th International Conference on Image Processing (ICIP 2008). San Diego, California, 12-15 October 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4712218},
  doi = {10.1109/ICIP.2008.4712218}
}
Dong L and Izquierdo E (2008), "Hypersphere Topology Creation for Image Classification", In Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008). Lausanne, Switzerland, August, 2008, pp. 1-5. European Association for Signal Processing (EURASIP).
Abstract: A kind of topology creation strategy for image analysis and classification is presented. The topology creation strategy automatically generates a relevance map from essential regions of natural images. It also derives a set of wellstructured representations from low-level description to drive the final classification. The backbone of the topology creation strategy is a distribution mapping rule involving two basic modules: structured low-level feature extraction using convolution neural network and a topology creation module based on a hypersphere neural network. Classification is achieved by simulating high-level top-down visual information perception and classifying using an incremental Bayesian parameter estimation method. The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.
BibTeX:
@inproceedings{dong2008hypersphere,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {Hypersphere Topology Creation for Image Classification},
  booktitle = {Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008)},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2008},
  pages = {1--5},
  note = {google scholar entry: 16th European Signal Processing Conference (EUSIPCO-2008). Lausanne, Switzerland, 25-29 August 2008.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2008/program.html}
}
Dong L and Izquierdo E (2008), "Scene Classification of Ambiguous Visual Information", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, July, 2008, pp. 699-704. IET.
Abstract: A framework for scene classification of ambiguous visual information is described and validated. Context-based scene classification algorithm is used for ambiguous visual information applications. The system can differentiate ambiguous visual information from various semantic meanings using a spatial layout of context information, which capture the ``essential'' of the scene. Distinct from previous frameworks, the system presents the entire scheme of being biologically plausible and application efficiency, offering a straightforward platform for rapid analysis and interpretation on ambiguous visual information demonstrating generalization and scalability of the approach.
BibTeX:
@inproceedings{dong2008scene,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {Scene Classification of Ambiguous Visual Information},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  publisher = {IET},
  year = {2008},
  pages = {699--704},
  note = {google scholar entry: 5th International Conference on Visual Information Engineering (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4743511},
  doi = {10.1049/cp:20080403}
}
Dong L and Izquierdo E (2008), "A topology synthesizing approach for classification of visual information", In Proceedings of the 6th International Workshop on Content-based Multimedia Indexing (CBMI 2008). London, England, June, 2008, pp. 373-380. IEEE.
Abstract: A system for classification of visual information based on a topology synthesizing approach is presented. The topology synthesizing approach automatically creates a relevance map from essential regions of visual information. It also derives a set of well-organized representations from low-level description to drive the final classification. The backbone of the topology synthesizing approach is a mapping strategy involving two basic modules: structured low-level feature extraction using convolution neural network and a topology representation module based on a self-organizing tree algorithm. Classification is achieved by simulating high-level top-down visual information perception and classifying using an incremental Bayesian parameter estimation method. The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.
BibTeX:
@inproceedings{dong2008topology,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {A topology synthesizing approach for classification of visual information},
  booktitle = {Proceedings of the 6th International Workshop on Content-based Multimedia Indexing (CBMI 2008)},
  publisher = {IEEE},
  year = {2008},
  pages = {373--380},
  note = {google scholar entry: 6th International Workshop on Content-Based Multimedia Indexing (CBMI 2008). London, England, 18-20 June 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4564971},
  doi = {10.1109/CBMI.2008.4564971}
}
Dumont E, Mérialdo B, Essid S, Bailer W, Rehatschek H, Byrne D, Bredin H, O'Connor NE, Jones GJ, Smeaton AF, Haller M, Krutz A, Sikora T and Piatrik T (2008), "Rushes video summarization using a collaborative approach", In Proceedings of the 2nd ACM TRECVid Video Summarization Workshop. Vancouver, British Columbia , pp. 90-94. ACM.
Abstract: This paper describes the video summarization system developed by the partners of the K-Space European Network of Excellence for the TRECVID 2008 BBC rushes summarization evaluation. We propose an original method based on individual content segmentation and selection tools in a collaborative system. Our system is organized in several steps. First, we segment the video, secondly we identify relevant and redundant segments, and finally, we select a subset of segments to concatenate and build the final summary with video acceleration incorporated. We analyze the performance of our system through the TRECVID evaluation.
BibTeX:
@inproceedings{dumont2008rushes,
  author = { Dumont, Emilie and 
Mérialdo, Bernard and
Essid, Slim and
Bailer, Werner and
Rehatschek, Herwig and
Byrne, Daragh and
Bredin, Hervé and
O'Connor, Noel E. and
Jones, Gareth J.F and
Smeaton, Alan F. and
Haller, Martin and
Krutz, Andreas and
Sikora, Thomas and
Piatrik, Tomas
}, title = {Rushes video summarization using a collaborative approach}, booktitle = {Proceedings of the 2nd ACM TRECVid Video Summarization Workshop}, publisher = {ACM}, year = {2008}, pages = {90--94}, note = {google scholar entry: 2nd ACM Workshop on Video Summarization (TVS 2008). Vancouver, British Columbia, 31 October 2008}, url = {http://doras.dcu.ie/16186/1/Rushes_Video_Summarization_Using_a_Collaborative_Approach.pdf}, doi = {10.1145/1463563.1463579} }
Janjusevic T and Izquierdo E (2008), "Layout Methods for Intuitive Partitioning of Visualization Space", In Information Visualization (IV 2008), Proceedings of the 12th IEEE International Conference on. London, England, July, 2008, pp. 88-93. IEEE.
Abstract: In this paper we address two relevant tasks in image visualisation research: layout methods for presenting content and relations within image databases; and optimal solutions for efficient use of the entire display space. We introduce a novel approach to enable users searching on large image archives to distinguish heterogeneous sets of images. Thus, helping them to navigate or browse image databases according to relevant query directions. Two methods for mapping similarity relations between images and cognitive partitioning of the display space are presented.
BibTeX:
@inproceedings{janjusevic2008layout,
  author = {Janjusevic, Tijana and Izquierdo, Ebroul},
  editor = {Banissi, Ebad and Stuart, Liz and Jern, Mikael and Andrienko, Gennady and Marchese, Francis T. and Memon, Nasrullah and Alhajj, Reda and Wyeld, Theodor G. and Burkhard, Remo Aslak and Grinstein, Georges and Groth, Dennis and Ursyn, Anna and Maple, Carsten and Faiola, Anthony and Craft, Brock},
  title = {Layout Methods for Intuitive Partitioning of Visualization Space},
  booktitle = {Information Visualization (IV 2008), Proceedings of the 12th IEEE International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {88-93},
  note = {google scholar entry: 12th IEEE International Conference on Information Visualization (IV 2008). London, England, 9-11 July 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4577931},
  doi = {10.1109/IV.2008.55}
}
Kitanovski V, Kseneman M, Gleich DA and Taskovski D (2008), "Adaptive Lifting Integer Wavelet Transform for Lossless Image Compression", In Systems, Signals and Image Processing (IWSSIP 2008), Proceedings of the 15th International Conference on. Bratislava, Slovak Republic, June, 2008, pp. 105-108. IEEE.
Abstract: This paper presents an adaptive lifting scheme for integer-to-integer wavelet transform, and its performance on lossless compression of digital images. We optimize the coefficients of the predict filter to minimize the predictor error variance for every image. The optimized coefficient depends on the variance-normalized autocorrelation function of the image. The proposed lifting scheme adapts not only to every image but also to its horizontal and vertical directions. Experimental results are obtained using different types of images. These results show that the proposed method is competitive to few well-known methods for lossless image compression, in terms of compression ratio and computational efficiency.
BibTeX:
@inproceedings{Kitanovski2008adaptive,
  author = {Kitanovski, Vlado and Kseneman, Matej and Gleich, Du An and Taskovski, Dimitar},
  editor = {Rozinaj, Gregor and Čepko, Jozef and Trúchly, Peter and Vrabec, Ján and Vojtko, Juraj},
  title = {Adaptive Lifting Integer Wavelet Transform for Lossless Image Compression},
  booktitle = {Systems, Signals and Image Processing (IWSSIP 2008), Proceedings of the 15th International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {105--108},
  note = {google scholar entry: 15th International Conference on Systems, Signals and Image Processing (IWSSIP 2008). Bratislava, Slovak Republic, 25-28 June 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4604378},
  doi = {10.1109/IWSSIP.2008.4604378}
}
Kitanovski V, Taskovski D, Gleich DA and Planinsic P (2008), "Optimization and Implementation of Integer Lifting Scheme for Lossless Image Coding", In Signal Processing and Information Technology (ISSPIT 2008), 8th IEEE International Symposium on. Sarajevo, Bosnia & Herzegovina, December, 2008, pp. 170-174. IEEE.
Abstract: This paper presents an adaptive lifting scheme, which performs integer-to-integer wavelet transform, for lossless image compression. We optimize the coefficients of the predict filter in the lifting scheme to minimize the predictor's error variance. The optimized coefficients depend on the autocorrelation structure of the image. The presented lifting scheme adapts not only to every component of the color image, but also to its horizontal and vertical directions. We implement this lifting scheme on the fixed-point TMS320C6416 DSK evaluation board. We obtain experimental results using different types of images, as well as using images captured by camera in a real-time application. These results show that the presented method is competitive to few well-known methods for lossless image compression.
BibTeX:
@inproceedings{kitanovski2008optimization,
  author = {Kitanovski, Vlado and Taskovski, Dimitar and Gleich, Du An and Planinsic, Peter},
  title = {Optimization and Implementation of Integer Lifting Scheme for Lossless Image Coding},
  booktitle = {Signal Processing and Information Technology (ISSPIT 2008), 8th IEEE International Symposium on},
  publisher = {IEEE},
  year = {2008},
  pages = {170--174},
  note = {google scholar entry: 8th IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2010). Bosnia & Herzegovina, 16-19 December 2010.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4775720},
  doi = {10.1109/ISSPIT.2008.4775720}
}
Kitanovski V, Taskovski D and Panovski L (2008), "Multi-scale Edge Detection Using Undecimated Wavelet Transform", In Signal Processing and Information Technology (ISSPIT 2008), 8th IEEE International Symposium on. Sarajevo, Bosnia & Herzegovina, December, 2008, pp. 385-389. IEEE.
Abstract: This paper presents multi-scale edge detection method using an undecimated Haar wavelet transform. The use of undecimated transform improves the localization of detected edges when compared to the classical, decimated Haar wavelet transform. The presented method tracks for edges that exist at several dyadic scales, favoring edges at larger scales. Edge points are obtained by non-maximum suppression in four possible directions, combined with hysteresis thresholding. The experimental results show that this method is competitive to classical edge detection methods. This multi-scale approach brings robustness to noise, while the redundancy from the undecimated transform ensures good edge localization.
BibTeX:
@inproceedings{kitanovski2008multi,
  author = {Kitanovski, Vlado and Taskovski, Dimitar and Panovski, Ljupcho},
  title = {Multi-scale Edge Detection Using Undecimated Wavelet Transform},
  booktitle = {Signal Processing and Information Technology (ISSPIT 2008), 8th IEEE International Symposium on},
  publisher = {IEEE},
  year = {2008},
  pages = {385--389},
  note = {google scholar entry: 8th IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2010). Bosnia & Herzegovina, 16-19 December 2010.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4775721},
  doi = {10.1109/ISSPIT.2008.4775721}
}
Kliegr T, Chandramouli K, Nemrava J, Svatek V and Izquierdo E (2008), "Combining Image Captions and Visual Analysis for Image Concept Classification", In MDM '08: Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008. Las Vegas, Nevada, August, 2008, pp. 8-17. ACM.
Abstract: We present a framework for efficiently exploiting free-text annotations as a complementary resource to image classification. A novel approach called Semantic Concept Mapping (SCM) is used to classify entities occurring in the text to a custom-defined set of concepts. SCM performs unsupervised classification by exploiting the relations between common entities codified in the Wordnet thesaurus. SCM exploits Targeted Hypernym Discovery (THD) to map unknown entities extracted from the text to concepts in Wordnet. We show how the result of SCM/THD can be fused with the outcome of Knowledge Assisted Image Analysis (KAA), a classification algorithm that extracts and labels multiple segments from an image. In the experimental evaluation, THD achieved an accuracy of 75 and SCM an accuracy of 52 In one of the first experiments with fusing the results of a free-text and image-content classifier, SCM/THD + KAA achieved a relative improvement of 49% and 31% over the text-only and image-content-only baselines.
BibTeX:
@inproceedings{kliegr2008combining,
  author = {Kliegr, Tomas and Chandramouli, Krishna and Nemrava, Jan and Svatek, Vojtech and Izquierdo, Ebroul},
  title = {Combining Image Captions and Visual Analysis for Image Concept Classification},
  booktitle = {MDM '08: Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008},
  publisher = {ACM},
  year = {2008},
  pages = {8--17},
  note = {google scholar entry: 9th International Workshop on Multimedia Data Mining (MDM '08)[held in conjunction with the 14th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2008)]. Las Vegas, NV, 24-27 August 2008.},
  url = {http://nb.vse.cz/~svatek/mdm08.pdf},
  doi = {10.1145/1509212.1509214}
}
Kliegr T, Chandramouli K, Nemrava J, Svatek V and Izquierdo E (2008), "Wikipedia as the Premiere Source for Targeted Hypernym Discovery", In Proceedings of the Wikis, Blogs, Bookmarking Tools - Mining the Web 2.0 Workshop (WBBTMine 08). Antwerp, Belgium, September, 2008, pp. 1-8. Universität Kassel.
Abstract: Targeted Hypernym Discovery (THD) applies lexico-syntactic (Hearst) patterns on a suitable corpus with the intent to extract one hy- pernym at a time. Using Wikipedia as the corpus in THD has recently yielded promising results in a number of tasks. We investigate the rea- sons that make Wikipedia articles such an easy target for lexicosyntactic patterns, and suggest that it is primarily the adherence of its contrib- utors to Wikipedia's Manual of Style. We propose the hypothesis that extractable patterns are more likely to appear in articles covering popu- lar topics, since these receive more attention including the adherence to the rules from the manual. However, two preliminary experiments carried out with 131 and 100 Wikipedia articles do not support this hypothesis.
BibTeX:
@inproceedings{kliegr2008wikipedia,
  author = {Kliegr, Tomas and Chandramouli, Krishna and Nemrava, Jan and Svatek, Vojtech and Izquierdo, Ebroul},
  title = {Wikipedia as the Premiere Source for Targeted Hypernym Discovery},
  booktitle = {Proceedings of the Wikis, Blogs, Bookmarking Tools - Mining the Web 2.0 Workshop (WBBTMine 08)},
  publisher = {Universität Kassel},
  year = {2008},
  pages = {1--8},
  note = {google scholar entry: Wikis, Blogs, Bookmarking Tools - Mining the Web 2.0 Workshop (WBBTMine 08). Antwerp, Belgium, 15 September 2008.},
  url = {http://nb.vse.cz/~svatek/wbbt08.pdf}
}
Kseneman M, Gleich D, Planinsic P, Kitanovski V and Taskovski D (2008), "Comparison between different lifting scheme algorithms", In Systems, Signals and Image Processing (IWSSIP 2008), Proceedings of the 15th International Conference on. Bratislava, Slovak Republic, June, 2008, pp. 327-330. IEEE.
Abstract: This paper presents three types of lifting schemes, which includes normal lifting scheme, integer lifting scheme and optimization of integer wavelet transforms based on lifting scheme. Paper also gives a comparison between these types of lifting schemes and their algorithms. To determine the quality of reconstructed images the PSNR ratio is introduced.
BibTeX:
@inproceedings{kseneman2008comparison,
  author = {Kseneman, M. and Gleich, D. and Planinsic, P. and Kitanovski, V. and Taskovski, D.},
  editor = {Rozinaj, Gregor and Čepko, Jozef and Trúchly, Peter and Vrabec, Ján and Vojtko, Juraj},
  title = {Comparison between different lifting scheme algorithms},
  booktitle = {Systems, Signals and Image Processing (IWSSIP 2008), Proceedings of the 15th International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {327--330},
  note = {google scholar entry: 15th International Conference on Systems, Signals and Image Processing (IWSSIP 2008). Bratislava, Slovak Republic, 25-28 June 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4604433},
  doi = {10.1109/IWSSIP.2008.4604433}
}
Kumar BGV and Aravind R (2008), "A 2D model for face superresolution", In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on. Tampa, Florida , pp. 535-538. IEEE.
Abstract: Traditional face superresolution methods treat face images as 1D vectors and apply PCA on the set of these 1D vectors to learn the face subspace. Zhang et al [7] proposed Two-directional two-dimensional PCA (2D)$^2$-PCA for efficient face representation and recognition where images are treated as matrices instead of vectors. In this paper, we present a two-step algorithm for face superresolution. In first step, we propose a 2D-framework for face superresolution where the face image is treated as a matrix. (2D)$^2$-PCA is used for learning face subspace and a MAP estimator is used to obtain the global high resolution image from the given low resolution image. To enhance the quality of the image further, we propose a method which uses Kernel Ridge Regression to learn the high frequency component relation between low and high resolution patches of the image. Experimental results show that our approach can reconstruct high quality face images.
BibTeX:
@inproceedings{kumar20082d,
  author = {Kumar, B. G. Vijay and Aravind, Rangarajan},
  title = {A 2D model for face superresolution},
  booktitle = {Pattern Recognition, 2008. ICPR 2008. 19th International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {535--538},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4761072},
  doi = {10.1109/ICPR.2008.4761072}
}
Kumar BGV and Aravind R (2008), "Face hallucination using OLPP and Kernel ridge regression", In Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on. San Diego, California, october, 2008, pp. 353-356. IEEE.
Abstract: Generally face images may be visualized as points drawn on a low-dimensional manifold embedded in high-dimensional ambient space. Many dimensionality reduction techniques have been used to learn this manifold. Orthogonal locality preserving projection (OLPP) is one among them which aims to discover the local structure of the manifold and produces orthogonal basis functions. In this paper, we present a two-step patch based algorithm for face superresolution. In first step a MAP based framework is used to obtain high resolution patch from its low resolution counterpart where the face subspace is learnt using OLPP. To enhance the quality of the image further, we propose a method which uses kernel ridge regression to learn the relation between low and high resolution residual patches. Experimental results show that our approach can reconstruct high quality face images.
BibTeX:
@inproceedings{kumar2008face,
  author = {Kumar, B. G. Vijay and Aravind, Rangarajan},
  title = {Face hallucination using OLPP and Kernel ridge regression},
  booktitle = {Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {353--356},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4711764},
  doi = {10.1109/ICIP.2008.4711764}
}
Papadopoulos GT, Chandramouli K, Mezaris V, Kompatsiaris I, Izquierdo E and Strintzis MG (2008), "A Comparative Study of Classification Techniques for Knowledge-Assisted Image Analysis", In Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on. May, 2008, pp. 4-7. IEEE.
Abstract: In this paper, four individual approaches to region classification for knowledge-assisted semantic image analysis are presented and comparatively evaluated. All of the examined approaches realize knowledge-assisted analysis via implicit knowledge acquisition, i.e. are based on machine learning techniques such as support vector machines (SVMs), self organizing maps (SOMs), genetic algorithm (GA)and particle swarm optimization (PSO). Under all examined approaches, each image is initially segmented and suitable low-level descriptors are extracted for every resulting segment. Then, each of the aforementioned classifiers is applied to associate every region with a predefined high-level semantic concept. An appropriate evaluation framework has been employed for the comparative evaluation of the above algorithms under varying experimental conditions.
BibTeX:
@inproceedings{papadopoulos2008comparative,
  author = {Papadopoulos, Georgios Th. and Chandramouli, Krishna and Mezaris, Vasileios and Kompatsiaris, Ioannis and Izquierdo, Ebroul and Strintzis, Michael G.},
  title = {A Comparative Study of Classification Techniques for Knowledge-Assisted Image Analysis},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on},
  publisher = {IEEE},
  year = {2008},
  pages = {4--7},
  note = {google scholar entry: 9th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2008). Klagenfurt, Austria, 7-9 May 2008.},
  url = {http://mklab.iti.gr/mklab_people/~papad/documents/WIAMIS08.pdf},
  doi = {10.1109/WIAMIS.2008.36}
}
Passino G, Patras I and Izquierdo E (2008), "Aspect Coherence for Graph-Based Image Labelling", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, July, 2008, pp. 94-99. IET.
Abstract: Semantic image labelling is the task of assigning each pixel of an image to a semantic category. To this end, in low-level image labelling, a labelled training set is available. In such a situation, structural information about the correlation between different image parts is particularly important. When a part-based inference algorithm is used to perform the association of semantic classes to pixels, however, a good choice on how to use structural information is crucial for learning an efficient and generalisable probabilistic model for the labelling task. In this paper we introduce an efficient way to take into account correlation between different image parts, embedding the parts relationships in a graph built according to aspect coherence of neighbouring image patches.
BibTeX:
@inproceedings{passino2008aspect,
  author = {Passino, Giuseppe and Patras, Ioannis and Izquierdo, Ebroul},
  title = {Aspect Coherence for Graph-Based Image Labelling},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  publisher = {IET},
  year = {2008},
  pages = {94--99},
  note = {google scholar entry: 5th International Conference on Visual Information Engineering (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://sites.google.com/site/zeppethefake/publications/vie08.pdf},
  doi = {10.1049/cp:20080290}
}
Passino G, Patras I and Izquierdo E (2008), "On the Role of Structure in Part-based Object Detection", In Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on. San Diego, California, October, 2008, pp. 65-68. IEEE.
Abstract: Part-based approaches in image analysis aim at exploiting the considerable discriminative power embedded in relations among image parts. Nonetheless, learning structural information is not always possible without the availability of a training set of classified parts, and taking into account this additional information can even degrade the performance of the system. In this paper, a discriminative graphical model for object detection is introduced and used in order to analyse and report results on the role of structural information in image classification tasks.
BibTeX:
@inproceedings{passino2008role,
  author = {Passino, Giuseppe and Patras, Ioannis and Izquierdo, Ebroul},
  title = {On the Role of Structure in Part-based Object Detection},
  booktitle = {Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {65--68},
  note = {google scholar entry: 15th International Conference on Image Processing (ICIP 2008). San Diego, California, 12-15 October 2008.},
  url = {http://sites.google.com/site/zeppethefake/publications/icip08.pdf},
  doi = {10.1109/ICIP.2008.4711692}
}
Piatrik T and Izquierdo E (2008), "An Application of Ant Colony Optimisation to Image Clustering", In PhDJamboree08: Proceedings of the 2nd K-Space PhD Jamboree Workshop. Paris, France, July, 2008, pp. 1-2. CEUR-WS.
Abstract: Content-based image retrieval can be dramatically improved by providing a good initial clustering of visual data. The problem of image clustering is that most current algorithms are not able to identify individual clusters that exist in different feature subspaces. In this paper, we propose a novel approach for subspace clustering based on Ant Colony Optimization and its learning mechanism. The proposed algorithm breaks the assumption that all of the clusters in a dataset are found in the same set of dimensions by assigning weights to features according to the local correlations of data along each dimension. Experiment results on real image datasets show the need for feature selection in clustering and the benefits of selecting features locally.
BibTeX:
@inproceedings{piatrik2008application,
  author = {Piatrik, Tomas and Izquierdo, Ebroul},
  editor = {De Simone, Francesca and Nemrava, Jan and Bailer, Werner},
  title = {An Application of Ant Colony Optimisation to Image Clustering},
  booktitle = {PhDJamboree08: Proceedings of the 2nd K-Space PhD Jamboree Workshop},
  publisher = {CEUR-WS},
  year = {2008},
  pages = {1--2},
  note = {google scholar entry: 2nd K-Space PhD Jamboree Workshop (PhDJamboree 2008). Paris, France, 25 July 2008.},
  url = {http://ceur-ws.org/Vol-379/}
}
Praks P, Grzegorzek M, Moravec R, Válek L and Izquierdo E (2008), "Wavelet and Eigen-Space Feature Extraction for Classification of Metallography Images", In Information Modelling and Knowledge Bases XIX (EJC 2007), 17th European-Japanese Conference on. Pori, Finland, June, 2008. Vol. 166, pp. 190-199. IOS Press.
Abstract: In this contribution a comparison of two approaches for classification of metallography images from the steel plant of Mittal Steel Ostrava plc (Ostrava, Czech Republic) is presented. The aim of the classification is to monitor the process quality in the steel plant. The first classifier represents images by feature vectors extracted using the wavelet transformation, while the feature computation in the second approach is based on the eigen-space analysis. Experiments made for real metallography data indicate feasibility of both methods for automatic image classification in hard industry environment.
BibTeX:
@inproceedings{praks2007wavelet,
  author = {Praks, Pavel and Grzegorzek, Marcin and Moravec, Rudolf and Válek, Ladislav and Izquierdo, Ebroul},
  editor = {Jaakkola, Hannu and Kiyoki, Yasushi and Tokuda, Takahiro},
  title = {Wavelet and Eigen-Space Feature Extraction for Classification of Metallography Images},
  booktitle = {Information Modelling and Knowledge Bases XIX (EJC 2007), 17th European-Japanese Conference on},
  publisher = {IOS Press},
  year = {2008},
  volume = {166},
  pages = {190--199},
  note = {google scholar entry: 17th European-Japanese Conference on Information Modelling and Knowledge Bases XIX (EJC 2007). Yyteri, Pori, Finland, 4-7 June 2007.},
  url = {http://sites.google.com/site/pavelpraks/PraksIOSPress08.pdf}
}
Praks P, Kučera R and Izquierdo E (2008), "The sparse image representation for automated image retrieval", In Image Processing (ICIP 2008), Proceedings of the 15th International Conference on. San Diego, California, October, 2008, pp. 25-28. IEEE.
Abstract: We describe a novel sparse image representation for full automated content-based image retrieval using the latent semantic indexing (LSI) approach and also a novel statistical-based model for the efficient dimensional reduction of sparse data. Although images can be represented sparsely for instance by the discrete cosine transform (DCT) coefficients, this sparsity character is destroyed during the LSI-based dimension reduction process. In our approach, we keep the memory limit of the decomposed data by a statistical model of the sparse data. The aim is to find a small but "important" sub-set of coefficients, which represent semantics of images efficiently. The effectiveness of our novel approach is demonstrated by the large scale image similarity task of the NIST TrecVid 2007 benchmark.
BibTeX:
@inproceedings{praks2008sparse,
  author = {Praks, Pavel and Kučera, Radek and Izquierdo, Ebroul},
  title = {The sparse image representation for automated image retrieval},
  booktitle = {Image Processing (ICIP 2008), Proceedings of the 15th International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {25--28},
  note = {google scholar entry: International Conference on Image Processing (ICIP 2008). San Diego, California, 12-15 October 2008},
  doi = {10.1109/ICIP.2008.4711682}
}
Ramzan N and Izquierdo E (2008), "Efficient Scalable Video Transmission Based on Two-dimensional Error Protection Scheme", In Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on. San Diego, California, October, 2008, pp. 3084-3087. IEEE.
Abstract: An efficient approach for transmission of scalable video over erroneous channel is proposed. The proposed approach jointly optimises the bit allocation between a wavelet-based scalable video coding framework and a forward error correction codes. The forward error correction is based on the two-dimensional error protection scheme which applies unequal error protection in temporal-spatial layers as well as in quality layers. The scheme minimizes the reconstructed video distortion at the decoder subject to a constraint on the overall transmission bit-rate budget with limited complexity. This minimization is achieved by exploiting the combined scalability in temporal-spatial and quality domain. Experimental results clearly demonstrate the superiority of the proposed approach over conventional forward error correction techniques. It also significantly improves the performance of end to end scalable video transmission at all channel bit rates.
BibTeX:
@inproceedings{ramzan2008efficient,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Efficient Scalable Video Transmission Based on Two-dimensional Error Protection Scheme},
  booktitle = {Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {3084--3087},
  note = {google scholar entry: 15th International Conference on Image Processing (ICIP 2008). San Diego, California, 12-15 October 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4712447},
  doi = {10.1109/ICIP.2008.4712447}
}
Schreer O, Fuentes Ardeo L, Sotiriou D, Sadka AH and Izquierdo E (2008), "User Requirements for Multimedia Indexing and Retrieval of Unedited Audio-Visual Footage - RUSHES", In Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on. Klagenfurt, Austria, May, 2008, pp. 76-79. IEEE.
Abstract: Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. As a first result of this project, the user requirements and possible user-scenarios are presented in this paper. These results lay down the foundation for the research and development of a multimedia search engine particularly dedicated to the specific needs of the users and the content.
BibTeX:
@inproceedings{schreer2008user,
  author = {Schreer, Oliver and Fuentes Ardeo, Leticia and Sotiriou, Dimitrious and Sadka, Abdul H. and Izquierdo, Ebroul},
  title = {User Requirements for Multimedia Indexing and Retrieval of Unedited Audio-Visual Footage - RUSHES},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Proceedings of the 9th International Workshop on},
  publisher = {IEEE},
  year = {2008},
  pages = {76--79},
  note = {google scholar entry: 9th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2008). Klagenfurt, Austria, 7-9 May 2008.},
  url = {http://v-scheiner.brunel.ac.uk/bitstream/2438/2408/1/User%20Requirements%20for%20Multimedia%20Indexing%20and%20Retrieval%20of.pdf},
  doi = {10.1109/WIAMIS.2008.14}
}
Seneviratne L and Izquierdo E (2008), "Image Annotation Through Gaming", In "PhDJamboree08": Proceedings of the 2nd K-Space PhD Jamboree Workshop. Paris, France, July, 2008, pp. 1-2. CEUR-WS.
Abstract: We introduce an interactive framework for image understanding, a game that is enjoyable and provide valuable image annotations. When people play the game, they provide useful information about contents of an image. In reality the most accurate method to describe the content of an image is manual labelling. Our approach is to motivate people to label imagers while entertaining themselves. Therefore if this game becomes popular it will be able to annotate most imagers on the web within a couple of months. When considering accuracy we use a combination of computer vision techniques to secure the accuracy of image labelling. By doing this we believe our system will make a significant contribution to address the semantic gap in the computer vision sector.
BibTeX:
@inproceedings{seneviratne2008image,
  author = {Seneviratne, Lasantha and Izquierdo, Ebroul},
  editor = {De Simone, Francesca and Nemrava, Jan and Bailer, Werner},
  title = {Image Annotation Through Gaming},
  booktitle = {"PhDJamboree08": Proceedings of the 2nd K-Space PhD Jamboree Workshop},
  publisher = {CEUR-WS},
  year = {2008},
  pages = {1--2},
  note = {google scholar entry: 2nd K-Space PhD Jamboree Workshop (PhDJamboree 2008). Paris, France, 25 July 2008.},
  url = {http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-379/paper3.pdf}
}
Stewart C, Chandramouli K, Cristea A, Brailsford T and Izquierdo E (2008), "Cultural Artefacts in Education: Analysis, Ontologies and Implementation", In Computer Science and Software Engineering (CSSE 2008), Proceedings of the International Conference on. Wuhan, China, December, 2008. Vol. 5, pp. 706-709.
Abstract: Adaptive Web technologies are often used in distance learning scenarios with little regard for learnerspsila cultural background. The CAE questionnaire determines the cultural artefacts that influence a learners behaviour within an educational environment. This paper presents the CAE-L cultural ontology along with an analysis of three countries (China, Ireland and UK). This ontology is instantiated to determine their dasiacultural stereotypepsila which can then be used to define a layer of adaptation that traditional systems ignore.
BibTeX:
@inproceedings{stewart2008cultural,
  author = {Stewart, Craig and Chandramouli, Krishna and Cristea, Alexandra and Brailsford, Tim and Izquierdo, Ebroul},
  title = {Cultural Artefacts in Education: Analysis, Ontologies and Implementation},
  booktitle = {Computer Science and Software Engineering (CSSE 2008), Proceedings of the International Conference on},
  year = {2008},
  volume = {5},
  pages = {706--709},
  note = {google scholar entry: International Conference on Computer Science and Software Engineering (CSSE 2008). Wuhan, China. 12-14 December 2008.},
  url = {http://eprints.dcs.warwick.ac.uk/150/1/04723000.pdf},
  doi = {10.1109/CSSE.2008.393}
}
Svatek V, Berka P, Nemrava J, Petrák J, Praks P, Vacura M, Izquierdo E and Stewart C (2008), "The K-Space Network of Excellence: on the Way to `Semantic' Multimedia", In 7th annual Czecho-Slovak Knowledge Technology Conference, Proceeding of the. February, 2008, pp. 421-425.
Abstract: The K-Space project is an EU FP6 IST Network of Excellence aiming at bringing together research teams from the multimedia and semantics domains. There are fourteen partners, one of them being University of Economics, Prague. Project activities involved shared research as well as dissemination activities.
BibTeX:
@inproceedings{svatek2008k,
  author = {Svatek, Vojtech and Berka, Petr and Nemrava, Jan and Petrák, Josef and Praks, Praks and Vacura, Miroslav and Izquierdo, Ebroul and Stewart, Craig},
  title = {The K-Space Network of Excellence: on the Way to `Semantic' Multimedia},
  booktitle = {7th annual Czecho-Slovak Knowledge Technology Conference, Proceeding of the},
  year = {2008},
  pages = {421--425},
  note = {google scholar entry: 7th annual Czecho-Slovak Knowledge Technology Conference. Bratislava, Slovakia, 13-15 February 2008.},
  url = {http://znalosti2008.fiit.stuba.sk/download/articles/znalosti2008-Berka.pdf}
}
Vincelette R, Tameze C, Zeljkovic V and Izquierdo E (2008), "Noise Removal from Polygonal Shapes Using Combined Inverse Diffusion Filter and Triangle Method", In Proceedings of the 6th International Workshop on Content-based Multimedia Indexing (CBMI 2008). London, England, June, 2008, pp. 551-555. IEEE.
Abstract: We introduce a method for de-noising, segmenting and measuring similarity of shapes. The proposed method consists of two techniques. First we use an inverse diffusion filter for enhancement of relevant polygonal vertices and removal of noise and irrelevant edges. Second we apply a technique that removes the vertices that form the triangles with the smallest areas. In a thorough experimental evaluation, the combined method shows successful noise suppression while preserving dominant vertices present in the input shape.
BibTeX:
@inproceedings{vincelette2008noise,
  author = {Vincelette, Robert and Tameze, Claude and Zeljkovic, Vesna and Izquierdo, Ebroul},
  title = {Noise Removal from Polygonal Shapes Using Combined Inverse Diffusion Filter and Triangle Method},
  booktitle = {Proceedings of the 6th International Workshop on Content-based Multimedia Indexing (CBMI 2008)},
  publisher = {IEEE},
  year = {2008},
  pages = {551--555},
  note = {google scholar entry: 6th International Workshop on Content-Based Multimedia Indexing (CBMI 2008). London, England, 18-20 June 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4564995},
  doi = {10.1109/CBMI.2008.4564995}
}
Wall J, McDaid LJ, Maguire LP and McGinnity TM (2008), "Spiking Neuron Models of the Medial and Lateral Superior Olive for Sound Localisation ", In International Joint Conference on Neural Networks (IJCNN 2008)[IEEE World Congress on Computational Intelligence (WCCI 2008)], Proceedings of the 2008 IEEE. June, 2008, pp. 2641-2647. IEEE.
Abstract: Sound localisation is defined as the ability to identify the position of a sound source. The brain employs two cues to achieve this functionality for the horizontal plane, interaural time difference (ITD) by means of neurons in the medial superior olive (MSO) and interaural intensity difference (IID) by neurons of the lateral superior olive (LSO), both located in the superior olivary complex of the auditory pathway. This paper presents spiking neuron architectures of the MSO and LSO. An implementation of the Jeffress model using spiking neurons is presented as a representation of the MSO, while a spiking neuron architecture showing how neurons of the medial nucleus of the trapezoid body interact with LSO neurons to determine the azimuthal angle is discussed. Experimental results to support this work are presented.
BibTeX:
@inproceedings{wall2008spiking,
  author = {Wall, Julie and McDaid, Liam J. and Maguire, Liam P. and McGinnity, Thomas M.},
  title = {Spiking Neuron Models of the Medial and Lateral Superior Olive for Sound Localisation },
  booktitle = {International Joint Conference on Neural Networks (IJCNN 2008)[IEEE World Congress on Computational Intelligence (WCCI 2008)], Proceedings of the 2008 IEEE},
  publisher = {IEEE},
  year = {2008},
  pages = {2641--2647},
  note = {google scholar entry: 2008 IEEE International Joint Conference on Neural Networks (IJCNN 2008)[IEEE World Congress on Computational Intelligence (WCCI 2008)]. Hong Kong, China, 1-6 June 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4634168},
  doi = {10.1109/IJCNN.2008.4634168}
}
Wan S, Yang F, He Mingyi and Izquierdo E (2008), "Rate Distortion Optimised Motion Estimation based on a General Framework", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, July, 2008, pp. 83-87. IET.
Abstract: In this paper, the relationship between the traditional motion estimation method and the general rate distortion framework is analysed. Based on the theoretical analysis, a new rate distortion optimised motion estimation method is proposed under a general rate distortion framework. The proposed method improves the rate distortion performance of video codecs, while the increase in computational complexity remains negligible. Extensive experimental results confirm the effectiveness of the proposed motion estimation method.
BibTeX:
@inproceedings{wan2008rate,
  author = {Wan, Shuai and Yang, Fuzheng and He, Mingyi, and Izquierdo, Ebroul},
  title = {Rate Distortion Optimised Motion Estimation based on a General Framework},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  publisher = {IET},
  year = {2008},
  pages = {83--87},
  note = {google scholar entry: 5th International Conference on Visual Information Engineering (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4743396},
  doi = {10.1049/cp:20080288}
}
Wilkins P, Byrne D, Jones GJF, Lee H, Keenan G, McGuinness K, O'Connor NE, O'Hare N, Smeaton AF, Adamek T, Troncy R, Amin A, Benmokhtar R, Dumont E, Huet B, Mérialdo B, Tolias G, Spyrou E, Avrithis YS, Papadopoulos G, Mezaris V, Kompatsiaris I, Mörzinger R, Schallauer P, Bailer W, Chandramouli K, Izquierdo E, Goldmann L, Haller M, Samour A, Cobet A, Sikora T, Praks P, Hannah D, Halvey M, Hopfgartner F, Villa R, Punitha P, Goyal A and Jose JM (2008), "K-Space at TRECVid 2008", In TRECVID 2008 workshop participants notebook papers. Gaithersburg, Maryland, November, 2008, pp. 1-10. National Institute of Standards and Technology (NIST).
Abstract: In this paper we describe K-Space's participation in TRECVid 2008 in the interactive search task. For 2008 the K-Space group performed one of the largest interactive video information retrieval experiments conducted in a laboratory setting. We had three institutions participating in a multi-site multi-system experiment. In total 36 users participated, 12 each from Dublin City University (DCU, Ireland), University of Glasgow (GU, Scotland) and Centrum Wiskunde & Informatica (CWI, the Netherlands). Three user interfaces were developed, two from DCU which were also used in 2007 as well as an interface from GU. All interfaces leveraged the same search service. Using a latin squares arrangement, each user conducted 12 topics, leading in total to 6 runs per site, 18 in total. We officially submitted for evaluation 3 of these runs to NIST with an additional expert run using a 4th system. Our submitted runs performed around the median. In this paper we will present an overview of the search system utilized, the experimental setup and a preliminary analysis of our results.
BibTeX:
@inproceedings{wilkins2008k,
  author = {Wilkins, Peter and Byrne, Daragh and Jones, Gareth J. F. and Lee, Hyowon and Keenan, Gordon and McGuinness, Kevin and O'Connor, Noel E. and O'Hare, Neil and Smeaton, Alan F. and Adamek, Tomasz and Troncy, Raphaël and Amin, Alia and Benmokhtar, Rachid and Dumont, Emilie and Huet, Benoit and Mérialdo, Bernard and Tolias, Giorgos and Spyrou, Evaggelos and Avrithis, Yannis S. and Papadopoulos, Georgios and Mezaris, Vasileios and Kompatsiaris, Ioannis and Mörzinger, Roland and Schallauer, Peter and Bailer, Werner and Chandramouli, Krishna and Izquierdo, Ebroul and Goldmann, Lutz and Haller, Martin and Samour, Amjad and Cobet, Andreas and Sikora, Thomas and Praks, Pavel and Hannah, David and Halvey, Martin and Hopfgartner, Frank and Villa, Robert and Punitha, P. and Goyal, Anuj and Jose, Joemon M.},
  title = {K-Space at TRECVid 2008},
  booktitle = {TRECVID 2008 workshop participants notebook papers},
  publisher = {National Institute of Standards and Technology (NIST)},
  year = {2008},
  pages = {1--10},
  note = {google scholar entry: 6th TRECVID Workshop (TRECVID 2008). Gaithersburg, Maryland, November 2008.},
  url = {http://www-nlpir.nist.gov/projects/tvpubs/tv8.papers/kspace.pdf}
}
Zeljkoviae V, Tameze C, Vincelette R and Izquierdo E (2008), "Nonlinear diffusion filter and triangle method used for noise removal from polygonal shapes", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, July, 2008, pp. 188-191. IET.
Abstract: We propose a two step process for removing noise from polygonal shapes. In the first step we apply a nonlinear diffusion filter to enhance relevant polygonal vertices and to remove those vertices that are identified as noise. In the second step we remove the vertices that form the smallest area triangles. In experimental tests of this procedure we demonstrate successful removal of noise and excellent preservation of shape thanks to appropriate emphasis of dominant vertices.
BibTeX:
@inproceedings{zeljkoviae2008nonlinear,
  author = {Zeljkoviae, Vesna and Tameze, Claude and Vincelette, Robert and Izquierdo, Ebroul},
  title = {Nonlinear diffusion filter and triangle method used for noise removal from polygonal shapes},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  publisher = {IET},
  year = {2008},
  pages = {188--191},
  note = {google scholar entry: 5th International Conference on Visual Information Engineering (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4743414},
  doi = {10.1049/cp:20080306}
}
Zeljkovic V, Tameze C and Vincelette R (2008), "Combined Nonlinear Inverse Diffusion Filter and Triangle Method used for Noise Removal from Polygonal Shapes", In Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on. San Diego, California, October, 2008, pp. 21-24. IEEE.
Abstract: A two step procedure for removing noise from polygonal shapes is presented here. The first step is the removal of vertices by linear and different nonlinear inverse diffusion filters. In the second step we apply the triangle method which identifies least dominant vertices from the polygon obtained from the first stage. Those vertices whose adjacent sides form triangles of the smallest area are defined as least dominant and most likely to be noise. A thorough testing of this method on the shapes typical in video images demonstrates that it successfully removes noise vertices while it preserves dominant ones.
BibTeX:
@inproceedings{zeljkovic2008combined,
  author = {Zeljkovic, Vesna and Tameze, Claude and Vincelette, Robert},
  title = {Combined Nonlinear Inverse Diffusion Filter and Triangle Method used for Noise Removal from Polygonal Shapes},
  booktitle = {Image Processing (ICIP 2008), Proceedings of the 15th IEEE International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {21--24},
  note = {google scholar entry: 15th International Conference on Image Processing (ICIP 2008). San Diego, California, 12-15 October 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4711681},
  doi = {10.1109/ICIP.2008.4711681}
}
Zgaljic T, Ramzan N, Akram M, Izquierdo E, Caballero R, Finn A, Wang H and Xiong Z (2008), "Surveillance Centric Coding", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, August, 2008, pp. 835-839. IEEE.
Abstract: In this paper we introduce the paradigm of Surveillance Centric Coding (SCC), in which coding aims to achieve bit-rate optimisation and adaptation of surveillance videos for storing and transmission purposes. In the proposed approach the SCC encoder communicates with Video Content Analysis (VCA) module that detects events of interests in video captured by CCTV. Bit-rate optimization and adaptation is achieved by exploiting the scalability properties of the employed codec. Time segments containing events relevant to surveillance application are encoded using high spatio-temporal resolution and quality while the irrelevant portions from the surveillance standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to the scalability of the produced compressed bit-stream, additional bit-rate adaptation is possible, for instance for the transmission purposes. Experimental evaluation shows that significant reduction in bit-rate can be achieved by the proposed approach without loss of information relevant to surveillance applications.
BibTeX:
@inproceedings{zgaljic2008surveillance,
  author = {Zgaljic, Toni and Ramzan, Naeem and Akram, Muhammad and Izquierdo, Ebroul and Caballero, Rodrigo and Finn, Alan and Wang, Hongcheng and Xiong, Ziyou},
  title = {Surveillance Centric Coding},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  publisher = {IEEE},
  year = {2008},
  pages = {835--839},
  note = {google scholar entry: 5th International Conference on Visual Information Engineering (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4743534},
  doi = {10.1049/cp:20080426}
}
Zhang Q, Chandramouli K, Damnjanovic U, Piatrik T, Tolias G, Avrithis YS, Kapsalas P, Mylonas P, Spyrou E, Mansencal B, Benois-Pineau J, Saracoglu A, Esen E, Alatan AA, Aginako N, Garcia I, Pinheiro AMG, Alexandre LA, Corvaglia M, Guerrini F, Migliorati P, Fatemi N, Poulin F, Raileanu LE, Jarina R, Paralic M, Vrochidis S, Moumtzidou A, King P, Nikolopoulos S, Dimou A, Mezaris V, Makris L, Kompatsiaris I, Naci SU and Hanjalic A (2008), "COST292 experimental framework for TRECVID 2008", In TRECVID 2008 workshop participants notebook papers. Gaithersburg, Maryland, November, 2008, pp. 1-15. NIST.
Abstract: In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two different systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos.
BibTeX:
@inproceedings{Zhang2008,
  author = { Qianni Zhang and
Krishna Chandramouli and
Uros Damnjanovic and
Tomas Piatrik and
Giorgos Tolias and
Yannis S. Avrithis and
P. Kapsalas and
Phivos Mylonas and
Evaggelos Spyrou and
Boris Mansencal and
Jenny Benois-Pineau and
Ahmet Saracoglu and
Ersin Esen and
A. Aydın Alatan and
Naiara Aginako and
I. Garcia and
António M. G. Pinheiro and
L. A. Alexandre and
Marzia Corvaglia and
Fabrizio Guerrini and
Pierangelo Migliorati and
Nastaran Fatemi and
Florian Poulin and
Laura Elena Raileanu and
Roman Jarina and
Martin Paralic and
Stefanos Vrochidis and
Anastasia Moumtzidou and
Paul King and
Spiros Nikolopoulos and
Anastasios Dimou and
Vasileios Mezaris and
Lambros Makris and
Ioannis Kompatsiaris and
Suphi Umut Naci and
Alan Hanjalic
}, editor = {Over, Paul and Awad, George and Rose, R. Travis and Fiscus, Jonathan G. and Kraaij, Wessel and Smeaton, Alan F.}, title = {COST292 experimental framework for TRECVID 2008}, booktitle = {TRECVID 2008 workshop participants notebook papers}, publisher = {NIST}, year = {2008}, pages = {1--15}, note = {google scholar entry: TRECVid 2008 Text REtrieval Conference (TRECVid Workshop). Gaithersburg, Maryland, November 2008}, url = {http://www-nlpir.nist.gov/projects/tvpubs/tv8.papers/cost292.pdf} }
Zhang Q and Izquierdo E (2008), "Bayesian Learning and Reasoning for Context Exploitation in Visual Information Retrieval", In Visual Information Engineering (VIE 2008), 5th International Conference on. Xi'an, China, July, 2008, pp. 170-175. IET.
Abstract: This paper presents a semantic context inference approach on the basis of a multi-feature based visual information retrieval framework. This approach aims at assisting effective retrieval of visual content by exploiting the context information in the digital database. Bayesian networks are used as an inference tool, which can be automatically constructed by learning from the multi-feature similarities and a small amount of training data. The idea is to model potential semantic descriptions of basic semantic concepts in the visual content, the dependencies between them, and the conditional probabilities involved in those dependencies. This information is then used to calculate the probabilities of the effects that those concepts have on each other in order to obtain more precise and meaningful semantic labels for the visual content. However, the proposed method is not restricted to the specific multi-feature based visual information retrieval framework used in this paper. Selected experimental results are presented to show how the proposed context inference approach could improve the retrieval performance.
BibTeX:
@inproceedings{zhang2008bayesian,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Bayesian Learning and Reasoning for Context Exploitation in Visual Information Retrieval},
  booktitle = {Visual Information Engineering (VIE 2008), 5th International Conference on},
  publisher = {IET},
  year = {2008},
  pages = {170--175},
  note = {Conference proceedings in 2 volumes. Google scholar entry: 5th International Conference on Visual Information Engineering (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4743411},
  doi = {10.1049/cp:20080303}
}
Zhang Q and Izquierdo E (2008), "Describing Objects with Multiple Features for Visual Information Retrieval and Annotation", In Image Analysis for Multimedia Interactive Services, 2008. WIAMIS '08. Ninth International Workshop on. Klagenfurt, Austria, May, 2008, pp. 80-83.
Abstract: This paper describes how a multi-feature merging approach can be applied in semantic-based visual information retrieval and annotation. The goal is to identify the key visual patterns of specific objects from either static images or video frames. It is shown how the performance of such visual-to-semantic matching schemes can be improved by describing these key visual patterns using particular combinations of multiple visual features. A multi-objective learning mechanism is designed to derive a suitable merging metric for different features. The core of this mechanism is a widely used optimisation method - the multi-objective optimisation strategies. Assessment of the proposed technique has been conducted to validate its performance with natural images and videos.
BibTeX:
@inproceedings{zhang2008describing,
  author = {Qianni Zhang and Izquierdo, Ebroul},
  title = {Describing Objects with Multiple Features for Visual Information Retrieval and Annotation},
  booktitle = {Image Analysis for Multimedia Interactive Services, 2008. WIAMIS '08. Ninth International Workshop on},
  year = {2008},
  pages = {80--83},
  note = {google scholar entry: Ninth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS'08). Klagenfurt, Austria, 7-9 May 2008.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4556888},
  doi = {10.1109/WIAMIS.2008.45}
}
Zhang Q, Tolias G, Mansencal B, Saracoglu A, Aginako N, Alatan AA, Alexandre LA, Avrithis YS, Benois-Pineau J, Chandramouli K, Corvaglia M, Damnjanovic U, Dimou A, Esen E, Fatemi N, Garcia I, Guerrini F, Hanjalic A, Jarina R, Kapsalas P, King P, Kompatsiaris I, Makris L, Mezaris V, Migliorati P, Moumtzidou A, Mylonas P, Naci SU, Nikolopoulos S, Paralic M, Piatrik T, Poulin F, Pinheiro AMG, Raileanu LE, Spyrou E and Vrochidis S (2008), "COST292 experimental framework for TRECVID 2008", In TRECVID 2008 workshop participants notebook papers. Gaithersburg, Maryland, November, 2008, pp. 1-15. National Institute of Standards and Technology (NIST).
Abstract: In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two different systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos.
BibTeX:
@inproceedings{zhang2008cost292,
  author = {Zhang, Qianni and Tolias, Giorgos and Mansencal, Boris and Ahmet Saracoglu and Aginako, Naiara and Alatan, A. Aydın and Alexandre, L. A. and Avrithis, Yannis S. and Benois-Pineau, Jenny and Chandramouli, Krishna and Corvaglia, Marzia and Damnjanovic, Uros and Anastasios Dimou and Esen, Ersin and Fatemi, Nastaran and Garcia, I. and Guerrini, Fabrizio and Hanjalic, Alan and Jarina, Roman and Kapsalas, P. and King, Paul and Kompatsiaris, Ioannis and Makris, Lambros and Mezaris, Vasileios and Migliorati, Pierangelo and Moumtzidou, Anastasia and Mylonas, Phivos and Naci, Suphi Umut and Nikolopoulos, Spiros and Paralic, Martin and Piatrik, Tomas and Poulin, Florian and Pinheiro, António M. G. and Raileanu, Laura Elena and Spyrou, Evaggelos and Vrochidis, Stefanos},
  title = {COST292 experimental framework for TRECVID 2008},
  booktitle = {TRECVID 2008 workshop participants notebook papers},
  publisher = {National Institute of Standards and Technology (NIST)},
  year = {2008},
  pages = {1--15},
  note = {google scholar entry: 6th TRECVID Workshop (TRECVID 2008). Gaithersburg, Maryland, November 2008.},
  url = {http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.8.org.html}
}

Presentations, Posters and Technical Reports

Dumont E, Merialdo B, Essid S, Bailer W, Byrne D, Bredin H, O'Connor N, Jones GJF, Haller M, Krutz A, Sikora T and Piatrik T (2008), "A collaborative approach to video summarization". December, 2008.
Abstract: This poster describes an approach to video summarization based on the combination of several decision mechanisms provided by the partners of the K-Space European Network of Excellence. The system has been applied to the TRECVID 2008 BBC rushes summarization task.
BibTeX:
@misc{dumont2008collaborative,
  author = {Dumont, Emilie and Merialdo, Bernard and Essid, Slim and Bailer, Werner and Byrne, Daragh and Bredin, Hervé and O'Connor, Noel and Jones, Gareth J. F. and Haller, Martin and Krutz, Andreas and Sikora, Thomas and Piatrik, Tomas},
  title = {A collaborative approach to video summarization},
  booktitle = {Semantic Multimedia. Third International Conference on Semantic and Digital Media Technologies (SAMT 2008). Proceedings.},
  publisher = {Digital Enterprise Research Institute (DERI)},
  year = {2008},
  note = {Poster presented at SAMT 2008},
  url = {http://resources.smile.deri.ie/conference/2008/samt/}
}

Theses and Monographs

Fernandez Arguedas V (2008), "Summarization of Surveillance Videos based on Visual Activity". Thesis at: Universidad Autónoma de Madrid. June, 2008.
Abstract: This master thesis proposes two different methods to summarize surveillance video sequences based on motion activity: thresholding between following frames and thresholding between a background and the frames. The former consists on an estimation of the energy of the difference between following frames. Considering temporal redundancy between following frames exists, only the frames whose differential energy is bigger than a fixed threshold would be saved in the summary. Given that they will be the only ones that would add some relevant information. While the latter, firstly, calculates a reconstructed background in order to consider only the relevant information (the foreground) in the frames. Later it will also calculate the energy of the difference (between the reconstructed background and the current frame). Only those frames whose differential energy overpasses the fixed threshold will be saved, as they hold relevant information. Both approaches depend on a fixed threshold, its definition can vary the quality of the summaries as well as the amount of relevant information they would contain. These approaches try to summarize videos automatically in order to reduce the necessary resources (to store or check information) like for example bit rate. The amount of redundant information in a surveillance video sequence usually is really considerably compared with the amount of useful information. In order to summary, these approaches try to take advantage of this special feature of surveillance videos.
BibTeX:
@mastersthesis{arguedas2008summarization,
  author = {Fernandez Arguedas, Virginia},
  editor = {Izquierdo, Ebroul},
  title = {Summarization of Surveillance Videos based on Visual Activity},
  school = {Universidad Autónoma de Madrid},
  year = {2008},
  note = {fix google scholar entry: publication type},
  url = {http://arantxa.ii.uam.es/~jms/pfcsteleco/lecturas/20080623VirginiaFernandez.pdf}
}
Izquierdo E (ed.) (2008), "Content-based Multimedia Indexing (CBMI 2008), Proceedings of the 6th International Workshop on", In Proceedings of CBMI 2008. London, England, June, 2008, pp. 575. IEEE.
BibTeX:
@proceedings{izquierdo2008proceedings,,
  editor = {Izquierdo, Ebroul},
  title = {Content-based Multimedia Indexing (CBMI 2008), Proceedings of the 6th International Workshop on},
  booktitle = {Proceedings of CBMI 2008},
  publisher = {IEEE},
  year = {2008},
  pages = {575},
  note = {google scholar entry: Proceedings of the 6th International Workshop on Content-Based Multimedia Indexing (CBMI 2008). London, England, 18-20 June 2008.},
  url = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4558154}
}
Izquierdo E (ed.) (2008), "Visual Information Engineering (VIE) 2008, 5th International Conference on", In Proceedings of VIE 2008. Xi'an, China, July, 2008, pp. 839. IET.
BibTeX:
@proceedings{izquierdo2008proceedings2,,
  editor = {Izquierdo, Ebroul and Liu, Guizhong},
  title = {Visual Information Engineering (VIE) 2008, 5th International Conference on},
  booktitle = {Proceedings of VIE 2008},
  publisher = {IET},
  year = {2008},
  pages = {839},
  note = {Conference proceedings in 2 volumes. Google scholar entry: 5th International Conference on Visual Information Engineering, (VIE 2008). Xi'an, China, 29 July - 1 August 2008.},
  url = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4717973}
}
Peixoto E (2008), "Transcodificador de Vídeo Wyner-Ziv/H.263 para Comunicação entre Dispositivos Móveis". Thesis at: Universidade de Brasília. February, 2008.
Abstract: In mobile to mobile video communications, both the transmitting and receiving ends may not have the necessary computing power to perform complex video compression and decompression tasks. Traditional video codecs typically have highly complex encoders and less complex decoders. However, Wyner-Ziv coding allows for a low complexity encoder at the price of a more complex decoder. It is proposed a video communication system where the transmitter uses a Wyner-Ziv (reverse complexity) encoder, while the receiver uses a traditional decoder, hence minimizing complexity at both ends. For that to work it becomes necessary to insert a transcoder in the network to convert the video stream. It is presented an efficient transcoder from a simple Wyner-Ziv approach to the H.263 standard. This approach saves a large amount of computation by reusing the motion estimation performed at the Wyner-Ziv decoder stage, among other things. A pixel-domain Wyner-Ziv codec was implemented for the transcoder. Along with reusing the motion estimation done in the Wyner-Ziv decoding process, the transcoder also allows one to change the GOP length of the transcoded sequence and to refine the motions vectors. Extensive tests were carried to evaluate the proposed transcoder performance using popular video sequences such as Foreman, Salesman, Carphone and Coastguard.
BibTeX:
@mastersthesis{peixoto2008transcodificador,
  author = {Eduardo Peixoto},
  title = {Transcodificador de Vídeo Wyner-Ziv/H.263 para Comunica�ao entre Dispositivos Móveis},
  school = {Universidade de Brasília},
  year = {2008},
  url = {http://queiroz.divp.org/papers/tese_eduardo_msc.pdf}
}
Zhang Q and Benini S (2008), "D9 SoA report on learning and reasoning techniques for automatic semantic inference and multimodal approaches", February, 2008, pp. 1-69.
BibTeX:
@techreport{FP6-045189,
  author = {Zhang, Qianni and Benini, Sergio},
  title = {D9 SoA report on learning and reasoning techniques for automatic semantic inference and multimodal approaches},
  year = {2008},
  pages = {1--69},
  note = {deliverables report for rushes project},
  url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.111.3755}
}


2007

Journal Papers

Cózar JR, Guil Mata N, González-Linares JM, Zapata EL and Izquierdo E (2007), "Logotype detection to support semantic-based video annotation", Signal Processing: Image Communication. August-September, 2007. Vol. 22(7-8), pp. 669-679. Elsevier.
Abstract: In conventional video production, logotypes are used to convey information about content originator or the actual video content. Logotypes contain information that is critical to infer genre, class and other important semantic features of video. This paper presents a framework to support semantic-based video classification and annotation. The backbone of the proposed framework is a technique for logotype extraction and recognition. The method consists of two main processing stages. The first stage performs temporal and spatial segmentation by calculating the minimal luminance variance region (MVLR) for a set of frames. Non-linear diffusion filters (NLDF) are used at this stage to reduce noise in the shape of the logotype. In the second stage, logotype classification and recognition are achieved. The earth mover's distance (EMD) is used as a metric to decide if the detected MLVR belongs to one of the following logotype categories: learned or candidate. Learned logos are semantically annotated shapes available in the database. The semantic characterization of such logos is obtained through an iterative learning process. Candidate logos are non-annotated shapes extracted during the first processing stage. They are assigned to clusters grouping different instances of logos of similar shape. Using these clusters, false logotypes are removed and different instances of the same logo are averaged to obtain a unique prototype representing the underlying noisy cluster. Experiments involving several hours of MPEG video and around 1000 of candidate logotypes have been carried out in order to show the robustness of both detection and classification processes.
BibTeX:
@article{cozar2007logotype,
  author = {Cózar, Julían Ramos and Guil Mata, Nicolás and José María González-Linares and Emilio L. Zapata and Izquierdo, Ebroul},
  title = {Logotype detection to support semantic-based video annotation},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2007},
  volume = {22},
  number = {7-8},
  pages = {669--679},
  note = {"Special Issue on Content-Based Multimedia Indexing and Retrieval"},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596507000707},
  doi = {10.1016/j.image.2007.05.006}
}
Dehmeshki J, Ye X, Amin H, Abaei M, Lin X and Qanadli SD (2007), "Volumetric Quantification of Atherosclerotic Plaque in CT Considering Partial Volume Effect", Medical Imaging, IEEE Transactions on. march, 2007. Vol. 26(3), pp. 273-282. IEEE.
Abstract: Coronary artery calcification (CAC) is quantified based on a computed tomography (CT) scan image. A calcified region is identified. Modified expectation maximization (MEM) of a statistical model for the calcified and background material is used to estimate the partial calcium content of the voxels. The algorithm limits the region over which MEM is performed. By using MEM, the statistical properties of the model are iteratively updated based on the calculated resultant calcium distribution from the previous iteration. The estimated statistical properties are used to generate a map of the partial calcium content in the calcified region. The volume of calcium in the calcified region is determined based on the map. The experimental results on a cardiac phantom, scanned 90 times using 15 different protocols, demonstrate that the proposed method is less sensitive to partial volume effect and noise, with average error of 9.5% (standard deviation (SD) of 5-7 mm3) compared with 67% (SD of 3-20 mm3) for conventional techniques. The high reproducibility of the proposed method for 35 patients, scanned twice using the same protocol at a minimum interval of 10 min, shows that the method provides 2-3 times lower interscan variation than conventional techniques
BibTeX:
@article{dehmeshki2007volumetric,
  author = {Dehmeshki, Jamshid and Xujiong Ye and Amin, Hamdan and Abaei, Maryam and Xinyu Lin and Qanadli, Salah D.},
  title = {Volumetric Quantification of Atherosclerotic Plaque in CT Considering Partial Volume Effect},
  journal = {Medical Imaging, IEEE Transactions on},
  publisher = {IEEE},
  year = {2007},
  volume = {26},
  number = {3},
  pages = {273--282},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4114550},
  doi = {10.1109/TMI.2007.893344}
}
Dehmeshki J, Ye X, Lin X, Valdivieso M and Amin H (2007), "Automated detection of lung nodules in CT images using shape-based genetic algorithm", Computerized Medical Imaging and Graphics. Vol. 31(6), pp. 408-417. Elsevier.
Abstract: A shape-based genetic algorithm template-matching (GATM) method is proposed for the detection of nodules with spherical elements. A spherical-oriented convolution-based filtering scheme is used as a pre-processing step for enhancement. To define the fitness function for GATM, a 3D geometric shape feature is calculated at each voxel and then combined into a global nodule intensity distribution. Lung nodule phantom images are used as reference images for template matching. The proposed method has been validated on a clinical dataset of 70 thoracic CT scans (involving 16,800 CT slices) that contains 178 nodules as a gold standard. A total of 160 nodules were correctly detected by the proposed method and resulted in a detection rate of about 90%, with the number of false positives at approximately 14.6/scan (0.06/slice). The high-detection performance of the method suggested promising potential for clinical applications.
BibTeX:
@article{Dehmeshki2007408,
  author = {Jamshid Dehmeshki and Xujiong Ye and XinYu Lin and Manlio Valdivieso and Hamdan Amin},
  title = {Automated detection of lung nodules in CT images using shape-based genetic algorithm},
  journal = {Computerized Medical Imaging and Graphics},
  publisher = {Elsevier},
  year = {2007},
  volume = {31},
  number = {6},
  pages = {408--417},
  url = {http://www.sciencedirect.com/science/article/pii/S089561110700050X},
  doi = {10.1016/j.compmedimag.2007.03.002}
}
Djordjevic D and Izquierdo E (2007), "An Object- and User-Driven System for Semantic-Based Image Annotation and Retrieval", Circuits and Systems for Video Technology, IEEE Transactions on. March, 2007. Vol. 17(3), pp. 313-323. IEEE.
Abstract: In this paper, a system for object-based semi-automatic indexing and retrieval of natural images is introduced. Three important concepts underpin the proposed system: a new strategy to fuse different low-level content descriptions; a learning technique involving user relevance feedback; and a novel object based model to link semantic terms and visual objects. To achieve high accuracy in the retrieval and subsequent annotation processes several low-level image primitives are combined in a suitable multifeatures space. This space is modelled in a structured way exploiting both low-level features and spatial contextual relations of image blocks. Support vector machines are used to learn from gathered information through relevance feedback. An adaptive convolution kernel is defined to handle the proposed structured multifeature space. The positive definite property of the introduced kernel is proven, as essential condition for uniqueness and optimality of the convex optimization in support vector machines. The proposed system has been thoroughly evaluated and selected results are reported in this paper
BibTeX:
@article{djordjevic2007object,
  author = {Djordjevic, Divna and Izquierdo, Ebroul},
  title = {An Object- and User-Driven System for Semantic-Based Image Annotation and Retrieval},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2007},
  volume = {17},
  number = {3},
  pages = {313--323},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4118236},
  doi = {10.1109/TCSVT.2007.890634}
}
Dong L and Izquierdo E (2007), "A Biologically Inspired System for Classification of Natural Images", Circuits and Systems for Video Technology, IEEE Transactions on. May, 2007. Vol. 17(5), pp. 590-603.
Abstract: A system for visual information analysis and classification based on a biologically inspired visual selective attention model with knowledge structuring is presented. The system is derived from well-known analogous processes in the visual system of primates and inference procedures of the human brain. It consists of three main units: biologically inspired visual selective attention, knowledge structuring, and clustering of visual information. The biologically inspired visual selective attention unit closely follow the mechanisms of the visual what pathway and where pathway in the primates' brain. It uses a bottom-up approach to generate a salient area based on low-level features extracted from natural images. The scale selection to determine suitable size of salient areas uses a maximum entropy approach. This unit also contains a low-level top-down selective attention module that performs decisions on interesting objects by human interaction. In this module, a reinforcement/inhibition mechanism is exploited. The knowledge structuring unit automatically creates a relevance map from salient image areas generated by the biologically inspired unit. It also derives a set of well-structured representations from low-level descriptions to drive the final classification. The knowledge structuring unit relys on human knowledge to produce suitable links between low-level descriptions and high-level representation on a limited training set. The backbone of this unit is a distribution mapping strategy involving two basic modules: structured low-level feature extraction using convolution neural network and a topology representation module based on a growing cell structure network. The third unit of the system classification is achieved by simulating high-level top-down visual information perception and clustering using an incremental Bayesian parameter estimation method. The proposed modular system architecture offers straightforward expansion to include user relevance feedback,- - contextual input, and multimodal information if available
BibTeX:
@article{dong2007biologically,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {A Biologically Inspired System for Classification of Natural Images},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  year = {2007},
  volume = {17},
  number = {5},
  pages = {590--603},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4162548},
  doi = {10.1109/TCSVT.2007.894035}
}
Izquierdo E (2007), "Knowledge Engineering, Semantics, and Signal Processing in Audio-Visual Information Retrieval", Circuits and Systems for Video Technology, IEEE Transactions on. March, 2007. Vol. 17(3), pp. 257-260. IEEE.
Abstract: The thirteen papers in this special issue focus on knowledge engineering, semantics, and signal processing in audio-visual information retrieval. The selected papers are briefly summarized.
BibTeX:
@article{izquierdo2007knowledge,
  author = {Izquierdo, Ebroul},
  title = {Knowledge Engineering, Semantics, and Signal Processing in Audio-Visual Information Retrieval},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2007},
  volume = {17},
  number = {3},
  pages = {257--260},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4118237},
  doi = {10.1109/TCSVT.2006.890273}
}
Izquierdo E, Benois-Pineau J and André-Obrecht Ré (2007), "Signal processing: Image communication, special issue on content-based multimedia indexing and retrieval", Image Communication. August, 2007. Vol. 22(7-8), pp. 605-606. Elsevier.
BibTeX:
@article{Izquierdo2007signal,
  author = {Izquierdo, Ebroul and Benois-Pineau, Jenny and André-Obrecht, Régine},
  title = {Signal processing: Image communication, special issue on content-based multimedia indexing and retrieval},
  journal = {Image Communication},
  publisher = {Elsevier},
  year = {2007},
  volume = {22},
  number = {7-8},
  pages = {605--606},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596507000835},
  doi = {10.1016/j.image.2007.06.001}
}
Ng Wing W. Y.; Dorado AYDSPWIE (2007), "Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error", Pattern Recognition. January, 2007. Vol. 40(1), pp. 19-32. Elsevier.
Abstract: Image classification arises as an important phase in the overall process of automatic image annotation and image retrieval. In this study, we are concerned with the design of image classifiers developed in the feature space formed by low level primitives defined in the setting of the MPEG-7 standard. Our objective is to investigate the discriminatory properties of such standard image descriptors and look at efficient architectures of the classifiers along with their design pursuits. The generalization capabilities of an image classifier are essential to its successful usage in image retrieval and annotation. Intuitively, it is expected that the classifier should achieve high classification accuracy on unseen images that are quite ``similar'' to those occurring in the training set. On the other hand, we may assume that the performance of the classifier could not be guaranteed in the case of images that are very much dissimilar from the elements of the training set. To follow this observation, we develop and use a concept of the localized generalization error and show how it guides the design of the classifier. As image classifier, we consider the usage of the radial basis function neural networks (RBFNNs). Through intensive experimentation we show that the resulting classifier outperforms other classifiers such as a multi-class support vector machines (SVMs) as well as ``standard'' RBFNNs (viz. those developed without the guidance offered by the optimization of the localized generalization error). The experimental studies reveal some interesting interpretation abilities of the RBFNN classifiers being related with their receptive fields.
BibTeX:
@article{ng2007image,
  author = {Ng, Wing W. Y.; Dorado, Andres; Yeung, Daniel S.; Pedrycz, Witold; Izquierdo, Ebroul},
  title = {Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error},
  journal = {Pattern Recognition},
  publisher = {Elsevier},
  year = {2007},
  volume = {40},
  number = {1},
  pages = {19--32},
  note = {none},
  url = {http://www.sciencedirect.com/science/article/pii/S003132030600313X},
  doi = {10.1016/j.patcog.2006.07.002}
}
Ramzan N, Wan S and Izquierdo E (2007), "Joint Source-Channel Coding for Wavelet-Based Scalable Video Transmission Using an Adaptive Turbo Code", EURASIP Journal on Image and Video Processing. January, 2007. Vol. 2007, pp. 1-12. Springer.
Abstract: An efficient approach for joint source and channel coding is presented. The proposed approach exploits the joint optimization of a wavelet-based scalable video coding framework and a forward error correction method based on turbo codes. The scheme minimizes the reconstructed video distortion at the decoder subject to a constraint on the overall transmission bitrate budget. The minimization is achieved by exploiting the source rate distortion characteristics and the statistics of the available codes. Here, the critical problem of estimating the bit error rate probability in error-prone applications is discussed. Aiming at improving the overall performance of the underlying joint source-channel coding, the combination of the packet size, interleaver, and channel coding rate is optimized using Lagrangian optimization. Experimental results show that the proposed approach outperforms conventional forward error correction techniques at all bit error rates. It also significantly improves the performance of end-to-end scalable video transmission at all channel bit rates.
BibTeX:
@article{ramzan2007joint,
  author = {Ramzan, Naeem and Wan, Shuai and Izquierdo, Ebroul},
  title = {Joint Source-Channel Coding for Wavelet-Based Scalable Video Transmission Using an Adaptive Turbo Code},
  journal = {EURASIP Journal on Image and Video Processing},
  publisher = {Springer},
  year = {2007},
  volume = {2007},
  pages = {1--12},
  note = {prev. Hindawi Publishing},
  url = {http://jivp.eurasipjournals.com/content/pdf/1687-5281-2007-047517.pdf},
  doi = {10.1155/2007/47517}
}
Wan S and Izquierdo E (2007), "Rate-Distortion Optimized Motion-Compensated Prediction for Packet Loss Resilient Video Coding", Image Processing, IEEE Transactions on. May, 2007. Vol. 16(5), pp. 1327-1338. IEEE.
Abstract: A rate-distortion optimized motion-compensated prediction method for robust video coding is proposed. Contrasting methods from the conventional literature, the proposed approach uses the expected reconstructed distortion after transmission, instead of the displaced frame difference in motion estimation. Initially, the end-to-end reconstructed distortion is estimated through a recursive per-pixel estimation algorithm. Then the total bit rate for motion-compensated encoding is predicted using a suitable rate distortion model. The results are fed into the Lagrangian optimization at the encoder to perform motion estimation. Here, the encoder automatically finds an optimized motion compensated prediction by estimating the best tradeoff between coding efficiency and end-to-end distortion. Finally, rate-distortion optimization is applied again to estimate the macroblock mode. This process uses previously selected optimized motion vectors and their corresponding reference frames. It also considers intraprediction. Extensive computer simulations in lossy channel environments were conducted to assess the performance of the proposed method. Selected results for both single and multiple reference frames settings are described. A comparative evaluation using other conventional techniques from the literature was also conducted. Furthermore, the effects of mismatches between the actual channel packet loss rate and the one assumed at the encoder side have been evaluated and reported in this paper
BibTeX:
@article{wan2007rate,
  author = {Wan, Shuai and Izquierdo, Ebroul},
  title = {Rate-Distortion Optimized Motion-Compensated Prediction for Packet Loss Resilient Video Coding},
  journal = {Image Processing, IEEE Transactions on},
  publisher = {IEEE},
  year = {2007},
  volume = {16},
  number = {5},
  pages = {1327--1338},
  url = {http://www.znu.ac.ir/data/members/fazli_saeid/DIP/Paper/ISSUE5/04154809.pdf},
  doi = {10.1109/TIP.2007.894230}
}
Wan S, Mrak M, Ramzan N and Izquierdo E (2007), "Perceptually Adaptive Joint Deringing-Deblocking Filtering for Scalable Video Transmission over Wireless Networks", Signal Processing: Image Communication. March, 2007. Vol. 22(3), pp. 266-276. Elsevier.
Abstract: Video transmission over low bit-rate channels, such as wireless networks, requires dedicated filtering during decoding for crucial enhancement of the perceptual video quality. For that reason, deringing and deblocking are inevitable components of decoders in wireless video transmission systems. Aimed at improving the visual quality of decoded video, in this paper a new perceptually adaptive joint deringing-deblocking filtering technique for scalable video streams is introduced. The proposed approach is designed to deal with artefacts inherent to transmissions over very low bit-rate channels, specifically wireless networks. It considers both prediction and update steps in motion compensated temporal filtering in an in-loop filtering architecture. The proposed approach integrates three different filtering modules to deal with low-pass, high-pass and after-update frames, respectively. The filter strength is adaptively tuned according to the number of discarded bit-planes, which in turn depends on the channel bit-rate and the channel error conditions. Furthermore, since ringing and blocking artefacts are visually annoying, relevant characteristics of the human visual system are considered in the used bilateral filtering model. That is, the amount of filtering is adjusted to the perceptual distortion by integrating a human visual system model into filtering based on luminance, activity and temporal masking. As a consequence, the resulting filter strength is automatically adapted to both perceptual sensitivity and channel variation. To assess the performance of the proposed approach, a comprehensive comparative evaluation against the conventional loop architecture and bilateral filter was conducted. The results of the experimental evaluation show a superior performance of the proposed adaptive filtering approach, providing better objective and subjective quality.
BibTeX:
@article{wan2007perceptually,
  author = {Shuai Wan and Marta Mrak and Naeem Ramzan and Ebroul Izquierdo},
  title = {Perceptually Adaptive Joint Deringing-Deblocking Filtering for Scalable Video Transmission over Wireless Networks},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2007},
  volume = {22},
  number = {3},
  pages = {266--276},
  note = {Special issue on Mobile Video},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.1588&rep=rep1&type=pdf},
  doi = {10.1016/j.image.2006.12.005}
}
Zgaljic T, Sprljan N and Izquierdo E (2007), "Bit-stream allocation methods for scalable video coding supporting wireless communications", Signal Processing: Image Communication. March, 2007. Vol. 22(3), pp. 298-316. Elsevier.
Abstract: The demand for video access services through wireless networks, as important parts of larger heterogeneous networks, is constantly increasing. To cope with this demand, flexible compression technology to enable optimum coding performance, especially at low bit-rates, is required. In this context, scalable video coding emerges as the most promising technology. A critical problem in wavelet-based scalable video coding is bit-stream allocation at any bit-rate and in particular when low bit-rates are targeted. In this paper two methods for bit-stream allocation based on the concept of fractional bit-planes are reported. The first method assumes that minimum rate-distortion (R--D) slope of the same fractional bit-plane within the same bit-plane across different subbands is higher than or equal to the maximum R--D slope of the next fractional bit-plane. This method is characterised by a very low complexity since no distortion evaluation is required. Contrasting this approach, in the second method the distortion caused by quantisation of the wavelet coefficients is considered. Here, a simple yet effective statistical distortion model that is used for estimation of R--D slopes for each fractional bit-plane is derived. Three different strategies are derived from this method. In the first one it is assumed that the used wavelet is nearly orthogonal, i.e. the distortion in the transform domain is treated as being equivalent to the distortion in the signal domain. To reduce the error caused by direct distortion evaluation in the wavelet domain, the weighting factors are applied to the used statistical distortion model in the second strategy. In the last strategy, the derived statistical model is used during the bit-plane encoding to determine optimal position of the fractional bit-plane corresponding to refinement information in the compressed bit-stream. Results of selected experiments measuring peak signal to noise ratio (PSNR) of decoded video at various bit-rates are reported. Additionally, the PSNR of decoded video at various bit-rates is measured for two specific cases: when the methods for bit-stream allocation are used to assign quality layers in the compressed bit-stream, and when quality layers are not assigned.
BibTeX:
@article{zgaljic2007bit,
  author = {Zgaljic, Toni and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {Bit-stream allocation methods for scalable video coding supporting wireless communications},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2007},
  volume = {22},
  number = {3},
  pages = {298--316},
  edition = {Special edition on Mobile Video},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596506001433},
  doi = {10.1016/j.image.2006.12.008}
}
Zhang Q and Izquierdo E (2007), "Combining Low-Level Features for Semantic Extraction in Image Retrieval", Eurasip Journal on Advances in Signal Processing. December, 2007. (1), pp. 1-12. Springer.
Abstract: An object-oriented approach for semantic-based image retrieval is presented. The goal is to identify key patterns of specific objects in the training data and to use them as object signature. Two important aspects of semantic-based image retrieval are considered: retrieval of images containing a given semantic concept and fusion of different low-level features. The proposed approach splits the image into elementary image blocks to obtain block regions close in shape to the objects of interest. A multiobjective optimization technique is used to find a suitable multidescriptor space in which several low-level image primitives can be fused. The visual primitives are combined according to a concept-specific metric, which is learned from representative blocks or training data. The optimal linear combination of single descriptor metrics is estimated by applying the Pareto archived evolution strategy. An empirical assessment of the proposed technique was conducted to validate its performance with natural images.
BibTeX:
@article{Zhang2007,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Combining Low-Level Features for Semantic Extraction in Image Retrieval},
  journal = {Eurasip Journal on Advances in Signal Processing},
  publisher = {Springer},
  year = {2007},
  number = {1},
  pages = {1--12},
  url = {http://asp.eurasipjournals.com/content/pdf/1687-6180-2007-061423.pdf},
  doi = {10.1155/2007/61423}
}
Zhang Q and Izquierdo E (2007), "Adaptive Salient Block Based Image Retrieval in Multi-Feature Space", Signal Processing: Image Communication. July, 2007. Vol. 22(6), pp. 591-603. Elsevier.
Abstract: In this paper, a new method for object-based image retrieval is proposed. The technique is designed to adaptively and efficiently locate salient blocks in images. Salient blocks are used to represent semantically meaningful objects in images and to perform object-oriented annotation and retrieval. An algorithm is proposed to locate the most suitable blocks of arbitrary size representing the query concept or object of interest in images. To annotate single objects according to human perception, associations between several low-level patterns and semantic concepts are modelled by an optimised multi-descriptor space. The approach starts by dividing the image into blocks partitioned according to several different layouts. Then, a fitting block is selected according to a similarity metric acting on concept-specific multi-feature spaces. The similarity metric is defined as linear combination of single feature space metrics for which the corresponding weights are learned from a group of representative salient blocks using multi-objective optimisation. Relevance Feedback is seamlessly integrated in the retrieval process. In each iteration, the user selects images relevant to the query object, then the corresponding salient blocks in selected images are used as training examples. The proposed technique was thoroughly assessed and selected results are reported in this paper to demonstrate its performance.
BibTeX:
@article{zhang2007adaptive,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Adaptive Salient Block Based Image Retrieval in Multi-Feature Space},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2007},
  volume = {22},
  number = {6},
  pages = {591--603},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596507000793},
  doi = {10.1016/j.image.2007.05.005}
}
Zhang Q and Izquierdo E (2007), "Combining Low-Level Features for Semantic Extraction in Image Retrieval", Eurasip Journal on Advances in Signal Processing. December, 2007. (1), pp. 1-12. Springer.
Abstract: An object-oriented approach for semantic-based image retrieval is presented. The goal is to identify key patterns of specific objects in the training data and to use them as object signature. Two important aspects of semantic-based image retrieval are considered: retrieval of images containing a given semantic concept and fusion of different low-level features. The proposed approach splits the image into elementary image blocks to obtain block regions close in shape to the objects of interest. A multiobjective optimization technique is used to find a suitable multidescriptor space in which several low-level image primitives can be fused. The visual primitives are combined according to a concept-specific metric, which is learned from representative blocks or training data. The optimal linear combination of single descriptor metrics is estimated by applying the Pareto archived evolution strategy. An empirical assessment of the proposed technique was conducted to validate its performance with natural images.
BibTeX:
@article{zhang2007combining,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Combining Low-Level Features for Semantic Extraction in Image Retrieval},
  journal = {Eurasip Journal on Advances in Signal Processing},
  publisher = {Springer},
  year = {2007},
  number = {1},
  pages = {1--12},
  url = {http://asp.eurasipjournals.com/content/pdf/1687-6180-2007-061423.pdf},
  doi = {10.1155/2007/61423}
}

Conference Papers

Borges PVK, Mayer J and Izquierdo E (2007), "Performance Analysis of Text Halftone Modulation", In Image Processing (ICIP 2007), Proceedings of the 14th International Conference on. San Antonio, TX, October, 2007. Vol. 3, pp. 285-288. IEEE.
Abstract: This paper analyzes the use of text halftone modulation (THM) as a text hardcopy watermarking method. Using THM, text characters in a document have their luminances modified from the standard black to a gray level generated with a given halftone screen, according to a message to be transmitted. The application of THM has been discussed in ((R. Villan et al. 2006), (K. Matsuit and K. Tanaka, 1994)). In this paper, a spectral metric is proposed to detect the embedded message. Based on this metric, an error rate analysis of halftone modulation is presented considering the effects of the print and scan channel. Experiments validate the analysis and the applicability of the method.
BibTeX:
@inproceedings{borges2007performance,
  author = {Borges, Paulo Vinicius Koerich and Mayer, Joceli and Izquierdo, Ebroul},
  title = {Performance Analysis of Text Halftone Modulation},
  booktitle = {Image Processing (ICIP 2007), Proceedings of the 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  volume = {3},
  pages = {285--288},
  note = {google scholar entry: International Conference on Image Processing (ICIP 2007). San Antonio, TX, 16-19 October 2007.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4379302},
  doi = {10.1109/ICIP.2007.4379302}
}
Borges PVK, Mayer J and Izquierdo E (2007), "A Practical Protocol for Digital and Printed Document Authentication", In Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007). Poznań, Poland, September, 2007, pp. 2529-2533. European Association for Signal Processing (EURASIP).
Abstract: This paper discusses a practical protocol for text document authentication, applicable to digital and printed form documents. It uses the text characters information to determine a key used to generate an authentication vector. Based on this vector, a feature in each character of the document is modified, without affecting the character ``meaning''. The modifiable feature may be size, color, shape, relative position, among others. If any character on the text is changed, the character information is different and consequently the authentication vector is also different. The proposed system does not require database retrieval and it is extremely difficult to forge the authentication process. A correlation-based detector for the system is proposed, and because feature detection errors may occur, an analysis is performed to determine the false alarm error probability of the system. Experiments illustrate the applicability of the method, considering the digital and the printed cases.
BibTeX:
@inproceedings{borges2007practical,
  author = {Borges, Paulo Vinicius Koerich and Mayer, Joceli and Izquierdo, Ebroul},
  editor = {Domański, Marek and Stasiński, Ryszard and Bartkowiak, Maciej},
  title = {A Practical Protocol for Digital and Printed Document Authentication},
  booktitle = {Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007)},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2007},
  pages = {2529--2533},
  note = {google scholar entry: 15th European Signal Processing Conference (EUSIPCO 2007). Poznan, Poland, 3-7 September 2007.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/d5p-k03.pdf}
}
Borges PVK, Mayer J and Izquierdo E (2007), "Segmentation of Document Images Using Higher Order Statistics", In Multimedia Signal Processing, 2007 IEEE 9th Workshop on (MMSP 2007). Chanai, Crete, 1-3 October 2007. Chanai, Crete, October, 2007, pp. 296-299. IEEE.
Abstract: This work presents an efficient post-segmentation method for separating text from the background in document images. For this task, this paper proposes the use of textured patterns to represent text in documents, instead of the standard black. It is shown that, in poor quality documents, text segmentation is more efficient when the characters in the document are represented in a halftoned gray level prior to printing. This occurs because the halftoning process induces statistical characteristics that help the text to be distinguished from noise or background. A typical case are noisy printed and scanned documents. Experiments validate the analysis and the applicability of the segmentation method. An important application for the method is in the postal service, where letters have their addresses segmented for automatic sorting.
BibTeX:
@inproceedings{borges2007segmentation,
  author = {Borges, Paulo Vinicius Koerich and Mayer, Joceli and Izquierdo, Ebroul},
  title = {Segmentation of Document Images Using Higher Order Statistics},
  booktitle = {Multimedia Signal Processing, 2007 IEEE 9th Workshop on (MMSP 2007). Chanai, Crete, 1-3 October 2007.},
  publisher = {IEEE},
  year = {2007},
  pages = {296--299},
  note = {google scholar entry: IEEE 9th Workshop on Multimedia Signal Processing (MMSP 2007). Chanai, Crete, 1-3 October 2007.},
  url = {http://www.paulovinicius.com/papers/borges_signal_processing_2007.pdf},
  doi = {10.1109/MMSP.2007.4412876}
}
Borges VKB, Izquierdo E and Mayer J (2007), "Efficient Side Information Encoding for Text Hardcopy Documents", In Advanced Video and Signal Based Surveillance (AVSS 2007), Proceedings of 2007 IEEE Conference on. London, England, September, 2007, pp. 552-557. IEEE.
Abstract: This paper proposes a new coding method that increases significantly the signal-to-watermark ratio in document watermarking algorithms. A possible approach to text document watermarking is to consider text characters as a data structure consisting of several modifiable features such as size, shape, position, luminance, among others. In existing algorithms, these features can be modified sequentially according to bit values to be embedded. In contrast, the solution proposed here uses a positional information coding approach to embed information. Using this approach, the information is related to the position of modified characters, and not to the bit embedded on each character. This coding is based on combinatorial analysis and it can embed more bits in comparison to the usual methods, given a distortion constraint. An analysis showing the superior performance of positional coding for this type of application is presented. Experiments validate the analysis and the applicability of the method.
BibTeX:
@inproceedings{borges2007efficient,
  author = {Borges, Vinicius Koerich Borges and Izquierdo, Ebroul and Mayer, Joceli},
  title = {Efficient Side Information Encoding for Text Hardcopy Documents},
  booktitle = {Advanced Video and Signal Based Surveillance (AVSS 2007), Proceedings of 2007 IEEE Conference on},
  publisher = {IEEE},
  year = {2007},
  pages = {552--557},
  note = {google scholar entry: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2007). London, England, 5-7 September 2007.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4425370},
  doi = {10.1109/AVSS.2007.4425370}
}
Damnjanovic U, Izquierdo E and Grzegorzek M (2007), "Shot Boundary Detection using Spectral Clustering", In Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007). Poznań, Poland, September, 2007, pp. 1779-1783. European Association for Signal Processing (EURASIP).
Abstract: Daily increase in the number of available video material resulted in significant research efforts for development of advanced content management systems. First step towards the semantic based video indexing and retrieval is a detection of elementary video structures. In this paper we present the algorithm for finding shot boundaries by using spectral clustering methods. Assuming that a shot boundary is a global feature of the shot rather then local, this paper introduces the algorithm for scene change detection based on the information from eigenvectors of a similarity matrix. Instead of utilising similarities from consecutive frames, we treat each shot as a cluster of frames. Objective function which is used as the criteria for spectral partitioning sums contributions of every frame to the overall structure of the shot. It is shown in this paper that optimizing this objective function gives proper information about scene change in the video sequence. Experiments showed that obtained scenes can be merged to from clusters with similar content, suitable for video summarisation. Evaluation is done on different datasets, and results are presented and discussed.
BibTeX:
@inproceedings{damnjanovic2007statistical,
  author = {Damnjanovic, Uros and Izquierdo, Ebroul and Grzegorzek, Marcin},
  editor = {Domański, Marek and Stasiński, Ryszard and Bartkowiak, Maciej},
  title = {Shot Boundary Detection using Spectral Clustering},
  booktitle = {Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007)},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2007},
  pages = {1779--1783},
  note = {google scholar entry: 15th European Signal Processing Conference (EUSIPCO 2007). Poznan, Poland, 3-7 September 2007.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/c5p-j06.pdf}
}
Damnjanovic U, Piatrik T, Djordjevic D and Izquierdo E (2007), "Video Summarisation for Surveillance and News Domain", In Semantic Multimedia. Second International Conference on Semantic and Digital Media Technologies, SAMT 2007, Genoa, Italy, December 5-7, 2007. Proceedings. Genoa, Italy Vol. 4816, pp. 99-112. Springer.
Abstract: Video summarization approaches have various fields of application, specifically related to organizing, browsing and accessing large video databases. In this paper we propose and evaluate two novel approaches for video summarization, one based on spectral methods and the other on ant-tree clustering. The overall summary creation process is broke down in two steps: detection of similar scenes and extraction of the most representative ones. While clustering approaches are used for scene segmentation, the post-processing logic merges video scenes into a subset of user relevant scenes. In the case of the spectral approach, representative scenes are extracted following the logic that important parts of the video are related with high motion activity of segments within scenes. In the alternative approach we estimate a subset of relevant video scene using ant-tree optimization approaches and in a supervised scenario certain scenes of no interest to the user are recognized and excluded from the summary. An experimental evaluation validating the feasibility and the robustness of these approaches is presented.
BibTeX:
@inproceedings{damnjanovic2007video,
  author = {Damnjanovic, Uros and Piatrik, Tomas and Djordjevic, Divna and Izquierdo, Ebroul},
  editor = {Falcidieno, Bianca and Spagnuolo, Michela and Avrithis, Yannis and Kompatsiaris, Ioannis and Buitelaar, Paul},
  title = {Video Summarisation for Surveillance and News Domain},
  booktitle = {Semantic Multimedia. Second International Conference on Semantic and Digital Media Technologies, SAMT 2007, Genoa, Italy, December 5-7, 2007. Proceedings},
  publisher = {Springer},
  year = {2007},
  volume = {4816},
  pages = {99--112},
  note = {google scholar entry: Second International Conference on Semantic and Digital Media Technologies (SAMT 2007). Genoa, Italy, December 5-7, 2007.},
  url = {http://www.mesh-ip.eu/upload/samt_2007-qmul.pdf},
  doi = {10.1007/978-3-540-77051-0_11}
}
Dong L and Izquierdo E (2007), "A Knowledge Structuring Technique for Image Classification", In Image Processing (ICIP 2007), Proceedings of the 14th International Conference on. San Antonio, TX, October, 2007. Vol. 6, pp. 377-380. IEEE.
Abstract: A system for image analysis and classification based on a knowledge structuring technique is presented. The knowledge structuring technique automatically creates a relevance map from salient areas of natural images. It also derives a set of well-structured representations from low-level description to drive the final classification. The backbone of the knowledge structuring technique is a distribution mapping strategy involving two basic modules: structured low-level feature extraction using convolution neural network and a topology representation module based on a growing cell structure network. Classification is achieved by simulating high-level top-down visual information perception and classifying using an incremental Bayesian parameter estimation method. The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.
BibTeX:
@inproceedings{4379600,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {A Knowledge Structuring Technique for Image Classification},
  booktitle = {Image Processing (ICIP 2007), Proceedings of the 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  volume = {6},
  pages = {377--380},
  note = {google scholar entry: 14th International Conference on Image Processing (ICIP 2007). San Antonio, Texas, 16-19 October 2007.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4379600},
  doi = {10.1109/ICIP.2007.4379600}
}
Dong L and Izquierdo E (2007), "A Topology Preserving Approach for Image Classification", In Image Analysis for Multimedia Interactive Services (WIAMIS 2007), Proceedings of the 8th International Workshop on. Santorini, Greece, June, 2007. (7), pp. 1-4. IEEE.
Abstract: In this paper, an approach for image analysis and classification is presented. It is based on a topology preserving approach to automatically create a relevance map from salient areas in natural images. It also derives a set of well-structured representations from low-level description to drive the final classification. The backbone of this approach is a distribution mapping strategy involving two basic modules: structured low-level feature extraction using convolution neural network and a topology preservation module based on a growing neural gas network. Classification is achieved by simulating the high-level top-down visual information perception in primates followed by incremental Bayesian parameter estimation. The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.
BibTeX:
@inproceedings{dong2007,
  author = {Dong, Le and Izquierdo, Ebroul},
  title = {A Topology Preserving Approach for Image Classification},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2007), Proceedings of the 8th International Workshop on},
  publisher = {IEEE},
  year = {2007},
  number = {7},
  pages = {1--4},
  note = {google scholar entry: 8th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2007). Santorini, Greece, 6-8 June 2007.},
  url = {http://www.ing.unibs.it/~cost292/pubs/wiamis07/007_ATOPOLOGYPRESERVINGAPPROACH.pdf},
  doi = {10.1109/WIAMIS.2007.13}
}
Dong L and Izquierdo E (2007), "A Knowledge Synthesizing Approach for Classification of Visual Information", In Advances in Visual Information Systems. Proceedings of the 9th International Conference on Visual Information Systems (VISUAL 2007). Shanghai, China, 28-29 June 2007. Revised Selected Papers. June, 2007. Vol. 4781, pp. 17-25. Springer.
Abstract: An approach for visual information analysis and classification is presented. It is based on a knowledge synthesizing technique to automatically create a relevance map from essential areas in natural images. It also derives a set of well-structured representations from low-level description to drive the final classification. The backbone of this approach is a distribution mapping strategy involving a knowledge synthesizing module based on an intelligent growing when required network. Classification is achieved by simulating the high-level top-down visual information perception in primates followed by incremental Bayesian parameter estimation. The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.
BibTeX:
@inproceedings{dong2007knowledge,
  author = {Dong, Le and Izquierdo, Ebroul},
  editor = {Qiu, Guoping and Leung, Clement and Xue, Xiangyang and Laurini, Robert},
  title = {A Knowledge Synthesizing Approach for Classification of Visual Information},
  booktitle = {Advances in Visual Information Systems. Proceedings of the 9th International Conference on Visual Information Systems (VISUAL 2007). Shanghai, China, 28-29 June 2007. Revised Selected Papers.},
  publisher = {Springer},
  year = {2007},
  volume = {4781},
  pages = {17--25},
  note = {google scholar entry: 9th International Conference on Visual Information Systems (VISUAL 2007). Shanghai, China, 28-29 June 2007.},
  url = {http://link.springer.com/chapter/10.1007/978-3-540-76414-4_3},
  doi = {10.1007/978-3-540-76414-4_3}
}
Grzegorzek M and Izquierdo E (2007), "Statistical 3D Object Classification and Localization with Context Modeling", In Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007). Poznań, Poland, September, 2007, pp. 1585-1589. European Association for Signal Processing (EURASIP).
Abstract: This contribution presents a probabilistic approach for automatic classification and localization of 3D objects in 2D multi-object images taken from a real world environment. In the training phase, statistical object models and statistical context models are learned separately. For the object modeling, the recognition system extracts local feature vectors from training images using the wavelet transformation and models them statistically by density functions. Since in contextual environments a-priori probabilities for occurrence of different objects cannot be assumed to be equal, statistical context modeling is introduced in this work. The a-priori occurrence probabilities are learned in the training phase and stored in so-called context models. In the recognition phase, the system determines the unknown number of objects in a multi-object scene first. Then, the object classification and localization are performed. Recognition results for experiments made on a real dataset with 3240 test images compare the performance of the system with and without consideration of the context modeling.
BibTeX:
@inproceedings{borges2007practical2,
  author = {Grzegorzek, Marcin and Izquierdo, Ebroul},
  editor = {Domański, Marek and Stasiński, Ryszard and Bartkowiak, Maciej},
  title = {Statistical 3D Object Classification and Localization with Context Modeling},
  booktitle = {Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007)},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2007},
  pages = {1585--1589},
  note = {google scholar entry: 15th European Signal Processing Conference (EUSIPCO 2007). Poznan, Poland, 3-7 September 2007.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/c3l-f02.pdf}
}
Izquierdo E, Chandramouli K, Grzegorzek M and Piatrik T (2007), "K-Space Content Management and Retrieval System", In Image Analysis and Processing Workshops (ICIAPW 2007), 14th International Conference on. Modena, Italy, September, 2007, pp. 131-136. IEEE.
Abstract: In this contribution, the so-called ``k-space content management and retrieval system'' is presented. This multimodal framework is being developed within the K-Space project. K-Space is a network of leading European research teams from academia and industry conducting integrative research and dissemination activities in semantic inference for automatic and semi-automatic annotation and retrieval of multimedia content. To the large number of the system functionalities count: automatic shot detection, key-frame extraction, mpeg-7 feature extraction, multimedia content annotation, video streaming, visual search and retrieval, relevance feedback, and visual content classification. First, the system features from the user point of view are presented. Second, the theoretical description of algorithms leading to these functionalities follows.
BibTeX:
@inproceedings{Izquierdo2007,
  author = {Izquierdo, Ebroul and Chandramouli, Krishna and Grzegorzek, Marcin and Piatrik, Tomas},
  title = {K-Space Content Management and Retrieval System},
  booktitle = {Image Analysis and Processing Workshops (ICIAPW 2007), 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  pages = {131--136},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4427489},
  doi = {10.1109/ICIAPW.2007.32}
}
Izquierdo E, Chandramouli K, Grzegorzek M and Piatrik T (2007), "K-Space Content Management and Retrieval System", In Image Analysis and Processing Workshops (ICIAPW 2007), Proceedings of the 14th International Conference on. Modena, Italy, September, 2007, pp. 131-136. IEEE.
Abstract: In this contribution, the so-called ``k-space content management and retrieval system'' is presented. This multimodal framework is being developed within the K-Space project. K-Space is a network of leading European research teams from academia and industry conducting integrative research and dissemination activities in semantic inference for automatic and semi-automatic annotation and retrieval of multimedia content. To the large number of the system functionalities count: automatic shot detection, key-frame extraction, mpeg-7 feature extraction, multimedia content annotation, video streaming, visual search and retrieval, relevance feedback, and visual content classification. First, the system features from the user point of view are presented. Second, the theoretical description of algorithms leading to these functionalities follows.
BibTeX:
@inproceedings{izquierdo2007k,
  author = {Izquierdo, Ebroul and Chandramouli, Krishna and Grzegorzek, Marcin and Piatrik, Tomas},
  title = {K-Space Content Management and Retrieval System},
  booktitle = {Image Analysis and Processing Workshops (ICIAPW 2007), Proceedings of the 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  pages = {131--136},
  note = {google scholar entry: 14th International Conference on Image Analysis and Processing Workshops (ICIAPW 2007). Modena, Italy, 10-13 September 2008.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4427489},
  doi = {10.1109/ICIAPW.2007.32}
}
Mrak M and Izquierdo E (2007), "Spatially Adaptive Wavelet Transform for Video Coding with Multi-Scale Motion Compensation", In Image Processing (ICIP 2007), Proceedings of the 14th International Conference on. San Antonio, TX, October, 2007. Vol. 2, pp. 317-320. IEEE.
Abstract: In this paper a technique that enables efficient synthesis of the prediction signal for application in multi-scale motion compensation is presented. The technique targets prediction of high-pass spatial subbands for motion compensation at higher scales. Since in the targeted framework these subbands are obtained by high-pass filtering of prediction signal in pixel domain, an adaptive approach for filtering is proposed to support decomposition of differently predicted frame areas. In this way an efficient application of different prediction modes at all scales used for compensation is enabled. Experimental results show that for fast sequences where such an application of different prediction modes is crucial, the proposed adaptive transform introduces significant objective and visual improvements.
BibTeX:
@inproceedings{mrak2007spatially,
  author = {Mrak, Marta and Izquierdo, Ebroul},
  title = {Spatially Adaptive Wavelet Transform for Video Coding with Multi-Scale Motion Compensation},
  booktitle = {Image Processing (ICIP 2007), Proceedings of the 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  volume = {2},
  pages = {317--320},
  note = {google scholar entry: 14th International Conference on Image Processing (ICIP 2007). San Antonio, Texas, 16-19 October 2007.},
  url = {http://nguyendangbinh.org/Proceedings/ICIP/2007/pdfs/0200317.pdf},
  doi = {10.1109/ICIP.2007.4379156}
}
Passino G and Izquierdo E (2007), "Conditional Random Fields for High-Level Part Correlation Analysis in Images", In Semantic Multimedia. Second International Conference on Semantic and Digital Media Technologies, SAMT 2007, Genoa, Italy, December 5-7, 2007. Proceedings. Genoa, Italy, December, 2007. Vol. 4816, pp. 264-267. Springer.
Abstract: A novel approach to model the semantic knowledge associated to objects detected in images is presented. The model is aimed at the classification of such objects according to contextual information combined to the extracted features. The system is based on Conditional Random Fields, a probabilistic graphical model used to model the conditional a-posteriori probability of the object classes, thus avoiding problems related to source modelling and features independence constraints. The novelty of the approach is in the addressing of the high-level, semantically rich objects interrelationships among image parts. This paper presents the application of the model to this new problem class and a first implementation of the system.
BibTeX:
@inproceedings{passino2007conditional,
  author = {Passino, Giuseppe and Izquierdo, Ebroul},
  editor = {Falcidieno, Bianca and Spagnuolo, Michela and Avrithis, Yannis and Kompatsiaris, Ioannis and Buitelaar, Paul},
  title = {Conditional Random Fields for High-Level Part Correlation Analysis in Images},
  booktitle = {Semantic Multimedia. Second International Conference on Semantic and Digital Media Technologies, SAMT 2007, Genoa, Italy, December 5-7, 2007. Proceedings},
  publisher = {Springer},
  year = {2007},
  volume = {4816},
  pages = {264--267},
  note = {google scholar entry: Second International Conference on Semantic and Digital Media Technologies (SAMT 2007). Genoa, Italy, 5-7 December 2007.},
  url = {http://link.springer.com/chapter/10.1007/978-3-540-77051-0_30},
  doi = {10.1007/978-3-540-77051-0_30}
}
Passino G and Izquierdo E (2007), "Patch-based Image Classification through Conditional Random Field Model", In Proceedings of the 3rd international conference on Mobile multimedia communications. Nafpaktos, Greece, August, 2007. (6), pp. 1-6. ICST.
Abstract: We present an image classification system based on a Conditional Random Field (CRF) model trained on simple features obtained from a small number of semantically representative image patches. The CRFs are very powerful to handle complex parts dependencies due to their approach based on the effective modelling of the source probability conditioned on the evidence data, and they have been applied successfully to image classification and segmentation tasks in presence of a large number of low level features. In this paper an agile system based on the application of CRFs to images coarsely segmented is introduced. The main advantage of the system is a reduction in the required training time, slightly sacrificing the classification accuracy. The model implementation is described, experimental results are presented and conclusions are drawn.
BibTeX:
@inproceedings{passino2007patch,
  author = {Passino, Giuseppe and Izquierdo, Ebroul},
  editor = {Dagiuklas, Tasos and Sklavos, Nicolas},
  title = {Patch-based Image Classification through Conditional Random Field Model},
  booktitle = {Proceedings of the 3rd international conference on Mobile multimedia communications},
  publisher = {ICST},
  year = {2007},
  number = {6},
  pages = {1--6},
  note = {google scholar entry: 3rd international conference on Mobile multimedia communications (MobiMedia 2007). Nafpaktos, Greece, 27-29 August 2007.},
  url = {https://sites.google.com/site/zeppethefake/publications/mobimedia07.pdf}
}
Ramzan N and Izquierdo E (2007), "Optimal Joint Source Channel Coding for Scalable Video Transmission over Wireless Channels", In Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007). Poznań, Poland, September, 2007, pp. 678-682. European Association for Signal Processing (EURASIP).
Abstract: In this paper, a robust and novel approach for optimal bit allocation between source and channel coding is proposed. The proposed approach consists of a wavelet-based scalable video coding framework and a forward error correction method based on the serial concatenation of LDPC codes and turbo codes. Turbo codes shows good performance at low signal to noise ratios but LDPC outperforms turbo codes at high signal to noise ratios. So the concatenation of LDPC and TC enhances the performance at both low and high signal to noise ratios. The scheme reduces the video distortion at the decoder under band-with constraints. The reduction is achieved by efficiently protecting the different quality layers from channel errors. Furthermore, an efficient decoding algorithm is proposed that reduces the decoding complexity of channel decoder. Experimental results clearly show that the proposed approach outperforms conventional forward error correction techniques.
BibTeX:
@inproceedings{ramzan2007optimal,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Domański, Marek and Stasiński, Ryszard and Bartkowiak, Maciej},
  title = {Optimal Joint Source Channel Coding for Scalable Video Transmission over Wireless Channels},
  booktitle = {Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007).},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2007},
  pages = {678--682},
  note = {google scholar entry: 15th European Signal Processing Conference (EUSIPCO 2007). Poznan, Poland, 3-7 September 2007.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/a5p-k05.pdf}
}
Ramzan N, Wan S and Izquierdo E (2007), "An Efficient Joint Source-Channel Coding for Wavelet Based Scalable Video", In Circuits and Systems (ISCAS 2007), Proceedings of the 40th IEEE International Symposium on. New Orleans, Louisiana, May, 2007, pp. 1505-1508.
Abstract: A robust and efficient approach for scalable video transmission over wireless channels is presented. The proposed approach jointly optimizes source and channel coding in order to minimize the overall end-to-end distortion. In particular, the forward error correction method based on turbo codes is considered. Aiming at improving the overall performance of the underlying joint source-channel coding, the combination of the channel coding rate, interleaver and packet size are optimized for turbo codes subject to a constraint on the overall transmission bitrate budget. Experimental results show that the proposed approach outperforms conventional forward error correction techniques at all bit error rates, even in very adverse conditions.
BibTeX:
@inproceedings{ramzan2007efficient,
  author = {Ramzan, Naeem and Wan, Shuai and Izquierdo, Ebroul},
  title = {An Efficient Joint Source-Channel Coding for Wavelet Based Scalable Video},
  booktitle = {Circuits and Systems (ISCAS 2007), Proceedings of the 40th IEEE International Symposium on},
  year = {2007},
  pages = {1505--1508},
  note = {google scholar entry: 40th IEEE International Symposium on Circuits and Systems (ISCAS 2007). New Orleans, Louisiana, 27-30 May 2007.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4252936},
  doi = {10.1109/ISCAS.2007.378589}
}
Ramzan N, Wan S and Izquierdo E (2007), "Error Robustness Scheme for Scalable Video Based on the Concatenation of LDPC and Turbo Codes", In Image Processing (ICIP 2007), Proceedings of the 14th International Conference on. San Antonio, TX, October, 2007. Vol. 6, pp. 521-524. IEEE.
Abstract: In this paper, a novel approach for transmission of scalable video over wireless channel is proposed. The proposed approach jointly optimises the bit allocation between a wavelet-based scalable video coding framework and a forward error correction codes. The forward error correction codes is based on the serial concatenation of LDPC codes and turbo codes. Turbo codes shows good performance at high error rates region but LDPC outperforms turbo codes at low error rates. So the concatenation of LDPC and TC enhances the performance at both low and high signal to noise ratios. The scheme minimizes the reconstructed video distortion at the decoder subject to a constraint on the overall transmission bitrate budget. The minimization is achieved by exploiting the source rate distortion characteristics and the statistics of the available codes. Furthermore, an efficient decoding algorithm is proposed. Experimental results clearly demonstrate the superiority of the proposed approach over conventional forward error correction techniques.
BibTeX:
@inproceedings{ramzan2007error,
  author = {Ramzan, Naeem and Wan, Shuai and Izquierdo, Ebroul},
  title = {Error Robustness Scheme for Scalable Video Based on the Concatenation of LDPC and Turbo Codes},
  booktitle = {Image Processing (ICIP 2007), Proceedings of the 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  volume = {6},
  pages = {521--524},
  note = {google scholar entry: 14th International Conference on Image Processing (ICIP 2007). San Antonio, Texas, 16-19 October 2007.},
  url = {http://nguyendangbinh.org/Proceedings/ICIP/2007/pdfs/0600521.pdf},
  doi = {10.1109/ICIP.2007.4379636}
}
Tameze C, Vincelette R, Melikechi N, Zeljkovic V and Izquierdo E (2007), "Empirical Analysis of Libs Images for Ovarian Cancer Detection", In Image Analysis for Multimedia Interactive Services (WIAMIS 2007), Proceedings of the 8th International Workshop on. Santorini, Greece, June, 2007. (76), pp. 1-4. IEEE.
Abstract: To develop a way to detect early epithelial ovarian cancer we expose blood samples to a laser. Using Laser induced breakdown spectroscopy, (LIBS) plasma images of the blood samples are generated and analyzed. In this paper we compare the images from blood specimens of cancer free mice to those of transgenic mice
BibTeX:
@inproceedings{tameze2007empirical,
  author = {Tameze, Claude and Vincelette, Robert and Melikechi, Noureddine and Zeljkovic, Vesna and Izquierdo, Ebroul},
  title = {Empirical Analysis of Libs Images for Ovarian Cancer Detection},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2007), Proceedings of the 8th International Workshop on},
  publisher = {IEEE},
  year = {2007},
  number = {76},
  pages = {1--4},
  note = {google scholar entry: 8th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2007). Santorini, Greece, 6-8 June 2007.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4279184},
  doi = {10.1109/WIAMIS.2007.40}
}
Wall J, McDaid LJ, Maguire LP and McGinnity TM (2007), "A spiking neural network implementation of sound localisation", In Proceedings of the 15th IET Irish Signals and Systems (ISSC 2007) Conference. June, 2007, pp. 19-23. IET.
Abstract: The focus of this paper is the implementation of a spiking neural network to achieve sound localization; the model is based on the influential short paper by Jeffress in 1948. The SNN has a two-layer topology which can accommodate a limited number of angles in the azimuthal plane. The model accommodates multiple inter-neuron connections with associated delays, and a supervised STDP algorithm is applied to select the optimal pathway for sound localization. Also an analysis of previous relevant work in the area of auditory modelling supports this research.
BibTeX:
@inproceedings{wall2007spiking,
  author = {Wall, Julie and McDaid, Liam J. and Maguire, Liam P. and McGinnity, Thomas M.},
  title = {A spiking neural network implementation of sound localisation},
  booktitle = {Proceedings of the 15th IET Irish Signals and Systems (ISSC 2007) Conference},
  publisher = {IET},
  year = {2007},
  pages = {19--23},
  note = {google scholar entry: 15th IET Irish Signals and Systems (ISSC 2007) Conference. Galway, Ireland, 18-19 June 2007.},
  url = {http://www.eecs.qmul.ac.uk/~juliew/pub/papers/ISSC2007.pdf}
}
Wilkins P, Adamek T, Byrne D, Jones GJF, Lee H, Keenan G, McGuinness K, O'Connor NE, Alan F. Smeatonand Amin A, Obrenovic Z, Benmokhtar R, Galmar E, Huet B, Essid S, Landais Ré, Vallet Fé, Papadopoulos Georgios Th.and Vrochidis S, Mezaris V, Kompatsiaris I, Spyrou E, Avrithis Y, Mörzinger R, Schallauer P, Bailer W, Piatrik Tomasand Chandramouli K, Izquierdo E, Haller M, Goldmann L, Samour A, Cobet A, Sikora T and Praks P (2007), "K-Space at TRECVid 2007", In TRECVID 2007 workshop participants notebook papers. Gaithersburg, Maryland, November, 2007, pp. 1-12. National Institute of Standards and Technology (NIST).
Abstract: In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a `shot' based interface, where the results from a query were presented as a ranked list of shots. The second interface was `broadcast' based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level feature submission as well as low-level visual features.
BibTeX:
@inproceedings{wilkins2007k,
  author = {Wilkins, Peter and Adamek, Tomasz and Byrne, Daragh and Jones, Gareth J. F. and Lee, Hyowon and Keenan, Gordon and McGuinness, Kevin and O'Connor, Noel E. and Alan F. Smeatonand Amin, Alia and Obrenovic, Zeljko and Benmokhtar, Rachid and Galmar, Eric and Huet, Benoit and Essid, Slim and Landais, Rémi and Vallet, Félicien and Papadopoulos, Georgios Th.and Vrochidis, Stefanos and Mezaris, Vasileios and Kompatsiaris, Ioannis and Spyrou, Evaggelos and Avrithis, Yannis and Mörzinger, Roland and Schallauer, Peter and Bailer, Werner and Piatrik, Tomasand Chandramouli, Krishna and Izquierdo, Ebroul and Haller, Martin and Goldmann, Lutz and Samour, Amjad and Cobet, Andreas and Sikora, Thomas and Praks, Pavel},
  title = {K-Space at TRECVid 2007},
  booktitle = {TRECVID 2007 workshop participants notebook papers},
  publisher = {National Institute of Standards and Technology (NIST)},
  year = {2007},
  pages = {1--12},
  note = {google scholar entry: 5th TRECVID Workshop (TRECVID 2007). Gaithersburg, Maryland, 5-6 November 2007.},
  url = {http://www-nlpir.nist.gov/projects/tvpubs/tv7.papers/kspace.pdf}
}
Wilkins P, Adamek T, Byrne D, Jones GJF, Lee H, Keenan G, McGuinness K, O'Connor NE, Smeaton AF, Amin A, Obrenovic Z, Benmokhtar R, Galmar E, Huet B, Essid S, Landais Ré, Vallet Fé, Papadopoulos GT, Vrochidis S, Mezaris V, Kompatsiaris I, Spyrou E, Avrithis YS, Mörzinger R, Schallauer P, Bailer W, Piatrik T, Chandramouli K, Izquierdo E, Haller M, Goldmann L, Samour A, Cobet A, Sikora T and Praks P (2007), "K-Space at "TRECVid" 2007", In "TRECVid" 2007 workshop participants notebook papers. Gaithersburg, Maryland, November, 2007, pp. 1-12. NIST.
Abstract: In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a �shot� based interface, where the results from a query were presented as a ranked list of shots. The second interface was �broadcast� based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level feature submission as well as low-level visual features.
BibTeX:
@inproceedings{Wilkins2007,
  author = { Peter Wilkins and Tomasz Adamek and Daragh Byrne and Gareth J. F. Jones and Hyowon Lee and Gordon Keenan and Kevin McGuinness and Noel E. O'Connor and Alan F. Smeaton and Alia Amin and Zeljko Obrenovic and Rachid Benmokhtar and Eric Galmar and Benoit Huet and Slim Essid and Rémi Landais and Félicien Vallet and Georgios Th. Papadopoulos and Stefanos Vrochidis and Vasileios Mezaris and Ioannis Kompatsiaris and Evaggelos Spyrou and Yannis S. Avrithis and Roland Mörzinger and Peter Schallauer and Werner Bailer and Tomas Piatrik and Krishna Chandramouli and Ebroul Izquierdo and Martin Haller and Lutz Goldmann and Amjad Samour and Andreas Cobet and Thomas Sikora and Pavel Praks },
  editor = {Over, Paul and Awad, George and Kraaij, Wessel and Smeaton, Alan F.},
  title = {K-Space at "TRECVid" 2007},
  booktitle = {"TRECVid" 2007 workshop participants notebook papers},
  publisher = {NIST},
  year = {2007},
  pages = {1--12},
  note = {google scholar entry: TRECVid 2007 Text REtrieval Conference (TRECVid Workshop). Gaithersburg, Maryland, 5-6 November 2007},
  url = {http://doras.dcu.ie/432/}
}
Yang F, Wan S and Izquierdo E (2007), "Lagrange Multiplier Selection for 3-D Wavelet Based Scalable Video Coding", In Image Processing (ICIP 2007), Proceedings of the 14th International Conference on. San Antonio, Texas, October, 2007. Vol. 2, pp. 309-312. IEEE.
Abstract: In this paper a thorough analysis on the theoretical rate distortion model and the rate distortion performance in an open-loop structure is conducted. A Lagrange multiplier selection for 3D wavelet based scalable video coding is then derived. The proposed Lagrange multiplier is adaptive with respect to the characteristics of video content. Furthermore, it is especially suitable for 3-D wavelet based scalable video coding where quantisation steps are unavailable. Extensive experimental results have demonstrated the effectiveness of the proposed Lagrange multiplier selection.
BibTeX:
@inproceedings{yang2007lagrange,
  author = {Yang, Fuzheng and Wan, Shuai and Izquierdo, Ebroul},
  title = {Lagrange Multiplier Selection for 3-D Wavelet Based Scalable Video Coding},
  booktitle = {Image Processing (ICIP 2007), Proceedings of the 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  volume = {2},
  pages = {309--312},
  note = {google scholar entry: 14th International Conference on Image Processing (ICIP 2007). San Antonio, Texas, 16-19 October 2007.},
  url = {http://nguyendangbinh.org/Proceedings/ICIP/2007/pdfs/0200309.pdf},
  doi = {10.1109/ICIP.2007.4379154}
}
Yang F, Wan S and Izquierdo E (2007), "Optimised Motion Estimation for Robust Video Coding in Packet Loss Environment", In Image Analysis for Multimedia Interactive Services (WIAMIS 2007), Proceedings of the 8th International Workshop on. Santorini, Greece, June, 2007. (50), pp. 1-4. IEEE.
Abstract: An improved motion estimation method for robust video coding is proposed. It is designed to enhance packet loss resilience in lossy environments. The proposed motion estimation is based on an end-to-end rate-distortion optimisation framework. Initially, the end-to-end reconstructed distortion is efficiently estimated considering the distortion caused by quantisation, error propagation, and error concealment. Then, the total bit rate for coding the residual is estimated using a quadratic rate-distortion model. The results are incorporated into the Lagrangian optimisation during motion estimation. Optimised motion vectors are selected in the sense of coding efficiency and robustness. A comparative evaluation with conventional robust video coding techniques was conducted. The experimental results demonstrate a superior performance of the proposed method.
BibTeX:
@inproceedings{yang2007optimised,
  author = {Yang, Fuzheng and Wan, Shuai and Izquierdo, Ebroul},
  title = {Optimised Motion Estimation for Robust Video Coding in Packet Loss Environment},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2007), Proceedings of the 8th International Workshop on},
  publisher = {IEEE},
  year = {2007},
  number = {50},
  pages = {1--4},
  note = {google scholar entry: 8th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2007). Santorini, Greece, 6-8 June 2007.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4279184},
  doi = {10.1109/WIAMIS.2007.65}
}
Ye X, Lin X, Beddoe G and Dehmeshki J (2007), "Efficient Computer-Aided Detection of Ground-Glass Opacity Nodules in Thoracic CT Images", In Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE. Lyon, France, august, 2007, pp. 4449-4452. IEEE.
Abstract: In this paper, an efficient compute-aided detection method is proposed for detecting ground-glass opacity (GGO) nodules in thoracic CT images. GGOs represent a clinically important type of lung nodule which are ignored by many existing CAD systems. Anti-geometric diffusion is used as preprocessing to remove image noise. Geometric shape features (such as shape index and dot enhancement), are calculated for each voxel within the lung area to extract potential nodule concentrations. Rule based filtering is then applied to remove false positive regions. The proposed method has been validated on a clinical dataset of 50 thoracic CT scans that contains 52 GGO nodules. A total of 48 nodules were correctly detected and resulted in an average detection rate of 92.3%, with the number of false positives at approximately 12.7/scan (0.07/slice). The high detection performance of the method suggested promising potential for clinical applications.
BibTeX:
@inproceedings{ye2007efficient,
  author = {Xujiong Ye and Xinyu Lin and Beddoe, Gareth and Dehmeshki, Jamshid},
  title = {Efficient Computer-Aided Detection of Ground-Glass Opacity Nodules in Thoracic CT Images},
  booktitle = {Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE},
  publisher = {IEEE},
  year = {2007},
  pages = {4449--4452},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4353326},
  doi = {10.1109/IEMBS.2007.4353326}
}
Zgaljic T, Mrak M and Izquierdo E (2007), "Optimised Compression Strategy in Wavelet-Based Video Coding using Improved Context Models", In Image Processing, 2007. ICIP 2007. IEEE International Conference on. San Antonio, Texas, November, 2007. Vol. 3, pp. 401-404. IEEE.
Abstract: Accurate probability estimation is a key to efficient compression in entropy coding phase of state-of-the-art video coding systems. Probability estimation can be enhanced if contexts in which symbols occur are used during the probability estimation phase. However, these contexts have to be carefully designed in order to avoid negative effects. Methods that use tree structures to model contexts of various syntax elements have been proven efficient in image and video coding. In this paper we use such structure to build optimised contexts for application in scalable wavelet-based video coding. With the proposed approach context are designed separately for intra-coded frames and motion-compensated frames considering varying statistics across different spatio-temporal subbands. Moreover, contexts are separately designed for different bit-planes. Comparison with compression using fixed contexts from embedded ZeroBlock coding (EZBC) has been performed showing improvements when context modelling on tree structures is applied.
BibTeX:
@inproceedings{Zgaljic2007,
  author = {Toni Zgaljic and Marta Mrak and Ebroul Izquierdo},
  title = {Optimised Compression Strategy in Wavelet-Based Video Coding using Improved Context Models},
  booktitle = {Image Processing, 2007. ICIP 2007. IEEE International Conference on},
  publisher = {IEEE},
  year = {2007},
  volume = {3},
  pages = {401--404},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4379331},
  doi = {10.1109/ICIP.2007.4379331}
}
Zgaljic T, Mrak M and Izquierdo E (2007), "Influence of Downsampling Filters Characteristics on Compression Performance in Scalable Video Coding", In Visual Information Engineering (VIE 2007), Proceedings of the 4th International Conference on. London, England, July, 2007, pp. 119-129.
Abstract: The application of different downsampling filters in video coding directly models visual information at lower resolutions and influences the compression performance of a chosen coding system. In wavelet based scalable video coding the spatial scalability is achieved by the application of wavelets as downsampling filters. However, characteristics of different wavelets influence the performance at targeting spatio-temporal decoding points. In this paper an analysis of different downsampling filter in popular wavelet-based scalable video coding schemes is presented. Evaluation is performed for both intra and inter coding schemes using wavelets and standard downsampling strategies. Based on the obtained results a new concept of inter-resolution prediction is proposed which maximises the performance at all decoding points using combination of standard downsampling filters and wavelet-based coding.
BibTeX:
@inproceedings{zgaljic2007influence,
  author = {Zgaljic, Toni and Mrak, Marta and Izquierdo, Ebroul},
  title = {Influence of Downsampling Filters Characteristics on Compression Performance in Scalable Video Coding},
  booktitle = {Visual Information Engineering (VIE 2007), Proceedings of the 4th International Conference on},
  year = {2007},
  pages = {119--129},
  note = {google scholar entry: 4th International Conference on Visual Information Engineering (VIE 2007). London, England, 25-27 July 2007.}
}
Zgaljic T, Mrak M and Izquierdo E (2007), "Optimised Compression Strategy in Wavelet-Based Video Coding using Improved Context Models", In Image Processing (ICIP 2007), Proceedings of the 14th International Conference on. San Antonio, TX, October, 2007. Vol. 3, pp. 401-404. IEEE.
Abstract: Accurate probability estimation is a key to efficient compression in entropy coding phase of state-of-the-art video coding systems. Probability estimation can be enhanced if contexts in which symbols occur are used during the probability estimation phase. However, these contexts have to be carefully designed in order to avoid negative effects. Methods that use tree structures to model contexts of various syntax elements have been proven efficient in image and video coding. In this paper we use such structure to build optimised contexts for application in scalable wavelet-based video coding. With the proposed approach context are designed separately for intra-coded frames and motion-compensated frames considering varying statistics across different spatio-temporal subbands. Moreover, contexts are separately designed for different bit-planes. Comparison with compression using fixed contexts from embedded ZeroBlock coding (EZBC) has been performed showing improvements when context modelling on tree structures is applied.
BibTeX:
@inproceedings{zgaljic2007optimised,
  author = {Zgaljic, Toni and Mrak, Marta and Izquierdo, Ebroul},
  title = {Optimised Compression Strategy in Wavelet-Based Video Coding using Improved Context Models},
  booktitle = {Image Processing (ICIP 2007), Proceedings of the 14th International Conference on},
  publisher = {IEEE},
  year = {2007},
  volume = {3},
  pages = {401--404},
  note = {google scholar entry: 14th International Conference on Image Processing (ICIP 2007). San Antonio, Texas, 16-19 October 2007.},
  url = {http://ieeexplore.ieee.org//xpl/articleDetails.jsp?arnumber=4379331},
  doi = {10.1109/ICIP.2007.4379331}
}
Zgaljic T, Mrak M and Izquierdo E (2007), "Towards Optimised Context Selection in Scalable Wavelet Based Video Coding", In Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007). Poznań, Poland, September, 2007, pp. 1412-1416. European Association for Signal Processing (EURASIP).
Abstract: The effectiveness of arithmetic coding in image and video compression depends on probability estimation of symbols to be encoded. Considering context in which these symbols occur can lead to improved compression. In wavelet-based coding same contexts are usually used for all wavelet subbands of same type across different scales of wavelet transform. Efficiency of different strategies for context selection has not yet been fully analysed for application in scalable wavelet-based video coding. In this paper an algorithm for context modelling based on tree structure is adapted for optimisation of contexts for symbols generated during Embedded ZeroBlock Coding (EZBC) of wavelet coefficients. With the proposed technique optimised contexts have been adaptively obtained for different wavelet subbands and EZBC quadtree levels. Comparison with predefined context models, which are used in common approach, shows that context models based on tree structure together with advanced modelling strategy significantly improve overall compression.
BibTeX:
@inproceedings{zgaljic2007towards,
  author = {Zgaljic, Toni and Mrak, Marta and Izquierdo, Ebroul},
  editor = {Domański, Marek and Stasiński, Ryszard and Bartkowiak, Maciej},
  title = {Towards Optimised Context Selection in Scalable Wavelet Based Video Coding},
  booktitle = {Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007)},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2007},
  pages = {1412--1416},
  note = {google scholar entry: 15th European Signal Processing Conference (EUSIPCO 2007). Poznan, Poland, 3-7 September 2007.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/c2l-f01.pdf}
}
Zgaljic T, Mrak M and Izquierdo E (2007), "User Driven Systems to Bridge the Semantic Gap", In Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007). Poznań, Poland, September, 2007, pp. 718-722. European Association for Signal Processing (EURASIP).
Abstract: In this tutorial relevant development in user-driven image annotation and retrieval is reviewed. This includes descriptive learning models starting from empirical parameter adaptation to approaches considering optimisations and complex parametric as well as non-parametric distributions. The review also includes discriminative learning models focusing on estimating the boundaries between classes rather then exact class distribution. A new approach to infer semantic concepts in images is also described. This approach draws on several important ideas including multi-feature space, learning technique pertaining to user provided relevance and an object based modelling to link semantic terms and visual objects.
BibTeX:
@inproceedings{zgaljic2007towards2,
  author = {Zgaljic, Toni and Mrak, Marta and Izquierdo, Ebroul},
  editor = {Domański, Marek and Stasiński, Ryszard and Bartkowiak, Maciej},
  title = {User Driven Systems to Bridge the Semantic Gap},
  booktitle = {Proceedings of the 15th European Signal Processing Conference (EUSIPCO 2007)},
  publisher = {European Association for Signal Processing (EURASIP)},
  year = {2007},
  pages = {718--722},
  note = {google scholar entry: 15th European Signal Processing Conference (EUSIPCO 2007). Poznan, Poland, 3-7 September 2007.},
  url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2007/Papers/b1l-b03.pdf}
}
Zhang Q, Chandramouli K, Damnjanovic U, Piatrik T, Izquierdo E, Corvaglia M, Adami N, Leonardi R, Yakin G, Aksoy S, Naci SU, Hanjalic A, Vrochidis S, Moumtzidou A, Nikolopoulos S, Mezaris V, Makris L, Kompatsiaris I, Mansencal B, Benois-Pineau J, Esen E, Alatan AA, Spyrou E, Kapsalas P, Tolias G, Mylonas P, Avrithis YS, Reljin B, Zajic G, Pinheiro AMG, Alexandre LA, Almeida P, Jarina R, Kuba M, Aginako N and Goya J (2007), "The COST292 experimental framework for TRECVID 2007", In TRECVID 2007 workshop participants notebook papers. Gaithersburg, Maryland, November, 2007, pp. 1-16. National Institute of Standards and Technology (NIST).
Abstract: In this paper, we give an overview of the four tasks submitted to TRECVID 2007 by COST292. In shot boundary (SB) detection task, four SB detectors have been developed and the results are merged using two merging algorithms. The framework developed for the high-level feature extraction task comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a Bayesian classifier trained with a ``bag of subregions''. The third system uses a multi-modal classifier based on SVMs and several descriptors. The fourth system uses two image classifiers based on ant colony optimisation and particle swarm optimisation respectively. The system submitted to the search task is an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. Finally, the rushes task submission is based on a video summarisation and browsing system comprising two different interest curve algorithms and three features.
BibTeX:
@inproceedings{zhang2007cost292,
  author = {Zhang, Qianni and
Chandramouli, Krishna and
Damnjanovic, Uros and
Piatrik, Tomas and
Izquierdo, Ebroul and
Corvaglia, Marzia and
Adami, Nicola and
Leonardi, Riccardo and
Yakin, G. and
Aksoy, Selim and
Naci, Suphi Umut and
Hanjalic, Alan and
Vrochidis, Stefanos and
Moumtzidou, Anastasia and
Nikolopoulos, Spiros and
Mezaris, Vasileios and
Makris, Lambros and
Kompatsiaris, Ioannis and
Mansencal, Boris and
Benois-Pineau, Jenny and
Esen, Ersin and
Alatan, A. Aydın and
Spyrou, Evaggelos and
Kapsalas, P. and
Tolias, Giorgos and
Mylonas, Phivos and
Avrithis, Yannis S. and
Reljin, Branimir and
Zajic, Goran and
Pinheiro, António M. G. and
Alexandre, L. A. and
Almeida, P. and
Jarina, Roman and
Kuba, Michal and
Aginako, Naiara and
Goya, J.
}, title = {The COST292 experimental framework for TRECVID 2007}, booktitle = {TRECVID 2007 workshop participants notebook papers}, publisher = {National Institute of Standards and Technology (NIST)}, year = {2007}, pages = {1--16}, note = {google scholar entry: 5th TRECVID Workshop (TRECVID 2007). Gaithersburg, Maryland, 5-6 November 2007.}, url = {http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.7.org.html} }
Zhang Q and Izquierdo E (2007), "Adaptive Salient Block Based Image Retrieval in Multi-Feature Space", In Content-Based Multimedia Indexing (CBMI 2007), Proceedings of the 5th International Workshop on. Bordeaux, France, June, 2007, pp. 106-113. IEEE.
Abstract: In this paper, an approach to tackle the object based image retrieval problem is proposed. The core technique is designed to adaptively and efficiently locate the salient block of objects of interest in each image. The salient blocks are then used as cues for representing the whole images when semantic-based searching is performed. Relevance Feedback is seamlessly integrated in the retrieval process. In each iteration, the user is requested to select images relevant to the query concept. Salient blocks of the selected images are used as training examples. To guarantee the accuracy of salient block matching, the similarities of block regions are calculated within an optimised concept-specific multi-feature space. In the multi-feature space, it is expected that the visual patterns of objects of interest can be effectively discriminated from irrelevant regions. This multi-feature space metric is learned from a group of representative salient blocks using a multi-objective optimisation approach. An empirical assessment of the proposed technique was conducted. Selected results show good performance of the proposed approach.
BibTeX:
@inproceedings{zhang2007adaptive2,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Adaptive Salient Block Based Image Retrieval in Multi-Feature Space},
  booktitle = {Content-Based Multimedia Indexing (CBMI 2007), Proceedings of the 5th International Workshop on},
  publisher = {IEEE},
  year = {2007},
  pages = {106--113},
  note = {google scholar entry: 5th International Workshop on Content-Based Multimedia Indexing (CBMI 2008). Bordeaux, France, 25-27 June 2007.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4275062},
  doi = {10.1109/CBMI.2007.385399}
}
Zhang Q and Izquierdo E (2007), "Context Inference in Region-based Image Retrieval", In Semantic Media Adaptation and Personalization (SMAP 2007), Second International Workshop on. London, England, December, 2007, pp. 187-192.
Abstract: In this paper, a method for inference of high-level semantic information for image annotation and retrieval is proposed. Bayesian theory is used as a tool to model a belief network to configure semantic labels for image regions. These semantic labels for regions are obtained from a multi visual feature-based object detection approach. The aim is to model potential semantic descriptions of basic objects in the images, the dependencies between them, and the conditional probabilities involved in those dependencies. This information is then used to calculate the probabilities of the effects that those objects have on each other in order to obtain more precise and meaningful semantic labels for the whole images. However, the proposed method is not restricted to the specific region-based approach used in this paper. Rather, the proposed method can be applied in any region-based image retrieval systems. Selected experimental results are presented to show the improved retrieval performance of the proposed method.
BibTeX:
@inproceedings{zhang2007context,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Mylonas, Phivos and Wallace, Manolis and Angelides, Marios},
  title = {Context Inference in Region-based Image Retrieval},
  booktitle = {Semantic Media Adaptation and Personalization (SMAP 2007), Second International Workshop on},
  year = {2007},
  pages = {187--192},
  note = {google scholar entry: 2nd International Workshop on Semantic Media Adaptation and Personalization (SMAP 2007). London, England, 17-18 December 2007.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4414408},
  doi = {10.1109/SMAP.2007.48}
}

Theses and Monographs

Zhang Q (2007), "Multi-feature Space Optimisation and Semantic Inference for Visual Information Retrieval". Thesis at: Queen Mary University of London.
BibTeX:
@phdthesis{zhang2007multi,
  author = {Zhang, Qianni},
  title = {Multi-feature Space Optimisation and Semantic Inference for Visual Information Retrieval},
  school = {Queen Mary University of London},
  year = {2007},
  note = {google scholar: fix error in title}
}


2006

Journal Papers

Djordjevic D and Izquierdo E (2006), "Kernels in structured multi-feature spaces for image retrieval", Electronics Letters. July, 2006. Vol. 42(15), pp. 856-857. IET.
Abstract: A new kernel for structured multi-feature spaces is introduced. It exploits the diversity of information encapsulated in different features. The mathematical validity of the introduced kernel is proven in the context of a conventional convex optimisation problem for support vector machines. Computer simulations show high performance for classification of images.
BibTeX:
@article{djordjevic2006kernels,
  author = {Djordjevic, Divna and Izquierdo, Ebroul},
  title = {Kernels in structured multi-feature spaces for image retrieval},
  journal = {Electronics Letters},
  publisher = {IET},
  year = {2006},
  volume = {42},
  number = {15},
  pages = {856--857},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1683658},
  doi = {10.1049/el:20061454}
}
Dorado A, Djordjevic D, Pedrycz W and Izquierdo E (2006), "Efficient image selection for concept learning", IEE Proceedings - Vision, Image and Signal Processing. June, 2006. Vol. 153(3), pp. 263-273. IET.
Abstract: In semantic-based image classification, learning concepts from features is an ongoing challenge for researchers and practitioners in different communities such as pattern recognition, machine learning and image analysis, among others. Concepts are used to add knowledge to the image descriptions linking high- and low-level numerical interpretation of the image content. Augmented descriptions are useful to perform more 'intelligent' processing on large-scale image databases. The semantic component casts the classification into the supervised or learning-from-examples paradigm, in which the classifier obtains knowledge by generalising specific facts presented in a number of design samples (or training patterns). Consequently, selection of suitable samples becomes a critical design step. The introduced framework exploits the capability of support vector classifiers to learn from relatively small number of patterns. Classifiers make decisions based on low-level descriptions containing only some image content information (e.g. colour, texture, shape). Therefore there is a clear drawback in collecting image samples by just using random visual observation and ignoring any low-level feature similarity. Moreover, this sort of approach set-up could lead to sub-optimal training data sets. The presented framework uses unsupervised learning to organise images based on low-level similarity, in effort to assist a professional annotator in picking positive and negative samples for a given concept. Active learning to refine the classifier model follows this initial design step. The framework shows promising results as an efficient approach in selecting design samples for semantic image description and classification.
BibTeX:
@article{dorado2006efficient,
  author = {Dorado, Andres and Djordjevic, Divna and Pedrycz, Witold and Izquierdo, Ebroul},
  title = {Efficient image selection for concept learning},
  journal = {IEE Proceedings - Vision, Image and Signal Processing},
  publisher = {IET},
  year = {2006},
  volume = {153},
  number = {3},
  pages = {263--273},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1633693},
  doi = {10.1049/ip-vis:20050057}
}
Izquierdo E (2006), "Knowledge-based digital media processing", IEE Proceedings - Vision, Image and Signal Processing. June, 2006. Vol. 153(3), pp. 253-254. IET.
Abstract: This Special Section presents work on integrative research aimed at low-level analysis, classification and semantic-based structuring of digital media. Most of the work presented in this section has originated in two large international cooperative projects funded by the European Commission under the sixth framework programme of the Information Society Technology: aceMedia and COST292. The mandate and scope of these two projects is very generic, embracing several applications that rely on technology for bridging the gap.
BibTeX:
@article{izquierdo2006knowledge,
  author = {Izquierdo, Ebroul},
  title = {Knowledge-based digital media processing},
  journal = {IEE Proceedings - Vision, Image and Signal Processing},
  publisher = {IET},
  year = {2006},
  volume = {153},
  number = {3},
  pages = {253--254},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/izquierdo2006knowledge.pdf},
  doi = {10.1049/ip-vis:20069007}
}
Pantic M and Patras I (2006), "Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences", Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on. April, 2006. Vol. 36(2), pp. 433-449. IEEE.
Abstract: Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypic emotional expressions, such as anger and happiness. Instead of representing another approach to machine analysis of prototypic facial expressions of emotion, the method presented in this paper attempts to handle a large range of human facial behavior by recognizing facial muscle actions that produce expressions. Virtually all of the existing vision systems for facial muscle action detection deal only with frontal-view face images and cannot handle temporal dynamics of facial actions. In this paper, we present a system for automatic recognition of facial action units (AUs) and their temporal models from long, profile-view face image sequences. We exploit particle filtering to track 15 facial points in an input face-profile sequence, and we introduce facial-action-dynamics recognition from continuous video input using temporal rules. The algorithm performs both automatic segmentation of an input video into facial expressions pictured and recognition of temporal segments (i.e. onset, apex, offset) of 27 AUs occurring alone or in a combination in the input face-profile video. A recognition rate of 87% is achieved.
BibTeX:
@article{pantic2006dynamics,
  author = {Pantic, Maja and Patras, Ioannis},
  title = {Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences},
  journal = {Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on},
  publisher = {IEEE},
  year = {2006},
  volume = {36},
  number = {2},
  pages = {433--449},
  url = {http://www.eecs.qmul.ac.uk/~ioannisp/pubs/ecopies/PanticPatras-PPHB-SMCB-2005-FINAL.pdf},
  doi = {10.1109/TSMCB.2005.859075}
}
Rodriguez R, Castillo PJ, Guerra V, Suárez AG and Izquierdo E (2006), "Two Robust Techniques for Segmentation of Biomedical Images", Computación y Sistemas. Vol. 9(4), pp. 355-369. UNAM.
Abstract: Abstract Image segmentation plays an important role in many systems of computer vision. According to criterions of many authors the segmentation finishes when it satisfies the goals of the observer. For that reason, an only method there is not able of solving all the problems that exists in the present time. In this work, we carry out a comparison between two segmentation techniques; namely, through the mean shift, where we give a new algorithm, and by using spectral methods. In the paper we discuss, through examples with biomedical real images, the advantages and disadvantages of them
BibTeX:
@article{rodriguez2006two,
  author = {Rodriguez, Roberto and Castillo, Patricio J. and Guerra, Valia and Suárez, Ana G. and Izquierdo, Ebroul},
  title = {Two Robust Techniques for Segmentation of Biomedical Images},
  journal = {Computación y Sistemas},
  publisher = {UNAM},
  year = {2006},
  volume = {9},
  number = {4},
  pages = {355--369},
  note = {google scholar author list: Rodr�guez, Roberto; Castillo, Patricio J; Guerra, Valia; Su�rez, Ana G.; Izquierdo, Ebroul},
  url = {http://scielo.unam.mx/pdf/cys/v9n4/v9n4a6.pdf}
}
Trujillo M and Izquierdo E (2006), "Improving the Efficiency of a Least Median of Squares Schema for the Estimation of the Fundamental Matrix", International Journal of Pattern Recognition and Artificial Intelligence. August, 2006. Vol. 20(5), pp. 633-648. World Scientific.
Abstract: A robust and efficient approach to estimate the fundamental matrix is proposed. The main goal is to reduce the computational cost involved in the estimation when robust schemas are applied. The backbone of the proposed technique is the conventional Least Median of Squares (LMedS) technique. It is well known that the LMedS is one of the most robust regressors for highly contaminated data and unstable models. Unfortunately, its computational complexity renders it useless for practical applications. To overcome this problem, a small number of low-dimensionality least-square problems are solved using well-selected subsets from the input data. The results of this initial approach are fed into the LMedS schema, which is applied to recover the final estimation of the Fundamental matrix. The complexity is substantially reduced by applying a selection process based on an effective statistical analysis of the inherent correlation of the input data. This analysis is used to define a suitable clustering of the data and to drive the subset selection aiming at the reduction of the search space in the LMedS schema. It is shown that avoiding redundancies better estimates can be obtained while keeping the computational cost low. Selected results of computer experiments were conducted to assess the performance of the proposed technique.
BibTeX:
@article{trujillo2006improving,
  author = {Trujillo, Maria and Izquierdo, Ebroul},
  title = {Improving the Efficiency of a Least Median of Squares Schema for the Estimation of the Fundamental Matrix},
  journal = {International Journal of Pattern Recognition and Artificial Intelligence},
  publisher = {World Scientific},
  year = {2006},
  volume = {20},
  number = {5},
  pages = {633--648},
  url = {http://www.worldscientific.com/doi/abs/10.1142/S0218001406004922},
  doi = {10.1142/S0218001406004922}
}

Books and Chapters in Books

Izquierdo E (2006), "Fragile Watermarking for Image Authentication", In Multimedia Watermarking Techniques And Applications. Boca Raton, Florida , pp. 229-256. CRC Press.
BibTeX:
@incollection{furht2006multimedia,
  author = {Izquierdo, Ebroul},
  editor = {Furht, Borko and Kirovski, Darko},
  title = {Fragile Watermarking for Image Authentication},
  booktitle = {Multimedia Watermarking Techniques And Applications},
  publisher = {CRC Press},
  year = {2006},
  pages = {229--256},
  url = {http://www.crcpress.com/product/isbn/9780849372131}
}
Mrak M and Izquierdo E (2006), "Scalable Video Coding", In Encyclopedia of Multimedia, pp. 759-765. Springer.
BibTeX:
@incollection{mrak2006scalable,
  author = {Marta Mrak and Ebroul Izquierdo},
  editor = {Furht, Borko},
  title = {Scalable Video Coding},
  booktitle = {Encyclopedia of Multimedia},
  publisher = {Springer},
  year = {2006},
  pages = {759--765},
  doi = {10.1007/0-387-30038-4_207}
}

Conference Papers

Calic J, Kraemer P, Naci SU, Vrochidis S, Aksoy S, Zhang Q, Benois-Pineau J, Saracoglu A, Doulaverakis C, Jarina R, Campbell N, Mezaris V, Kompatsiaris I, Spyrou E, Koumoulos G, Avrithis Y, Dalkilic A, Alatan AA, Hanjalic A and Izquierdo E (2006), "COST292 experimental framework for TRECVID 2006", In 4th TRECVID Workshop. Gaithersburg,Maryland NIST.
Abstract: In this paper we give an overview of the four TRECVID tasks submitted by COST292, European network of institutions in the area of semantic multimodal analysis and retrieval of digital video media. Initially, we present shot boundary evaluation method based on results merged using a confidence measure. The two SB detectors user here are presented, one of the Technical University of Delft and one of the LaBRI, University of Bordeaux 1, followed by the description of the merging algorithm. The high-level feature extraction task comprises three separate systems. The first system, developed by the National Technical University of Athens (NTUA) utilises a set of MPEG-7 low-level descriptors and Latent Semantic Analysis to detect the features. The second system, developed by Bilkent University, uses a Bayesian classifier trained with a ``bag of subregions'' for each keyframe. The third system by the Middle East Technical University (METU) exploits textual information in the video using character recognition methodology. The system submitted to the search task is an interactive retrieval application developed by Queen Mary University of London, University of Zilina and ITI from Thessaloniki, combining basic retrieval functionalities in various modalities (i.e. visual, audio, textual) with a user interface supporting the submission of queries using any combination of the available retrieval tools and the accumulation of relevant retrieval results over all queries submitted by a single user during a specified time interval. Finally, the rushes task submission comprises a video summarisation and browsing system specifically designed to intuitively and efficiently presents rushes material in video production environment. This system is a result of joint work of University of Bristol, Technical University of Delft and LaBRI, University of Bordeaux 1.
BibTeX:
@inproceedings{Calic2006,
  author = {Calic, Janko and Kraemer, Petra and Naci, Suphi Umut and Vrochidis, Stefanos and Aksoy, Selim and Zhang, Qianni and Benois-Pineau, Jenny and Saracoglu, Ahmet and Doulaverakis, Charalampos and Jarina, Roman and Campbell, Neill and Mezaris, Vasileios and Kompatsiaris, Ioannis and Spyrou, Evaggelos and Koumoulos, George and Avrithis, Yannis and Dalkilic, A and Alatan, A. Aydın and Hanjalic, Alan and Izquierdo, Ebroul},
  title = {COST292 experimental framework for TRECVID 2006},
  booktitle = {4th TRECVID Workshop},
  publisher = {NIST},
  year = {2006},
  url = {http://www-nlpir.nist.gov/projects/tvpubs/tv6.papers/cost292.pdf}
}
Calic J, Kraemer P, Naci Suphi Umutand Vrochidis S, Aksoy S, Zhang Q, Benois-Pineau J, Saracoglu A, Doulaverakis C, Jarina R, Campbell N, Mezaris V, Kompatsiaris I, Spyrou E, Koumoulos G, Avrithis Y, Dalkilic A, Alatan AA, Hanjalic A and Izquierdo E (2006), "COST292 experimental framework for TRECVID 2006", In TRECVID 2006 workshop participants notebook papers. Gaithersburg, Maryland, November, 2006, pp. 1-15. National Institute of Standards and Technology (NIST).
Abstract: In this paper we give an overview of the four TRECVID tasks submitted by COST292, European network of institutions in the area of semantic multimodal analysis and retrieval of digital video media. Initially, we present shot boundary evaluation method based on results merged using a confidence measure. The two SB detectors user here are presented, one of the Technical University of Delft and one of the LaBRI, University of Bordeaux 1, followed by the description of the merging algorithm. The high-level feature extraction task comprises three separate systems. The first system, developed by the National Technical University of Athens (NTUA) utilises a set of MPEG-7 low-level descriptors and Latent Semantic Analysis to detect the features. The second system, developed by Bilkent University, uses a Bayesian classifier trained with a ``bag of subregions'' for each keyframe. The third system by the Middle East Technical University (METU) exploits textual information in the video using character recognition methodology. The system submitted to the search task is an interactive retrieval application developed by Queen Mary University of London, University of Zilina and ITI from Thessaloniki, combining basic retrieval functionalities in various modalities (i.e. visual, audio, textual) with a user interface supporting the submission of queries using any combination of the available retrieval tools and the accumulation of relevant retrieval results over all queries submitted by a single user during a specified time interval. Finally, the rushes task submission comprises a video summarisation and browsing system specifically designed to intuitively and efficiently presents rushes material in video production environment. This system is a result of joint work of University of Bristol, Technical University of Delft and LaBRI, University of Bordeaux 1.
BibTeX:
@inproceedings{calic2006cost292,
  author = {Calic, Janko and Kraemer, Petra and Naci, Suphi Umutand Vrochidis, Stefanos and Aksoy, Selim and Zhang, Qianni and Benois-Pineau, Jenny and Saracoglu, Ahmet and Doulaverakis, Charalampos and Jarina, Roman and Campbell, Neill and Mezaris, Vasileios and Kompatsiaris, Ioannis and Spyrou, Evaggelos and Koumoulos, George and Avrithis, Yannis and Dalkilic, A. and Alatan, A. Aydın and Hanjalic, Alanand and Izquierdo, Ebroul },
  title = {COST292 experimental framework for TRECVID 2006},
  booktitle = {TRECVID 2006 workshop participants notebook papers},
  publisher = {National Institute of Standards and Technology (NIST)},
  year = {2006},
  pages = {1--15},
  note = {google scholar entry: 4th TRECVID Workshop (TRECVID 2006). Gaithersburg, Maryland, 13-14 November 2006.},
  url = {http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.6.org.html}
}
Chandramouli K, Djordjevic D and Izquierdo E (2006), "Binary Particle Swarm and Fuzzy Inference for Image Classification", In Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, September, 2006, pp. 126-131. IET.
Abstract: In this paper, Binary Particle Swarm and Fuzzy Inference considered for the image classification problem. A combination of binary particle swarms and fuzzy inference logic is applied on the classifier output produces with self organizing maps and particle swarm optimization. Several MPEG - 7 colour and texture descriptors are used in this paper. The proposed approach, improves precision of the retrieved images whiles keeping high values for recall.
BibTeX:
@inproceedings{chandramouli2006binary,
  author = {Chandramouli, Krishna and Djordjevic, Divna and Izquierdo, Ebroul},
  title = {Binary Particle Swarm and Fuzzy Inference for Image Classification},
  booktitle = {Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics.},
  publisher = {IET},
  year = {2006},
  pages = {126--131},
  note = {google scholar entry: 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, 26-28 September 2006.},
  url = {http://digital-library.theiet.org/content/conferences/10.1049/cp_20060515},
  doi = {10.1049/cp:20060515}
}
Chandramouli K and Izquierdo E (2006), "Image Classification using Chaotic Particle Swarm Optimization", In Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on. Atlanta, Georgia, October, 2006, pp. 3001-3004. IEEE.
Abstract: Particle swarm optimization is one of several meta-heuristic algorithms inspired by biological systems. The chaotic modeling of particle swarm optimization is presented in this paper with application to image classification. The performance of this modified particle swarm optimization algorithm is compared with standard particle swarm optimization. Numerical results of this comparative study are performed on binary classes of images from the Corel dataset.
BibTeX:
@inproceedings{chandramouli2006image,
  author = {Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Image Classification using Chaotic Particle Swarm Optimization},
  booktitle = {Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on},
  publisher = {IEEE},
  year = {2006},
  pages = {3001-3004},
  note = {google scholar entry: 13th International Conference on Image Processing (ICIP 2006). Atlanta, Georgia, 8-11 October 2006.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4107201},
  doi = {10.1109/ICIP.2006.312968}
}
Chandramouli K and Izquierdo E (2006), "Image classification using self organizing feature maps and particle swarm optimization", In Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th European. April, 2006, pp. 313-316.
BibTeX:
@inproceedings{chandramouli2006image2,
  author = {Chandramouli, Krishna and Izquierdo, Ebroul},
  title = {Image classification using self organizing feature maps and particle swarm optimization},
  booktitle = {Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th European},
  year = {2006},
  pages = {313--316},
  note = {google scholar entry: 7th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006). Incheon, Korea, 19-21 April 2006.}
}
Damnjanovic I and Izquierdo E (2006), "Capacity Enhancement of Compressed Domain Watermarking Channel Using Duo-binary Coding", In Digital Watermarking. 5th International Workshop, IWDW 2006, Jeju Island, Korea, November 8-10, 2006. Proceedings. Jeju Island, Korea, November, 2006. (4283), pp. 162-176. Springer.
Abstract: One of the main goals of watermarking is to optimize capacity while preserving high video fidelity. This paper describes a watermarking scheme based on the spread spectrum paradigm with capacity enhancement using a state-of-the-art error correction technique -- duo-binary turbo coding. A new watermark composition with novel bit-wise watermark bits interleaving scheme and bit-rate control on the macro-block level is proposed. In previous works, the perceptual watermark adjustment was mainly based on Watson Just Noticeable Difference (JND) model. A new JND estimation model based on block classification is presented. In addition, experimental results on perceptibility and robustness to transcoding are reported.
BibTeX:
@inproceedings{damnjanovic2006capacity,
  author = {Damnjanovic, Ivan and Izquierdo, Ebroul},
  editor = {Shi, YunQing and Jeon, Byeungwoo},
  title = {Capacity Enhancement of Compressed Domain Watermarking Channel Using Duo-binary Coding},
  booktitle = {Digital Watermarking. 5th International Workshop, IWDW 2006, Jeju Island, Korea, November 8-10, 2006. Proceedings.},
  publisher = {Springer},
  year = {2006},
  number = {4283},
  pages = {162--176},
  note = {google scholar entry: 5th International Workshop on Digital Watermarking (IWDW 2006). Jeju Island, Korea, 8-10 November 2006.},
  url = {http://link.springer.com/chapter/10.1007/11922841_14},
  doi = {10.1007/11922841_14}
}
Damnjanovic I and Izquierdo E (2006), "Perceptual Watermarking Using Just Noticeable Difference Model Based on Block Classification", In Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, September, 2006. (36), pp. 1-5. ACM.
Abstract: One of the main goals of watermarking is to optimize capacity while preserving high video fidelity. The perceptual adjustment of the watermark is mainly based on Watson Just Noticeable Difference (JND) model. Recently, it was proposed to improve Watson model using the classification blocks inherent to the encoder in a compressed stream. Although, the new model outperforms the previous one, especially in increasing the watermark power in textured blocks, it still underestimates JNDs at block's edges. This work presents a detailed comparison of these two models and proposes a new method that exploits the good characteristics of the two available models. In addition, experimental results on perceptibility are reported.
BibTeX:
@inproceedings{damnjanovic2006perceptual,
  author = {Damnjanovic, Ivan and Izquierdo, Ebroul},
  title = {Perceptual Watermarking Using Just Noticeable Difference Model Based on Block Classification},
  booktitle = {Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006)},
  publisher = {ACM},
  year = {2006},
  number = {36},
  pages = {1--5},
  note = {google scholar entry: 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, 18-20 September 2006.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/MobiMedia2006_IvanDamnjanovic.pdf},
  doi = {10.1145/1374296.1374335}
}
Damnjanovic I, Ramzan N and Izquierdo E (2006), "MPEG2 Watermarking Channel Protection Using Duo-Binary Turbo Codes", In Acoustics Speech and Signal Processing (ICASSP 2006), Proceedings of the 31st IEEE International Conference on. Toulouse, France, May, 2006. Vol. 2, pp. 301-304. IEEE.
Abstract: This paper describes scheme for protection of watermarking channel and its capacity enhancement using state-of-the-art error correction technique - turbo coding. Duo-binary codes were used for protection since they are perform better then classical turbo coders in terms of better convergence for iterative decoding, a large minimum distance and also computational expensiveness. A spread spectrum watermarking technique is used to insert watermark. Proposed pseudo-random watermarking bits spreading, amplitude adjustment in DCT domain based on block classification and bit-rate preserving increased signal to noise ratio of the watermarking channel. However, it was essential to introduce an error correction technique in order to achieve high capacity and robustness. In addition, experimental results on robustness to transcoding are presented
BibTeX:
@inproceedings{damnjanovic2006mpeg2,
  author = {Damnjanovic, Ivan and Ramzan, Naeem and Izquierdo, Ebroul},
  title = {MPEG2 Watermarking Channel Protection Using Duo-Binary Turbo Codes},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2006), Proceedings of the 31st IEEE International Conference on},
  publisher = {IEEE},
  year = {2006},
  volume = {2},
  pages = {301--304},
  note = {google scholar entry: 31st International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006). Toulouse, France, 14-19 May 2006.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1660339},
  doi = {10.1109/ICASSP.2006.1660339}
}
Djordjevic D and Izquierdo E (2006), "Empirical Analysis of Descriptor Spaces and Metrics for Image Classifcation", In Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th European. April, 2006, pp. 1-4. COST292.
Abstract: MPEG2 compression, as compression in general, tends to reduce information spatial and temporal redundancy. In that way, it reduces the watermark insertion space and power of the embedded signal. This paper describes scheme for capacity enhancement using state-of-the-art error correction technique - turbo coding. A spread spectrum watermarking technique is used to insert watermark. We are proposing new watermark composition, amplitude adjustment in DCT domain and bit-rate preserving. However, it was essential to introduce an error correction technique in order to achieve desired level of capacity and robustness. In addition, experimental results on perceptibility and robustness on transcoding are presented.
BibTeX:
@inproceedings{djordjevic2006empirical,
  author = {Djordjevic, Divna and Izquierdo, Ebroul},
  title = {Empirical Analysis of Descriptor Spaces and Metrics for Image Classifcation},
  booktitle = {Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th European},
  publisher = {COST292},
  year = {2006},
  pages = {1--4},
  note = {google scholar entry: 7th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006). Incheon, Korea, 19-21 April 2006.},
  url = {http://www.ing.unibs.it/~cost292/pubs/wiamis06/WIAMIS-06-QMUL.pdf}
}
Djordjevic D and Izquierdo E (2006), "Relevance Feedback for Image Retrieval in Structured Multi-Feature Spaces", In Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, September, 2006. (25), pp. 1-5. ACM.
Abstract: An approach for content-based image retrieval with relevance feedback based on a structured multi-feature space is proposed. It uses a novel kernel for merging multiple feature subspaces into a complementary space. The kernel exploits nature of the data by assigning appropriate weights for each feature set. The weights are dynamically adapted to user preferences in a relevance feedback scenario.
BibTeX:
@inproceedings{djordjevic2006relevance,
  author = {Djordjevic, Divna and Izquierdo, Ebroul},
  title = {Relevance Feedback for Image Retrieval in Structured Multi-Feature Spaces},
  booktitle = {Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006)},
  publisher = {ACM},
  year = {2006},
  number = {25},
  pages = {1--5},
  note = {google scholar entry: 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, 18-20 September 2006.},
  url = {http://doi.acm.org/10.1145/1374296.1374323},
  doi = {10.1145/1374296.1374323}
}
Izquierdo E (2006), "Knowledge-Based Image Processing for Classification and Recognition in Surveillance Applications", In Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on. Atlanta, Georgia, October, 2006, pp. 2377-2380. IEEE.
Abstract: This short paper serves as preface for the ICIP special session on knowledge-based image processing for classification in surveillance applications. This special session presents work on integrative research aimed at automatic classification and semantic-based recognition of scenes and events for surveillance applications. The focus is integrative research on low-level multimedia analysis, knowledge extraction and semantic analysis. The eight papers selected for the special session target convergence of these fields by integrating, for a purpose, what can be disparate disciplines. The integration covers image processing, knowledge representation, information retrieval and semantic analysis. The targeted application scenario involves the processing of input data captured by video-cameras where security is of interest. The work presented in this special session has originated in two large international cooperative projects funded by the European commission under the sixth framework programme of the Information Society Technology (1ST). The mandate and scope of these two projects is very generic embracing several applications that rely on technology for bridging the gap between low-level content descriptions that can be computed automatically by a machine and the richness and subjectivity of semantics in high-level human interpretations of audiovisual media (i.e. the semantic gap). However, the section presented in these conference proceedings is restricted to research work undertaken under more limited scenarios for specific video-based surveillance.
BibTeX:
@inproceedings{izquierdo2006knowledge2,
  author = {Izquierdo, Ebroul},
  title = {Knowledge-Based Image Processing for Classification and Recognition in Surveillance Applications},
  booktitle = {Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on},
  publisher = {IEEE},
  year = {2006},
  pages = {2377--2380},
  note = {google scholar entry: 13th International Conference on Image Processing (ICIP 2006). Atlanta, Georgia, 8-11 October 2006.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4107045},
  doi = {10.1109/ICIP.2006.312904}
}
Izquierdo E and Djordjevic D (2006), "Using Relevance Feedback to Bridge the Semantic Gap", In Adaptive Multimedia Retrieval: User, Context, and Feedback. Proceedings of the 3rd International Workshop on Adaptive Multimedia Retrieval (AMR 2005). Revised. Selected Papers. Glasgow, Scotland, July, 2006. Vol. 3877, pp. 19-34. Springer.
Abstract: In this article relevant developments in relevance feedback based image annotation and retrieval are reported. A new approach to infer semantic concepts representing meaningful objects in images is also described. The proposed technique combines user relevance feedback and underlying low-level properties of elementary building blocks making up semantic objects in images. Images are regarded as mosaics made of small building blocks featuring good representations of colour, texture and edgeness. The approach is based on accurate classification of these building blocks. Once this has been achieved, a signature for the object of concern is built. It is expected that this signature features a high discrimination power and consequently it becomes very suitable to find other images containing the same semantic object. The model combines fuzzy clustering and relevance feedback in the training stage, and uses fuzzy support vector machines in the generalization stage.
BibTeX:
@inproceedings{izquierdo2005using,
  author = {Izquierdo, Ebroul and Djordjevic, Divna},
  editor = {Detyniecki, Marcin and Jose, Joemon M. and Nürnberger, Andreas and van Rijsbergen, Cornelis Joost},
  title = {Using Relevance Feedback to Bridge the Semantic Gap},
  booktitle = {Adaptive Multimedia Retrieval: User, Context, and Feedback. Proceedings of the 3rd International Workshop on Adaptive Multimedia Retrieval (AMR 2005). Revised. Selected Papers.},
  publisher = {Springer},
  year = {2006},
  volume = {3877},
  pages = {19--34},
  note = {google scholar entry: 3rd International Workshop on Adaptive Multimedia Retrieval (AMR 2005). Glasgow, Scotland, 28-29 July 2005.},
  url = {http://link.springer.com/chapter/10.1007/11670834_2},
  doi = {10.1007/11670834_2}
}
Mrak M, Sprljan N and Izquierdo E (2006), "Evaluation of Techniques for Modeling of Layered Motion Structure", In Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on. Atlanta, Georgia, October, 2006, pp. 1905-1908. IEEE.
Abstract: Motion information scalability is important for scalable bit-stream adaptation on low bit-rates, when motion rate occupies a significant portion of the total bit-rate. This type of scalability can be achieved by layered representation of motion block partitioning and predictive coding of associated motion vectors across these layers. So far, several approaches for creating layered motion structure targeting quality scalability have been proposed and in this paper their accuracy is evaluated. For that purpose optimal motion models have been found. It has been shown that simple evaluation of reconstruction error at the encoder side improves suboptimal modeling techniques.
BibTeX:
@inproceedings{mrak2006evaluation,
  author = {Mrak, Marta and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {Evaluation of Techniques for Modeling of Layered Motion Structure},
  booktitle = {Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on},
  publisher = {IEEE},
  year = {2006},
  pages = {1905--1908},
  note = {google scholar entry: 13th International Conference on Image Processing (ICIP 2006). Atlanta, Georgia, 8-11 October 2006.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4106927},
  doi = {10.1109/ICIP.2006.313140}
}
Piatrik T, Chandramouli K and Izquierdo E (2006), "Image Classification Using Biologically Inspired Systems", In Proceedings of the 2nd International Conference on Mobile Multimedia Communications (MobiMedia 2006). Alghero, Sardinia, September, 2006. Vol. 324(28), pp. 1-5. ACM.
Abstract: In this paper the problem of the image classification based on biologically inspired optimization systems is addressed. Recent developments in applied and heuristic optimization have been strongly influenced and inspired by natural and biological system. The findings of recent studies are showing strong evidence to the fact that some aspects of the collaborative behavior of social animals such as ants and birds can be applied to solve specific problems in science and engineering. Two algorithms based on this paradigm Ant Colony Optimization and Particle Swarm Optimization are investigated in this paper. The comparative evaluation of the recently developed techniques by the authors for optimizing the COP-K-means and the Self Organizing Feature Maps for the application of Binary Image Classification is presented. The precision and retrieval results are used as the metrics of comparison for both classifiers.
BibTeX:
@inproceedings{piatrik2006image,
  author = {Piatrik, Tomas and Chandramouli, Krishna and Izquierdo, Ebroul},
  editor = {Atzori, Luigi and Izquierdo, Ebroul},
  title = {Image Classification Using Biologically Inspired Systems},
  booktitle = {Proceedings of the 2nd International Conference on Mobile Multimedia Communications (MobiMedia 2006)},
  publisher = {ACM},
  year = {2006},
  volume = {324},
  number = {28},
  pages = {1--5},
  note = {google scholar entry: 2nd International Conference On Mobile Multimedia Communications (MobiMedia 2006). Alghero, Sardinia, 18-20 September 2006.},
  url = {http://dl.acm.org/citation.cfm?id=1374326},
  doi = {10.1145/1374296.1374326}
}
Piatrik T and Izquierdo E (2006), "Image Classification Using an Ant Colony Optimization Approach", In Semantic Multimedia, Proceedings of the First International Conference on Semantic and Digital Media Technologies. Athens, Greece, December, 2006. Vol. 4306, pp. 159-168. Springer.
Abstract: Automatic semantic clustering of image databases is a very challenging research problem. Clustering is the unsupervised classification of patterns (data items or feature vectors) into groups (clusters). Clustering algorithms usually employ a similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper an Ant Colony Optimization (ACO) and its learning mechanism is integrated with the K-means approach to solve image classification problems. Our simulation results show that the proposed method makes K-Means less dependent on the initial parameters such as randomly chosen initial cluster centers. Selected results from experiments of the proposed method using two different image databases are presented.
BibTeX:
@inproceedings{piatrik2006imageclassification,
  author = {Piatrik, Tomas and Izquierdo, Ebroul},
  editor = {Avrithis, Yannis S. and Kompatsiaris, Yiannis and Staab, Steffen and O'Connor, Noel E.},
  title = {Image Classification Using an Ant Colony Optimization Approach},
  booktitle = {Semantic Multimedia, Proceedings of the First International Conference on Semantic and Digital Media Technologies},
  publisher = {Springer},
  year = {2006},
  volume = {4306},
  pages = {159--168},
  note = {google scholar entry: Semantic Multimedia, First International Conference on Semantics and Digital Media Technologies (SAMT 2006). Athens, Greece, 6-8 December 2006.},
  url = {http://link.springer.com/chapter/10.1007%2F11930334_13},
  doi = {10.1007/11930334_13}
}
Ramzan N and Izquierdo E (2006), "An Efficient Protection Scheme for Scalable Video Transmission over Unreliable Wireless Channels", In Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th International Workshop on. Seoul, Korea, April, 2006, pp. 261-264.
BibTeX:
@inproceedings{ramzan2006anchannels,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  title = {An Efficient Protection Scheme for Scalable Video Transmission over Unreliable Wireless Channels},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th International Workshop on},
  year = {2006},
  pages = {261--264},
  note = {google scholar entry: 7th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006). Seoul, Korea, 19-21 April 2006.}
}
Ramzan N and Izquierdo E (2006), "Double-Binary Turbo Code for Transmission of Scalable Video Bitstream", In Proceedings ELMAR 2006. Zadar, Croatia, 6, 2006, pp. 301-304. IEEE.
Abstract: A novel low bit rate video transmission scheme is proposed for the communication of scalable video bitstream over wireless channel. The schema consists of motion compensated t+2D wavelet decomposition scalable coder and double binary turbo code. Double binary codes were used for protection since they perform better than classical turbo coders in terms of better convergence for iterative decoding, a large minimum distance and computational cost. An efficient method to estimate the minimum distance of double binary turbo codes is also proposed. The best packet length are evaluated by measuring the minimum distance of turbo code to adopt the scalable bitstream with unequal error protection. Simulation results show that the unequal error protection is quite efficient, even in very adverse conditions, and it clearly outperforms simple FEC techniques
BibTeX:
@inproceedings{ramzan2006double,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  editor = {Grgić, Mislav and Grgić, Sonja},
  title = {Double-Binary Turbo Code for Transmission of Scalable Video Bitstream},
  booktitle = {Proceedings ELMAR 2006},
  publisher = {IEEE},
  year = {2006},
  pages = {301--304},
  note = {google scholar entry: 48th International Symposium focused on Multimedia Signal Processing and Communications (ELMAR2006). Zadar, Croatia, 7-9 June 2006.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4127544},
  doi = {10.1109/ELMAR.2006.329571}
}
Ramzan N and Izquierdo E (2006), "Joint Source-Channel Coding for Scalable Video Atomic Bitstream by Adaptive Turbo Code", In Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, September, 2006. (12), pp. 1-4. ACM.
Abstract: Joint source and channel coding enables efficient transmission of embedded video bitstream over unreliable channels. In this paper, we propose an efficient technique for joint source and channel coding. The technique consists of motion compensated spatio-temporal wavelet decomposition fully scalable video coder and turbo coding. The scalable atomic bitstream is adapted with unequal error protection according to the relevance of the video quality layers in the scalable bitstream. The adaptive turbo code is used to switch between simple equal error protection and the proposed unequal error protection technique. Experimental results show that the proposed approach outperforms conventional forward error correction techniques at medium to low signal to noise ratio. It also significantly improves the performance of end to end transmission at all signal to noise ratios.
BibTeX:
@inproceedings{ramzan2006joint,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Joint Source-Channel Coding for Scalable Video Atomic Bitstream by Adaptive Turbo Code},
  booktitle = {Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006)},
  publisher = {ACM},
  year = {2006},
  number = {12},
  pages = {1--4},
  note = {google scholar entry: 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, 18-20 September 2006.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.9552&rep=rep1&type=pdf},
  doi = {10.1145/1374296.1374300}
}
Ramzan N and Izquierdo E (2006), "Robust Scalable Video Transmission using Adaptive Double Binary Turbo Code", In Proceedings of the IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, September, 2006, pp. 239-243. The Institution of Engineering and Technology (IET).
Abstract: In this paper an approach for joint source-channel coding in a spatio-temporal wavelet-based scalable coding environment is presented. The scalable quality layers are protected using a double binary turbo code for robust scalable transmission over wireless channels. In the proposed schema the double binary turbo code is used for bitstream adaptation with unequal error protection. An efficient method to estimate the minimum distance of the double binary turbo codes is also proposed. Experimental results show promising performance of the proposed model as compared to equal error protection at low signal to noise ratio.
BibTeX:
@inproceedings{ramzan2006robust,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Robust Scalable Video Transmission using Adaptive Double Binary Turbo Code},
  booktitle = {Proceedings of the IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics.},
  publisher = {The Institution of Engineering and Technology (IET)},
  year = {2006},
  pages = {239--243},
  note = {google scholar entry: 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, 26-28 September 2006.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.7636&rep=rep1&type=pdf},
  doi = {10.1049/cp:20060535}
}
Ramzan N and Izquierdo E (2006), "Scalable Video Transmission Using Double Binary Turbo Code", In Image Processing (ICIP 2006), Proceedings of the 13th International Conference on. Atlanta, Georgia, October, 2006, pp. 1309-1312. IEEE.
Abstract: In this paper, we propose a novel efficient scheme for robust video transmission over wireless channels. The schema consists of motion compensated spatio-temporal wavelet decomposition scalable coder and double binary turbo code. The scalable bitstream is adapted with unequal error protection according to the relevance of video layers in the scalable bitstream. Cyclic redundancy check bits are also added to check error free delivery of scalable bitstream. Double binary turbo codes were used for protection since they perform better than classical turbo coders in terms of better convergence for iterative decoding and computational cost. Selected results of experimental evaluation are reported in the paper
BibTeX:
@inproceedings{ramzan2006scalable,
  author = {Ramzan, Naeem and Izquierdo, Ebroul},
  title = {Scalable Video Transmission Using Double Binary Turbo Code},
  booktitle = {Image Processing (ICIP 2006), Proceedings of the 13th International Conference on},
  publisher = {IEEE},
  year = {2006},
  pages = {1309--1312},
  note = {google scholar entry: 13th International Conference on Image Processing (ICIP 2006). Atlanta, Georgia, 8-11 October 2006.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.8122&rep=rep1&type=pdf},
  doi = {10.1109/ICIP.2006.312559}
}
Sprljan N, Mrak M and Izquierdo E (2006), "Optimised delta low-pass filter for UMCTF", In Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, September, 2006, pp. 283-286. IET.
Abstract: This paper provides a performance analysis of the delta low-pass filterbank used in unconstrained motion compensated temporal filtering (UMCTF) of video frames. Any wavelet filterbank for which a two-step (one prediction and one update) lifting implementation exists has a delta low-pass filter variant, where the modification consists of not performing the update step. Benefits of using such filters are better temporal scalability performance and lower decoding delay. Here it is argued that in that case the normalisation factor for the high-pass subband needs to be carefully selected in order to improve the compression performance. For the case of Haar wavelet with delta low-pass filter the expression for the Riesz bounds is derived and the measure of the expected difference of energies in signal and transform domain is introduced. Then, the normalisation factor is optimised in respect to the minimal Riesz bounds ratio and minimal expected difference. The application of optimised filterbank with delta low-pass in a scalable video environment shows enhanced decoding performance on a wide range of bit-rates and lower temporal resolutions.
BibTeX:
@inproceedings{sprijan2006optimised,
  author = {Sprljan, Nikola and Mrak, Marta and Izquierdo, Ebroul},
  title = {Optimised delta low-pass filter for UMCTF},
  booktitle = {Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics.},
  publisher = {IET},
  year = {2006},
  pages = {283--286},
  note = {google scholar entry: 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, 26-28 September 2006.},
  url = {http://digital-library.theiet.org/content/conferences/10.1049/cp_20060543},
  doi = {10.1049/cp_20060543}
}
Sprljan N, Mrak M and Izquierdo E (2006), "Motion Driven Adaptive Transform Based on Wavelet Transform for Enhanced Video Coding", In Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, September, 2006. (8), pp. 1-4. ACM.
Abstract: Spatial wavelet transform in video coding has traditionally been applied in a non-adaptive fashion. However, since motion compensation introduces specific structure into video frames, adaptive spatial transform can introduce additional coding gain. As the structure introduced by motion compensation depends on applied motion parameters, the same motion information can be used as an adaptation parameter in spatial decomposition. Specifically, in this paper motion driven adaptation on wavelet transform in its lifting implementation is proposed. The introduced computational complexity of the proposed method is low as only a few additional multiplications are introduced per lifting step. Since in this case the same motion information is available at both encoder and decoder, transmission of additional side information is not needed. Besides better energy compaction, the proposed scheme introduces considerable gain when applied in scalable video coding.
BibTeX:
@inproceedings{sprljan2006motion,
  author = {Sprljan, Nikola and Mrak, Marta and Izquierdo, Ebroul},
  title = {Motion Driven Adaptive Transform Based on Wavelet Transform for Enhanced Video Coding},
  booktitle = {Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006)},
  publisher = {ACM},
  year = {2006},
  number = {8},
  pages = {1--4},
  note = {google scholar entry: 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, 18-20 September 2006.},
  url = {http://dl.acm.org/citation.cfm?id=1374307},
  doi = {10.1145/1374296.1374307}
}
Stewart C and Izquierdo E (2006), "Knowledge Space of Semantic Inference for Automatic Annotation and retrieval of Multimedia Content -- K-Space", In Poster and Demo Proceedings of 1st International Conference on Semantic and Digital Media Technologies (SAMT 2006). 6-8 December 2006, Athens, Greece. Athens, Greece , pp. 71-72. CEUR Workshop Proceedings.
Abstract: K-Space is a network of leading research teams from academia and industry conducting integrative research and dissemination activities in semantic inference for automatic and semi-automatic annotation and retrieval of multimedia content. K-Space exploits the complementary expertise of project partners, enables resource optimization and fosters innovative research in the field.
BibTeX:
@inproceedings{stewart2006knowledge,
  author = {Stewart, Craig and Izquierdo, Ebroul},
  editor = {Avrithis, Yannis and Kompatsiaris, Yiannis and Staab, Steffen and O'Connor, Noel E.},
  title = {Knowledge Space of Semantic Inference for Automatic Annotation and retrieval of Multimedia Content -- K-Space},
  booktitle = {Poster and Demo Proceedings of 1st International Conference on Semantic and Digital Media Technologies (SAMT 2006). 6-8 December 2006, Athens, Greece.},
  publisher = {CEUR Workshop Proceedings},
  year = {2006},
  pages = {71--72},
  note = {google scholar entry: 1st International Conference on Semantic and Digital Media Technologies (SAMT 2006). Athens, Greece, 6-8 December 2006.},
  url = {http://ceur-ws.org/Vol-233/}
}
Trujillo M and Izquierdo E (2006), "Exploiting Spatial Variability for Disparity Estimation", In Poster and Demo Proceedings of 1st International Conference on Semantic and Digital Media Technologies (SAMT 2006). 6-8 December 2006, Athens, Greece. Athens, Greece , pp. 19-20. CEUR Workshop Proceedings.
Abstract: In this correspondence a block-matching strategy for disparity estimation is introduced. In the proposed approach the size of the matching window is adapted according to the spatial variability of the matching areas. That is, the window size is constrained by the variations of the image intensity. A modified semivariogram function is proposed to measure the spatial variability of concerned sampling positions. Results of computer experiments aimed at validating the performance of the proposed approach are reported. As expected, using adaptive matching window size provides better disparity estimations than those obtained by using a fixed window.
BibTeX:
@inproceedings{trujillo2006exploiting,
  author = {Trujillo, Maria and Izquierdo, Ebroul},
  editor = {Avrithis, Yannis and Kompatsiaris, Yiannis and Staab, Steffen and O'Connor, Noel E.},
  title = {Exploiting Spatial Variability for Disparity Estimation},
  booktitle = {Poster and Demo Proceedings of 1st International Conference on Semantic and Digital Media Technologies (SAMT 2006). 6-8 December 2006, Athens, Greece.},
  publisher = {CEUR Workshop Proceedings},
  year = {2006},
  pages = {19--20},
  note = {google scholar entry: 1st International Conference on Semantic and Digital Media Technologies (SAMT 2006). Athens, Greece, 6-8 December 2006.},
  url = {http://ceur-ws.org/Vol-233/}
}
Vaiapury K, Atrey PK, Kankanhalli MS and Ramakrishnan K (2006), "Non-identical duplicate video detection using the sift method", In Visual Information Engineering, 2006. VIE 2006. IET International Conference on. Bangalore, India, september, 2006, pp. 537-542.
Abstract: Non-Identical Duplicate video detection is a challenging research problem. Non-Identical Duplicate video are a pair of videos that are not exactly identical but are almost similar. In this paper, we evaluate two methods - Keyframe-based and Tomography-based methods to determine the Non-Identical Duplicate videos. These two methods make use of the existing scale based shift invariant (SIFT) method to find the match between the key frames in first method, and the cross-sections through the temporal axis of the videos in second method. We provide extensive experimental results and the analysis of accuracy and efficiency of the above two methods on a data set of Non-Identical Duplicate video-pair.
BibTeX:
@inproceedings{vaiapury2006non,
  author = {Vaiapury, Karthikeyan and Atrey, Pradeep K. and Kankanhalli, Mohan S. and Ramakrishnan, Kalpathi},
  title = {Non-identical duplicate video detection using the sift method},
  booktitle = {Visual Information Engineering, 2006. VIE 2006. IET International Conference on},
  year = {2006},
  pages = {537--542},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4286748}
}
Wall J (2006), "Perception-based Modelling of System Behaviour", In Proceeding of the 5th UK-RI Conference on Advances in Cybernetic Systems (AICS-06). September, 2006, pp. 312-317. IEEE.
Abstract: This paper presents a new approach for the navigation of mobile robots. Human perception, that remarkable aptitude for performing a vast array of physical and mental tasks, is combined with fuzzy logic to create a perception-based model of system behaviour. This model will reduce the need for high precision, expensive equipment and overcome problems with conditions in the environment that can affect navigation.
BibTeX:
@inproceedings{wall2006perception,
  author = {Wall, Julie},
  title = {Perception-based Modelling of System Behaviour},
  booktitle = {Proceeding of the 5th UK-RI Conference on Advances in Cybernetic Systems (AICS-06)},
  publisher = {IEEE},
  year = {2006},
  pages = {312--317},
  note = {google scholar entry: 5th UK-RI Conference on Advances in Cybernetic Systems (AICS-06). Sheffield, UK, 7-8 September 2006.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/AISC2006_JulieWall.pdf}
}
Wan S, Izquierdo E, Yang F and Chang Y (2006), "End-to-End Rate-Distortion Optimized Motion Estimation", In Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on. Atlanta, Georgia, October, 2006, pp. 809-812. IEEE.
Abstract: An end-to-end rate-distortion optimized motion estimation method for robust video coding in lossy networks is proposed. In this method the expected reconstructed distortion after transmission and the total bit rate for displaced frame difference are estimated at the encoder. The results are fed into the Lagrangian optimization at the encoder to perform motion estimation. Here the encoder automatically finds an optimized motion compensated prediction by estimating the best trade off between coding efficiency and end-to-end distortion. Computer simulations in lossy channel environments were conducted to assess the performance of the proposed method. A comparative evaluation using other conventional techniques from the literature was also conducted.
BibTeX:
@inproceedings{wan2006end,
  author = {Wan, Shuai and Izquierdo, Ebroul and Yang, Fuzheng and Chang, Yilin},
  title = {End-to-End Rate-Distortion Optimized Motion Estimation},
  booktitle = {Image Processing (ICIP 2006), Proceedings of the 13th IEEE International Conference on},
  publisher = {IEEE},
  year = {2006},
  pages = {809--812},
  note = {google scholar entry: 13th International Conference on Image Processing (ICIP 2006). Atlanta, Georgia, 8-11 October 2006.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4106653},
  doi = {10.1109/ICIP.2006.312525}
}
Wan S, Mrak M and Izquierdo E (2006), "Perceptually Adaptive Joint Deringing - Deblocking Filtering for Scalable Video Coding", In Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, September, 2006. (13), pp. 1-5. ACM.
Abstract: Joint source and channel coding enables efficient transmission of embedded video bitstream over unreliable channels. In this paper, we propose an efficient technique for joint source and channel coding. The technique consists of motion compensated spatio-temporal wavelet decomposition fully scalable video coder and turbo coding. The scalable atomic bitstream is adapted with unequal error protection according to the relevance of the video quality layers in the scalable bitstream. The adaptive turbo code is used to switch between simple equal error protection and the proposed unequal error protection technique. Experimental results show that the proposed approach outperforms conventional forward error correction techniques at medium to low signal to noise ratio. It also significantly improves the performance of end to end transmission at all signal to noise ratios.
BibTeX:
@inproceedings{wan2006perceptually,
  author = {Wan, Shuai and Mrak, Marta and Izquierdo, Ebroul},
  title = {Perceptually Adaptive Joint Deringing - Deblocking Filtering for Scalable Video Coding},
  booktitle = {Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006)},
  publisher = {ACM},
  year = {2006},
  number = {13},
  pages = {1--5},
  note = {google scholar entry: 2nd international conference on Mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, 18-20 September 2006.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.4665&rep=rep1&type=pdf},
  doi = {10.1145/1374296.1374310}
}
Wilkins P, Adamek T, Ferguson P, Hughes M, Jones GJ, Keenan G, McGuinness K, Malobabić J, O'Connor NE, Sadlier D, Smeaton AF, Benmokhtar R, Dumont E, Huet B, Merialdo B, Spyrou E, Koumoulos G, Avrithis Y, Moerzinger R, Schallauer P, Bailer W, Zhang Q, Piatrik T, Chandramouli K, Izquierdo E, Goldmann L, Haller M, Sikora T, Praks P, Urban J, Hilaire X and Jose JM (2006), "K-Space at TRECVid 2006", In TRECVID 2006 workshop participants notebook papers. Gaithersburg, Maryland, November, 2006, pp. 1-12. National Institute of Standards and Technology (NIST).
Abstract: In this paper we describe the K-Space participation in TRECVid 2006. K-Space participated in two tasks, high-level feature extraction and search. We present our approaches for each of these activities and provide a brief submission, four of our six runs achieved performance above the TRECVid median, whilst our search submission performed around the median. The K-Space team consisted of eight partner institutions from the EU-funded K-Space Network, and our submissions made use of tools and techniques from each partner. As such this paper will provide overviews of each partner�s contributions and provide appropriate references for specific descriptions of individual components.
BibTeX:
@inproceedings{Wilkins2006,
  author = {Wilkins, Peter and Adamek, Tomasz and Ferguson, Paul and Hughes, Mark and Jones, Gareth J.F. and Keenan, Gordon and McGuinness, Kevin and Malobabić, Jovanka and O'Connor, Noel E. and Sadlier, David and Smeaton, Alan F. and Benmokhtar, Rachid and Dumont, Emilie and Huet, Benoit and Merialdo, Bernard and Spyrou, Evaggelos and Koumoulos, George and Avrithis, Yannis and Moerzinger, Roland and Schallauer, Peter and Bailer, Werner and Zhang, Qianni and Piatrik, Tomas and Chandramouli, Krishna and Izquierdo, Ebroul and Goldmann, Lutz and Haller, Martin and Sikora, Thomas and Praks, Pavel and Urban, Jana and Hilaire, Xavier and Jose, Joemon M.
}, title = {K-Space at TRECVid 2006}, booktitle = {TRECVID 2006 workshop participants notebook papers}, publisher = {National Institute of Standards and Technology (NIST)}, year = {2006}, pages = {1--12}, note = {google scholar entry: 4th TRECVID Workshop (TRECVID 2006). Gaithersburg, Maryland, November 2006.}, url = {http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.6.org.html} }
Wilkins P, Adamek T, Ferguson P, Hughes M, Jones GJ, Keenan G, McGuinness K, Malobabić J, O'Connor NE, Sadlier D, Smeaton AF, Benmokhtar R, Dumont E, Huet B, Merialdo B, Spyrou E, Koumoulos G, Avrithis Y, Moerzinger R, Schallauer P, Bailer W, Zhang Q, Piatrik T, Chandramouli K, Izquierdo E, Goldmann L, Haller M, Sikora T, Praks P, Urban J, Hilaire X and Jose JM (2006), "K-Space at TRECVid 2006", In TRECVID 2006 workshop participants notebook papers. Gaithersburg, Maryland, November, 2006, pp. 1-12. National Institute of Standards and Technology (NIST).
Abstract: In this paper we describe the K-Space participation in TRECVid 2006. K-Space participated in two tasks, high-level feature extraction and search. We present our approaches for each of these activities and provide a brief submission, four of our six runs achieved performance above the TRECVid median, whilst our search submission performed around the median. The K-Space team consisted of eight partner institutions from the EU-funded K-Space Network, and our submissions made use of tools and techniques from each partner. As such this paper will provide overviews of each partner's contributions and provide appropriate references for specific descriptions of individual components.
BibTeX:
@inproceedings{wilkins2006k,
  author = {Wilkins, Peter and Adamek, Tomasz and Ferguson, Paul and Hughes, Mark and Jones, Gareth J.F. and Keenan, Gordon and McGuinness, Kevin and Malobabić, Jovanka and O'Connor, Noel E. and Sadlier, David and Smeaton, Alan F. and Benmokhtar, Rachid and Dumont, Emilie and Huet, Benoit and Merialdo, Bernard and Spyrou, Evaggelos and Koumoulos, George and Avrithis, Yannis and Moerzinger, Roland and Schallauer, Peter and Bailer, Werner and Zhang, Qianni and Piatrik, Tomas and Chandramouli, Krishna and Izquierdo, Ebroul and Goldmann, Lutz and Haller, Martin and Sikora, Thomas and Praks, Pavel and Urban, Jana and Hilaire, Xavier and Jose, Joemon M.},
  title = {K-Space at TRECVid 2006},
  booktitle = {TRECVID 2006 workshop participants notebook papers},
  publisher = {National Institute of Standards and Technology (NIST)},
  year = {2006},
  pages = {1--12},
  note = {google scholar entry: 4th TRECVID Workshop (TRECVID 2006). Gaithersburg, Maryland, 13-14 November 2006.},
  url = {http://www-nlpir.nist.gov/projects/tvpubs/tv6.papers/k-space.pdf}
}
Zgaljic T, Mrak M, Sprljan N and Izquierdo E (2006), "An Entropy Coding Scheme for Multi-Component Scalable Motion Information", In Acoustics Speech and Signal Processing (ICASSP 2006), Proceedings of the 31st IEEE International Conference on. Toulouse, France, May, 2006. Vol. 2, pp. 561-564. IEEE.
Abstract: Fully scalable video bit-stream requires layered structure of most of its components. For that reason few methods targeting scalability on motion information have been proposed over the last decade. However, layered representation requires new entropy coding strategies able to efficiently handling of redundancies between different layers. In this paper three methods for entropy coding of layered motion information are proposed. The influence of these schemes on the reconstructed video quality has been also studied.
BibTeX:
@inproceedings{zgaljic2006entropy,
  author = {Zgaljic, Toni and Mrak, Marta and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {An Entropy Coding Scheme for Multi-Component Scalable Motion Information},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2006), Proceedings of the 31st IEEE International Conference on},
  publisher = {IEEE},
  year = {2006},
  volume = {2},
  pages = {561--564},
  note = {google scholar entry: 31st International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006). Toulouse, France, 14-19 May 2006.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1660404},
  doi = {10.1109/ICASSP.2006.1660404}
}
Zgaljic T, Sprljan N and Izquierdo E (2006), "Bit-stream Allocation Methods for Wavelet Based Scalable Video Coding", In Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, September, 2006. (16), pp. 1-6. ACM.
Abstract: In wavelet based scalable video coding quality scalability is achieved using bit-plane coding techniques that result in embedded bit-stream. In order to minimise distortion when specific bit-rate is targeted, compressed data has to be optimally truncated in different layers that comprise the bit-stream. In this paper two methods for compressed bit-stream allocation are presented. First adopts assumption that same fractional bit-planes within same bit-planes of different wavelet subbands have nearly equal rate-distortion slopes, while the other takes into account the distortion caused by the quantisation of wavelet coefficients. First method is characterised by low complexity since distortion does not have to be assessed. For the second method, a simple yet effective statistical distortion model is derived. The experimental results show comparable performance of the two tested methods.
BibTeX:
@inproceedings{zgaljic2006bit,
  author = {Zgaljic, Toni and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {Bit-stream Allocation Methods for Wavelet Based Scalable Video Coding},
  booktitle = {Proceedings of the 2nd international conference on mobile multimedia communications (MobiMedia 2006)},
  publisher = {ACM},
  year = {2006},
  number = {16},
  pages = {1--6},
  note = {google scholar entry: 2nd international conference on mobile multimedia communications (MobiMedia 2006). Alghero, Sardinia, 18-20 September 2006.},
  url = {http://www.stday07.dibe.unige.it/papers/07p.pdf},
  doi = {10.1145/1374296.1374313}
}
Zhang Q and Izquierdo E (2006), "Multi-Feature Based Face Detection", In Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, September, 2006, pp. 572-576. IET.
Abstract: In this paper a two-step face detection technique is proposed. The first step uses a conventional skin detection method to extract regions of potential faces from the image database. This skin detection step is based on a Gaussian mixture model in the YCbCr colour space. In the second step faces are detected among the candidate regions by filtering out false positives from the skin colour detection module. The selection process is achieved by applying a learning approach using multiple additional features and a suitable metric in multi-feature space. The metric is derived by learning the underlying parameters using a small set of representative face samples. In this process the parameters are optimized by a Multiple Objective Optimization method based on the Pareto Archived Evolution Strategy. The learned metric in multi-feature space is then applied to a conventional classifier to filter out non faces from the first processing step. Support Vector Machines and KNearest Neighbours classifiers are used to test the performance of the optimized metric in multi-feature space.
BibTeX:
@inproceedings{Zhang2006,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Multi-Feature Based Face Detection},
  booktitle = {Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics.},
  publisher = {IET},
  year = {2006},
  pages = {572--576},
  note = {google scholar entry: 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, 26-28 September 2006.},
  url = {http://digital-library.theiet.org/content/conferences/10.1049/cp_20060594},
  doi = {10.1049/cp:20060594}
}
Zhang Q and Izquierdo E (2006), "A Bayesian Network Approach to Multi-feature Based Image Retrieval", In Semantic Multimedia. First International Conference on Semantics and Digital Media Technologies (SAMT 2006). Athens, Greece, 6-8 December 2006. Proceedings. Athens, Greece, December, 2006. Vol. 4306, pp. 138-147. Springer.
Abstract: This paper aims at devising a Bayesian Network approach to object centered image retrieval employing non-monotonic inference rules and combining multiple low-level visual primitives as cue for retrieval. The idea is to model a global knowledge network by treating an entire image as a scenario. The overall process is divided into two stages: the initial retrieval stage which is concentrated on finding an optimal multi-feature space stage and doing a simple initial retrieval within this space; and the Bayesian inference stage which uses the initial retrieval information and seeks for a more precise second- retrieval.
BibTeX:
@inproceedings{zhang2006bayesian,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Avrithis, Yannis and Kompatsiaris, Yiannis and Staab, Steffen and O'Connor, Noel E.},
  title = {A Bayesian Network Approach to Multi-feature Based Image Retrieval},
  booktitle = {Semantic Multimedia. First International Conference on Semantics and Digital Media Technologies (SAMT 2006). Athens, Greece, 6-8 December 2006. Proceedings.},
  publisher = {Springer},
  year = {2006},
  volume = {4306},
  pages = {138--147},
  note = {google scholar entry: 1st International Conference on Semantic and Digital Media Technologies (SAMT 2006). Athens, Greece, 6-8 December 2006.},
  url = {http://link.springer.com/chapter/10.1007/11930334_11},
  doi = {10.1007/11930334_11}
}
Zhang Q and Izquierdo E (2006), "A Multi-Feature Optimization Approach to Object-Based Image Classification", In Image and Video Retrieval (CIVR 2006), 5th International Conference. Tempe, Arizona, 13-15 July 2006. Proceedings. Tempe, Arizona, July, 2006. Vol. 4071, pp. 310-319. Springer.
Abstract: This paper proposes a novel approach for the construction and use of multi-feature spaces in image classification. The proposed technique combines low-level descriptors and defines suitable metrics. It aims at representing and measuring similarity between semantically meaningful objects within the defined multi-feature space. The approach finds the best linear combination of predefined visual descriptor metrics using a Multi-Objective Optimization technique. The obtained metric is then used to fuse multiple non-linear descriptors is be achieved and applied in image classification.
BibTeX:
@inproceedings{zhang2006multi,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  editor = {Sundaram, Hari and Naphade, Milind and Smith, John R. and Rui, Yong},
  title = {A Multi-Feature Optimization Approach to Object-Based Image Classification},
  booktitle = {Image and Video Retrieval (CIVR 2006), 5th International Conference. Tempe, Arizona, 13-15 July 2006. Proceedings},
  publisher = {Springer},
  year = {2006},
  volume = {4071},
  pages = {310--319},
  note = {google scholar entry: 5th International Conference Image and Video Retrieval (CIVR 2006). Tempe, Arizona, 13-15 July 2006.},
  url = {http://pdf.aminer.org/000/098/489/a_multi_feature_optimization_approach_to_object_based_image_classification.pdf},
  doi = {10.1007/11788034_32}
}
Zhang Q and Izquierdo E (2006), "A New Approach to Image Retrieval in a Multi-Feature Space", In Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th International Workshop on. April, 2006, pp. 325-328.
BibTeX:
@inproceedings{zhang2006new,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {A New Approach to Image Retrieval in a Multi-Feature Space},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2006), Proceedings of the 7th International Workshop on},
  year = {2006},
  pages = {325--328},
  note = {google scholar entry: 7th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2006). Seoul, Korea, 19-21 April 2006.}
}
Zhang Q and Izquierdo E (2006), "Optimizing Metrics Combining Low-Level Visual Descriptors for Image Annotation and Retrieval", In Acoustics Speech and Signal Processing (ICASSP 2006), Proceedings of the 31st IEEE International Conference on. Toulouse, France, May, 2006. Vol. 2, pp. 405-408. IEEE.
Abstract: An object oriented approach for key-word based image annotation and classification is presented. It considers combinations of low-level descriptors and suitable metrics to represent and measure similarity between semantically meaningful objects. The objective is to obtain "optimal" metrics based on a linear combination of single metrics and descriptors in a multi-feature space. The proposed approach estimates an optimal linear combination of predefined metrics by applying a multi-objective optimization technique based on a Pareto archived evolution strategy. The proposed approach has been evaluated and tested for annotation of objects in images
BibTeX:
@inproceedings{zhang2006optimizing,
  author = {Qianni Zhang and Izquierdo, Ebroul},
  title = {Optimizing Metrics Combining Low-Level Visual Descriptors for Image Annotation and Retrieval},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2006), Proceedings of the 31st IEEE International Conference on},
  publisher = {IEEE},
  year = {2006},
  volume = {2},
  pages = {405--408},
  note = {google scholar entry: 31st International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006). Toulouse, France, 14-19 May 2006.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1660365},
  doi = {10.1109/ICASSP.2006.1660365}
}
Zhang Q and Izquierdo E (2006), "Multi-Feature Based Face Detection", In Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, September, 2006, pp. 572-576. The Institution of Engineering and Technology (IET).
Abstract: In this paper a two-step face detection technique is proposed. The first step uses a conventional skin detection method to extract regions of potential faces from the image database. This skin detection step is based on a Gaussian mixture model in the YCbCr colour space. In the second step faces are detected among the candidate regions by filtering out false positives from the skin colour detection module. The selection process is achieved by applying a learning approach using multiple additional features and a suitable metric in multi-feature space. The metric is derived by learning the underlying parameters using a small set of representative face samples. In this process the parameters are optimized by a Multiple Objective Optimization method based on the Pareto Archived Evolution Strategy. The learned metric in multi-feature space is then applied to a conventional classifier to filter out non faces from the first processing step. Support Vector Machines and KNearest Neighbours classifiers are used to test the performance of the optimized metric in multi-feature space.
BibTeX:
@inproceedings{zhang2010multi,
  author = {Zhang, Qianni and Izquierdo, Ebroul},
  title = {Multi-Feature Based Face Detection},
  booktitle = {Proceedings of the 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics.},
  publisher = {The Institution of Engineering and Technology (IET)},
  year = {2006},
  pages = {572--576},
  note = {google scholar entry: 3rd IET International Conference on Visual Information Engineering (VIE 2006). Innovation and Creativity in Visual Media Processing and Graphics. Bangalore, India, 26-28 September 2006.},
  url = {http://digital-library.theiet.org/content/conferences/10.1049/cp_20060594},
  doi = {10.1049/cp:20060594}
}

Presentations, Posters and Technical Reports

Adami N, Izquierdo E, Leonardi R, Mrak M, Signoroni A and Zgaljic T (2006), "Efficient Wavelet-based Video Compression". July, 2006.
BibTeX:
@misc{adami2006efficient,
  author = {Adami, Nicola and Izquierdo, Ebroul and Leonardi, Riccardo and Mrak, Marta and Signoroni, Alberto and Zgaljic, Toni},
  title = {Efficient Wavelet-based Video Compression},
  booktitle = {39th JPEG WG1 meeting.},
  publisher = {JPEG},
  year = {2006},
  note = {google scholar entry: 39th JPEG WG1 meeting. Perugia, Italy, 10-14 July 2006.},
  url = {http://www.jpeg.org/newsrel16.html}
}

Theses and Monographs

Djordjevic D (2006), "User Relevance Feedback, Search and Retrieval of Visual Content". Thesis at: Queen Mary University of London, pp. 1-180.
Abstract: The main objective of this work is to study and implement techniques for visual content retrieval using relevance feedback. Relevance feedback approaches make use of interactive learning in order to modify and adapt system behaviour to user�s desires by modelling human subjectivity. They allow a more semantic approach based on user�s feedback information, while relying on similarity derived from low-level features. An image relevance feedback framework has been implemented based on support vector machines as a generalisation method. The algorithm for support vector machines solves a convex optimisation problem and the algorithm has been tailored to the relevance feedback scenario. MPEG-7 standard descriptors and their recommended distance functions have been used to represent low-level visual features as well as several additional descriptors. A multi-feature scenario has been developed in an effort to represent visual content as close as possible to human perceptual experience. A model for feature combination and not just concatenation has been developed and a novel kernel for adaptive similarity matching in support vector machines has been proposed. The new kernel models multifeature space guaranteeing convergence of the support vector optimisation problem. To address the problem of visual content representations, a novel approach of building descriptors based on image blocks, their low-level features and their spatial correlation has been proposed as a part of the relevance feedback framework. In accordance to this an accompanying kernel on sets has been proposed that handles both multi-feature space as well as the local spatial information among image blocks. The relevance feedback module has been applied to a framework for image selection in concept learning. It combines unsupervised learning to organize images based on lowlevel similarity, and reinforcement learning based on relevance feedback, to refine the classifier model. This research is a part of the EU IST aceMedia project and the described relevance feedback module has been integrated in the project framework.
BibTeX:
@phdthesis{djordjevic2006user,
  author = {Djordjevic, Divna},
  editor = {Izquierdo, Ebroul},
  title = {User Relevance Feedback, Search and Retrieval of Visual Content},
  school = {Queen Mary University of London},
  year = {2006},
  pages = {1--180},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/PhDThesis_DivnaDjordjevic.pdf}
}
Mrak M (2006), "Motion Scalability for Video Coding with Flexible Spatio-Temporal Decompositions". Thesis at: Queen Mary University of London. December, 2006, pp. 1-172.
Abstract: The research presented in this thesis aims to extend the scalability range of the wavelet-based video coding systems in order to achieve fully scalable coding with a wide range of available decoding points. Since the temporal redundancy regularly comprises the main portion of the global video sequence redundancy, the techniques that can be generally termed motion decorrelation techniques have a central role in the overall compression performance. For this reason the scalable motion modelling and coding are of utmost importance, and specifically, in this thesis possible solutions are identified and analysed. The main contributions of the presented research are grouped into two interrelated and complementary topics. Firstly a flexible motion model with rateoptimised estimation technique is introduced. The proposed motion model is based on tree structures and allows high adaptability needed for layered motion coding. The flexible structure for motion compensation allows for optimisation at different stages of the adaptive spatio-temporal decomposition, which is crucial for scalable coding that targets decoding on different resolutions. By utilising an adaptive choice of wavelet filterbank, the model enables high compression based on efficient mode selection. Secondly, solutions for scalable motion modelling and coding are developed. These solutions are based on precision limiting of motion vectors and creation of a layered motion structure that describes hierarchically coded motion. The solution based on precision limiting relies on layered bit-plane coding of motion vector values. The second solution builds on recently established techniques that impose scalability on a motion structure. The new approach is based on two major improvements: the evaluation of distortion in temporal Subbands and motion search in temporal subbands that finds the optimal motion vectors for layered motion structure. Exhaustive tests on the rate-distortion performance in demanding scalable video coding scenarios show benefits of application of both developed flexible motion model and various solutions for scalable motion coding.
BibTeX:
@phdthesis{mrak2006motion,
  author = {Mrak, Marta},
  editor = {Izquierdo, Ebroul},
  title = {Motion Scalability for Video Coding with Flexible Spatio-Temporal Decompositions},
  school = {Queen Mary University of London},
  year = {2006},
  pages = {1--172},
  url = {https://qmro.qmul.ac.uk/jspui/handle/123456789/1907}
}


2005

Journal Papers

Mrak M, Sprljan N and Izquierdo E (2005), "Motion estimation in temporal subbands for quality scalable motion coding", Electronics Letters. September, 2005. Vol. 41(19), pp. 1050-1051. IET.
Abstract: A new approach for estimation and modelling of layered motion vector fields for fully scalable video coding (SVC) is presented. Motion vector fields are evaluated according to their impact on the overall reconstruction error. This strategy leads to improved decoding performance when compared with previous methods for layered motion modelling in SVC.
BibTeX:
@article{mrak2005motion,
  author = {Mrak, Marta and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {Motion estimation in temporal subbands for quality scalable motion coding},
  journal = {Electronics Letters},
  publisher = {IET},
  year = {2005},
  volume = {41},
  number = {19},
  pages = {1050--1051},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1512749},
  doi = {10.1049/el:20052863}
}

Conference Papers

Damnjanovic I and Izquierdo E (2005), "Capacity Enhancement of Compressed Video Watermarking Using Turbo Codes", In Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European. April, 2005, pp. 1-4. National Technical University of Athens (NTUA).
Abstract: MPEG2 compression, as compression in general, tends to reduce information spatial and temporal redundancy. In that way, it reduces the watermark insertion space and power of the embedded signal. This paper describes scheme for capacity enhancement using state-of-the-art error correction technique - turbo coding. A spread spectrum watermarking technique is used to insert watermark. We are proposing new watermark composition, amplitude adjustment in DCT domain and bit-rate preserving. However, it was essential to introduce an error correction technique in order to achieve desired level of capacity and robustness. In addition, experimental results on perceptibility and robustness on transcoding are presented.
BibTeX:
@inproceedings{damnjanovic2005capacity,
  author = {Damnjanovic, Ivan and Izquierdo, Ebroul},
  title = {Capacity Enhancement of Compressed Video Watermarking Using Turbo Codes},
  booktitle = {Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European},
  publisher = {National Technical University of Athens (NTUA)},
  year = {2005},
  pages = {1--4},
  note = {google scholar entry: 6th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005). Montreux, Switzerland, 13-15 April 2005.},
  url = {ftp://image.ntua.gr/pub/4dkonto/WIAMIS-2005/defevent/papers/cr1081.pdf}
}
Damnjanovic I and Izquierdo E (2005), "Turbo Coding Protection of Compressed Domain Watermarking Channel", In ``Computer as a Tool'' (EUROCON 2005), Proceedings of the International Conference on. Belgrade, Serbia, November, 2005. Vol. 2, pp. 171-174. IEEE.
Abstract: The aim of compression in general is to reduce information spatial and temporal redundancy from the data. From a watermarking point of view, it reduces the watermark insertion space and power of the embedded signal. This paper describes a watermarking scheme based on the spread spectrum paradigm with capacity enhancement using a state-of-the-art error correction technique - turbo coding. A new watermark composition with random spreading of watermarking bits through the video frames, amplitude adjustment in DCT domain and bit-rate preserving is proposed. However, it was essential to introduce an error correction technique in order to achieve reasonable capacity levels and robustness. In addition, experimental results on perceptibility and robustness on transcoding are presented.
BibTeX:
@inproceedings{damnjanovic2005turbo,
  author = {Damnjanovic, Ivan and Izquierdo, Ebroul},
  title = {Turbo Coding Protection of Compressed Domain Watermarking Channel},
  booktitle = {``Computer as a Tool'' (EUROCON 2005), Proceedings of the International Conference on},
  publisher = {IEEE},
  year = {2005},
  volume = {2},
  pages = {171--174},
  note = {google scholar entry: EUROCON 2005 - The International Conference on ``Computer as a Tool''. Belgrade, Serbia, 21-24 November 2005.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1629886},
  doi = {10.1109/EURCON.2005.1629886}
}
Djordjevic D, Dorado A, Izquierdo E and Pedrycz W (2005), "Concept-Oriented Sample Images Selection", In Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European. Montreux, Switzerland, April, 2005, pp. 1-4. National Technical University of Athens (NTUA).
Abstract: In semantic-based image classification, learning concepts for adding knowledge to the image descriptions is an issue of special interest. This learning increases the capabilities for more �intelligent� image processing. The classifier learns by generalizing specific facts present in a number of design samples. Due to the fact that the learning and classification processes run over image descriptions containing part of the image content, selection of training patterns should take into account relationships among those descriptions. Proposed framework uses unsupervised clustering to support the selection of design samp les and user feedback to refine the classifier model.
BibTeX:
@inproceedings{djordjevic2005concept,
  author = {Djordjevic, Divna and Dorado, Andres and Izquierdo, Ebroul and Pedrycz, Witold},
  title = {Concept-Oriented Sample Images Selection},
  booktitle = {Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European},
  publisher = {National Technical University of Athens (NTUA)},
  year = {2005},
  pages = {1--4},
  note = {google scholar entry: 6th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005). Montreux, Switzerland, 13-15 April 2005.},
  url = {http://www.ing.unibs.it/~cost292/pubs/wiamis05/cr1129.pdf}
}
Djordjevic D and Izquierdo E (2005), "Relevance Feedback in Content-based Image Retrieval Systems, an Overview and Analysis", In ``Computer as a Tool'' (EUROCON 2005), Proceedings of the International Conference on. Belgrade, Serbia, November, 2005. Vol. 2, pp. 143-146. IEEE.
Abstract: In this paper an overview of relevant developments in visual relevance feedback based image retrieval is presented. Important problems of content-based image retrieval are analyzed and relevant findings from the evaluation of our framework are reported
BibTeX:
@inproceedings{djordjevic2005relevance,
  author = {Djordjevic, Divna and Izquierdo, Ebroul},
  title = {Relevance Feedback in Content-based Image Retrieval Systems, an Overview and Analysis},
  booktitle = {``Computer as a Tool'' (EUROCON 2005), Proceedings of the International Conference on},
  publisher = {IEEE},
  year = {2005},
  volume = {2},
  pages = {143--146},
  note = {google scholar entry: EUROCON 2005 - The International Conference on ``Computer as a Tool''. Belgrade, Serbia, 21-24 November 2005.},
  url = {http://www.ing.unibs.it/~cost292/pubs/eurocon05/Divna_Djordjevic.pdf},
  doi = {10.1109/EURCON.2005.1629879}
}
Djordjevic D, Zhang Q and Izquierdo E (2005), "Fusion of semantic and visual information for automatic indexing and annotation of key frame images", In Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005), Proceedings of the 2nd European Workshop on the. London, England, November, 2005, pp. 293-299. IET.
Abstract: Modern database structures consisting of audio visual and low-level semantic data are in line with the needs and requirements of end user customers. It is difficult to expect the end user to provide any high level input rather than a very general key word annotation of the recorded data. A typical user scenario appearing in modern multimedia deals with having just audio-visual data with no additional knowledge. How to effectively annotate and enable efficient indexing and categorisation that would subsequently provide easy browsing and retrieval of desired content is the problem addressed in this paper.
BibTeX:
@inproceedings{djordjevic2005fusion,
  author = {Djordjevic, Divna and Zhang, Qianni and Izquierdo, Ebroul},
  title = {Fusion of semantic and visual information for automatic indexing and annotation of key frame images},
  booktitle = {Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005), Proceedings of the 2nd European Workshop on the},
  publisher = {IET},
  year = {2005},
  pages = {293--299},
  note = {google scholar entry: 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005). London, England, 30 November - 1 December 2005.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1575999},
  doi = {10.1049/ic.2005.0746}
}
Dorado A, Pedrycz W and Izquierdo E (2005), "User-Driven Fuzzy Clustering: On the Road to Semantic Classification", In Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Proceedings of the 10th International Conference on. Regina, Saskatchewan, August, 2005. Vol. 3641, pp. 421-430. Springer.
Abstract: The work leading to this paper is semantic image classification. The aim is to evaluate contributions of clustering mechanisms to organize low-level features into semantically meaningful groups whose interpretation may relate to some description task pertaining to the image content. Cluster assignment reveals underlying structures in the data sets without requiring prior information. The semantic component indicates that some domain knowledge about the classification problem is available and can be used as part of the training procedures. Besides, data structural analysis can be applied to determine proximity and overlapping between classes, which leads to misclassification problems. This information is used to guide the algorithms towards a desired partition of the feature space and establish links between visual primitives and classes. It derives into partially supervised learning modes. Experimental studies are addressed to evaluate how unsupervised and partially supervised fuzzy clustering boost semantic-based classification capabilities.
BibTeX:
@inproceedings{Dorado2005user,
  author = {Dorado, Andres and Pedrycz, Witold and Izquierdo, Ebroul},
  editor = {Slezak, Dominik and Wang, Guoyin and Szczuka, Marcin and Düntsch, Ivo and Yao, Yiyu},
  title = {User-Driven Fuzzy Clustering: On the Road to Semantic Classification},
  booktitle = {Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Proceedings of the 10th International Conference on},
  publisher = {Springer},
  year = {2005},
  volume = {3641},
  pages = {421--430},
  note = {google scholar entry: 10th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGrC 2005). Regina, Saskatchewan, August 31 - September 3, 2005.},
  url = {http://link.springer.com/chapter/10.1007/11548669_44},
  doi = {10.1007/11548669_44}
}
Izquierdo E (2005), "Image segmentation combining non-linear diffusion and the Nystrom extension", In Proceedings of the SPIE 2005 Visual Communications and Image Processing (VCIP 2005). Beijing, China, July, 2005. Vol. 5960, pp. 1686-1694.
Abstract: An approach for image segmentation is presented. Images are first preprocessed using multiscale simplification by nonlinear diffusion. Subsequently image segmentation of the resulting smoothed images is carried out. The actual segmentation step is based on the estimation of the Eigenvectors and Eigenvalues of a matrix derived from both the total dissimilarity and the total similarity between different groups of pixels in the image. This algorithm belong to the class of spectral methods, specifically, the Nystron extension introduced by Fowlkes et al in [1]. Stability analysis of the approximation of the underlying spectral partitioning is presented. Modifications of Fowlkes technique are proposed to improve the stability of the algorithm. The proposed modifications include a criterion for the selection of the initial sample and numerically stable estimations of ill-posed inverse matrices for the solution of the underlying mathematical problem. Results of selected computer experiments are reported to validate the superiority of the proposed approach when compared with the technique proposed in [1].
BibTeX:
@inproceedings{Izquierdo2005image,
  author = {Izquierdo, Ebroul},
  editor = {Li, Shipeng and Pereira, Fernando and Shum, Heung-Yeung and Tescher, Andrew G.},
  title = {Image segmentation combining non-linear diffusion and the Nystrom extension},
  booktitle = {Proceedings of the SPIE 2005 Visual Communications and Image Processing (VCIP 2005)},
  year = {2005},
  volume = {5960},
  pages = {1686--1694},
  note = {google scholar entry: 2005 Visual Communications and Image Processing (VCIP 2005). Beijing, China, 12-15 July 2005.},
  url = {http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=876045},
  doi = {10.1117/12.633218}
}
Izquierdo E (2005), "Knowledge Space of Semantic Inference for Automatic Annotation and Retrieval of Multimedia Content -KSpace-", In Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005), Proceedings of the 2nd European Workshop on the. London, England, November, 2005, pp. 441-442. IET.
Abstract: K-Space is a network of leading research teams from academia and industry conducting integrative research and dissemination activities in semantic inference for semi-automatic annotation and retrieval of multimedia content. K-Space exploits the complementary expertise of project partners, enables resource optimization and fosters innovative research in the field. The aim of K-Space research is to narrow the gap between lowlevel content descriptions that can be computed automatically by a machine and the richness and subjectivity of semantics in high-level human interpretations of audiovisual media: The Semantic Gap. Specifically, K-Space integrative research focus on three areas: Content-based multimedia analysis; Knowledge extraction and Semantic multimedia.
BibTeX:
@inproceedings{izquierdo2005knowledge,
  author = {Izquierdo, Ebroul},
  title = {Knowledge Space of Semantic Inference for Automatic Annotation and Retrieval of Multimedia Content -KSpace-},
  booktitle = {Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005), Proceedings of the 2nd European Workshop on the},
  publisher = {IET},
  year = {2005},
  pages = {441--442},
  note = {google scholar entry: 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005). London, England, 30 November - 1 December 2005.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1576051},
  doi = {10.1049/ic.2005.0770}
}
Izquierdo E and Dorado A (2005), "Climbing the Semantic Ladder: Towards Semantic Semi-Automatic Image Annotation using MPEG-7 Descriptor Schemas", In Computer Architecture for Machine Perception (CAMP 2005), Proceedings of the 7th International Workshop on. Palermo, Sicily, July, 2005, pp. 257-262. IEEE.
Abstract: The fast development of innovative tools to create user friendly and effective multimedia libraries, services and environments requires novel concepts to support storage, annotation and retrieval of huge amounts of digital audiovisual data. This article presents a technique to tackle the first instance of the problem in visual digital archives: classification using generic semantic descriptions. As a case study, classification abilities inherent to some important MFEG-7 low-level visual descriptors are explored and quantified.
BibTeX:
@inproceedings{izquierdo2005climbing,
  author = {Izquierdo, Ebroul and Dorado, Andres},
  title = {Climbing the Semantic Ladder: Towards Semantic Semi-Automatic Image Annotation using MPEG-7 Descriptor Schemas},
  booktitle = {Computer Architecture for Machine Perception (CAMP 2005), Proceedings of the 7th International Workshop on},
  publisher = {IEEE},
  year = {2005},
  pages = {257--262},
  note = {google scholar entry: 7th International Workshop on Computer Architecture for Machine Perception (CAMP 2005). Palermo, Sicily, 4-6 July 2005.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1508195},
  doi = {10.1109/CAMP.2005.16}
}
Kitanovski V, Taskovski D and Bogdanova S (2005), "Watermark Generation using Image-dependent Key for Image Authentication", In ``Computer as a Tool'' (EUROCON 2005), Proceedings of the International Conference on. Belgrade, Serbia, November, 2005. Vol. 2, pp. 947-950. IEEE.
Abstract: In order to increase security of the blind watermarking schemes for image authentication it is desirable to use image-dependent keys in the process of watermark generation. In this paper we propose a new watermarking method for image authentication where the watermark is generated using the image hash as a key. The robust image hash, invariant to legitimate modifications, but fragile to illegitimate modifications is generated from the local image characteristics. Due to this relation between the watermark and the image hash, it is possible to differentiate among legitimate and illegitimate modifications of the image. Quantized index modulation of DCT coefficients is used for watermark embedding. Watermark detection is performed without use of the original image. Experimental results demonstrate the effectiveness of the proposed method in terms of robustness and fragility.
BibTeX:
@inproceedings{kitanovski2005watermark,
  author = {Kitanovski, Vlado and Taskovski, Dimitar and Bogdanova, Sofilja},
  title = {Watermark Generation using Image-dependent Key for Image Authentication},
  booktitle = {``Computer as a Tool'' (EUROCON 2005), Proceedings of the International Conference on},
  publisher = {IEEE},
  year = {2005},
  volume = {2},
  pages = {947--950},
  note = {google scholar entry: EUROCON 2005 - The International Conference on �Computer as a Tool�. Belgrade, Serbia, 21-24 November 2005.},
  url = {http://www.ing.unibs.it/~cost292/pubs/eurocon05/Ivan_Damnjanovic.pdf},
  doi = {10.1109/EURCON.2005.1630103}
}
Mrak M, Sprljan N and Izquierdo E (2005), "A Resolution Adaptive Interpolation Technique for Enhanced Decoding of Scalable Coded Video", In Acoustics, Speech, and Signal Processing (ICASSP 2005), Proceedings of the 30th IEEE International Conference on. Philadelphia, Pennsylvania, March, 2005. Vol. 2, pp. 353-356. IEEE.
Abstract: Subpixel accurate motion compensated temporal filtering introduces a significant coding gain in scalable 3D wavelet video codecs. The influence of the chosen subpixel interpolation technique has not yet been fully analysed in the context of resolution scalability. That problem is addressed in this paper. It is shown that support for increased accuracy and resolution adaptive spatial interpolation needs to be featured in a scalable video decoder, when low resolution sequences are targeted. Using the proposed resolution adaptive filters based on sinc kernels leads to improved decoding performance at low resolution in the sense of achieving higher quality while reducing the complexity of the system.
BibTeX:
@inproceedings{mrak2005resolution,
  author = {Mrak, Marta and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {A Resolution Adaptive Interpolation Technique for Enhanced Decoding of Scalable Coded Video},
  booktitle = {Acoustics, Speech, and Signal Processing (ICASSP 2005), Proceedings of the 30th IEEE International Conference on},
  publisher = {IEEE},
  year = {2005},
  volume = {2},
  pages = {353--356},
  note = {google scholar entry: 30th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). Philadelphia, Pennsylvania, 18-23 March 2005.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1415414},
  doi = {10.1109/ICASSP.2005.1415414}
}
Pani D, Passino G and Raffo L (2005), "Run-time adaptive resources allocation and balancing on nanoprocessors arrays", In Digital System Design (DSD 2005), Proceedings of the 8th Euromicro Conference on. Porto, Portugal, August, 2005, pp. 492-499.
Abstract: Modern processor architectures try to exploit the different kind of parallelism that may be found even in general purpose applications. In this paper we present a new architecture based on an array of nanoprocessors that parallely and cooperatively support both Thread and Instruction level parallelism. A such architecture doesn't explicitly require any particular programming techniques since it has been developed to deal with standard sequential programs. Preliminary results on a model of the architecture show the feasibility of the proposed approach.
BibTeX:
@inproceedings{pani2005run,
  author = {Pani, Danilo and Passino, Giuseppe and Raffo, Luigi},
  title = {Run-time adaptive resources allocation and balancing on nanoprocessors arrays},
  booktitle = {Digital System Design (DSD 2005), Proceedings of the 8th Euromicro Conference on},
  year = {2005},
  pages = {492--499},
  note = {google scholar entry: 8th Euromicro Conference on Digital System Design (DSD 2005). Porto, Portugal, 30 August - 3 September 2008.},
  url = {https://sites.google.com/site/zeppethefake/publications/dcd05runtime.pdf},
  doi = {10.1109/DSD.2005.70}
}
Peixoto E, de Queiroz RL and Mukherjee D (2005), "Mobile video communications using a Wyner-Ziv transcoder", In Proceedings of the SPIE 2008 Visual Communications and Image Processing (VCIP 2008). San Jose, California, January, 2005. Vol. 6822(28), pp. 1069-1079.
Abstract: In mobile-to-mobile video communications, both the transmitting and receiving ends may not have the necessary computing power to perform complex video compression and decompression tasks. Traditional video codecs typically have highly complex encoders and less complex decoders. However, Wyner-Ziv (WZ) coding allows for a low complexity encoder at the price of a more complex decoder. We propose a video communication system where the transmitter uses a WZ (reversed complexity) coder, while the receiver uses a traditional decoder, hence minimizing complexity at both ends. For that to work we propose to insert a transcoder in the network to convert the video stream. We present an efficient transcoder from a simple WZ approach to H.263. Our approach saves a large amount of the computation by reusing the motion estimation performed at the WZ decoder stage, among other things. Results are presented to demonstrate the transcoder performance.
BibTeX:
@inproceedings{peixoto2008mobile,
  author = {Peixoto, Eduardo and de Queiroz, Ricardo L. and Mukherjee, Debargha},
  editor = {Pearlman, William A. and Woods, John W. and Lu, Ligang},
  title = {Mobile video communications using a Wyner-Ziv transcoder},
  booktitle = {Proceedings of the SPIE 2008 Visual Communications and Image Processing (VCIP 2008)},
  year = {2005},
  volume = {6822},
  number = {28},
  pages = {1069-1079},
  note = {google scholar entry: 2008 Visual Communications and Image Processing (VCIP 2008). San Jose, California, 29-31 January 2008.},
  url = {http://image.unb.br/queiroz/papers/vcip_2008_transcoder.pdf},
  doi = {10.1117/12.765545}
}
Sprljan N, Mrak M, Abhayaratne C and Izquierdo E (2005), "A Scalable Coding Framework for Efficient Video Adaptation", In Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European. Montreux, Switzerland, April, 2005, pp. 1-4. National Technical University of Athens (NTUA).
Abstract: Current digital video applications require video coding techniques that cater a wide range of quality levels, spatial resolutions and frame rates supporting different user preferences, varying transmission bandwidths and terminal capabilities. Efficient adaptation of video content is vital in such application environments. Encoding video in scalable formats support fast and efficient adaptation. This paper presents a flexible scalable video coding framework that supports mandatory scalability functionalities required for video adaptation. The encoder, decoder and extractor modules of the framework, the scalable bits stream descriptions and video adaptation are presented.
BibTeX:
@inproceedings{sprljan2005scalable,
  author = {Sprljan, Nikola and Mrak, Marta and Abhayaratne, Charith and Izquierdo, Ebroul},
  title = {A Scalable Coding Framework for Efficient Video Adaptation},
  booktitle = {Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European},
  publisher = {National Technical University of Athens (NTUA)},
  year = {2005},
  pages = {1--4},
  note = {google scholar entry: 6th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005). Montreux, Switzerland, 13-15 April 2005.},
  url = {http://www.ing.unibs.it/~cost292/pubs/wiamis05/cr1076.pdf}
}
Sprljan N, Mrak M and Izquierdo E (2005), "A fast error protection scheme for transmission of embedded coded images over unreliable channels and fixed packet size", In Acoustics, Speech, and Signal Processing (ICASSP 2005), Proceedings of the 30th IEEE International Conference on. Philadelphia, Pennsylvania, March, 2005. Vol. 3, pp. 741-744. IEEE.
Abstract: Joint source-channel coding enables efficient transmission of embedded bitstreams over unreliable channels. We address channels with fixed packetisation and decoding without or with minimal delays. The computation of an optimal protection scheme for such bitstreams is generally an exponential complexity problem and hence not applicable in a straightforward implementation. Using the rate-distortion characteristics of the source bitstream and the dynamic programming approach we construct an efficient unequal error protection scheme for the predefined channels. Our algorithm is of linear complexity and thus applicable in real time scenarios.
BibTeX:
@inproceedings{sprljan2005fast,
  author = {Sprljan, Nikola and Mrak, Marta and Izquierdo, Ebroul},
  title = {A fast error protection scheme for transmission of embedded coded images over unreliable channels and fixed packet size},
  booktitle = {Acoustics, Speech, and Signal Processing (ICASSP 2005), Proceedings of the 30th IEEE International Conference on},
  publisher = {IEEE},
  year = {2005},
  volume = {3},
  pages = {741--744},
  note = {google scholar entry: 30th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). Philadelphia, Pennsylvania, 18-23 March 2005.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1415816},
  doi = {10.1109/ICASSP.2005.1415816}
}
Trujillo M and Izquierdo E (2005), "Combining K-Means and Semivariogram-Based Grid Clustering", In Proceedings of the 47th International Symposium on Electronics in Marine (ELMAR 2005). Zadar, Croatia, June, 2005, pp. 9-12. IEEE.
Abstract: Clustering is useful in several situations, amongst others: data mining, information retrieval, image segmentation, and data classification. In this paper an approach for grouping data sets that are indexed in the space is proposed. It is based on the k-means algorithm and grid clustering. The former is the simplest and most commonly used clustering technique. A major problem with this algorithm is that it is sensitive to the selection of the initial partition. The latter is commonly used for grouping data that are indexed in the space. The goal in this paper is to overcome the high sensitivity of the k-means algorithm to the starting conditions by using the available spatial information. A semivariogram-based grid clustering is introduced. It uses the spatial correlation for determining the bin size. Since the bins are constrained to regular blocks while the spatial distribution of objects is not regular, we propose to combine this technique with a conventional k-means algorithm. By using the semivariogram an excellent initialization of the k-means is provided. Experimental results show that the final partition preserves the spatial distribution of the objects
BibTeX:
@inproceedings{trujillo2005combining,
  author = {Trujillo, Maria and Izquierdo, Ebroul},
  editor = {Grgić, Mislav and Kos, Tomislav and Grgić, Sonja},
  title = {Combining K-Means and Semivariogram-Based Grid Clustering},
  booktitle = {Proceedings of the 47th International Symposium on Electronics in Marine (ELMAR 2005)},
  publisher = {IEEE},
  year = {2005},
  pages = {9--12},
  note = {google scholar entry: 47th International Symposium on Electronics in Marine (ELMAR 2005). Zadar, Croatia, 8-10 June 2005.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5606090},
  doi = {10.1109/ELMAR.2005.193628}
}
Zeljkovic V, Pokrajac D, Dorado A and Izquierdo E (2005), "Moving Object Detection under Significant Changes of Lighting", In Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European. April, 2005. National Technical University of Athens (NTUA).
Abstract: Surveillance applications require reliable components with capabilities to work under disturbance conditions such as momentary power interruption. The research leading to this paper has been focused on implementing a robust algorithm to detect movement of foreground objects independent of illumination changes. An improved version combining time and space analysis is presented. Experimental results show a promising advance in the addressed problem.
BibTeX:
@inproceedings{zeljkovic2005moving,
  author = {Zeljkovic, Vesna and Pokrajac, Dragoljub and Dorado, Andres and Izquierdo, Ebroul},
  title = {Moving Object Detection under Significant Changes of Lighting},
  booktitle = {Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Proceedings of the 6th European},
  publisher = {National Technical University of Athens (NTUA)},
  year = {2005},
  note = {google scholar entry: 6th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005). Montreux, Switzerland, 13-15 April 2005.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/WIAMIS2005_VesnaZeljkovic.pdf}
}
Zgaljic T, Sprljan N and Izquierdo E (2005), "Bitstream Syntax Description based Adaptation of Scalable Video", In Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005), Proceedings of the 2nd European Workshop on the. London, England, November, 2005, pp. 173-178. IET.
Abstract: Using scalable coding technology the multimedia content can be easily made accessible on different display terminals connected through heterogeneous channels, as scalable bitstreams can be seamlessly adapted according to the network restrictions and/or terminal properties. To make the adaptation process coding format-independent, a XML description of the bitstream scalable structure is created. Then, the adaptation is performed on the XML description and an adapted bitstream is generated from the adapted XML description. In this paper techniques used in scalable video coding, bitstream descriptions and XML as well as extractor based adaptation of scalable video bitstreams are presented.
BibTeX:
@inproceedings{zgaljic2005bitstream,
  author = {Zgaljic, Toni and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {Bitstream Syntax Description based Adaptation of Scalable Video},
  booktitle = {Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005), Proceedings of the 2nd European Workshop on the},
  publisher = {IET},
  year = {2005},
  pages = {173--178},
  note = {google scholar entry: 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2005). London, England, 30 November - 1 December 2005.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1576051},
  doi = {10.1049/ic.2005.0728}
}
Zgaljic T, Sprljan N and Izquierdo E (2005), "Scalable Video Adaptation based on Bitstream Syntax Description", In Proceedings of 2nd Workshop on Immersive Communication and Broadcast Systems (ICOB 2005). Berlin, Germany, October, 2005. Fraunhofer HHI.
BibTeX:
@inproceedings{zgaljic2005scalable,
  author = {Zgaljic, Toni and Sprljan, Nikola and Izquierdo, Ebroul},
  title = {Scalable Video Adaptation based on Bitstream Syntax Description},
  booktitle = {Proceedings of 2nd Workshop on Immersive Communication and Broadcast Systems (ICOB 2005)},
  publisher = {Fraunhofer HHI},
  year = {2005},
  note = {google scholar entry: 2nd Workshop on Immersive Communication and Broadcast Systems (ICOB 2005). Berlin, Germany, 27-28 October 2005.}
}

Theses and Monographs

Djordjević I (2005), "Watermarking of MPEG2 Video Streams". Thesis at: Queen Mary University of London. January, 2005, pp. 1-138.
Abstract: The main field of this study is digital watermarking of compressed video sequences. The targeted area in the field of digital watermarking is data embedding and indexing, which can be used in applications such as video indexing and retrieval. Having in mind that the content in video databases is mainly compressed and that video retrieval applications demand real-time capabilities, this work is focused on efficient and real-time watermark embedding and decoding in the compressed domain. The implemented watermarking technique is based on well-known spread spectrum techniques. The watermark is spread by a large chip factor, modulated by a pseudo sequence and then added to the DCT coefficients of an MPEG2 sequence. Detection probability was increased by a new block-wise random watermark bits interleaving. The watermark must be embedded in such a way that it does not introduce visual artefacts into the host signal, hence the power of the watermarking signal is bounded by perceptual visibility. The perceptual watermark adjustment is done using the information from a corresponding DCT block in the original sequence. A novel adaptation method is based on the Just Noticeable Difference (JND) model with block classification. Since a transmission channel has its own particular capacity, the bit-rate of a video stream needs to be chosen to comply with the capacity of the channel. Therefore, watermarking of a compressed video bit-stream must not increase its bit-rate. A novel technique for bit-rate control on the macroblock level increased the number of watermarked coefficients in comparison with existing schemes. To boost the capacity, a state-of-the-art error correction coding technique -- turbo coding -- was employed. The watermarking channel has a small signal-to-noise ratio and a potentially large bit-error rate due to the noise introduced by the host signal and attacks. In such an environment, it was essential to protect the watermark message by introducing redundant bits, which will be used for error correction.
BibTeX:
@phdthesis{djordjevic2006watermarking,
  author = {Djordjević, Ivan},
  editor = {Izquierdo, Ebroul},
  title = {Watermarking of MPEG2 Video Streams},
  school = {Queen Mary University of London},
  year = {2005},
  pages = {1--138},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/PhDThesis_IvanDjordjevic.pdf}
}
Dorado A (2005), "Towards Semantic-Based Image Annotation". Thesis at: Queen Mary University of London, pp. 1-120.
Abstract: This research work addresses the problem of using concept-related indexing of image content as a near-automatic way to perform semantic image annotation. The main objective is to provide a framework in which lexical information of visual interpretations and their components (concept-related indexes) can be used to perform content-based image annotation. Several design phases starting from the formation of an MPEG-7 learning space to the construction of a robust semantic indexer were applied. Salient features of the proposed framework for concept-related indexing of image content are: Provide a suitable combination of low-level visual features. A specific concern is on defining a structure in which MPEG-7 descriptor elements may be aggregated into feature vectors. A structure is proposed to preserve the semantics embedded in descriptors, avoid description overriding, and control the vector dimensionality using the minimum number of required elements. Unambiguous interpretation is reduced using a built-in knowledge base consisting of concepts organized into a restrained lexicon. A learning procedure, which applies partially supervised clustering, is an important component within the framework when presenting the exemplars to the learner. A fuzzy partition of the learning space is used not only to approximate semantically meaningful groups, but also to facilitate long-term learning. Semantic profiles are proposed to incorporate conceptualisation of image content into clusters. The underlying relationship between features vectors is considered in defining the semantic profiles. In addition, a matching procedure is proposed to estimate distance between semantic profiles. High scalability by using partitions rather than the entire learning space to add new image concepts, or new image representations, is provided. This issue relates long-term learning and contributes to improve generalization capabilities. Experimental studies demonstrated how the proposed framework leads to a potential solution of equipping content-based image retrieval systems with learnable concepts useful to deal with meta-information.
BibTeX:
@phdthesis{dorado2005towards,
  author = {Dorado, Andres},
  editor = {Izquierdo, Ebroul},
  title = {Towards Semantic-Based Image Annotation},
  school = {Queen Mary University of London},
  year = {2005},
  pages = {1--120},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/PhDThesis_AndresDorado.pdf}
}


2004

Journal Papers

Dorado A, Calic J and Izquierdo E (2004), "A Rule-Based Video Annotation System", Circuits and Systems for Video Technology, IEEE Transactions on. May, 2004. Vol. 14(5), pp. 622-633. IEEE.
Abstract: A generic system for automatic annotation of videos is introduced. The proposed approach is based on the premise that the rules needed to infer a set of high-level concepts from low-level descriptors cannot be defined a priori. Rather, knowledge embedded in the database and interaction with an expert user is exploited to enable system learning. Underpinning the system at the implementation level is preannotated data that dynamically creates signification links between a set of low-level features extracted directly from the video dataset and high-level semantic concepts defined in the lexicon. The lexicon may consist of words, icons, or any set of symbols that convey the meaning to the user. Thus, the lexicon is contingent on the user, application, time, and the entire context of the annotation process. The main system modules use fuzzy logic and rule mining techniques to approximate human-like reasoning. A rule-knowledge base is created on a small sample selected by the expert user during the learning phase. Using this rule-knowledge base, the system automatically assigns keywords from the lexicon to nonannotated video clips in the database. Using common low-level video representations, the system performance was assessed on a database containing hundreds of broadcasting videos. The experimental evaluation showed robust and high annotation accuracy. The system architecture offers straightforward expansion to relevance feedback and autonomous learning capabilities.
BibTeX:
@article{dorado2004rule,
  author = {Dorado, Andres and Calic, Janko and Izquierdo, Ebroul},
  title = {A Rule-Based Video Annotation System},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2004},
  volume = {14},
  number = {5},
  pages = {622--633},
  url = {http://epubs.surrey.ac.uk/1831/1/fulltext.pdf},
  doi = {10.1109/TCSVT.2004.826764}
}
Grgić S, Grgić M and Mrak M (2004), "Reliability of Objective Picture Quality Measures", Journal of Electrical Engineering. Vol. 55(1-2), pp. 3-10. UNIVERSITY ``POLITEHNICA'' TIMISOARA.
Abstract: This paper investigates a set of objective picture quality measures for application in still image compression systems and emphasizes the correlation of these measures with subjective picture quality measures. Picture quality is measured using nine different objective picture quality measures and subjectively using Mean Opinion Score ( MOS) as measure of perceived picture quality. The correlation between each objective measure and MOS is found. The effects of different image compression algorithms, image contents and compression ratios are assessed. Our results show that some objective measures correlate well with the perceived picture quality for a given compression algorithm but they are not reliable for an evaluation across different algorithms. So, we compared objective picture quality measures across different algorithms and we found measures, which serve well in all tested image compression systems.
BibTeX:
@article{grgic2004reliability,
  author = {Grgić, Sonja and Grgić, Mislav and Mrak, Mrak},
  title = {Reliability of Objective Picture Quality Measures},
  journal = {Journal of Electrical Engineering},
  publisher = {UNIVERSITY ``POLITEHNICA'' TIMISOARA},
  year = {2004},
  volume = {55},
  number = {1-2},
  pages = {3--10},
  url = {http://www.vcl.fer.hr/papers_pdf/Reliability%20of%20objective%20picture%20quality%20measures.pdf}
}
Izquierdo E, Katsaggelos AK and Strintzis MG (2004), "Introduction to the Special Issue on Audio and Video Analysis for Multimedia Interactive Services", Circuits and Systems for Video Technology, IEEE Transactions on. May, 2004. Vol. 14(5), pp. 569-571.
BibTeX:
@article{izquierdo2004introduction,
  author = {Izquierdo, Ebroul and Katsaggelos, Aggelos Konstantinos and Strintzis, Michael G.},
  title = {Introduction to the Special Issue on Audio and Video Analysis for Multimedia Interactive Services},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  year = {2004},
  volume = {14},
  number = {5},
  pages = {569--571},
  doi = {10.1109/TCSVT.2004.828719}
}
Zeljkovic V, Dorado A and Izquierdo E (2004), "Combining a Fuzzy Rule-Based Classifier and Illumination Invariance for Improved Building Detection", Circuits and Systems for Video Technology, IEEE Transactions on. November, 2004. Vol. 14(11), pp. 1277-1280. IEEE.
Abstract: The problem of edge-based classification of natural video sequences containing buildings and captured under changing lighting conditions is addressed in this letter. The introduced approach is derived from two empiric observations: In static regions the likelihood of finding features that match the patterns of ``buildings'' is high because buildings are rigid static objects; and misclassification can be reduced by filtering out image regions changing or deforming in time. These regions may contain objects semantically different to buildings but with a highly similar edge distribution, e.g. high frequency of vertical and horizontal edges. Using these observations a strategy is devised in which a fuzzy rule-based classification technique is combined with a method for changing region detection in outdoor scenes. The proposed approach has been implemented and tested with sequences showing changes in the lighting conditions. Selected results from the experimental evaluation are reported.
BibTeX:
@article{zeljkovic2004combining,
  author = {Zeljkovic, Vesna and Dorado, Andres and Izquierdo, Ebroul},
  title = {Combining a Fuzzy Rule-Based Classifier and Illumination Invariance for Improved Building Detection},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2004},
  volume = {14},
  number = {11},
  pages = {1277--1280},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1347197},
  doi = {10.1109/TCSVT.2004.835145}
}
Zeljkovic Vesna, Dorado A, Trpovski Že and Izquierdo E (2004), "Classification of building images in video sequences", Electronics Letters. February, 2004. Vol. 40(3), pp. 169-170. IET.
Abstract: A technique that improves precision in classification results using information extracted from video features is introduced. It combines fuzzy rule-based classification with the detection of changing regions in outdoor scenes. The approach is invariant to extreme illumination changes.
BibTeX:
@article{zeljkovic2004classification,
  author = {Zeljkovic, Vesna, and Dorado, Andres and Trpovski, Željen and Izquierdo, Ebroul},
  title = {Classification of building images in video sequences},
  journal = {Electronics Letters},
  publisher = {IET},
  year = {2004},
  volume = {40},
  number = {3},
  pages = {169--170},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1267520},
  doi = {10.1049/el:20040128}
}

Books and Chapters in Books

Izquierdo E (2004), "Fragile Watermarking for Image Authentication", In Multimedia Security Handbook. Boca Raton, Florida , pp. 359-386. CRC Press.
BibTeX:
@incollection{izquierdo2004fragileauthentication,
  author = {Izquierdo, Ebroul},
  editor = {Furht, Borko and Kirovski, Darko},
  title = {Fragile Watermarking for Image Authentication},
  booktitle = {Multimedia Security Handbook},
  publisher = {CRC Press},
  year = {2004},
  pages = {359--386},
  url = {http://www.crcpress.com/product/isbn/9780849327735}
}

Conference Papers

Dorado A, Djordjevic D, Izquierdo E and Pedrycz W (2004), "Supervised Semantic Scene Classification Based on Low-Level Clustering and Relevance Feedback", In Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2004), Proceedings of the 1st European Workshop on the. London, England, November, 2004, pp. 441-442. QMUL.
Abstract: A framework for semantic-based scene classification using relevance feedback is presented. The semantic component casts the classifier within a framework of the supervised � or learning-from-examples � paradigm. Selection of suitable examples and labeling training patterns imposes a certain burden on the user that increases with the complexity of the ontology involved in the scene interpretation. The proposed framework involves an on-line clustering whose intent is to create ``natural'' groups of patterns extracted from the scenes. The user adds some domain knowledge by labeling a number of randomly selected samples. Relevance feedback is incorporated to reinforce the training of the classifier in a `learning with a critic' mode. To tackle the stability/plasticity dilemma that rises in changing the clusters arrangement, an intermediate structure is used to organize the patterns into semantically meaningful groups. The framework shows promising results and alleviates some of the drawbacks present when exploiting mechanisms of partial supervision when dealing with scene classification.
BibTeX:
@inproceedings{Dorado2004supervised,
  author = {Dorado, Andres and Djordjevic, Divna and Izquierdo, Ebroul and Pedrycz, Witold},
  editor = {Hobson, Paola and Izquierdo, Ebroul and Kompatsiaris, Ioannis and O'Connor, Noel E. },
  title = {Supervised Semantic Scene Classification Based on Low-Level Clustering and Relevance Feedback},
  booktitle = {Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2004), Proceedings of the 1st European Workshop on the},
  publisher = {QMUL},
  year = {2004},
  pages = {441--442},
  note = {google scholar entry: 1st European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2004). London, England, 25-26 November 2004.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.9438}
}
Dorado A and Izquierdo E (2004), "Exploiting Problem Domain Knowledge for Accurate Building Image Classification", In Image and Video Retrieval. Proceedings of the 3rd International Conference on Image and Video Retrieval (CIVR 2004). Dublin, Ireland, July, 2004. Vol. 3115, pp. 199-206. Springer.
Abstract: An approach for classification of building images through rule-based fuzzy inference is presented. It exploits rough matching and problem domain knowledge to improve precision results. This approach uses knowledge representation based on a fuzzy reasoning model for establishing a bridge between visual primitives and their interpretations.
Knowledge representation goes from low level to high level features. The knowledge is acquired from both visual content and users. These users provide the interpretations of low level features as well as their knowledge and experience to improve the rule base.
Experiments are tailored to building image classification. This approach can be extended to other semantic categories, i.e. skyline, vegetation, landscapes. Results show that proposed method is promising support for semantic annotation of image/video content.
BibTeX:
@inproceedings{Dorado2004exploiting,
  author = {Dorado, Andres and Izquierdo, Ebroul},
  editor = {Enser, Peter and Kompatsiaris, Yiannis and O'Connor, Noel E. and Smeaton, Alan F. and Smeulders, Arnold W.M.},
  title = {Exploiting Problem Domain Knowledge for Accurate Building Image Classification},
  booktitle = {Image and Video Retrieval. Proceedings of the 3rd International Conference on Image and Video Retrieval (CIVR 2004).},
  publisher = {Springer},
  year = {2004},
  volume = {3115},
  pages = {199--206},
  note = {google scholar entry: 3rd International Conference on Image and Video Retrieval (CIVR 2004). Dublin, Ireland, 21-23 July 2004.},
  url = {http://link.springer.com/chapter/10.1007/978-3-540-27814-6_26},
  doi = {10.1007/978-3-540-27814-6_26}
}
Izquierdo E (2004), "A Technique for Secure Image Authentication", In Proceedings of the 46th International Symposium on Electronics in Marine (ELMAR 2004). Zadar, Croatia, June, 2004, pp. 67-73. Croatian Society of Electronics in Marine (ELMAR).
Abstract: A technique for secure image authentication is presented. It is based on a rather unconventional approach: the extremely high sensitivity of ill-posed operators to any change in the input data is turned into the tool to achieve fragile watermarking for secure image verification. Regarding general requirements for secure watermark-based authentication analytical and practical aspects of the introduced technique are discussed. It is shown that the proposed authentication schema is highly secure while providing excellent tamper localization. Several experiments were conducted to demonstrate the effectiveness of the technique when it is subject to different attacks including vector quantization counterfeiting and cropping.
BibTeX:
@inproceedings{izquierdo2004technique,
  author = {Izquierdo, Ebroul},
  editor = {Kos, Tomislav and Grgić, Mislav},
  title = {A Technique for Secure Image Authentication},
  booktitle = {Proceedings of the 46th International Symposium on Electronics in Marine (ELMAR 2004)},
  publisher = {Croatian Society of Electronics in Marine (ELMAR)},
  year = {2004},
  pages = {67--73},
  note = {google scholar entry: 46th International Symposium on Electronics in Marine (ELMAR 2010). Zadar, Croatia, 16-18 September 2004.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5606090}
}
Izquierdo E, Damnjanovic I, Villegas P, Xu L-Q and Herrmann S (2004), "Bringing user satisfaction to media access: the IST BUSMAN Project", In Information Visualisation (IV 2004), Proceedings of the 8th International Conference on. London, England, July, 2004, pp. 444-449. IEEE.
Abstract: Interactive and seamless access to video content is facilitated by technology that is able to annotate and retrieve media data efficiently and automatically with the minimal human interventions. Media should be accessible by intended users quickly and independent of the database size, regardless of the homogeneous or heterogeneous nature of the delivery channels or the user computing platform. These requirements are driving research and development efforts in the European project BUSMAN
BibTeX:
@inproceedings{izquierdo2004bringing,
  author = {Izquierdo, Ebroul and Damnjanovic, Ivan and Villegas, Paulo and Xu, Li-Qun and Herrmann, Stephan},
  title = {Bringing user satisfaction to media access: the IST BUSMAN Project},
  booktitle = {Information Visualisation (IV 2004), Proceedings of the 8th International Conference on},
  publisher = {IEEE},
  year = {2004},
  pages = {444--449},
  note = {google scholar entry: 8th International Conference on Information Visualisation (IV 2004). London, England, 14-16 July 2004.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1320182},
  doi = {10.1109/IV.2004.1320182}
}
Izquierdo E and Guerra Ones V (2004), "Numerical Stability of Nystrom Extension for Image Segmentation", In Machine Learning for Signal Processing (MLSP 2004), Proceedings of the 14th IEEE Signal Processing Society Workshop on. Sao Luis, Brazil, September, 2004, pp. 609-614. IEEE.
Abstract: A stability analysis of the approximate solution for spectral partitioning in image segmentation based on the Nystrom extension is presented. Algorithmic modifications are introduced to improve the stability of the original technique reported in (C. Fowlkes et al. 2004). The proposed improvement includes a criterion for the selection of the initial sample and more stable estimations of inverse matrices. The proposed algorithm is validated by several computer experiments
BibTeX:
@inproceedings{izquierdo2004numerical,
  author = {Izquierdo, Ebroul and Guerra Ones, Valia},
  editor = {Barros, Allan and Principe, Jose and Larsen, Jan and Adali, Tülay and Douglas, Scott},
  title = {Numerical Stability of Nystrom Extension for Image Segmentation},
  booktitle = {Machine Learning for Signal Processing (MLSP 2004), Proceedings of the 14th IEEE Signal Processing Society Workshop on},
  publisher = {IEEE},
  year = {2004},
  pages = {609--614},
  note = {google scholar entry: 14th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing (MLSP 2004). Sao Luis, Brazil, 29 September - 1 October 2004.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1423024},
  doi = {10.1109/MLSP.2004.1423024}
}
Lui TY and Izquierdo E (2004), "Automatic Detection of Human Faces in Natural Scene Images by Use of Skin Colour and Edge Distribution", In Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on. Lisbon, Portugal, April, 2004, pp. 112-117. Instituto Superior Técnico, Lisboa, Portugal.
Abstract: We study different skin colour models and analysis the facial texture based on edge distribution to automatically detect and locate human faces in natural scene images. First colour segmentation is performed to obtain face candidates of an input image. Then the resulted binary image is grouped into clusters of connected pixels. Edge distribution is computed from each cluster and a SVM face classifier is trained for face detection. The experimental results show that the combination of skin segmentation and of edge distribution in detecting faces of various poses is efficient and effective.
BibTeX:
@inproceedings{lui2004modified,
  author = {Lui, Tsz Ying and Izquierdo, Ebroul},
  title = {Automatic Detection of Human Faces in Natural Scene Images by Use of Skin Colour and Edge Distribution},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on},
  publisher = {Instituto Superior Técnico, Lisboa, Portugal},
  year = {2004},
  pages = {112--117},
  note = {google scholar entry: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2004). Lisboa, Portugal, 21-23 April 2004.},
  url = {http://www.iti.gr/SCHEMA/files/document/26-02-2004/wiamis2004_andy.pdf}
}
Mrówka E, Dorado A, Pedrycz W and Izquierdo E (2004), "Dimensionality Reduction for Content-Based Image Classification", In Information Visualisation (IV 2004), Proceedings of the 8th International Conference on. London, England, July, 2004, pp. 435-438. IEEE.
Abstract: Interactive and seamless access to video content is facilitated by technology that is able to annotate and retrieve media data efficiently and automatically with the minimal human interventions. Media should be accessible by intended users quickly and independent of the database size, regardless of the homogeneous or heterogeneous nature of the delivery channels or the user computing platform. These requirements are driving research and development efforts in the European project BUSMAN
BibTeX:
@inproceedings{mrowka2004dimensionality,
  author = {Mrówka, Edyta and Dorado, Andres and Pedrycz, Witold and Izquierdo, Ebroul},
  title = {Dimensionality Reduction for Content-Based Image Classification},
  booktitle = {Information Visualisation (IV 2004), Proceedings of the 8th International Conference on},
  publisher = {IEEE},
  year = {2004},
  pages = {435--438},
  note = {google scholar entry: 8th International Conference on Information Visualisation (IV 2004). London, England, 14-16 July 2004.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1320180},
  doi = {10.1109/IV.2004.1320180}
}
Mrak M, Abhayaratne GCK and Izquierdo E (2004), "On the influence of motion vector precision limiting in scalable video coding", In Signal Processing (ICSP 2004), Proceedings of the 7th International Conference on. Beijing, China, August, 2004. Vol. 2, pp. 1143-1146. IEEE.
Abstract: Recent studies on scalable video coding have not only substantiated the need for such technology but also made evident that many related problems remain open and need to be tackled if truly scalable video coding is to be achieved. One of these challenges relates to the coding of motion vectors. In conventional coders motion vectors are treated and coded in a nonprogressive manner. Since scalable video coding targets decoding at several resolutions and a wide range of quality levels, the motion information needs to be encoded in an adaptive way. We propose a simple, yet efficient, strategy for scalable motion vector coding. The results show improvements of resolution scalability performance at lower-bit rates, while overcoming any negative influence at high resolutions and bit-rates.
BibTeX:
@inproceedings{mrak2004influence,
  author = {Mrak, Marta and Abhayaratne, Guruge Charith Kanchana and Izquierdo, Ebroul},
  editor = {YUAN, Baozong and RUAN, Qiuqi and TANG, Xiaofang},
  title = {On the influence of motion vector precision limiting in scalable video coding},
  booktitle = {Signal Processing (ICSP 2004), Proceedings of the 7th International Conference on},
  publisher = {IEEE},
  year = {2004},
  volume = {2},
  pages = {1143--1146},
  note = {google scholar entry: 7th International Conference on Signal Processing (ICSP 2004). Beijing, China, 31 August - 4 September 2004},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1441526},
  doi = {10.1109/ICOSP.2004.1441526}
}
Mrak M and Izquierdo E (2004), "Resolution Scalability Drawbacks of an MCTF Video Coder", In Proceedings of the Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004). London, England, April, 2004, pp. 176-177.
BibTeX:
@inproceedings{mrak2004resolution,
  author = {Mrak, Marta and Izquierdo, Ebroul},
  title = {Resolution Scalability Drawbacks of an MCTF Video Coder},
  booktitle = {Proceedings of the Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004)},
  year = {2004},
  pages = {176--177},
  note = {google scholar entry: Postgraduate Research Conference in Electronics, Photonics, Communications and Networks, and Computing Science (PREP2004). London, England, 5-7 April 2004.}
}
Mrak M, Sprljan N, Abhayaratne C and Izquierdo E (2004), "Scalabled Generation and Coding of Motion Vectors for Highly Scalable Video Coding", In Proceedings of the 24th Picture Coding Symposium (PCS 2004). San Francisco, CA, December, 2004. Tektronix, Inc.
Abstract: One of the challenges of achieving highly scalable video coding (SVC) is the estimation and coding of motion vectors. In conventional video coders motion vectors are generated and lossless coded in a non-scalable manner. In SVC, motion vectors are estimated for the highest spatial resolution and they are scaled down at the decoder for lower spatial resolution accordingly. In such a case, the motion vectors cause a fixed cost irrespective of the decoding spatial resolution. In this paper we propose a simple, yet efficient, framework for scalable generation and scalable coding of motion information, for highly scalable video coding. This framework is based on the spatial resolution (scale) dependent precision limiting of motion vectors. The results show substantial improvements of spatial resolution scalability performance at low bit rates.
BibTeX:
@inproceedings{mrak2004scalable,
  author = {Mrak, Marta and Sprljan, Nikola and Abhayaratne, Charith and Izquierdo, Ebroul},
  title = {Scalabled Generation and Coding of Motion Vectors for Highly Scalable Video Coding},
  booktitle = {Proceedings of the 24th Picture Coding Symposium (PCS 2004)},
  publisher = {Tektronix, Inc},
  year = {2004},
  note = {google scholar entry: 24th Picture Coding Symposium (PCS 2004). San Francisco, California, 15-17 December 2004.}
}
Mrak M, Sprljan N and Izquierdo E (2004), "An Overview of Basic Techniques behind Scalable Video Coding", In Proceedings of the 46th International Symposium on Electronics in Marine (ELMAR 2004). Zadar, Croatia, June, 2004, pp. 597-602. Croatian Society of Electronics in Marine (ELMAR).
Abstract: The recent research activities on scalable video coding have set the base for a future video coding standard that will enable efficient serving of various demands without the need for transcoding. This work aims to provide an overview of basic techniques behind scalable coding.
BibTeX:
@inproceedings{mrak2004overview,
  author = {Mrak, Marta and Sprljan, Nikola and Izquierdo, Ebroul},
  editor = {Kos, Tomislav and Grgić, Mislav},
  title = {An Overview of Basic Techniques behind Scalable Video Coding},
  booktitle = {Proceedings of the 46th International Symposium on Electronics in Marine (ELMAR 2004)},
  publisher = {Croatian Society of Electronics in Marine (ELMAR)},
  year = {2004},
  pages = {597--602},
  note = {google scholar entry: 46th International Symposium on Electronics in Marine (ELMAR 2004). Zadar, Croatia, 16-18 September 2004.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1356449}
}
Sprljan N, Djordjevic D and Izquierdo E (2004), "Scalability Evaluation of Still Image Coders", In Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on. Lisbon, Portugal , pp. 1-4. Instituto Superior Técnico, Lisboa, Portugal.
Abstract: Transmission of multimedia content over heterogeneous networks requires highly adaptive compression systems. Therefore, its fully scalable performance is an especially attractive feature, since it enables partial decoding that is adaptive to the given requirements. This work is aiming for establishing of a framework for comparison of scalable still image coders. We give a comparison of scalability features for several popular image compression methods and we propose the methodology for testing.
BibTeX:
@inproceedings{sprljan2004scalability,
  author = {Sprljan, Nikola and Djordjevic, Divna and Izquierdo, Ebroul},
  title = {Scalability Evaluation of Still Image Coders},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on},
  publisher = {Instituto Superior Técnico, Lisboa, Portugal},
  year = {2004},
  pages = {1--4},
  note = {google scholar entry: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2004). Lisboa, Portugal, 21-23 April 2004.},
  url = {http://www.eecs.qmul.ac.uk/~ebroul/mmv_publications/pdf/cr1131.pdf}
}
Sprljan N, Mrak M and Izquierdo E (2004), "Image Compression Using a Cartoon-Texture Decomposition Technique", In Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on. Lisbon, Portugal, April, 2004, pp. 1-4. Instituto Superior Técnico, Lisboa, Portugal.
Abstract: Based on the assumption that an image is composed of two separated layers - smooth regions and textures, a new image compression technique is established. In our multilayered compression scenario, a non-linear diffusion is used to smooth out texture and to decompose images into a large scale piecewise smooth component and a residual part consisting of ``pure texture''. Since the extracted layers have different characteristics, suitable transforms are used to encode each layer individually. Both PSNR results and subjective quality are comparable to popular image compression techniques.
BibTeX:
@inproceedings{zeljkovic2004modified2,
  author = {Sprljan, Nikola and Mrak, Marta and Izquierdo, Ebroul},
  title = {Image Compression Using a Cartoon-Texture Decomposition Technique},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on},
  publisher = {Instituto Superior Técnico, Lisboa, Portugal},
  year = {2004},
  pages = {1--4},
  note = {google scholar entry: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2004). Lisboa, Portugal, 21-23 April 2004.},
  url = {ftp://www.smap2006.org/pub/4dkonto/WIAMIS-2004/wiamis/papers/cr1050.pdf}
}
Trujillo M and Izquierdo E (2004), "A Robust Correlation Measure for Correspondence Estimation", In 3D Data Processing, Visualization and Transmission (3DPVT 2004), Proceedings of the 2nd International Symposium on. Thessaloniki, Greece, September, 2004, pp. 155-162. IEEE.
Abstract: A median correlation for the estimation of corresponding points in stereovision is proposed. It is based on the normalised correlation coefficient using the median instead of the mean. Its performance appears to be superior than conventional correlation specially in depth discontinuities image areas. This conclusion is derived from an empirical evaluation in which the proposed correlation is compared with the normalised correlation coefficient and the sum of absolute difference. The results show that the median correlation produces higher scores and lower estimation errors.
BibTeX:
@inproceedings{trujillo2004robust,
  author = {Trujillo, Maria and Izquierdo, Ebroul},
  title = {A Robust Correlation Measure for Correspondence Estimation},
  booktitle = {3D Data Processing, Visualization and Transmission (3DPVT 2004), Proceedings of the 2nd International Symposium on},
  publisher = {IEEE},
  year = {2004},
  pages = {155--162},
  note = {google scholar entry: 2nd International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT 2004). Thessaloniki, Greece, 6-9 September 2004.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1335189},
  doi = {10.1109/TDPVT.2004.1335189}
}
Xu L-Q, Villegas P, Dez Mó, Izquierdo E, Herrmann S, Bottreau V, Damnjanovic I and Papworth D (2004), "A User-Centred System for End-to-End Secure Multimedia Content Delivery: From Content Annotation to Consumer Consumption", In Image and Video Retrieval. Proceedings of the 3rd International Conference on Image and Video Retrieval (CIVR 2004). Dublin, Ireland, July, 2004. Vol. 3115, pp. 656-664. Springer.
Abstract: The paper discusses the current status of progress of the on-going EU IST BUSMAN project (Bringing User Satisfaction to Media Access Networks), which now approaches the milestone of its 2nd year running. The issues explained include the motivation and approaches behind the design of its client-server system architecture for effective data flows handling, the progress in the implementation of the proposed server system functionalities, and the advanced video processing algorithms investigated and adopted. A fully functional client-server system for video content management, search and retrieval for both professional use scenarios and customers with either fixed or wireless network connections is expected to be demonstrable by year end of 2004.
BibTeX:
@inproceedings{xu2004user,
  author = {Xu, Li-Qun and Villegas, Paulo and Dez, Mónica and Izquierdo, Ebroul and Herrmann, Stephan and Bottreau, Vincent and Damnjanovic, Ivan and Papworth, Damien},
  editor = {Enser, Peter and Kompatsiaris, Yiannis and OConnor, Noel E. and Smeaton, Alan F. and Smeulders, Arnold W. M.},
  title = {A User-Centred System for End-to-End Secure Multimedia Content Delivery: From Content Annotation to Consumer Consumption},
  booktitle = {Image and Video Retrieval. Proceedings of the 3rd International Conference on Image and Video Retrieval (CIVR 2004).},
  publisher = {Springer},
  year = {2004},
  volume = {3115},
  pages = {656--664},
  note = {google scholar entry: 3rd International Conference on Image and Video Retrieval (CIVR 2004). Dublin, Ireland, 21-23 July 2004.},
  url = {http://link.springer.com/chapter/10.1007/978-3-540-27814-6_76},
  doi = {10.1007/978-3-540-27814-6_76}
}
Zeljkovic V, Dorado A and Izquierdo E (2004), "A Modified Shading Model Method for Building Detection", In Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on. Lisbon, Portugal, April, 2004, pp. 1-4. Instituto Superior Técnico, Lisboa, Portugal.
Abstract: A technique for detection of building images in real-world video sequences is presented. It combines fuzzy rulebased classification with a modified shading model method for changing region detection in outdoor environment. Proposed technique uses information extracted from video features to improve precision in classification results. It has been tested on sequences under various lighting conditions. Satisfactory and promising results have been achieved.
BibTeX:
@inproceedings{zeljkovic2004modified,
  author = {Zeljkovic, Vesna and Dorado, Andres and Izquierdo, Ebroul},
  title = {A Modified Shading Model Method for Building Detection},
  booktitle = {Image Analysis for Multimedia Interactive Services (WIAMIS 2004), Proceedings of the 5th International Workshop on},
  publisher = {Instituto Superior Técnico, Lisboa, Portugal},
  year = {2004},
  pages = {1--4},
  note = {google scholar entry: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2004). Lisboa, Portugal, 21-23 April 2004.},
  url = {ftp://iva07.ntua.gr/pub/4dkonto/WIAMIS-2004/wiamis/papers/cr1086.pdf}
}

Theses and Monographs

Lui TY (2004), "Face Detection in Colour Images by ICA-SVM Architecture". Thesis at: Queen Mary University of London. October, 2004, pp. 1-74.
Abstract: We describe a face detection algorithm based on support vector machine (SVM). The algorithm consists of two steps. The first step is a skin detection model which serves as a platform to reduce the searching space for potential face candidates. The second step reduces the computational complexity of the SVM architecture by projecting the image signals into a face subspace, constructed under ICA framework, to reduce the dimensionality of the problem while preserving the unique facial features. Experiments were conducted using various real world data and results are reported.
BibTeX:
@phdthesis{lui2004face,
  author = {Lui, Tsz Ying},
  editor = {Izquierdo, Ebroul},
  title = {Face Detection in Colour Images by ICA-SVM Architecture},
  school = {Queen Mary University of London},
  year = {2004},
  pages = {1--74},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Theses/MPhilThesis_TszYingLui.pdf}
}
Hobson P, Izquierdo E, Kompatsiaris I, O'Connor NE (2004), In Knowledge-Based Media Analysis for Self-Adaptive and Agile Multi-Media :: Proceedings of the European Workshop for the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2004). London, England, 25-26 November 2004. pp. 466. QMUL.
BibTeX:
@proceedings{hobson2004knowledge,,
  editor = {Hobson, Paola and Izquierdo, Ebroul and Kompatsiaris, Ioannis and O'Connor, Noel E. },
  booktitle = {Knowledge-Based Media Analysis for Self-Adaptive and Agile Multi-Media :: Proceedings of the European Workshop for the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT 2004). London, England, 25-26 November 2004.},
  publisher = {QMUL},
  year = {2004},
  pages = {466},
  note = {missing conference papers}
}


2003

Journal Papers

Izquierdo E (2003), "Efficient and Accurate Image Based Camera Registration", Multimedia, IEEE Transactions on. September, 2003. Vol. 5(3), pp. 293-302. IEEE.
Abstract: A technique for efficient and accurate camera registration based on stereo image analysis is presented. Initially, few correspondences are estimated with high accuracy using a probabilistic relaxation technique. Accuracy is achieved by considering the continuous approximations of selected image areas using second order polynomials and a relaxation rule defined according to the likelihood that estimates obey stereoscopic constraints. The extrinsic camera parameters are then obtained using a novel efficient and robust approach derived from the classic eight point algorithm. Efficiency is achieved by solving a parametric linear optimization problem rather than a nonlinear one as more conventional methods attempt to do. Robustness is obtained by applying two novel strategies: normalization of the initial data via a simple but efficient diagonal scaling approach, and regularization of the underlying linear parametric optimization problem using meaningful constraints. The performance of the presented methods is assessed in several computer experiments using natural video data.
BibTeX:
@article{izquierdo2003efficient,
  author = {Izquierdo, Ebroul},
  title = {Efficient and Accurate Image Based Camera Registration},
  journal = {Multimedia, IEEE Transactions on},
  publisher = {IEEE},
  year = {2003},
  volume = {5},
  number = {3},
  pages = {293--302},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1223557},
  doi = {10.1109/TMM.2003.814910}
}
Izquierdo E and Guerra V (2003), "Estimating the Essential Matrix by Efficient Linear Techniques", Circuits and Systems for Video Technology, IEEE Transactions on. September, 2003. Vol. 13(9), pp. 925-935. IEEE.
Abstract: In the problem of recovering the 3D structure of a scene from its 2D projections, a fundamental low-level computer vision task is the estimation of the epipolar geometry. Accurate estimation of the epipolar geometry uses computationally expensive iteration schemes based on nonlinear algebraic constraints to deal with the ill-posedness of the problem. Linear techniques are computationally efficient but extremely unstable. Theoretical and practical aspects of linear methods are analyzed and fundamental results are derived from the study. Two main causes of instability are considered. The first one refers to the lack of homogeneity in the input data. To deal with this problem, a highly efficient scaling approach is introduced. The optimality of the technique is proven theoretically and heuristically. It is shown that a second source of instability arises from the linear dependency between rows of the matrix of the linear system. The effect of this problem in the estimation of the essential matrix is analyzed. An additional strategy is introduced to overcome this difficulty. This strategy improves the stability and accuracy of the linear approach even further while reducing the computational cost. Numerical experiments to evaluate the effectiveness of the proposed techniques are reported.
BibTeX:
@article{izquierdo2003estimating,
  author = {Izquierdo, Ebroul and Guerra, Valia},
  title = {Estimating the Essential Matrix by Efficient Linear Techniques},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2003},
  volume = {13},
  number = {9},
  pages = {925--935},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1233004},
  doi = {10.1109/TCSVT.2003.816503}
}
Izquierdo E and Guerra V (2003), "An ill-posed operator for secure image authentication", Circuits and Systems for Video Technology, IEEE Transactions on. August, 2003. Vol. 13(8), pp. 842-852. IEEE.
Abstract: Many problems in science and technology are modeled by ill-posed operators and the main difficulty in obtaining accurate solutions is caused by the high instability of such operators. The paper introduces a new schema for secure image authentication. It is based on a rather unconventional approach - the extremely high sensitivity of ill-posed operators to any change in the input data is turned into a tool to achieve fragile watermarking for secure image verification. The ill-posed operator of concern is based on a highly ill-conditioned matrix interrelating the watermark and the original image. Authentication is achieved by solving the least squares problem associated with the underlying linear operator. Regarding general requirements for secure watermark-based authentication, analytical and practical aspects of the introduced technique are discussed. It is shown that the proposed authentication schema is highly secure while providing excellent tamper localization. Several experiments were conducted to demonstrate the effectiveness of the technique when it is subjected to different attacks, including vector quantization counterfeiting and cropping.
BibTeX:
@article{izquierdo2003ill,
  author = {Izquierdo, Ebroul and Guerra, Valia},
  title = {An ill-posed operator for secure image authentication},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2003},
  volume = {13},
  number = {8},
  pages = {842--852},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1227612},
  doi = {10.1109/TCSVT.2003.815961}
}
Izquierdo E, Kim H-J and Macq B (2003), "Introduction to the Special Issue on Authentication, Copyright Protection, and Information Hiding", Circuits and Systems for Video Technology, IEEE Transactions on. August, 2003. Vol. 13(8), pp. 729-731. IEEE.
BibTeX:
@article{izquierdo2003introduction,
  author = {Izquierdo, Ebroul and Kim, Hyoung-Joong and Macq, Benoit},
  title = {Introduction to the Special Issue on Authentication, Copyright Protection, and Information Hiding},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {2003},
  volume = {13},
  number = {8},
  pages = {729--731},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1227602},
  doi = {10.1109/TCSVT.2003.817838}
}

Books and Chapters in Books

Izquierdo E (2003), "Hierarchy Embedded Content and Content Descriptors as Basis for Cross-Media Applications", In Cross-Media Service Delivery. Vol. 740, pp. 121-132. Springer.
Abstract: The main goal of Cross-Media technology is to provide seamless delivery of and access to digital content, enabling optimal, user centred multi-channel and cross-platform media services. Regardless of the nature of the delivery channel, homogeneous or heterogeneous, and regardless of the user platform, media should be accessible efficiently and at optimum quality over the full available bandwidths and independently of terminal capabilities. If we accept that such Cross-Media services will be based on applications that are both network aware and also terminal aware, it will be necessary to focus on crucial issues related to content processing. In this paper, we argue that the development of efficient applications underpinning Cross-Media systems need to be based on what we call hierarchically embedded content and metadata. The presented discussion relates to the content engineering side of Cross-Media and describes how it can be used to facilitate, if not to achieve, the overall goal of the Cross-Media concept. Three examples combining available technologies and ongoing developments to produce hierarchically embedded content and metadata are presented. Their application in the development of Cross-Media systems is discussed and results of computer experiments aiming to demonstrate the suitability of these techniques are reported.
BibTeX:
@incollection{izquierdo2003hierarchy,
  author = {Izquierdo, Ebroul},
  editor = {Spinellis, Diomidis},
  title = {Hierarchy Embedded Content and Content Descriptors as Basis for Cross-Media Applications},
  booktitle = {Cross-Media Service Delivery},
  publisher = {Springer},
  year = {2003},
  volume = {740},
  pages = {121--132},
  url = {http://link.springer.com/chapter/10.1007/978-1-4615-0381-1_11},
  doi = {10.1007/978-1-4615-0381-1_11}
}

Conference Papers

Dorado A and Izquierdo E (2003), "An Approach for Supervised Semantic Annotation", In Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). Apr, 2003. Vol. 616, pp. 117-121. World Scientific.
Abstract: Advanced Content-Based Image Retrieval Systems combine automatic extracted low-level features with user-specified interpretations to generate semantic descriptors, which allow to conceptualize queries in a search engine. The user' interpretations can be summarized by means of symbols such as keywords. In this paper, a supervised image annotation process is presented. This process combines color and texture features with symbolic descriptions for semi-automatic and incremental annotation of images. In this way, images can be retrieved using a semantic concept human beings are familiar with.
BibTeX:
@inproceedings{dorado2003approach,
  author = {Dorado, Andres and Izquierdo, Ebroul},
  editor = {Izquierdo, Ebroul},
  title = {An Approach for Supervised Semantic Annotation},
  booktitle = {Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003)},
  publisher = {World Scientific},
  year = {2003},
  volume = {616},
  pages = {117--121},
  note = {google scholar entry: 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). London, England, April 9-11, 2003.},
  url = {http://www.worldscientific.com/doi/abs/10.1142/9789812704337_0022},
  doi = {10.1142/9789812704337_0022}
}
Dorado A and Izquierdo E (2003), "Knowledge Representation of Low Level Features for Semantic Video Analysis", In Proceedings of the Workshop on Multimedia Discovery and Mining (MDM 2003) [at ECML/PKDD 2003]. Cavtat-Dubrovnik, Croatia, 22-26 September 2003. Cavtat-Dubrovnik, Croatia, September, 2003, pp. 72-83. Jožef Stefan Institute, Slovenia.
Abstract: An approach towards automatic knowledge representation for semantic-based video analysis is described. The approach consists of two stages: learning and generalization. In the learning stage, fuzzy sets are used to map low level features into a set of user-specified keywords. Frequent patterns representing associations between low level features and semantic concepts are extracted applying association rule mining. These patterns are used to build an inference rule base. In the generalization stage, the inference rule base is used to automatically associate appropriate keywords to video clips. Experimental results show that this approach is suitable to support analysis of video content.
BibTeX:
@inproceedings{dorado2003knowledge,
  author = {Dorado, Andres and Izquierdo, Ebroul},
  editor = {Mladenić, Dunja and Paa Gerhard},
  title = {Knowledge Representation of Low Level Features for Semantic Video Analysis},
  booktitle = {Proceedings of the Workshop on Multimedia Discovery and Mining (MDM 2003) [at ECML/PKDD 2003]. Cavtat-Dubrovnik, Croatia, 22-26 September 2003.},
  publisher = {Jožef Stefan Institute, Slovenia},
  year = {2003},
  pages = {72--83},
  note = {google scholar entry: Workshop on Multimedia Discovery and Mining (MDM 2003) [at ECML/PKDD 2003]. Cavtat-Dubrovnik, Croatia, 22-26 September 2003.}
}
Dorado A and Izquierdo E (2003), "Semantic Labeling of Images Combining Color, Texture and Keywords", In Image Processing (ICIP 2003), Proceedings of the 10th International Conference on. Barcelona, Catalonia, September, 2003. Vol. 3, pp. 9-12. IEEE.
Abstract: Content-based image retrieval systems combine perceptual features such as color, texture and shape with semantic concepts for improving the quality of the query's results. In this paper, an annotation technique that combines color and texture with keywords is presented. A method based on color similarity along with a keyword mining technique is used to propagate keywords extracted from a sub-set of annotated images into a large-scale database. A method based on texture properties is applied to link keywords with regions within the images. Finally, an approach for semantic labeling of images is described. In this approach, accuracy of the annotations is estimated and the relationships among keywords are identified. The presented annotation technique is useful for labeling images with keywords construing the underlying semantic content.
BibTeX:
@inproceedings{dorado2003semantic,
  author = {Dorado, Andres and Izquierdo, Ebroul},
  title = {Semantic Labeling of Images Combining Color, Texture and Keywords},
  booktitle = {Image Processing (ICIP 2003), Proceedings of the 10th International Conference on},
  publisher = {IEEE},
  year = {2003},
  volume = {3},
  pages = {9--12},
  note = {google scholar entry: 10th International Conference on Image Processing (ICIP 2003). Barcelona, Catalonia, 14-18 September 2003.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.80&rep=rep1&type=pdf},
  doi = {10.1109/ICIP.2003.1247168}
}
Dorado A and Izquierdo E (2003), "Semi-Automatic Image Annotation Using Frequent Keyword Mining", In Information Visualisation (IV 2003), Proceedings of the 7th International Conference on. London, England, July, 2003, pp. 532-535. IEEE.
Abstract: Research in content-based image retrieval is an expanding discipline with an accelerated growth in the last ten years. Advances in telecommunications and the huge demand of visual information on Internet and mobile devices is occupying the attention of the researchers in developing efficient systems to ease the task of useful visual information retrieval by the users. We present a semiautomatic image annotation process using the low-level image descriptor fuzzy color signature to extract the most similar images from an annotated database and frequent pattern mining to select the candidates keywords for annotating the new image. The idea is aimed at establishing a bridge between visual data and their interpretation using a weak semantic approach.
BibTeX:
@inproceedings{dorado2003semi,
  author = {Dorado, Andres and Izquierdo, Ebroul},
  title = {Semi-Automatic Image Annotation Using Frequent Keyword Mining},
  booktitle = {Information Visualisation (IV 2003), Proceedings of the 7th International Conference on},
  publisher = {IEEE},
  year = {2003},
  pages = {532--535},
  note = {google scholar entry: 7th International Conference on Information Visualisation (IV 2003). London, England, 16-18 July 2003.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1218036},
  doi = {10.1109/IV.2003.1218036}
}
Feng Y and Izquierdo E (2003), "Robust Local Watermarking on Salient Image Areas", In Digital Watermarking. First International Workshop, IWDW 2002, Seoul, Korea, November 21-22, 2002. Revised Papers. Seoul, Korea, November, 2003. (2613), pp. 189-201. Springer.
Abstract: Digital content services and tools have spread all over the world creating an acute need for robust copyright protection and ownership management systems. Robust watermarking technology is playing a crucial roll in these systems developments. This paper reports a robust technique for digital image watermarking. Salient image areas extracted from the pixel domain are used to embed local watermarks in the frequency domain. Relevant image corners are first extracted using well- established scale-space theories for edge and corner detection. Fairly small areas around each corner are used to insert the watermark. The objective is to achieve robustness against local and global geometric transformations as well as conventional image processing attacks while keeping the computational cost low. The DFT is applied on the discrete polar coordinates over asymmetric grids defined on uniform angles and nonuniform radii. The objective is to keep the rotation invariance property of the DFT magnitude while avoiding numerical and interpolation errors on the radii axis. A blind watermark detection process based on statistical correlation is used. Several experiments to validate the robustness of the proposed approach were conducted and are reported in the paper.
BibTeX:
@inproceedings{feng2003robust,
  author = {Feng, Yi and Izquierdo, Ebroul},
  editor = {Peticolas, Fabien A. P. and Kim, Hyoung-Joong},
  title = {Robust Local Watermarking on Salient Image Areas},
  booktitle = {Digital Watermarking. First International Workshop, IWDW 2002, Seoul, Korea, November 21-22, 2002. Revised Papers},
  publisher = {Springer},
  year = {2003},
  number = {2613},
  pages = {189--201},
  note = {google scholar entry: First International Workshop on Digital Watermarking (IWDW 2002). Seoul, Korea, 21-22 November 2002.},
  url = {http://link.springer.com/chapter/10.1007/3-540-36617-2_16},
  doi = {10.1007/3-540-36617-2_16}
}
Izquierdo E (2003), "Successive approximations for scalable content and content descriptors in multimedia applications", In Video/Image Processing and Multimedia Communications (VIPMC 2003), Proceedings of the 4th EURASIP Conference focused on. Zagreb, Croatia, July, 2003. Vol. 1, pp. 137-142.
Abstract: Interactive media aims to provide seamless access to digital content in a proactive and user centred manner. Regardless of the nature of the delivery channel, homogeneous or heterogeneous, and regardless of the user platform, media should be accessible efficiently and at optimum quality over the full available bandwidths and independently of terminal capabilities. In this paper, we argue that the development of efficient applications underpinning interactive media systems need to be based on successive approximations of content and metadata. The presented discussion relates to the content engineering side of media applications and describes how it can be used to easy, if not to achieve, the overall goal of a new generation of media applications. Three examples combining available technologies and ongoing developments to produce successive approximations of content and metadata are presented. Their application in the development of interactive multimedia systems is discussed and results of computer experiments aiming to demonstrate the suitability of these techniques are reported.
BibTeX:
@inproceedings{izquierdo2003successive,
  author = {Izquierdo, Ebroul},
  title = {Successive approximations for scalable content and content descriptors in multimedia applications},
  booktitle = {Video/Image Processing and Multimedia Communications (VIPMC 2003), Proceedings of the 4th EURASIP Conference focused on},
  year = {2003},
  volume = {1},
  pages = {137--142},
  note = {google scholar entry: 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (VIPMC 2003). Zagreb, Croatia, 2-5 July 2003.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1220452},
  doi = {10.1109/VIPMC.2003.1220452}
}
Izquierdo E, Casas JR, Leonardi R, Migliorati P, O'Connor NE, Kompatsiaris I and Strintzis MG (2003), "Advanced Content-Based Semantic Scene Analysis and Information Retrieval: The Schema Project", In Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). London, England, Apr, 2003. Vol. 616, pp. 519-528. World Scientific.
Abstract: This volume contains papers describing state-of-the-art technology for advanced multimedia systems. It presents applications in broadcasting, copyright protection of multimedia content, image indexing and retrieval, and other topics related to computer vision. Upper-level undergraduates in computer science, researchers in image and video processing multimedia applications and computer vision.
BibTeX:
@inproceedings{izquierdo2003advanced,
  author = {Izquierdo, Ebroul and Casas, Josep R. and Leonardi, Riccardo and Migliorati, Pierangelo and O'Connor, Noel E. and Kompatsiaris, Ioannis and Strintzis, Michael G.},
  editor = {Izquierdo, Ebroul},
  title = {Advanced Content-Based Semantic Scene Analysis and Information Retrieval: The Schema Project},
  booktitle = {Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003)},
  publisher = {World Scientific},
  year = {2003},
  volume = {616},
  pages = {519--528},
  note = {google scholar entry: 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). London, England, 9-11 April 2003.},
  url = {http://www.worldscientific.com/doi/abs/10.1142/9789812704337_0094},
  doi = {10.1142/9789812704337_0094}
}
Izquierdo E and Guerra Ones V (2003), "Optimizing the Efficiency of Linear Techniques to Estimate the Epipolar Geometry", In Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). Apr, 2003. Vol. 616, pp. 439-444. World Scientific.
Abstract: Accurate estimation of the Epipolar geometry uses computational expensive iteration schemas based on nonlinear algebraic constraints to deal with the problem ill-posedness. Linear techniques are computationally efficient but extremely unstable. In this paper, it is shown that the linear schemas can be made more stable while reducing the computational cost if only a small but ``well selected'' subset of the input data is considered for the estimation. A method is proposed to automatically select the most linear independent rows in the matrix and to perform the estimation with these rows only. Numerical experiments confirm the effectiveness of the proposed technique.
BibTeX:
@inproceedings{izquierdo2003optimizing,
  author = {Izquierdo, Ebroul and Guerra Ones, Valia},
  editor = {Izquierdo, Ebroul},
  title = {Optimizing the Efficiency of Linear Techniques to Estimate the Epipolar Geometry},
  booktitle = {Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003)},
  publisher = {World Scientific},
  year = {2003},
  volume = {616},
  pages = {439--444},
  note = {google scholar entry: 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). London, England, 9-11 April 2003.},
  url = {http://www.worldscientific.com/doi/abs/10.1142/9789812704337_0080},
  doi = {10.1142/9789812704337_0080}
}
Lui TY and Izquierdo E (2003), "Scalable Object-Based Image Retrieval", In Image Processing (ICIP 2003), Proceedings of the 10th International Conference on. Barcelona, Catalonia, September, 2003. Vol. 3, pp. 501-504. IEEE.
Abstract: Digital visual libraries have currently available huge amounts of content in unstructured, non-indexed form. Since these collections keep growing fast, retrieving specific images is becoming extremely difficult. It is too slow to linearly search all the stored feature vectors to find those that satisfy the query criteria. Scalability is crucial for an image retrieval system to be practical and realistic. In this paper a simple hierarchical object descriptor scheme, which is compact, flexible, and inherently suited for hierarchical search, is described. By integrating a suitable segmentation algorithm into the descriptor generation schema, the proposed approach becomes object oriented. Basically, features used for the extraction of image regions belonging to single physical objects are used in the definition of object descriptors. The resulting technique generates compact scalable descriptions for each object in the database. Experimental results show the performance of the presented schema in terms of accuracy and scalability.
BibTeX:
@inproceedings{lui2003scalable,
  author = {Lui, Tsz Ying and Izquierdo, Ebroul},
  title = {Scalable Object-Based Image Retrieval},
  booktitle = {Image Processing (ICIP 2003), Proceedings of the 10th International Conference on},
  publisher = {IEEE},
  year = {2003},
  volume = {3},
  pages = {501--504},
  note = {google scholar entry: 10th International Conference on Image Processing (ICIP 2003). Barcelona, Catalonia, 14-18 September 2003.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.4806&rep=rep1&type=pdf},
  doi = {10.1109/ICIP.2003.1247291}
}
O'Connor NE, Sav S, Adamek T, Mezaris V, Kompatsiaris I, Lui TY, Izquierdo E, Bennström CF and Casas JR (2003), "Region and Object Segmentation Algorithms in the Qimera Segmentation Platform", In Proceedings of the 3rd International Workshop on Content-Based Multimedia Indexing (CBMI 2003). September, 2003, pp. 1-8.
Abstract: In this paper we present the Qimera segmentation platform and describe the different approaches to segmentation that have been implemented in the system to date. Analysis techniques have been implemented for both region-based and object-based segmentation. The region-based segmentation algorithms include: a colour segmentation algorithm based on a modified Recursive Shortest Spanning Tree (RSST) approach, an implementation of a colour image segmentation algorithm based on the K-Means-with-Connectivity-Constraint (KMCC) algorithm and an approach based on the Expectation Maximization (EM) algorithm applied in a 6D colour/texture space. A semi-automatic approach to object segmentation that uses the modified RSST approach is outlined. An automatic object segmentation approach via snake propagation within a level-set framework is also described. Illustrative segmentation results are presented in all cases. Plans for future research within the Qimera project are also discussed.
BibTeX:
@inproceedings{o2003region,
  author = {O'Connor, Noel E. and Sav, Sorin and Adamek, Tomasz and Mezaris, Vasileios and Kompatsiaris, Ioannis and Lui, Tsz Ying and Izquierdo, Ebroul and Bennström, Christian Ferran and Casas, Josep R.},
  title = {Region and Object Segmentation Algorithms in the Qimera Segmentation Platform},
  booktitle = {Proceedings of the 3rd International Workshop on Content-Based Multimedia Indexing (CBMI 2003)},
  year = {2003},
  pages = {1--8},
  note = {xxx--incorrect citation to this on qmul site: Region and Object Segmentation Algorithms in the QUIMERA Platform. Google scholar entry: Third International Workshop on Content-Based Multimedia Indexing (CBMI 2003). Rennes, France, 22-24 September 2003.},
  url = {http://doras.dcu.ie/389/}
}
Sprljan Nikola; Izquierdo E (2003), "New Perspectives on Image Compression Using a Cartoon - Texture Decomposition Model", In Video/Image Processing and Multimedia Communications (VIPMC 2003), Proceedings of the 4th EURASIP Conference focused on. Zagreb, Croatia, July, 2003. Vol. 1, pp. 359-368.
Abstract: A new method for multilayered image representation and its application in image compression are presented. The basic idea is to separate an image into two different layers having fundamentally different structures. Nonlinear diffusion is used to smooth out image texture in order to obtain a large scale piecewise smooth image representation called ``cartoon''. The residual part is considered as ``pure texture''. Since the best approximations of these two structurally different image layers can be realised by different transforms, higher compression ratios are achieved using suitably chosen transforms prior to individual encoding of each layer. Transform basis tailored to each layer are derived from a comprehensive empirical study. Several experiments were conducted to evaluate the performance of the proposed approach. The results show that in most cases the proposed technique achieves objectively slightly better and subjectively superior results than well-established image compression algorithms.
BibTeX:
@inproceedings{1220488,
  author = {Sprljan, Nikola; Izquierdo, Ebroul},
  title = {New Perspectives on Image Compression Using a Cartoon - Texture Decomposition Model},
  booktitle = {Video/Image Processing and Multimedia Communications (VIPMC 2003), Proceedings of the 4th EURASIP Conference focused on},
  year = {2003},
  volume = {1},
  pages = {359--368},
  note = {google scholar entry: 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (VIPMC 2003). Zagreb, Croatia, 2-5 July 2003.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1220488},
  doi = {10.1109/VIPMC.2003.1220488}
}
Trujillo M and Izquierdo E (2003), "KLT-Based Linear Scaling for the Estimation of the Fundamental Matrix", In Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). Apr, 2003. Vol. 616, pp. 411-416. World Scientific.
Abstract: The Epipolar Geometry describes the relation between two stereo images and the 3D scene. This geometry is encoded in a 3�3 singular matrix called the Fundamental Matrix. The Fundamental Matrix can be estimated from point correspondences in stereo images. Accurate techniques for the estimation of the Fundamental Matrix are based on optimization models constrained to complex non-linear algebraic conditions. More computationally efficient methods are based on regularization, scaling and preconditioning of simple linear systems derived from the classic eight points algorithm. In this paper a new scaling approach addressing the numerical stability of the eight-points algorithm is proposed. The technique exploits well know properties of the Karhunen-Loève Transform to improve the condition number and numerical properties of the linear system. The performance of the proposed method is compared with the diagonal scaling introduced by Izquierdo and Guerra as well as the isotropic and the non-isotropic scaling proposed by Hartley.
BibTeX:
@inproceedings{trujillo2003klt,
  author = {Trujillo, Maria and Izquierdo, Ebroul},
  editor = {Izquierdo, Ebroul},
  title = {KLT-Based Linear Scaling for the Estimation of the Fundamental Matrix},
  booktitle = {Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003)},
  publisher = {World Scientific},
  year = {2003},
  volume = {616},
  pages = {411--416},
  note = {google scholar entry: 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). London, England, 9-11 April 2003.},
  url = {http://www.worldscientific.com/doi/abs/10.1142/9789812704337_0075},
  doi = {10.1142/9789812704337_0075}
}
Trujillo M and Izquierdo E (2003), "Robust Estimation of the Fundamental Matrix by Exploiting Disparity Redundancies", In Proceedings of the 1st IASTED International Conference on Visualization, Imaging, and Image Processing (VIIP 2003), pp. 1-6. ACTA Press.
Abstract: In this paper an approach to estimate the fundamental matrix is proposed. The goal is to overcome the high vulnerability of linear models against noise and bad estimates by exploiting the structure of the input data. Initially, a small number of low-dimensionality least square problems are solved using well-selected subsets from the input data. The selection process is based on the inherent 3D structure encapsulated in the disparity vectors. It is shown that the 3D structure embedded in the input data provides means to filter redundant information and to obtain better estimates with few input points. The results of these estimations are fed into a Least Median of Squares schema, which is applied to recover the final estimate of the Fundamental Matrix. Several experiments were conducted to assess the performance of the proposed technique.
BibTeX:
@inproceedings{trujillo2003robust,
  author = {Trujillo, Maria and Izquierdo, Ebroul},
  title = {Robust Estimation of the Fundamental Matrix by Exploiting Disparity Redundancies},
  booktitle = {Proceedings of the 1st IASTED International Conference on Visualization, Imaging, and Image Processing (VIIP 2003)},
  publisher = {ACTA Press},
  year = {2003},
  pages = {1--6},
  note = {google scholar entry: 1st IASTED International Conference on Visualization, Imaging, and Image Processing (VIIP 2003). Benalmadena, Spain, 8-10 September 2003.},
  url = {http://www.actapress.com/Abstract.aspx?paperId=14303}
}
Villegas P, Herrmann S, Izquierdo E, Teh J and Xu L-Q (2003), "An Environment for Efficient Handling of Digital Assets", In Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS'03). London, England, April, 2003. Vol. 616, pp. 529-536. World Scientific.
Abstract: We present a system designed for the management of multimedia content through IP networks. It includes advanced cataloguing capabilities based on the MPEG-7 standard, and embedding of metadata and identifiers for content tracking through video watermarking. A prototype is being implemented as part of the BUSMAN IST project.
BibTeX:
@inproceedings{villegas2003environment,
  author = {Villegas, Paulo and Herrmann, Stephan and Izquierdo, Ebroul and Teh, Jonathan and Xu, Li-Qun},
  editor = {Izquierdo, Ebroul},
  title = {An Environment for Efficient Handling of Digital Assets},
  booktitle = {Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS'03)},
  publisher = {World Scientific},
  year = {2003},
  volume = {616},
  pages = {529--536},
  note = {google scholar entry: Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). London, England, 9-11 April 2003.},
  url = {http://schema.iti.gr/SCHEMA/files/document/24-06-2003/Villegas_wiamis03.pdf},
  doi = {10.1142/9789812704337_0095}
}
Wang Y and Izquierdo E (2003), "High-Capacity Data Hiding in MPEG-2 Compressed Video", In Recent Trends in Multimedia Information Processing: Proceedings of the 9th International Workshop on Systems, Signals and Image Processing (IWSSIP 2002). Manchester, England, November, 2003, pp. 212-218. World Scientific.
Abstract: A technique for high capacity data hiding in MPEG-2 streams is presented. The objective is to maximise the payload while keeping robustness and simplicity. Ancillary data is embedded in the host signal by modulating the quantized block DCT coefficients of I frames. To achieve robustness, each information bit is embedded in more than one DCT coefficient within each intra coded block. The extraction process is blind. Thus, the presented technique is suitable for side information delivery. The scheme is less complex than a complete decoding process followed by watermarking in the pixel domain and reencoding. Selected results of computer simulations are also reported.
BibTeX:
@inproceedings{Wang2003,
  author = {Wang, Yulin and Izquierdo, Ebroul},
  editor = {Liatsis, Panos},
  title = {High-Capacity Data Hiding in MPEG-2 Compressed Video},
  booktitle = {Recent Trends in Multimedia Information Processing: Proceedings of the 9th International Workshop on Systems, Signals and Image Processing (IWSSIP 2002)},
  publisher = {World Scientific},
  year = {2003},
  pages = {212--218},
  note = {google scholar entry: 9th International Workshop on Systems, Signals and Image Processing (IWSSIP 2002). Manchester, England, 7-8 November 2002.},
  url = {http://schema.iti.gr/SCHEMA/files/document/24-06-2003/Izquierdo02Manchester.pdf},
  doi = {10.1142/9789812776266_0031}
}

Theses and Monographs

(2003), "Digital Media Processing for Multimedia Interactive Services", In Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003). Apr, 2003. Vol. 616, pp. 591. World Scientific.
Abstract: This volume contains papers describing state-of-the-art technology for advanced multimedia systems. It presents applications in broadcasting, copyright protection of multimedia content, image indexing and retrieval, and other topics related to computer vision. Upper-level undergraduates in computer science, researchers in image and video processing multimedia applications and computer vision.
BibTeX:
@proceedings{izquierdo2003digital,,
  editor = {Izquierdo, Ebroul},
  title = {Digital Media Processing for Multimedia Interactive Services},
  booktitle = {Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003)},
  publisher = {World Scientific},
  year = {2003},
  volume = {616},
  pages = {591},
  note = {google scholar entry: 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '03). London, England, April 9-11, 2003.},
  url = {http://books.google.co.uk/books?id=vVvJINURimIC}
}


2002

Journal Papers

Bais M, Cosmas J, Dosch C, Engelsberg A, Erk A, Hansen PS, Healey P, Klungsoeyr GKK, Mies R, Ohm J-R, Paker Y, Pearmain A, Pedersen L, Sandvand Å, Schäfer R, Schoonjans P and Stammnitz P (2002), "Customized Television: Standards Compliant Advanced Digital Television", Broadcasting, IEEE Transactions on. June, 2002. Vol. 48(2), pp. 151-158. IEEE.
Abstract: This paper describes a European Union supported collaborative project called CustomTV based on the premise that future TV sets will provide all sorts of multimedia information and interactivity, as well as manage all such services according to each user's or group of user's preferences/profiles. We have demonstrated the potential of recent standards (MPEG-4 and MPEG-7) to implement such a scenario by building the following services: an advanced EPG, weather forecasting, and stock exchange/flight information.
BibTeX:
@article{bais2002customized,
  author = {Bais, Michel and Cosmas, John and Dosch, Christoph and Engelsberg, Andreas and Erk, Alexander and Hansen, Per Steinar and Healey, Pat and Klungsoeyr, Gunn Kristin Klungsoeyr and Mies, Ronald and Ohm, Jens-Rainer and Paker, Yakup and Pearmain, Alan and Pedersen, Lena and Sandvand, Åsmund and Schäfer, Rainer and Schoonjans, Peter and Stammnitz, Peter},
  title = {Customized Television: Standards Compliant Advanced Digital Television},
  journal = {Broadcasting, IEEE Transactions on},
  publisher = {IEEE},
  year = {2002},
  volume = {48},
  number = {2},
  pages = {151--158},
  url = {http://v-scheiner.brunel.ac.uk/bitstream/2438/1516/1/IEE%20tran%20broad.pdf},
  doi = {10.1109/TBC.2002.1021281}
}
Calic J and Izquierdo E (2002), "Temporal Segmentation of MPEG Video Streams", EURASIP Journal on Applied Signal Processing. June, 2002. (6), pp. 561-565. Springer.
Abstract: Many algorithms for temporal video partitioning rely on the analysis of uncompressed video features. Since the information relevant to the partitioning process can be extracted directly from the MPEG compressed stream, higher efficiency can be achieved utilizing information from the MPEG compressed domain. This paper introduces a real-time algorithm for scene change detection that analyses the statistics of the macroblock features extracted directly from the MPEG stream. A method for extraction of the continuous frame difference that transforms the 3D video stream into a 1D curve is presented. This transform is then further employed to extract temporal units within the analysed video sequence. Results of computer simulations are reported.
BibTeX:
@article{calic2002temporal2,
  author = {Calic, Janko and Izquierdo, Ebroul},
  title = {Temporal Segmentation of MPEG Video Streams},
  journal = {EURASIP Journal on Applied Signal Processing},
  publisher = {Springer},
  year = {2002},
  number = {6},
  pages = {561--565},
  url = {http://asp.eurasipjournals.com/content/pdf/1687-6180-2002-847653.pdf},
  doi = {10.1155/S1110865702000938}
}
Izquierdo E (2002), "Using Invariant Image Features for Synchronization in Spread Spectrum Image Watermarking", EURASIP Journal on Advances in Signal Processing. April, 2002. (4), pp. 412-419. Springer.
Abstract: A watermarking scheme is presented in which the characteristics of both spatial and frequency techniques are combined to achieve robustness against image processing and geometric transformations. The proposed approach consists of three basic steps: estimation of the just noticeable image distortion, watermark embedding by adaptive spreading of the watermark signal in the frequency domain, and extraction of relevant information relating to the spatial distribution of pixels in the original image. The just noticeable image distortion is used to insert a pseudo-random signal such that its amplitude is maintained below the distortion sensitivity of the pixel into which it is embedded. Embedding the watermark in the frequency domain guarantees robustness against compression and other common image processing transformations. In the spatial domain most salient image points are characterized using the set of Hilbert first-order differential invariants. This information is used to detect geometrical attacks in a frequency-domain watermarked image and to resynchronize the attacked image. The presented schema has been evaluated experimentally. The obtained results show that the technique is resilient to most common attacks including rotation, translation, and scaling.
BibTeX:
@article{izquierdo2002using,
  author = {Izquierdo, Ebroul},
  title = {Using Invariant Image Features for Synchronization in Spread Spectrum Image Watermarking},
  journal = {EURASIP Journal on Advances in Signal Processing},
  publisher = {Springer},
  year = {2002},
  number = {4},
  pages = {412--419},
  url = {http://asp.eurasipjournals.com/content/pdf/1687-6180-2002-205219.pdf},
  doi = {10.1155/S1110865702000719}
}
Izquierdo E and Ghanbari M (2002), "Key Components for an Advanced Segmentation System", Multimedia, IEEE Transactions on. March, 2002. Vol. 4(1), pp. 97-113. IEEE.
Abstract: An advanced image and video segmentation system is proposed. The system builds on existing work, but extends it to achieve efficiency and robustness, which are the two major shortcomings of segmentation methods developed so far. Six different schemes containing several approaches tailored for diverse applications constitute the core of the system. The first two focus on very-low complexity image segmentation addressing real-time applications under specific assumptions. The third scheme is a highly efficient implementation of the powerful nonlinear diffusion model. The other three schemes address the more complex task of physical object segmentation using information about the scene structure or motion. These techniques are based on an extended diffusion model and morphology. The main objective of this work has been to develop a robust and efficient segmentation system for natural video and still images. This goal has been achieved by advancing the state-of-art in terms of pushing forward the frontiers of current methods to meet the challenges of the segmentation task in different situations under reasonable computational cost. Consequently, more efficient methods and novel strategies to issues for which current approaches fail are developed. The performance of the presented segmentation schemes has been assessed by processing several video sequences. Qualitative and quantitative result of this assessment are also reported
BibTeX:
@article{izquierdo2002key,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Key Components for an Advanced Segmentation System},
  journal = {Multimedia, IEEE Transactions on},
  publisher = {IEEE},
  year = {2002},
  volume = {4},
  number = {1},
  pages = {97--113},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=985558},
  doi = {10.1109/6046.985558}
}

Conference Papers

Calic J and Izquierdo E (2002), "A Multiresolution Technique for Video Indexing and Retrieval", In Image Processing (ICIP 2002), Proceedings of the 9th International Conference on. Rochester, NY, September, 2002. Vol. 1, pp. 952-955. IEEE.
Abstract: This paper presents a novel approach to multiresolution analysis and scalability in video indexing and retrieval. A scalable algorithm for video parsing and key-frame extraction is introduced. The technique is based on real-time analysis of MPEG motion variables and scalable metrics simplification by discrete contour evolution. Furthermore, a hierarchical key-frame retrieval method using scalable colour histogram analysis is presented. It offers customisable levels of detail in the descriptor space, where the relevance order is determined by degradation of the image, and not by degradation of the image histogram. To assess the performance of the approach, several experiments have been conducted. Selected results are reported.
BibTeX:
@inproceedings{calic2002multiresolution,
  author = {Calic, Janko and Izquierdo, Ebroul},
  title = {A Multiresolution Technique for Video Indexing and Retrieval},
  booktitle = {Image Processing (ICIP 2002), Proceedings of the 9th International Conference on},
  publisher = {IEEE},
  year = {2002},
  volume = {1},
  pages = {952--955},
  note = {google scholar entry: 2002 International Conference on Image Processing (ICIP 2002), Rochester, New York, 22-25 September 2002.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1038185},
  doi = {10.1109/ICIP.2002.1038185}
}
Calic J, Sav S, Izquierdo E, Marlow S, Murphy N and O'Connor NE (2002), "Temporal Video Segmentation For Real-Time Key Frame Extraction", In Acoustics Speech and Signal Processing (ICASSP 2002), Proceedings of the 27th IEEE International Conference on. Orlando, FL, may, 2002. Vol. 4, pp. 3632-3635.
Abstract: The extensive amount of media coverage today, generates difficulties in identifying and selecting desired information. Browsing and retrieval systems become more and more necessary in order to support users with powerful and easy-to-use tools for searching, browsing and summarization of information content. The starting point for these tasks in video browsing and retrieval systems is the low level analysis of video content, especially the segmentation of video content into shots. This paper presents a fast and efficient way to detect shot changes using only the temporal distribution of macroblock types in MPEG compressed video. The notion of a dominant reference frame is introduced here. A dominant frame denotes the reference frame (I or P) used as prediction reference for most of the macroblocks from a subsequent B frame.
BibTeX:
@inproceedings{calic2002temporal,
  author = {Calic, Janko and Sav, Sorin and Izquierdo, Ebroul and Marlow, Seán and Murphy, Noel and O'Connor, Noel E.},
  title = {Temporal Video Segmentation For Real-Time Key Frame Extraction},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2002), Proceedings of the 27th IEEE International Conference on},
  year = {2002},
  volume = {4},
  pages = {3632--3635},
  note = {google scholar entry: 27th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002). Orlando, FL, 13-17 May 2002.},
  url = {http://doras.dcu.ie/247/1/ieee_icassp_2002.pdf},
  doi = {10.1109/ICASSP.2002.5745442}
}
Dorado A and Izquierdo E (2002), "Fuzzy color signatures", In Image Processing (ICIP 2002), Proceedings of the 9th International Conference on. Rochester, NY, September, 2002. Vol. 1, pp. 433-436. IEEE.
Abstract: With the large and increasing amount of visual information available in digital libraries and the Web, efficient and robust systems for image retrieval are urgently needed. A compact color descriptor scheme and an efficient metric to compare and retrieve images is presented. An image adaptive color clustering method, called fuzzy color signature, is proposed. The original image colors are mapped into a small number of representative colors using a peaks detection function derived from the color distribution. Fuzzy color signatures are then used as image descriptors. To compare image descriptors the earth mover's distance is used. Several experiments have been conducted to assess the performance of the proposed technique.
BibTeX:
@inproceedings{dorado2002fuzzy,
  author = {Dorado, Andres and Izquierdo, Ebroul},
  title = {Fuzzy color signatures},
  booktitle = {Image Processing (ICIP 2002), Proceedings of the 9th International Conference on},
  publisher = {IEEE},
  year = {2002},
  volume = {1},
  pages = {433--436},
  note = {google scholar entry: International Conference on Image Processing (ICIP 2002). Rochester, New York, 22-25 September 2002.},
  url = {http://mmv.eecs.qmul.ac.uk/Publications/mmv/pdf/Conference/ICIP2002_AndresDorado.pdf},
  doi = {10.1109/ICIP.2002.1038053}
}
Izquierdo E (2002), "Computational Experiments with Area-Based Stereo for Image-Based Rendering", In 3D Data Processing Visualization and Transmission (3DPVT 2002), Proceedings of the 1st International Symposium on. Padova, Italy, June, 2002, pp. 168-171. IEEE.
Abstract: In this paper two disparity estimators with different complexity degrees are described and used to examine how much disparity inaccuracies influence image rendering quality. The objective of this study is to design software-based image synthesis in real-time on conventional PC platforms. Basically, this work looks at the opposite end of the cost-complexity curve by making very restrained demands to the disparity estimator. It is empirically shown that in many cases the effect of disparity accuracy in the quality of virtual views is almost imperceptible and that for many applications requiring real-time processing reasonably good results can be achieved with less computational cost.
BibTeX:
@inproceedings{izquierdo20023d,
  author = {Izquierdo, Ebroul},
  title = {Computational Experiments with Area-Based Stereo for Image-Based Rendering},
  booktitle = {3D Data Processing Visualization and Transmission (3DPVT 2002), Proceedings of the 1st International Symposium on},
  publisher = {IEEE},
  year = {2002},
  pages = {168--171},
  note = {google scholar entry: 1st International Symposium on 3D Data Processing Visualization and Transmission (3DPVT 2002). Padova, Italy, 19-21 June 2002.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1024056},
  doi = {10.1109/TDPVT.2002.1024056}
}
Izquierdo E (2002), "Compact Hierarchical Image Descriptors", In Information Visualisation (IV 2002), Proceedings of the 6th International Conference on. London, England, July, 2002, pp. 627-630. IEEE.
Abstract: An efficient approach for image annotation and retrieval is presented. The main objective is to overcome the speed limitations of existing video indexing and retrieval systems. A family of successively simplified image descriptors is defined using colour histograms. The colour distribution is quantised according to a scale-parameter obtained by convolving the distribution function with Gaussians of increasing size. The generated family of histograms is then used to define descriptors at various levels of detail. Using this hierarchical descriptor structure large sets of non-similar images are discriminated at very low computational cost using low detailed descriptions. The search is then refined progressively until only a few very similar objects or images are found and ranked using higher levels of detail. To compare image descriptors the earth mover's distance is used. Experiments have been conducted to assess the performance of the proposed technique.
BibTeX:
@inproceedings{izquierdo2002compact,
  author = {Izquierdo, Ebroul},
  title = {Compact Hierarchical Image Descriptors},
  booktitle = {Information Visualisation (IV 2002), Proceedings of the 6th International Conference on},
  publisher = {IEEE},
  year = {2002},
  pages = {627--630},
  note = {google scholar entry: 6th International Conference on Information Visualisation (IV 2002). London, England, 10-12 July 2002.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.4826&rep=rep1&type=pdf},
  doi = {10.1109/IV.2002.1028840}
}
Izquierdo E (2002), "How Accurate should be Disparities Estimated for Image-based Rendering?", In Video/Image Processing and Multimedia Communications (VIPromCom 2002), Proceedings of the 4th EURASIP-IEEE Region 8 International Symposium on. Zadar, Croatia, June, 2002, pp. 69-74. IEEE.
Abstract: Disparity estimation is the basis for the generation of a virtual view from a small set of real reference images. Much research on this area has been conducted by the computer vision community over the last decade. Currently, most problems involved in this technology are well understood and there exist several well-established algorithms to render virtual views from perspectively different images of the same scene. However, most work has been oriented towards high-accuracy disparity estimation to produce high-quality virtual images, often using sophisticated purpose-built hardware accelerators to achieve real-time results. Two disparity estimators with different complexity degrees are described and used to examine how much disparity inaccuracies influence image rendering quality. The objective of this study is to design software-based image synthesis in real-time on conventional PC platforms. Basically, this work looks at the opposite end of the cost-complexity curve by making very restrained demands on the disparity estimator. It is shown empirically that in many cases the effect of disparity accuracy on the quality of virtual views is almost imperceptible and that, for many applications requiring real-time processing, reasonably good results can be achieved with less computational cost.
BibTeX:
@inproceedings{izquierdo2002how,
  author = {Izquierdo, Ebroul},
  editor = {Grgić, Mislav},
  title = {How Accurate should be Disparities Estimated for Image-based Rendering?},
  booktitle = {Video/Image Processing and Multimedia Communications (VIPromCom 2002), Proceedings of the 4th EURASIP-IEEE Region 8 International Symposium on},
  publisher = {IEEE},
  year = {2002},
  pages = {69--74},
  note = {google scholar entry: 4th EURASIP-IEEE Region 8 International Symposium on Video/Image Processing and Multimedia Communications (VIPromCom 2002). Zadar, Croatia, 16-19 June 2002.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1026630},
  doi = {10.1109/VIPROM.2002.1026630}
}
Izquierdo E (2002), "A Low Access Latency Video Portal", In Circuits and Systems for Communications (ICCSC 2002), Proceedings of the 1st IEEE International Conference on. St.Petersburg, Russia, June, 2002, pp. 226-229. IEEE.
Abstract: Using a meaningful combination of advanced video processing techniques, the development of a low-access latency video portal is presented. The main objective is to overcome the speed limitations of existing video indexing and retrieval systems. Due to the linear and data intensive nature of video, the temporal segmentation of video sequences is the first step towards the implementation of the system. Video is first partitioned into basic units called shots. Thereafter, each single shot can he identified by a key-frame containing the most relevant scene information. For the sake of efficiency these processing steps are carried out in the compressed domain. Once key frames have been extracted a family of successively simplified image features is defined using colour histograms. The image colour distribution is quantised according to a scale-parameter obtained by convolving the colour distribution function with Gaussians of increasing size. The generated family of histograms is then used to define descriptors at various levels of detail. Using the hierarchical descriptor structure, irrelevant details and noise are removed in a very early processing step.
BibTeX:
@inproceedings{izquierdo2002low,
  author = {Izquierdo, Ebroul},
  title = {A Low Access Latency Video Portal},
  booktitle = {Circuits and Systems for Communications (ICCSC 2002), Proceedings of the 1st IEEE International Conference on},
  publisher = {IEEE},
  year = {2002},
  pages = {226--229},
  note = {google scholar entry: 1st IEEE International Conference on Circuits and Systems for Communications (ICCSC 2002). St.Petersburg, Russia, 26-28 June 2002.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1029084},
  doi = {10.1109/OCCSC.2002.1029084}
}
Izquierdo E, Xia J and Mech R (2002), "A Generic Video Analysis and Segmentation System", In Acoustics Speech and Signal Processing (ICASSP 2002), Proceedings of the 27th IEEE International Conference on. Orlando, FL, May, 2002. Vol. 4, pp. 3592-3595. IEEE.
Abstract: A generic video analysis system for supervised and unsupervised segmentation is described. The idea behind the presented concept is to integrate different advanced segmentation techniques to obtain a robust, efficient and modular segmentation system for natural video and still images. The system entails several independent modules. Each one of these modules encapsulates a complete video processing technique The intermediate results obtained from each single module are merged and further processed by a set of intelligent rules to achieve a highly accurate final segmentation. The modular structure of the system allows it to be extended continuously and with ease by adding new independent modules. The intermediate segmentation results of newly added modules are linked to the other system results via the rule processor. A user friendly graphical interface (GUI) is also provided. The functionality of the GUI is twofold: it serves as input interface to pass processing parameters to the system and as semi-automatic segmentation tool for user interaction and manually refinement of automatically generated segmentation masks. Selected results obtained with the current version of the video analysis system are reported.
BibTeX:
@inproceedings{izquierdo2002generic,
  author = {Izquierdo, Ebroul and Xia, Jianhui and Mech, Roland},
  title = {A Generic Video Analysis and Segmentation System},
  booktitle = {Acoustics Speech and Signal Processing (ICASSP 2002), Proceedings of the 27th IEEE International Conference on},
  publisher = {IEEE},
  year = {2002},
  volume = {4},
  pages = {3592--3595},
  note = {google scholar entry: 27th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002). Orlando, Florida, 13-17 May 2002.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5745432},
  doi = {10.1109/ICASSP.2002.5745432}
}

Theses and Monographs

Calic J (2002), "New perspectives of video indexing and retrieval. PhD Thesis.". Thesis at: Queen Mary University of London. September, 2002.
BibTeX:
@phdthesis{calic2002new,
  author = {Calic, Janko},
  editor = {Izquierdo, Ebroul and Pearmain, Alan and Bourne, Rachel},
  title = {New perspectives of video indexing and retrieval. PhD Thesis.},
  school = {Queen Mary University of London},
  year = {2002}
}


2001

Journal Papers

Izquierdo E and Guerra Ones V (2001), "Improving efficiency of linear techniques to estimate epipolar geometry", Electronics Letters. July, 2001. Vol. 37(15), pp. 952-954. IEEE.
Abstract: Important analytical aspects of efficient linear methods to estimate the epipolar geometry are studied. Based on the obtained results, a novel low cost and accurate linear algorithm is introduced. Owing to low complexity and accuracy, the proposed approach appears to be suitable for real-time stereo vision applications
BibTeX:
@article{izquierdo2001improving,
  author = {Izquierdo, Ebroul and Guerra Ones, Valia},
  title = {Improving efficiency of linear techniques to estimate epipolar geometry},
  journal = {Electronics Letters},
  publisher = {IEEE},
  year = {2001},
  volume = {37},
  number = {15},
  pages = {952--954},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=937297},
  doi = {10.1049/el:20010675}
}

Books and Chapters in Books

Izquierdo E (2001), "Shape Extraction by Nonlinear Diffusion", In Approximation, Optimization and Mathematical Economics, pp. 177-190. Springer.
Abstract: An advanced scale-space technique for image segmentation and pattern recognition in computer vision is presented. The approach is based on the Perona-Malik[1] nonlinear diffusion model. The idea at the heart of this technique is to smooth the image within object boundaries, inhibiting diffusion across the contour and even enhancing the contrast along the boundaries. For general image simplification and segmentation, the Perona-Malik paradigm leads to impressive results outperforming clearly well-established filters like the canny operator and morphological schemes. Nevertheless it cannot be used directly to carry out the more complex object segmentation task. In this context an extension of the diffusion model is introduced to extract shapes of complete physical objects present in the scene. In the new formulation information about the structure or dynamic of the scene is used.
BibTeX:
@incollection{izquierdo2001shape,
  author = {Izquierdo, Ebroul},
  editor = {Lassonde, Marc},
  title = {Shape Extraction by Nonlinear Diffusion},
  booktitle = {Approximation, Optimization and Mathematical Economics},
  publisher = {Springer},
  year = {2001},
  pages = {177--190},
  url = {http://books.google.co.uk/books?id=f3A3dSOxJMsC}
}

Conference Papers

Calic J and Izquierdo E (2001), "Towards Real-Time Shot Detection in the MPEG Compressed Domain", In 3rd Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2001). Tampere, Finland, May, 2001, pp. 1-5. Tampere University of Technology.
Abstract: As content based video indexing and retrieval has its foundations in the prime video structures, such as a shot or a scene, the algorithms for video partitioning have become crucial in contemporary development of digital video technology. Conventional algorithms for video partitioning mainly focus on the analysis of compressed video features, since the information relevant to the partitioning process can be extracted directly from the MPEG compressed stream and used for the detection of shot boundaries. However, most of the proposed algorithms do not show real time capabilities that are essential for video applications. This paper introduces a real time algorithm for cut detection. It analyses the statistics of the features extracted from the MPEG compressed stream, such as the macroblock type, and extends the same metrics to algorithms for gradual change detection. Our analysis led to a fast and robust algorithm for cut detection. Future research will be directed towards the use of the same concept for improving the real-time gradual change detection algorithms. Results of computer simulations are reported.
BibTeX:
@inproceedings{calic2001towards,
  author = {Calic, Janko and Izquierdo, Ebroul},
  title = {Towards Real-Time Shot Detection in the MPEG Compressed Domain},
  booktitle = {3rd Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2001).},
  publisher = {Tampere University of Technology},
  year = {2001},
  pages = {1--5},
  note = {google scholar entry: 3rd Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2001). Tampere, Finland, 16-17 May 2001.},
  url = {http://www.iva.cs.tut.fi/COST211/publications/ag_01_25.pdf}
}
Charalambos JP and Izquierdo E (2001), "Linear Programming Concept Visualization", In Information Visualization (IV 2001), Proceedings of the 5th IEEE International Conference on. London, England, July, 2001, pp. 529-535. IEEE.
Abstract: A visualization scheme as a tool to find the solution of any 3D-linear programming problem is introduced. The presented approach is highly suitable to draw and visualize interactively the feasible region of a given 3D-linear programming problem. It can be used for a better understanding of the solution process when different methods devoted to solve the underlying problem are applied, e.g. the simplex method. The proposed technique comprises sensitive analysis during the solution process and interactive visualization of the feasible region. The analysis leading to the introduced schema also shows that the design of an appropriate and simple vertex representation is crucial to manage any order of degeneracy. To deal with this paradigm and to some extent to formalize it, the concept of adjacency invariance is introduced. Several experiments have been conducted to test and assess the performance of the introduced concepts and techniques
BibTeX:
@inproceedings{izquierdo2001linear,
  author = {Charalambos, Jean Pierre and Izquierdo, Ebroul},
  title = {Linear Programming Concept Visualization},
  booktitle = {Information Visualization (IV 2001), Proceedings of the 5th IEEE International Conference on},
  publisher = {IEEE},
  year = {2001},
  pages = {529--535},
  note = {google scholar entry: 5th IEEE International Conference on Information Visualization (IV 2001). London, England, 25-27 July 2001.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=942107},
  doi = {10.1109/IV.2001.942107}
}
Izquierdo E (2001), "Image Based Rendering using Rational Filters", In Information Visualization (IV 2001), Proceedings of the 5th IEEE International Conference on. London, England, July, 2001, pp. 311-316. IEEE.
Abstract: Image based rendering is a powerful technique for 3D object visualization and representation. Using this approach, arbitrary object or scene views are generated automatically from a small set of real reference images. An algorithm for image based rendering using stereo image pairs is presented. The main goal is to produce a realistic 3D `continuous look-around' effect along the stereo baseline. To minimize the distortion of the reconstructed pixels due to undefined disparity values, a nonlinear interpolator is proposed. It uses the sparse available disparity map to generate a dense field that partially reconstructs original disparity edge information, producing a sharper intermediate view. The identification of occluded and non-occluded areas is also used to aid the view synthesis process. A special treatment for occluded image areas is also considered in the proposed technique. Several computer experiments have been conducted to assess the performance of the presented method
BibTeX:
@inproceedings{izquierdo2001image,
  author = {Izquierdo, Ebroul},
  editor = {Banissi, Ebad and Khosrowshahi, Farzad and Sarfraz, Muhammad and Ursyn, Anna},
  title = {Image Based Rendering using Rational Filters},
  booktitle = {Information Visualization (IV 2001), Proceedings of the 5th IEEE International Conference on},
  publisher = {IEEE},
  year = {2001},
  pages = {311--316},
  note = {google scholar entry: 5th IEEE International Conference on Information Visualization (IV 2001). London, England, 25-27 July 2001.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=942075},
  doi = {10.1109/IV.2001.942075}
}
Izquierdo E and Feng J (2001), "A Hierarchical Approach for Low-Access Latency Image Indexing and Retrieval", In Image Processing (ICIP 2001), Proceedings of the 8th International Conference on. Thessaloniki, Greece, October, 2001. Vol. 1, pp. 38-41. IEEE.
Abstract: A major problem to be faced when efficient schemas for image indexing and retrieval are envisaged is the large workload and high complexity of underlying image-processing algorithms. The research leading to this paper focused on this problem, i.e. efficiency and scalability. A novel representation of color histograms is presented. The introduced model is based on hierarchical image adaptive color histograms. One important aspect of the envisaged hierarchic definition is that details at various levels of scale can be obtained. For fast search of similar features in large databases large sets of non-similar images are discriminated at very low computational cost using low detailed descriptions. The search is then progressively refined until only a few very similar objects or images are found and ranked using higher levels of detail. To assess the performance of the approach several experiments have been conducted. Selected results are reported
BibTeX:
@inproceedings{izquierdo2001hierarchical,
  author = {Izquierdo, Ebroul and Feng, Jun},
  title = {A Hierarchical Approach for Low-Access Latency Image Indexing and Retrieval},
  booktitle = {Image Processing (ICIP 2001), Proceedings of the 8th International Conference on},
  publisher = {IEEE},
  year = {2001},
  volume = {1},
  pages = {38--41},
  note = {google scholar entry: International Conference on Image Processing (ICIP 2001). Thessaloniki, Greece, 7-10 October 2001.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=958947},
  doi = {10.1109/ICIP.2001.958947}
}
Izquierdo E and Guerra V (2001), "A Novel Linear Technique to Estimate the Epipolar Geometry", In Acoustics, Speech, and Signal Processing (ICASSP 2001), Proceedings of the 26th IEEE International Conference on. Salt Lake City, Utah, May, 2001. Vol. 3, pp. 1657-1660. IEEE.
Abstract: The accurate reconstruction of the 3D scene structure from two different projections and the estimation of the camera scene geometry is of paramount importance in many computer vision tasks. Most of the information about the camera-scene geometry is encapsulated in the fundamental matrix. Estimating the fundamental matrix has been an object of research for many years and continues to be a challenging task in current computer vision systems. While nonlinear iterative approaches have been successful in dealing with the high instability of the underlying problem, their inherent large workload makes these approaches inappropriate for real-time applications. Practical aspects of highly efficient linear methods are studied and a novel low-cost and accurate linear algorithm is introduced. The performance of the proposed approach is assessed by several experiments on real images
BibTeX:
@inproceedings{izquierdo2001novel,
  author = {Izquierdo, Ebroul and Guerra, Valia},
  title = {A Novel Linear Technique to Estimate the Epipolar Geometry},
  booktitle = {Acoustics, Speech, and Signal Processing (ICASSP 2001), Proceedings of the 26th IEEE International Conference on},
  publisher = {IEEE},
  year = {2001},
  volume = {3},
  pages = {1657--1660},
  note = {google scholar entry: 26th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001). Salt Lake City, Utah, 7-11 May 2001.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2.9350&rep=rep1&type=pdf},
  doi = {10.1109/ICASSP.2001.941255}
}
Kay S and Izquierdo E (2001), "Robust Content Based Image Watermarking", In Proceedings of the 3rd Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2001). Tampere, Finland, May, 2001, pp. 1-4. Tampere University of Technology.
Abstract: A watermarking scheme is presented in which characteristics of both spatial and frequency techniques are combined to achieve robustness against image processing and geometric transformations. The proposed approach consists of three basic steps: estimation of the just noticeable image distortion, watermark embedding by adaptive spreading of the watermark signal in the frequency domain, and extraction of relevant information relating to the spatial distribution of pixels in the original image. The just noticeable image distortion is used to insert a pseudorandom signal such that its amplitude is maintained below the distortion sensitivity of the pixel into which it is embedded. Embedding the watermark in the frequency domain guarantees robustness against compression performed in image processing attacks. In the spatial domain most salient image points are characterized using first order differential invariants. This information is used to detect geometrical attacks in a frequency-domain watermarked image and to re-synchronize the attacked image. The presented schema has been evaluated experimentally. The obtained results show that the technique is resilient to most common attacks including geometrical image transformations.
BibTeX:
@inproceedings{kay2001robust,
  author = {Kay, Selena and Izquierdo, Ebroul},
  title = {Robust Content Based Image Watermarking},
  booktitle = {Proceedings of the 3rd Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2001).},
  publisher = {Tampere University of Technology},
  year = {2001},
  pages = {1--4},
  note = {google scholar entry: 3rd Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2001). Tampere, Finland, 16-17 May 2001.},
  url = {http://www.iva.cs.tut.fi/COST211/publications/ag_01_22.pdf}
}


2000 and earlier

Izquierdo E (2000), "High accuracy solution to correspondence problem", Electronics Letters. May, 2000. Vol. 36, pp. 948-949. IET.
Abstract: A probabilistic relaxation technique for high accuracy disparity estimation is proposed. The relaxation rule is defined according to the likelihood that estimates obey physically meaningful constraints. High accuracy is achieved by applying the approach to points inside salient image areas previously approximated by second-order polynomials.
BibTeX:
@article{izquierdo2000high,
  author = {Izquierdo, Ebroul},
  title = {High accuracy solution to correspondence problem},
  journal = {Electronics Letters},
  publisher = {IET},
  year = {2000},
  volume = {36},
  pages = {948--949},
  url = {http://digital-library.theiet.org/content/journals/10.1049/el_20000709}
}
Izquierdo E and Ohm J-R (2000), "Image-based rendering and 3D modeling: A complete framework", Signal Processing: Image Communication. August, 2000. Vol. 15(10), pp. 817-858. Elsevier.
Abstract: Multi-viewpoint synthesis of video data is a key technology for the integration of video and 3D graphics, as necessary for telepresence and augmented-reality applications. This paper describes a number of important techniques which can be employed to accomplish that goal. The techniques presented are based on the analysis of 2D images acquired by two or more cameras. To determine depth information of single objects present in the scene, it is necessary to perform segmentation and disparity estimation. It is shown, how these analysis tools can benefit from each other. For viewpoint synthesis, techniques with different levels of tradeoff between complexity and degrees of freedom are presented. The first approach is disparity-controlled view interpolation, which is capable of generating intermediate views along the interocular axis between two adjacent cameras. The second is the recently introduced incomplete 3D technique, which in a first step extracts the texture of the visible surface of a video object acquired with multiple cameras, and then performs disparity-compensated projection from the surface onto a view plane. In the third and most complex approach, a 3D model of the object is generated, which can be represented by a 3D wire grid. For synthesis, this model can be rotated to arbitrary orientations, and original texture is mapped onto the surface to obtain an arbitrary view of the processed object. The result of this rendering procedure is a virtual image with very natural appearance.
BibTeX:
@article{izquierdo2000image2,
  author = {Ebroul Izquierdo and Jens-Rainer Ohm},
  title = {Image-based rendering and 3D modeling: A complete framework},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {2000},
  volume = {15},
  number = {10},
  pages = {817--858},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596599000144},
  doi = {10.1016/S0923-5965(99)00014-4}
}
Izquierdo E and Xu L-Q (2000), "Image segmentation using data-modulated nonlinear diffusion", Electronics Letters. October, 2000. Vol. 36(21), pp. 1767-1769. IET.
Abstract: Effective object segmentation is achieved by applying a novel data-modulated nonlinear diffusion technique. The advantages of this strategy are a considerable smoothing of the detail of the scene image within the boundaries of the object while inhibiting the diffusion across the boundaries, as well as preserving and even enhancing the object borders
BibTeX:
@article{izquierdo2000image,
  author = {Izquierdo, Ebroul and Xu, Li-Qun},
  title = {Image segmentation using data-modulated nonlinear diffusion},
  journal = {Electronics Letters},
  publisher = {IET},
  year = {2000},
  volume = {36},
  number = {21},
  pages = {1767--1769},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=878555},
  doi = {10.1049/el:20001244}
}
Guerra Ones V, Izquierdo E and Madrid de la Vega H (2000), "Una Versión Revisada del Algoritmo de Ocho Puntos [A new version of the eight point algorithm]", In 33 Congreso Nacional de la Sociedad Matemática Mexicana: Memorias, pp. 1-12. Sociedad Matemática Mexicana.
BibTeX:
@inproceedings{guerra2000version,
  author = {Guerra Ones, Valia and Izquierdo, Ebroul and Madrid de la Vega, Humberto},
  editor = {Alfaro, J. and Eudave, M. and González, J and Pérez Chavela, E.},
  title = {Una Versión Revisada del Algoritmo de Ocho Puntos [A new version of the eight point algorithm]},
  booktitle = {33 Congreso Nacional de la Sociedad Matemática Mexicana: Memorias},
  publisher = {Sociedad Matemática Mexicana},
  year = {2000},
  pages = {1--12},
  note = {google scholar entry: 33rd Congreso Nacional de la Sociedad Matem�tica Mexicana. Saltillo, Coah. M�xico, 2000},
  url = {http://books.google.co.uk/books?id=R-zuAAAAMAAJ}
}
Izquierdo E (2000), "A Highly Robust Regressor and its Application in Computer Vision", In The Electronic Proceedings of the Eleventh British Machine Vision Conference (BMVC 2000). Bristol, England, September, 2000. (44), pp. 1-10. BMVA.
Abstract: This paper introduces a highly efficient model for general regression with unknown error distribution function. The model is derived from a bi-criteria optimisation problem combining the best properties of least squares and least absolute deviation. The solution of this problem leads to a robust M-estimator that can be applied in a large range of computer vision tasks. The technique has been designed to overcome the extreme lack of robustness and low efficiency observed when conventional approaches are used to solve fundamental ill-posed computer vision problems. The performance of the method has been assessed by recovering the 3D-scene structure from stereoscopic images. In this context, several experiments have been conducted. Some selected results are reported in this article.
BibTeX:
@inproceedings{izquierdo2000highly,
  author = {Izquierdo, Ebroul},
  editor = {Mirmehdi, Majid and Thomas, Barry},
  title = {A Highly Robust Regressor and its Application in Computer Vision},
  booktitle = {The Electronic Proceedings of the Eleventh British Machine Vision Conference (BMVC 2000)},
  publisher = {BMVA},
  year = {2000},
  number = {44},
  pages = {1--10},
  note = {google scholar entry: 11th British Machine Vision Conference (BMVC). Bristol, England, 11-14 September 2000.},
  url = {http://www.bmva.org/bmvc/2000/papers/p44.pdf},
  doi = {10.5244/C.14.44}
}
Izquierdo E (2000), "Linear and Nonlinear Scale-spaces for Video Indexing and Retrieval", In Time-scale and Time-Frequency Analysis and Applications, IEE Seminar on. London, England, February, 2000. (19), pp. 1-5. IET.
Abstract: An extension of the conventional linear and nonlinear scale-space models for shape and image simplification, indexing and retrieval is presented. The linear model can be used for shape based retrieval when the main video objects have been identified. In this context an algorithm for contour simplification is introduced. Two different nonlinear filtering techniques for image simplification are also described. The first one is generated by convolution with a group of anisotropic weighted filter kernels. The second is based on a parabolic differential equation in divergence form. The linear and nonlinear form of these diffusion filters allows to integrate additional information to control the evolution in both spatial and temporal directions. This property is used to extend the conventional nonlinear scale-space in order to perform content-based segmentation of natural video
BibTeX:
@inproceedings{izquierdo2000linear,
  author = {Izquierdo, Ebroul},
  title = {Linear and Nonlinear Scale-spaces for Video Indexing and Retrieval},
  booktitle = {Time-scale and Time-Frequency Analysis and Applications, IEE Seminar on},
  publisher = {IET},
  year = {2000},
  number = {19},
  pages = {1--5},
  note = {google scholar entry: 2000 IEE Seminar on Time-scale and Time-Frequency Analysis and Applications (2000). London, England, 29 February 2000.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=847058},
  doi = {10.1049/ic:20000568}
}
Izquierdo E and Ghanbari M (2000), "A low cost hybrid diffusion technique for object segmentation", In Acoustics, Speech, and Signal Processing (ICASSP 2000), Proceedings of the IEEE International Conference on. Phoenix, AZ, June, 2000. Vol. 4, pp. 2251-2254. IEEE.
Abstract: In this paper a low-complexity nonlinear filtering technique to smooth textures preserving object contours is presented. The approach is based on a hybrid combination of both isotropic and anisotropic recursive filtering. Using only intensity information the segment borders obtained by applying nonlinear filtering do not necessarily coincide with physical object contours, especially in the case of textured objects. To segment images into regions with physical meaning additional information extracted from disparity or motion is used to weight the filter coefficients. The presented technique has been successfully tested in the context of object segmentation of natural scenes and object-based disparity estimation for stereoscopic applications
BibTeX:
@inproceedings{izquierdo2000low,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {A low cost hybrid diffusion technique for object segmentation},
  booktitle = {Acoustics, Speech, and Signal Processing (ICASSP 2000), Proceedings of the IEEE International Conference on},
  publisher = {IEEE},
  year = {2000},
  volume = {4},
  pages = {2251--2254},
  note = {google scholar entry: 25th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000). Istanbul, Turkey, 5-9 June 2000.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=859287},
  doi = {10.1109/ICASSP.2000.859287}
}
Izquierdo E and Guerra V (2000), "Video-Based Camera Registration for Augmented Reality", In Information Visualization (IV 2000), Proceedings of the 4th IEEE International Conference on. London, England, July, 2000, pp. 499-503. IEEE.
Abstract: A video-based approach for camera scene registration in augmented reality systems is presented. The presented technique relies on the definition of a model, which is derived from an appropriate parametric linear optimization problem. The optimal parameters are sought in the solution space defined by physically meaningful constraints. Solving the underlying regularized linear problem, we expect to overcome the major shortcoming observed in image-based augmented reality and telepresence systems: the extreme lack of robustness due to the ill-posed nature of the calibration problem. Several computer experiments have been conducted in order to assess the performance of the introduced technique
BibTeX:
@inproceedings{izquierdo2000video,
  author = {Izquierdo, Ebroul and Guerra, Valia},
  editor = {Banissi, Ebad and Bannatyne, Mark W. McK. and Chen, Chaomei and Khosrowshahi, Farzad and Sarfraz, Muhammad and Ursyn, Anna},
  title = {Video-Based Camera Registration for Augmented Reality},
  booktitle = {Information Visualization (IV 2000), Proceedings of the 4th IEEE International Conference on},
  publisher = {IEEE},
  year = {2000},
  pages = {499--503},
  note = {google scholar entry: 4th IEEE International Conference on Information Visualization (IV 2000). London, England, 19-21 July 2000.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=859803},
  doi = {10.1109/IV.2000.859803}
}
Izquierdo E, Pinheiro A and Ghanbari M (2000), "A robust and efficient scale-space based metric for the evaluation of MPEG-4 VOPs", In Circuits and Systems (ISCAS 2000), Proceedings of the 2000 IEEE International Symposium on. Geneva, Switzerland, May, 2000. Vol. 3, pp. 527-530. IEEE.
Abstract: New MPEG-4 functionalities require the segmentation of input video into different layers (VOPs). Usually, these layers contain arbitrarily shaped objects representing meaningful content of the video stream. With the introduction of the new functionalities in MPEG4, the need of objective and subjective assessment of segmented image quality has emerged. In this paper we introduce an efficient and reliable metric to evaluate segmentation results by comparing them with a given ground truth. Beyond this application the proposed technique can be used for real-time shape description and retrieval in the context of the emerging MPEG7, as well as in general pattern recognition tasks. Selected results obtained by using this metric within these application areas are reported
BibTeX:
@inproceedings{izquierdo2000robust,
  author = {Izquierdo, Ebroul and Pinheiro, Antonio and Ghanbari, Mohammed},
  title = {A robust and efficient scale-space based metric for the evaluation of MPEG-4 VOPs},
  booktitle = {Circuits and Systems (ISCAS 2000), Proceedings of the 2000 IEEE International Symposium on},
  publisher = {IEEE},
  year = {2000},
  volume = {3},
  pages = {527--530},
  note = {google scholar entry: 2000 IEEE International Symposium on Circuits and Systems (ISCAS 2000). Geneva, Switzerland, 28-31 May 2000.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=856113},
  doi = {10.1109/ISCAS.2000.856113}
}
Pinheiro AMG, Izquierdo E and Ghanbari M (2000), "Shape Matching using a Curvature Based Polygonal Approximation in Scale-Space", In Image Processing (ICIP 2000), Proceedings of the 7th International Conference on. Vancouver, BC, October, 2000. Vol. 1, pp. 538-541.
Abstract: The emerging MPEG-7 standard demands shape description and shape retrieval techniques. Polygonal approximations of the shape contours give attractive solutions in this domain, because of the description simplicity. This paper introduces a shape matching technique based on the turning function comparison of the shape contour polygonal approximations.
BibTeX:
@inproceedings{pinheiro2000shape,
  author = {Pinheiro, António M. G. and Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Shape Matching using a Curvature Based Polygonal Approximation in Scale-Space},
  booktitle = {Image Processing (ICIP 2000), Proceedings of the 7th International Conference on},
  year = {2000},
  volume = {1},
  pages = {538--541},
  note = {google scholar entry: International Conference on Image Processing (ICIP 2000). Vancouver, British Columbia, 10-13 September 2000.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=899475},
  doi = {10.1109/ICIP.2000.899475}
}
Xu L-Q and Izquierdo E (2000), "Data-Driven Nonlinear Diffusion for Object Segmentation", In Image Processing (ICIP 2000), Proceedings of the 7th International Conference on. Vancouver, BC, October, 2000. Vol. 1, pp. 319-322.
Abstract: We propose a novel technique for effective object segmentation, which is based on the combination of image simplification via data-driven nonlinear diffusion and subsequent efficient segmentation of the simplified image. In particular, the data, taking the form of a disparity field from a stereo analysis, has been used to modulate the diffusion process. The strength of this strategy consists of an ability to smooth considerably the details of the imaging scene within the objects' boundaries while inhibiting the diffusion across the boundaries, preserving and even enhancing the object borders. As such, from the simplified image, a simple but efficient histogram-based thresholding and labeling technique can be used to extract precisely an object boundary in its entirety
BibTeX:
@inproceedings{xu2000data,
  author = {Xu, Li-Qun and Izquierdo, Ebroul},
  title = {Data-Driven Nonlinear Diffusion for Object Segmentation},
  booktitle = {Image Processing (ICIP 2000), Proceedings of the 7th International Conference on},
  year = {2000},
  volume = {1},
  pages = {319--322},
  note = {google scholar entry: International Conference on Image Processing (ICIP 2000). Vancouver, British Columbia, 10-13 September 2000.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=900959},
  doi = {10.1109/ICIP.2000.900959}
}
Izquierdo E (1999), "Disparity/Segmentation Analysis: Matching with an Adaptive Window and Depth-Driven Segmentation", Circuits and Systems for Video Technology, IEEE Transactions on. June, 1999. Vol. 9(4), pp. 589-607. IEEE.
Abstract: Most of the emerging content-based multimedia technologies are based on efficient methods to solve machine early vision tasks. Among other tasks, object segmentation is perhaps the most important problem in single image processing, whereas pixel-correspondence estimation is the crucial task in multiview image analysis. The solution of these two problems is the key for the development of the majority of leading-edge interactive video-communication technologies and telepresence systems. In this paper, we present a robust framework comprised of joined pixel-correspondence estimation and image segmentation in video sequences taken simultaneously from different perspectives. An improved concept for stereo-image analysis based on block matching with a local adaptive window is introduced. The size and shape of the reference window is calculated adaptively according to the degree of reliability of disparities estimated previously. Considerable improvements are obtained just within object borders or image areas that become occluded by applying the proposed block-matching model. An initial object segmentation is obtained by merging neighboring sampling positions with disparity vectors of similar size and direction. Starting from this initial segmentation, true object borders are detected using a contour-matching algorithm. In this process, the contour of the initial segmentation is taken as a reference pattern, and the edges extracted from the original images, by applying a multiscale algorithm, are the candidates for the true object contour. The performance of the introduced methods has been verified
BibTeX:
@article{izquierdo1999disparity,
  author = {Izquierdo, Ebroul},
  title = {Disparity/Segmentation Analysis: Matching with an Adaptive Window and Depth-Driven Segmentation},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {1999},
  volume = {9},
  number = {4},
  pages = {589--607},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=767125},
  doi = {10.1109/76.767125}
}
Izquierdo E (1999), "Technique for accurate correspondence estimation in object borders and occluded image regions", Electronics Letters. January, 1999. Vol. 35(1), pp. 34-35. IET.
Abstract: An advanced block-matching technique, in which the shape of the matching window is calculated adaptively by applying an energy-based model, is introduced. In this model the window shape is controlled by external forces defined according to the entropy of previously estimated disparities, their reliability and the variation of the image intensity
BibTeX:
@article{izquierdo1999technique,
  author = {Izquierdo, Ebroul},
  title = {Technique for accurate correspondence estimation in object borders and occluded image regions},
  journal = {Electronics Letters},
  publisher = {IET},
  year = {1999},
  volume = {35},
  number = {1},
  pages = {34--35},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=749204},
  doi = {10.1049/el:19990080}
}
Izquierdo E and Feng X (1999), "Modeling Arbitrary Objects Based on Geometric Surface Conformity", Circuits and Systems for Video Technology, IEEE Transactions on. March, 1999. Vol. 9(2), pp. 336-352. IEEE.
Abstract: We address the problem of efficient and flexible modeling of arbitrary three-dimensional (3-D) objects and the accurate tracking of the generated model. These goals are reached by combining available multiview image analysis tools with a straightforward 3-D modeling method, which exploit well-established techniques from both computer vision and computer graphics, improving and combining them with new strategies. The basic idea of the technique presented is to use feature points and relevant edges in the images as nodes and edges of an initial two-dimensional wire grid. The method is adaptive in the sense that an initial rough surface approximation is progressively refined at the locations where the triangular patches do not approximate the surface accurately. The approximation error is measured according to the distance of the model to the object surface, taking into account the reliability of the depth estimated from the stereo image analysis. Once the initial wireframe is available, it is deformed and updated from frame to frame according to the motion of the object points chosen to be nodes. At the end of this process we obtain a temporally consistent 3-D model, which accurately approximates the visible object surface and reflects the physical characteristics of the surface with as few planar patches as possible. The performance of the presented methods is confirmed by several computer experiments
BibTeX:
@article{izquierdo1999modeling,
  author = {Izquierdo, Ebroul and Feng, Xiaohua},
  title = {Modeling Arbitrary Objects Based on Geometric Surface Conformity},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {1999},
  volume = {9},
  number = {2},
  pages = {336--352},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=752100},
  doi = {10.1109/76.752100}
}
Izquierdo E and Ghanbari M (1999), "Nonlinear Gaussian filtering approach for object segmentation", IEE Proceedings - Vision, Image and Signal Processing. June, 1999. Vol. 146(3), pp. 137-143. IET.
Abstract: Gaussian filter kernels can be used to smooth textures for image segmentation. In so-called anisotropic diffusion techniques, the smoothing process is adapted according to the edge direction to preserve the edges. However, the segment borders obtained with this approach do not necessarily coincide with physical object contours, especially in the case of textured objects. A novel segmentation technique involving weighted Gaussian filtering is introduced. The extraction of true object masks is performed by smoothing edges due to texture and preserving true object borders. In this process, additional features such as disparity or motion are taken into account. The method presented has been successfully applied in the context of object segmentation to natural scenes and object-based disparity estimation for stereoscopic applications
BibTeX:
@article{izquierdo2000high2,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Nonlinear Gaussian filtering approach for object segmentation},
  journal = {IEE Proceedings - Vision, Image and Signal Processing},
  publisher = {IET},
  year = {1999},
  volume = {146},
  number = {3},
  pages = {137--143},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=799043},
  doi = {10.1049/ip-vis:19990197}
}
Izquierdo E and Ghanbari M (1999), "Motion-Driven Object Segmentation in Scale-Space", In Acoustics, Speech, and Signal Processing (ICASSP 1999), Proceedings of the IEEE International Conference on. Phoenix, AZ, March, 1999. Vol. 6, pp. 3473-3476. IEEE.
Abstract: In this paper we present a method for motion segmentation, in which accurate grouping of pixels undergoing the same motion is targeted. In the presented technique true object edges are first obtained by combining anisotropic diffusion of the original image with edge detection and contour reconstruction in the inherent scale-space. Contours are then matched according to the distance given by a metric defined on their polygonal approximations and the shape of the one-dimensional intensity function along the contour. Masks of objects are obtained by merging image areas inside of edges having the same motion. The performance of the presented technique has been evaluated by computer simulations
BibTeX:
@inproceedings{izquierdo1999motion,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Motion-Driven Object Segmentation in Scale-Space},
  booktitle = {Acoustics, Speech, and Signal Processing (ICASSP 1999), Proceedings of the IEEE International Conference on},
  publisher = {IEEE},
  year = {1999},
  volume = {6},
  pages = {3473--3476},
  note = {google scholar entry: 24th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1999). Phoenix, Arizona, 15-19 March 1999.},
  url = {http://www.mirlab.org/conference_papers/International_Conference/ICASSP%201999/PDF/SCAN/IC991388.PDF},
  doi = {10.1109/ICASSP.1999.757590}
}
Izquierdo E and Ghanbari M (1999), "Using 3D structure and anisotropic diffusion for object segmentation", In Image Processing and Its Applications, 1999. Seventh International Conference on (Conf. Publ. No. 465). Manchester, England, Jul, 1999. Vol. 2, pp. 532-536. IET.
Abstract: We describe an advanced technique that can be employed to carry out the segmentation task. The goal is the development of a system capable of solving the segmentation problem in most situations encountered in video sequences taken from real world scenes. For this aim the presented segmentation scheme comprises different processing steps implemented as independent modules. The core of the system is a module for multiscale image simplification by anisotropic diffusion. The remaining modules are concerned with the subsequent image segmentation of the resulting smoothed images. The mathematical model supporting the implemented algorithms is based on the numerical solution of a system of nonlinear partial differential equations introduced by Perona and Malik (1987). The idea at the heart of this approach is to smooth the image in direction parallel to the object boundaries, inhibiting diffusion across the edges. The goal of this processing step is to enhance edges keeping their correct position, reducing noise and smoothing regions with small intensity variations. The techniques for object segmentation presented are based on image simplification by nonlinear diffusion and subsequent extraction of object masks taking into account disparity or motion fields as additional information. Two different strategies are considered in order to extract masks of physical objects in the scene: depth-driven nonlinear diffusion and edge extraction and enhancement in scale-space followed by edge matching in two different views of the same scene
BibTeX:
@inproceedings{izquierdo1999using,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Using 3D structure and anisotropic diffusion for object segmentation},
  booktitle = {Image Processing and Its Applications, 1999. Seventh International Conference on (Conf. Publ. No. 465)},
  publisher = {IET},
  year = {1999},
  volume = {2},
  pages = {532--536},
  note = {google scholar entry: 7th International Conference on Image Processing and its Applications (1999). Manchester, England, 13-15 July 1999.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=791109},
  doi = {10.1049/cp:19990379}
}
Izquierdo E and Ghanbari M (1999), "Video Composition by Spatiotemporal Object Segmentation, 3D-Structure and Tracking", In Information Visualization (IV 1999), Proceedings of the 3rd IEEE International Conference on. London, England, July, 1999, pp. 194-199. IEEE.
Abstract: A stereo vision based system for composition of natural and computer generated images is presented. The system focuses on the solution of four essential tasks in computer vision: disparity estimation, object segmentation, modeling and tracking. These tasks are performed in direct interaction with each other using novel and available multiview analysis techniques and standard computer graphics algorithms. The system is assessed by processing natural video sequences. Selected results are reported
BibTeX:
@inproceedings{izquierdo1999video,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  editor = {Banissi, Ebad and Khosrowshahi, Farzad and Sarfraz, Muhammad and Ursyn, Anna},
  title = {Video Composition by Spatiotemporal Object Segmentation, 3D-Structure and Tracking},
  booktitle = {Information Visualization (IV 1999), Proceedings of the 3rd IEEE International Conference on},
  publisher = {IEEE},
  year = {1999},
  pages = {194--199},
  note = {google scholar entry: 3rd IEEE International Conference on Information Visualization (IV 1999). London, England, 14-16 July 1999.},
  doi = {10.1109/IV.1999.781558}
}
Izquierdo E, Lopes F and Ghanbari M (1999), "Combining segmentation by nonlinear diffusion and shape adaptive block-matching for content based coding", In Motion Analysis and Tracking (1999), Proceedings of the IEE Colloquium on. London, England, May, 1999. (6), pp. 1-6. IET.
Abstract: A technique for image simplification and segmentation in scale-space is presented. Segmentation masks are then used to estimate accurately the parameters describing motion of single objects in the scene. The segmentation approach relies on a non-linear diffusion model in which multiscale image simplification and subsequent segmentation of the resulting smoothed images is performed. Motion parameters describing the dynamic of each single object are estimated by applying a generalised block-matching approach. The main strategy behind this technique is to use bilinear transformations to establish a spatial correspondence between the points in the input and output images. The performance of the presented techniques is evaluated by processing natural video sequences
BibTeX:
@inproceedings{izquierdo1999combining,
  author = {Izquierdo, Ebroul and Lopes, Fernando and Ghanbari, Mohammed},
  title = {Combining segmentation by nonlinear diffusion and shape adaptive block-matching for content based coding},
  booktitle = {Motion Analysis and Tracking (1999), Proceedings of the IEE Colloquium on},
  publisher = {IET},
  year = {1999},
  number = {6},
  pages = {1--6},
  note = {google scholar entry: IEE Colloquium on Motion Analysis and Tracking (1999). London, England, 10 May 1999.},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=789947},
  doi = {10.1049/ic:19990589}
}
Izquierdo E, Lopes F and Ghanbari M (1999), "Object-Based Motion Parameterization Using Shape Adaptive Bilinear Warping", In Multimedia Applications, Services and Techniques (ECMAST 1999), Proceedings of the 4th European Conference on. Madrid, Spain, May, 1999. Vol. 1629, pp. 97-107. Springer.
BibTeX:
@inproceedings{izquierdo1999object,
  author = {Izquierdo, Ebroul and Lopes, Fernando and Ghanbari, Mohammed},
  editor = {Leopold, Helmut and García, Narciso},
  title = {Object-Based Motion Parameterization Using Shape Adaptive Bilinear Warping},
  booktitle = {Multimedia Applications, Services and Techniques (ECMAST 1999), Proceedings of the 4th European Conference on},
  publisher = {Springer},
  year = {1999},
  volume = {1629},
  pages = {97--107},
  note = {google scholar entry: 4th European Conference on Multimedia Applications, Services and Techniques (ECMAST 1999). Madrid, Spain, 26-28 May 1999.},
  url = {http://link.springer.com/chapter/10.1007/3-540-48757-3_8},
  doi = {10.1007/3-540-48757-3_8}
}
Hanke M, Izquierdo Macana E and März R (1998), "On Asymptotics in Case of Linear Index-2 Differential-Algebraic Equations", SIAM Journal on Numerical Analysis. Vol. 35, pp. 1326-1346. SIAM.
Abstract: Asymptotic properties of solutions of general linear differential-algebraic equations (DAE�s) and those of their numerical counterparts are discussed. New results on the asymptotic stability in the sense of Lyapunov as well as on contractive index-2 DAE�s are given. The behaviour of BDF, IRK, and PIRK applied to such systems is investigated. In particular, we clarify the significance of certain subspaces closely related to the geometry of the DAE. Asymptotic properties like A-stability and L-stability are shown to be preserved if these subspaces are constant. Moreover, algebraically stable IRK(DAE) are B-stable under this condition. The general results are specialized to the case of index-2 Hessenberg systems.
BibTeX:
@article{hanke1998asymptotics,
  author = {Hanke, Michael and Izquierdo Macana, Ebroul and März, Roswitha},
  title = {On Asymptotics in Case of Linear Index-2 Differential-Algebraic Equations},
  journal = {SIAM Journal on Numerical Analysis},
  publisher = {SIAM},
  year = {1998},
  volume = {35},
  pages = {1326--1346},
  url = {http://epubs.siam.org/doi/abs/10.1137/S0036142994268879},
  doi = {10.1137/S0036142994268879}
}
Izquierdo E (1998), "Stereo Image Analysis for Multi-viewpoint Telepresence Applications", Signal Processing: Image Communication. January, 1998. Vol. 11(3), pp. 231-254.
Abstract: An improved method for combined motion and disparity estimation in stereo sequences to synthesize temporally and perspectively intermediate views is presented. The main problems of matching methods for motion and disparity analysis are summarised. The improved concept is based on a modified block matching algorithm in which a cost function consisting of feature- and area-based correlation together with an appropriately weighted temporal smoothness term is applied. Considerable improvements have been obtained with respect to the motion and disparity assignments by introducing a confidence measure to evaluate the reliability of estimated correspondences. In occluded image areas, enhanced results are obtained applying an edge-assisted vector interpolation strategy. Two different image synthesis concepts are presented. The first concept is suitable for processing natural stereo sequences. It comprises the detection of covered and uncovered image areas caused by motion or disparity. This information is used to switch between different interpolation and extrapolation modes during the computation of intermediate views. The proposed object-based approach is suitable for processing typical video conference scenes containing extremely large occluded image regions and keeping implementation costs low. A set of stereo sequences has been processed. The performed computer simulations show that a continuous motion parallax can be obtained with good image quality by using sequences taken with stereo cameras having large interaxial distances.
BibTeX:
@article{izquierdo1998stereo,
  author = {Izquierdo, Ebroul},
  title = {Stereo Image Analysis for Multi-viewpoint Telepresence Applications},
  journal = {Signal Processing: Image Communication},
  year = {1998},
  volume = {11},
  number = {3},
  pages = {231--254},
  url = {http://www.sciencedirect.com/science/article/pii/S0923596597000313},
  doi = {10.1016/S0923-5965(97)00031-3}
}
Izquierdo E and Ghanbari M (1998), "Accurate curve matching for object-based motion estimation", Electronics Letters. November, 1998. Vol. 34(23), pp. 2220-2221. IET.
Abstract: An improved technique for matching relevant curves extracted from natural video sequences is presented. The similarity measure introduced is based on the L2-distance between the turning functions of the curves and the shape of the curve defined by the intensity function along the curve
BibTeX:
@article{Izquierdo1998accurate,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Accurate curve matching for object-based motion estimation},
  journal = {Electronics Letters},
  publisher = {IET},
  year = {1998},
  volume = {34},
  number = {23},
  pages = {2220--2221},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=739597},
  doi = {10.1049/el:19981511}
}
Izquierdo E and Ghanbari M (1998), "Fast nonlinear diffusion approach for object segmentation", Electronics Letters. September, 1998. Vol. 34(19), pp. 1836-1837. IEEE.
Abstract: A low-complexity anisotropic diffusion technique for smoothing textures preserving object contours is presented. Additional features such as disparity or motion can be used to control the evolution of the intensity diffusion. Drastic simplifications in the iterative diffusion process are also introduced to reduce the algorithmic complexity
BibTeX:
@article{izquierdo1998fast,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Fast nonlinear diffusion approach for object segmentation},
  journal = {Electronics Letters},
  publisher = {IEEE},
  year = {1998},
  volume = {34},
  number = {19},
  pages = {1836--1837},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=722369},
  doi = {10.1049/el:19981324}
}
Izquierdo E and Kruse S (1998), "Image Analysis for 3D Modeling, Rendering, and Virtual View Generation", Computer Vision and Image Understanding. August, 1998. Vol. 71(2), pp. 231-253. Elsevier.
Abstract: The generation of virtual environments for telepresence systems, interactive viewing, and immersion in remote 3D-scenarios is an expanding and very promising research field. The challenge of this technology is to design systems capable of synthesizing views from any desired perspective using a set of real-scene perspectives. In such a system, two main parts can be distinguished: multiview image-analysis and viewpoint synthesis. Among other methods, multiview image-analysis is comprised of methods for the solution of three commonly encountered difficult tasks in computer vision: estimation of pixel correspondence, object segmentation, and camera calibration. Moreover, the results of multiview image-analysis are the required foundations for image synthesis. In this paper, we present a robust framework for stereo-image analysis, 3D modeling, and viewpoint synthesis. The image analysis part comprises disparity estimation by an improved block-matching algorithm and depth-based object segmentation using morphology. The basic idea of the system is to solve each of these tasks in direct interaction with the others. The results obtained in the partial solution of the correspondence problem are used to substantially simplify the segmentation task and vice versa. In the image-synthesis part, two tasks are considered: direct generation of arbitrary intermediate views, and extraction of the 3D structure of the scene using disparity results and camera parameters. The system has been successfully demonstrated in the context of virtual-view generation and object manipulation through 3D modeling.
BibTeX:
@article{izquierdo1998image,
  author = {Izquierdo, Ebroul and Kruse, Silko},
  title = {Image Analysis for 3D Modeling, Rendering, and Virtual View Generation},
  journal = {Computer Vision and Image Understanding},
  publisher = {Elsevier},
  year = {1998},
  volume = {71},
  number = {2},
  pages = {231--253},
  url = {http://www.sciencedirect.com/science/article/pii/S1077314298907068},
  doi = {10.1006/cviu.1998.0706}
}
Ohm J-R, Grüneberg K, Hendriks E, Izquierdo E, Kaliva D, Karl M, Papadimatos D and Redert A (1998), "A Realtime Hardware System for Stereoscopic Videoconferencing with Viewpoint Adaptation", Signal Processing: Image Communication. Vol. 14(1-2), pp. 147-171. Elsevier.
Abstract: This paper describes a hardware system and the underlying algorithms that were developed for realtime stereoscopic videoconferencing with viewpoint adaptation within the European PANORAMA project. The goal was to achieve a true telepresence illusion for the remote partners. For this purpose, intermediate views at arbitrary positions must be synthesized from the views of a stereoscopic camera system with rather large baseline. The actual viewpoint is adapted according to the head position of the viewer, such that the impression of motion parallax is produced. The whole system consists of a disparity estimator, stereoscopic MPEG-2 encoder, disparity encoder and multiplexer at the transmitter side, and a demultiplexer, disparity decoder, MPEG-2 decoder and interpolator with viewpoint adaptation at the receiver side. For transmission of the encoded signals, an ATM network is provided. In the final system, autostereoscopic displays will be used. The algorithms for disparity estimation, disparity encoding and disparity-driven intermediate viewpoint synthesis were specifically developed under the constraint of hardware feasibility.
BibTeX:
@article{ohm1998realtime,
  author = {Ohm, Jens-Rainer and Grüneberg, Karsten and Hendriks, Emile and Izquierdo, Ebroul and Kaliva, Dimitris and Karl, Michael and Papadimatos, Dionysis and Redert, André},
  title = {A Realtime Hardware System for Stereoscopic Videoconferencing with Viewpoint Adaptation},
  journal = {Signal Processing: Image Communication},
  publisher = {Elsevier},
  year = {1998},
  volume = {14},
  number = {1-2},
  pages = {147--171},
  note = {xxx: Izquierdo 'M', changed this in pdf, also reference to this paper as a conference paper},
  url = {http://www.eecs.qmul.ac.uk/~ebroul/mmv_publications/pdf/10.1.1.31.5664.pdf},
  doi = {10.1016/S0923-5965(98)00034-4}
}
Izquierdo E (1998), "Improved Disparity Estimation by Matching with an Adaptive Window", In Proceedings of the IAPR Workshop on Machine Vision Applications (MVA 1998). Chiba, Japan, November, 1998, pp. 65-68. Computer Vision Laboratory, University of Tokyo.
Abstract: A novel technique for disparity estimation based on block matching with a local adaptive window is introduced in this paper. In the proposed approach the size and shape of the reference window is calculated adaptively according to the degree of reliability of disparities estimated previously and the local variation of the disparity map. The performance of the method has been verified by computer simulations using synthetic data and several natural stereo sequences. Considerable improvements were obtained just in object borders or image areas that become occluded.
BibTeX:
@inproceedings{passino2009context2,
  author = {Izquierdo, Ebroul},
  title = {Improved Disparity Estimation by Matching with an Adaptive Window},
  booktitle = {Proceedings of the IAPR Workshop on Machine Vision Applications (MVA 1998)},
  publisher = {Computer Vision Laboratory, University of Tokyo},
  year = {1998},
  pages = {65--68},
  note = {google scholar entry: IAPR Workshop on Machine Vision Applications (MVA 1998). Chiba, Japan, 17-19 November 1998.},
  url = {http://b2.cvl.iis.u-tokyo.ac.jp/mva/proceedings/CommemorativeDVD/1998/papers/1998065.pdf}
}
Izquierdo E and Ghanbari M (1998), "Texture Smoothing and Object Segmentation Using Feature-Adaptive Weighted Gaussian Filtering", In Telecommunications Symposium (ITS 1998), Proceedings of the SBT/IEEE International. São Paulo, Brazil, August, 1998. Vol. 2, pp. 650-655. IEEE.
Abstract: Gaussian filter kernels can be used to smooth out textures in order to obtain uniform regions for image segmentation. In so-called anisotropic diffusion techniques, the smoothing process is adapted according to the edge direction in order to preserve the edges. However, the segment borders obtained with that approach do not necessarily coincide with physical object contours, especially in the case of textured objects. A novel segmentation technique by weighted Gaussian filtering is introduced. The extraction of true object masks is performed by smoothing edges due to texture and preserving true object borders. In this process additional features like disparity or motion are taken into account. The method presented has been successfully applied in the context of object segmentation in natural scenes and object-based disparity estimation for stereoscopic applications
BibTeX:
@inproceedings{izquierdo1998texture,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Texture Smoothing and Object Segmentation Using Feature-Adaptive Weighted Gaussian Filtering},
  booktitle = {Telecommunications Symposium (ITS 1998), Proceedings of the SBT/IEEE International},
  publisher = {IEEE},
  year = {1998},
  volume = {2},
  pages = {650--655},
  note = {google scholar entry: SBT/IEEE International Telecommunications Symposium (ITS 1998). S�o Paulo, Brazil, 9-13 August 1998.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=718473},
  doi = {10.1109/ITS.1998.718473}
}
Izquierdo E and Ghanbari M (1998), "Virtual 3D-View Generation from Stereoscopic Video Data", In Systems, Man, and Cybernetics (SMC 1998), 1998 IEEE International Conference on. San Diego, CA, October, 1998. Vol. 2, pp. 1219-1224. IEEE.
Abstract: Multi viewpoint synthesis from stereoscopic video data is a key technology in most emerging content based multimedia systems. The different techniques presented in the article are based on the analysis of 2D images acquired by two or more cameras. It is shown, how suitable analysis tools for disparity estimation and segmentation can benefit from each other. For the generation of virtual images, two methods with different levels of trade-off between complexity and degree of freedom are described. The first approach is disparity compensated view interpolation, which is capable of generating intermediate views along the interocular axis. The second is a more complex approach, which in a first step generates a 3D model of the object. This model can be rotated to any orientation, the original texture is mapped onto the surface, and arbitrary views can be generated by rendering the obtained surface
BibTeX:
@inproceedings{izquierdo1998virtual,
  author = {Izquierdo, Ebroul and Ghanbari, Mohammed},
  title = {Virtual 3D-View Generation from Stereoscopic Video Data},
  booktitle = {Systems, Man, and Cybernetics (SMC 1998), 1998 IEEE International Conference on},
  publisher = {IEEE},
  year = {1998},
  volume = {2},
  pages = {1219--1224},
  note = {google scholar entry: 1998 IEEE International Conference on Systems, Man, and Cybernetics (SMC 1998). San Diego, California, 11-14 October 1998.},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=727876},
  doi = {10.1109/ICSMC.1998.727876}
}
Ohm J-R, Izquierdo E and Müller K (1998), "Systems for Disparity-based Multiple-view Interpolation", In Circuits and Systems (ISCAS 1998), Proceedings of the 1998 IEEE International Symposium on. Monterey, California, May, 1998. Vol. 5, pp. 502-505.
Abstract: Viewpoint adaptation from multiple-viewpoint video captures is an important tool for telepresence illusion in stereoscopic presentation of natural scenes, and for the integration of real-world video objects into virtual 3D worlds. This paper describes different low-complexity approaches for generation of virtual-viewpoint camera signals, which are based on disparity-processing techniques and can hence be implemented with much lower complexity than full 3D analysis of natural objects or scenes. A realtime hardware system, which is based on one of our algorithms, has already been developed.
BibTeX:
@inproceedings{ohm1998systems,
  author = {Ohm, Jens-Rainer and Izquierdo, Ebroul and Müller, Karsten},
  title = {Systems for Disparity-based Multiple-view Interpolation},
  booktitle = {Circuits and Systems (ISCAS 1998), Proceedings of the 1998 IEEE International Symposium on},
  year = {1998},
  volume = {5},
  pages = {502--505},
  note = {google scholar entry: IEEE International Symposium on Circuits and Systems (ISCAS 1998). Monterey, California, 31 May - 3 June 1998.},
  url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.2475&rep=rep1&type=pdf},
  doi = {10.1109/ISCAS.1998.694542}
}
Izquierdo E (1997), "Stereo Matching for Enhanced Telepresence in Three-Dimensional Videocommunications", Circuits and Systems for Video Technology, IEEE Transactions on. August, 1997. Vol. 7(4), pp. 629-643. IEEE.
Abstract: A robust approach for joint motion and disparity estimation in stereo sequences to synthesize arbitrary intermediate views is presented. The improved concept for stereo image analysis is based on a modified block matching algorithm. In which a cost function consisting of area-based correlation together with an appropriately weighted temporal smoothness term is applied. A confidence measure to evaluate the reliability of estimated correspondences is introduced. In occluded image areas and image points with unreliable motion or disparity assignments, considerable improvements are obtained applying an edge-assisted vector interpolation strategy. Two different image synthesis concepts are presented as well. The reported approach is verified by processing a set of sequences taken with stereo cameras having large interaxial distances. Computer simulations show that telepresence illusion with continuous motion parallax and good image quality can be obtained using the methods presented
BibTeX:
@article{izquierdo1997stereo,
  author = {Izquierdo, Ebroul},
  title = {Stereo Matching for Enhanced Telepresence in Three-Dimensional Videocommunications},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {1997},
  volume = {7},
  number = {4},
  pages = {629--643},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=611174},
  doi = {10.1109/76.611174}
}
Ohm J-R and Izquierdo E (1997), "An Object-Based System for Stereoscopic Viewpoint Synthesis", Circuits and Systems for Video Technology, IEEE Transactions on. October, 1997. Vol. 7(5), pp. 801-811. IEEE.
Abstract: This paper describes algorithms that were developed for a real-time stereoscopic videoconferencing systems with viewpoint adaptation. The goal is a real telepresence illusion, which is achieved by synthesis of intermediate views from a stereoscopic camera shot with a rather large baseline. The actual viewpoint will be adapted according to the head position of the viewer, such that the impression of motion parallax is produced. The object-based system first identifies foreground and background regions and applies disparity estimation to the foreground object. A hierarchical block matching algorithm is employed for this purpose which takes into account the position of high-activity feature points and the object/background border positions. Using the disparity estimator's output, it is possible to generate arbitrary intermediate views by projections from the left- and right-view images. For this purpose, we have also developed an object-based interpolation algorithm, taking into account a very simple convex-surface model of a person's face and body. Though the algorithms had to be held rather simple under the constraint of hardware feasibility, we obtain a good quality of the intermediate-view images. Finally, we describe the hardware concept for the disparity estimator, which is the most complicated part of the algorithm
BibTeX:
@article{ohm1997object,
  author = {Ohm, Jens-Rainer and Izquierdo, Ebroul},
  title = {An Object-Based System for Stereoscopic Viewpoint Synthesis},
  journal = {Circuits and Systems for Video Technology, IEEE Transactions on},
  publisher = {IEEE},
  year = {1997},
  volume = {7},
  number = {5},
  pages = {801--811},
  url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=633502},
  doi = {10.1109/76.633502}
}
Izquierdo E and Kruse S (1997), "Disparity Controlled Segmentation", In Picture Coding Symposium (PCS 1997). Berlin, Germany, 10-12 September 1997. Berlin, Germany, September, 1997, pp. 737-742. IEEE.
BibTeX:
@inproceedings{izquierdo1997disparity,
  author = {Izquierdo, Ebroul and Kruse, Silko},
  title = {Disparity Controlled Segmentation},
  booktitle = {Picture Coding Symposium (PCS 1997). Berlin, Germany, 10-12 September 1997.},
  publisher = {IEEE},
  year = {1997},
  pages = {737--742},
  note = {google scholar entry: Picture Coding Symposium (PCS 1997). Berlin, Germany, 10-12 September 1997.}
}
Izquierdo E and Karl M (1996), "Low Complexity Disparity Estimation and Object-Based Image Synthesis for Multi-Viewpoint Stereoscopic Videoconferencing", In Proceedings of IAPR Workshop on Machine Vision Applications (MVA'96). Tokyo, Japan , pp. 196-199. Keio University.
Abstract: This paper focuses the realization of a low complexity disparity estimator and the generation of arbitrary intermediate views for typical stereo videoconference sequences. The method presented has been optimized in order to achieve a very low hardware complexity keeping a good performance with regard to the addressed application. A first hardware realization of the estimator is described. Additionally, an object-based approach to synthesize intermediate views is presented. The performance of the system is verified by several computer simulations.
BibTeX:
@inproceedings{izquierdo1996low,
  author = {Izquierdo, Ebroul and Karl, Michael},
  title = {Low Complexity Disparity Estimation and Object-Based Image Synthesis for Multi-Viewpoint Stereoscopic Videoconferencing},
  booktitle = {Proceedings of IAPR Workshop on Machine Vision Applications (MVA'96)},
  publisher = {Keio University},
  year = {1996},
  pages = {196--199},
  note = {google scholar entry: IAPR Workshop on Machine Vision Applications (MVA'96). Tokyo, Japan, 12-14 November 1996.}
}
Ohm J-R and Izquierdo E (1996), "An Object-Based System for Stereoscopic Videoconferencing with Viewpoint Adaptation", In Digital Compression Technologies and Systems for Video Communications (SPIE 2952), Proceedings of the SPIE Conference on. Berlin, Germany, October, 1996, pp. 29-40. SPIE.
Abstract: This paper describes algorithms that were developed for a stereoscopic videoconferencing system with viewpoint adaptation. The system identifies foreground and background regions, and applies disparity estimation to the foreground object, namely the person sitting in front of a stereoscopic camera system with rather large baseline. A hierarchical block matching algorithm is employed for this purpose, which takes into account the position of high-variance feature points and the object background border positions. Using the disparity estimator's output, it is possible to generate arbitrary intermediate views from the left- and right-view images. We have developed an object-based interpolation algorithm, which produces high-quality results. It takes into account the fact that a person's face has a more or less convex surface. Interpolation weights are derived both from the position of the intermediate view, and from the position of a specific point within the face. The algorithmshave been designed for a real-time videoconferencing system with telepresence illusion. Therefore, an important aspect during development was the constraint of hardware feasibility, while sufficient quality of the intermediate view images had still to be retained.
BibTeX:
@inproceedings{ohm1996object,
  author = {Ohm, Jens-Rainer and Izquierdo, Ebroul},
  editor = {Ohta, Naohisa},
  title = {An Object-Based System for Stereoscopic Videoconferencing with Viewpoint Adaptation},
  booktitle = {Digital Compression Technologies and Systems for Video Communications (SPIE 2952), Proceedings of the SPIE Conference on},
  publisher = {SPIE},
  year = {1996},
  pages = {29--40},
  url = {http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1026426},
  doi = {10.1117/12.251299}
}
Izquierdo E and Ohm J-R (1996), "Texture Smoothing and Object Segmentation Using Feature-Adaptive Weighted Gaussian Filtering" , pp. 1-19.
Abstract: Gaussian filter kernels can be used to smooth out textures in order to obtain uniform regions for image segmentation. In so-called anisotropic diffusion techniques, the smoothing process is adapted according to the edge direction in order to preserve the edges. However, the segment borders obtained with that approach do not necessarily coincide with true object borders, especially in the case of multi-coloured objects. We have developed a technique, which can take into account additional features like disparity or motion during the diffusion process, with the goal to segment true physical objects. In addition, we have introduced drastic simplifications in the iterative diffusion process. This technique has been successfully applied in the context of object-based disparity estimation.
BibTeX:
@techreport{izquierdo1996texture,
  author = {Izquierdo, Ebroul and Ohm, Jens-Rainer},
  title = {Texture Smoothing and Object Segmentation Using Feature-Adaptive Weighted Gaussian Filtering},
  year = {1996},
  pages = {1--19},
  url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8425}
}
Izquierdo E (1995), "Stability and convergence of IRK methods for DAEs with index two and time-varying nullspace", In Approximation & Optimisation, pp. 425-454. Peter Lang.
BibTeX:
@incollection{izquierdo1995stabilitynullspace,
  author = {Izquierdo, Ebroul},
  editor = {Brosowski, Bruno and Deutsch, Frank and Guddat, Juergen},
  title = {Stability and convergence of IRK methods for DAEs with index two and time-varying nullspace},
  booktitle = {Approximation & Optimisation},
  publisher = {Peter Lang},
  year = {1995},
  pages = {425--454},
  note = {xxx}
}
Izquierdo E (1995), "On the Epipolar Geometry in Stereo Vision", In Proceedings of the 3rd International Conference on Approximation and Optimization in the Caribbean. Puebla, Mexico, October, 1995, pp. 1-16. European Mathematical Information Service (EMIS).
Abstract: In this paper we are interested in the calculation of the equation governing pairs of epipolar lines for unregistered images, i.e. when the image acquisition parameters are unknown or no calibration object is available. First we do a thorough analysis of the mathematics required to find the epipolar line equations in the classical sense. Thereafter we show that for a broad class of stereo images the epipolar line equation can be obtained directly from the images without restrictive assumptions on the stereo geometry and without anyone 3D reference point. Therefore, the estimation of the epipolar line equations reduce to solve a linear optimization problem. Finally, we give an efficient algorithm to find the epipolar line equations directly from the image data, without knowledge of the image acquisition parameters.
BibTeX:
@inproceedings{izquierdo1995epipolar,
  author = {Izquierdo, Ebroul},
  title = {On the Epipolar Geometry in Stereo Vision},
  booktitle = {Proceedings of the 3rd International Conference on Approximation and Optimization in the Caribbean},
  publisher = {European Mathematical Information Service (EMIS)},
  year = {1995},
  pages = {1--16},
  note = {google scholar entry: 3rd International Conference on Approximation and Optimization in the Caribbean. Puebla, Mexico, 8-13 October 1995.},
  url = {http://www.emis.de/proceedings/3ICAOC/index.html}
}
Izquierdo E and Ernst M (1995), "Motion/Disparity analysis for 3D-Video-Conference Applications", In Proceedings of the International Workshop on Stereoscopy and 3-Dimensional Imaging (IWS3DI 1995). Santorini, Greece, September, 1995.
BibTeX:
@inproceedings{izquierdo1995motion,
  author = {Izquierdo, Ebroul and Ernst, Manfred},
  title = {Motion/Disparity analysis for 3D-Video-Conference Applications},
  booktitle = {Proceedings of the International Workshop on Stereoscopy and 3-Dimensional Imaging (IWS3DI 1995)},
  year = {1995},
  note = {google scholar entry: International Workshop on Stereoscopy and 3-Dimensional Imaging (IWS3DI 1995). Santorini, Greece, September 1995.}
}
Izquierdo E (1994), "The numerical solution of linear time-varying DAEs with index 2 by IRK methods", Revista Colombiana de Matemáticas. Vol. 28(2), pp. 43-82. Sociedad Colombiana de Matemáticas.
Abstract: Differential-algebraic equations (DAEs) with a higher index can be approximated by implicit Runge-Kutta methods (IRK). Until now, a number of initial value problems have been approximated by Runge-Kutta methods, but all these problems have a special semi-explicit or Hessenberg form. In the present paper we consider IRK methods applied to general linear time-varying (nonautonomous) DAEs tractable with index 2. For some stiffly accurate IRK formulas we show that the order of accuracy in the differential component is the same nonstiff order, if the DAE has constant nullspace. We prove that IRK methods cannot be feasible or become exponentially unstable when applied to linear DAEs with variable nullspace. In order to overcome these difficulties we propose a new approach for this case. Feasibility, weak instability and convergence are proved. Order results are given in terms of the Butcher identities.
BibTeX:
@article{Izquierdo1994numerical,
  author = {Izquierdo, Ebroul},
  title = {The numerical solution of linear time-varying DAEs with index 2 by IRK methods},
  journal = {Revista Colombiana de Matemáticas},
  publisher = {Sociedad Colombiana de Matemáticas},
  year = {1994},
  volume = {28},
  number = {2},
  pages = {43--82},
  url = {http://eudml.org/doc/226488}
}
Izquierdo E and Ernst M (1994), "Motion/disparity analysis and image synthesis for 3DTV", In Signal processing of HDTV, VI: Proceedings of the International Workshop on HDTV 1994. Turin, Italy Elsevier.
Abstract: This paper describes an improved method for combined motion and disparity estimation applied to 3DTV signal processing in order to compute temporally and perspectively intermediate pictures. Motion and disparity estimation is based on a modified block matching algorithm taking into account the inherent coherence relation between disparity and motion. Appearing and disappearing areas caused by motion and disparity are detected as well. This information is used to switch between different interpolation and extrapolation modes for computing intermediate pictures. A set of stereo-sequences has been processed by computer simulation. Using the proposed concept with only two cameras a continuous motion parallax can be obtained by signal processing.
BibTeX:
@inproceedings{izquierdo1994motion,
  author = {Izquierdo, Ebroul and Ernst, Manfred},
  editor = {Ninomiya, Yuichi and Chiariglione, Leonardo},
  title = {Motion/disparity analysis and image synthesis for 3DTV},
  booktitle = {Signal processing of HDTV, VI: Proceedings of the International Workshop on HDTV 1994},
  publisher = {Elsevier},
  year = {1994},
  note = {google scholar entry: International Workshop on HDTV 1994. Turin, Italy, 26-28 October 1994.},
  url = {http://www.scientificcommons.org/20302211}
}
Hanke M and Izquierdo Macana E (1993), "Implicit Runge-Kutta methods for general linear index 2 differential-algebraic equations with variable coefficients" , pp. 1-30. Humboldt-Universität zu Berlin.
Abstract: We investigate the approximation of linear index 2 differential-algebraic equations by implicit Runge-Kutta methods. Implicit Runge-Kutta methods cannot be feasible or become unstable when applied to linear differential-algebraic equations with variable nullspace. In order to overcome these difficulties we propose a new approach for this case. The feasibility of the weak instability and convergence are proved. Order results are given in terms of the Butcher identities.
BibTeX:
@techreport{Hanke_implicitrunge-kutta,
  author = {Hanke, Michael and Izquierdo Macana, Ebroul},
  title = {Implicit Runge-Kutta methods for general linear index 2 differential-algebraic equations with variable coefficients},
  publisher = {Humboldt-Universität zu Berlin},
  year = {1993},
  pages = {1--30},
  url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.358}
}