1.2 Contributions of the thesis
The main contribution of this thesis is a new statistical framework inspired from Information Theory to address the problem of **image** **restoration**. Many problems of **image** **and** **video** processing can be expressed as the minimization of a data con- sistency residual **and** a term of mismatch with respect to a priori constraints. Tra- ditionally, these functionals are based on penalization functions such as the ones defined for robust estimation, sometimes referred to as φ-functions. From a statis- tical point of view, recurring to these functions is equivalent to implicitly making assumptions on the probability density functions (PDFs) of the residual **and** the model mismatch, e.g., Gaussian, Laplacian, or other parametric laws for the square function, the absolute value, or other φ-functions, respectively. Alternatively, it is interesting to adapt to (an estimation of) the true PDF. This nonparametric ap- proach implies to define functionals which take PDFs as input. Entropy has been proposed in this context since, as a measure of dispersion of a PDF, its minimiza- tion leads the residual or model mismatch values to concentrate around narrow modes, the highest one normally corresponding to the annihilation of the residual or mismatch, the others corresponding to inevitable outliers.

En savoir plus
166 En savoir plus

Numerous tasks in **image** processing, such as **video** restora- tion, can be formulated as nonsmooth optimization problems over large datasets. In this context, it is necessary to pro- pose parallel/distributed methods to compute efficiently the solutions to the corresponding high-dimensional optimiza- tion problems. In this work, we focus on the case when the objective function is a sum of several convex non-necessarily smooth functions [1]. In the general case, a closed form expression of the solution does not exist, **and** developing iter- ative strategies becomes necessary.

Generally, the eld of view of the camera is fully illuminated by the laser **and** is acquired at standard **video** rates, say 10 Hz. In mosaic laser imaging, we re- place the low-repetition-rate 10Hz laser with optical parametric oscillator by a high-repetition-rate 10kHz ber laser. The latter is expected to oer higher average power **and** plug-eciencies within a few years. This concept presents additional advantages. As the repetition rate is larger by three orders of magnitude, the en- ergy per pulse is lowered by the same ratio. In order to maintain the signal-to-noise ratio, only a reduced part of the eld of view is illuminated at each laser ash. The corresponding region of interest of the sensor is read. The laser beam is then deected in order to illuminate another region of interest. By repeating the process, we scan the eld of view of the camera. This results in the successive acquisition of elementary images taken at a repetition-rate of 10 kHz that will tile as a mosaic in order to build the full-frame **image** at 10 Hz. The formation of each elementary **image** can be modeled as follows.

En savoir plus
represents additive noise.
**Video** deblurring is highly related to multi-**image** deconvolution [ Paragios et al. ,
2006 ; Chen et al. , 2008 ; Cai et al. , 2009 ; Sroubek **and** Milanfar , 2012 ]. It has been shown in [ Cai et al. , 2009 ] that given multiple observations, enforcing the frame sparsity improves the accuracy of identifying the blur kernels **and** reduces the ill- posedness of the problem. However, multi-**image** deconvolution algorithms require that all the input images are aligned **and** that the content is the same (static scene). On the other hand, the authors in [ Li et al. , 2010 ] proposed to estimate the camera motion **and** to explicitly model the **video** blur as a function of the motion being esti- mated. A joint energy function is formulated between the underlying sharp sequence **and** motion parameters. Recently, [ Kim **and** Lee , 2015 ] proposed to simultaneously tackle the problem of optical flow estimation **and** frame **restoration** in general blurred videos. This is done by simultaneously estimating the optical flow **and** latent sharp frames through the minimization of a single nonconvex energy function. Addressing these two problems simultaneously requires a much more complex optimization, due to the more sophisticated direct model linking all the blurry observations.

En savoir plus
183 En savoir plus

Nantes, France patrick.lecallet@univ-nantes.fr
Abstract—Learning-based black-box approaches have proven to be successful at several tasks in **image** **and** **video** processing domain. Many of these approaches depend on gradient-descent **and** back-propagation algorithms which requires to calculate the gradient of the loss function. However, many of the visual metrics are not differentiable, **and** despite their superior accuracy, they cannot be used to train neural networks for imaging tasks. Most of the **image** **restoration** neural networks rely on mean squared error to train. In this paper, we investigate visual system based metrics in order to provide perceptual loss functions that can replace mean squared error for gradient descent- based algorithms. We also share our preliminary results on the proposed approach.

En savoir plus
III. R ESULTS **AND** D ISCUSSION A. Phantom quality **and** reproducibility
The reproducibility of phantom printing can refer to reproducibility between or within phantoms. While accurate scatterer positioning could be achieved with a judicious choice of printing parameters, it was found that even using the same printer settings, some printed phantoms showed contamination of the propagation medium with scattering material. The reason behind this merits further investigation. Although scatterer diameters as low as 50 μm could be printed, setting it to 100 μm (the setting presented in the current work) substantially reduced variations in scatterer diameter. As regards variations of scatterer printing within phantoms, Fig. 2 shows the B-mode **image** of a successfully printed typical phantom. It can be observed that the lateral variation of the scatterer responses around the outer frame is relatively small compared to the axial variation. This suggests that the scatterer diameters are fairly reproducible, with the relatively high amplitude of the response at 20 mm hypothesized to be due to elevational focusing of the transducer. As judged by the location of the outer ring of scatterers, the scatterer placement is also accurate.

En savoir plus
Knowledge of the scattering function allows comparison of the deconvolved images with the ground truth. Thus, using the scattering function **and** the originally acquired B-mode **image**, performance of **image** **restoration** methods could be evaluated quantitatively through comparison of root mean square error **and** full width half maximum values. Preliminary results demonstrate the benefits of knowing the scattering function during experimental testing of **image** **restoration** methods. In summary, the current work shows the potential of an experimental method for evaluating the extent to which an **image** **restoration** method provides a faithful rendering of the underlying scattering structure.

En savoir plus
Projet PASTIS
Rapport de recherche n˚3249 — September 1997 — 68 pages
Abstract: The purpose of this report is first to show the main properties of Gibbs
distributions considered as exponential statistics on finite spaces, as well as their sampling **and** annealing properties. Moreover, the definition **and** use of their cu- mulant expansions enables to exhibit other important properties of such distribu- tions. Last, we tackle the problem of hyperparameter estimation in an incomplete data frame for **image** **restoration** purposes. A detailed analysis of several joint **restoration**-estimation methods using generalized stochastic gradient algorithms is presented, requiring infinite, continuous configuration spaces. Using once again cumulant analysis **and** its relationship with Statistical Physics allows us to propose new algorithms **and** to extend them to an explicit boundary frame.

En savoir plus
I. I NTRODUCTION
Part 11 of the JPEG2000 standard extends it beyond the scope of **image** compression toward a global wire- less transmission architecture. The JPEG2000 Wireless (JPWL) transmission consists of the core coding system **and** an Unequal Error Protection (UEP) scheme driven by semantic information reflecting the error sensitivity of each part of the bitstream (Fig. 1). An emphasized protection of the **image** **and** tiles headers has been pro- posed [1] because errors occurring at these levels have a dramatic impact on the overall **image** quality. References [2], [3], [4] have shown the efficiency of UEP schemes over traditional Equal Error Protection (EEP) schemes for multimedia content. Typically, Reed-Solomon (RS) codes are used as Forward Error Correction (FEC) codes. In [5], the RS codes (160,64), (80,25) **and** (40,13) are used. They introduce redundancy ratios of 1.5, 2.2 **and** 2.08, respectively. This robust protection improves significantly the probability of successful decoding independently of the channel conditions because the integrity of the headers is preserved in case of binary losses.

En savoir plus
101 - 54602 Villers lès Nancy Cedex France Unité de recherche INRIA Rennes : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex France Unité de recherche INRIA Rhône-Alpes : 65[r]

La classification d’images constitue sans doute, la partie la plus importante de l’analyse de l’image numérique. L’objectif est d’identifier et de décrire les caractéris- tiques présentes dans une **image** afin de les répertorier par classes et par thèmes. Des applications existent dans un grand nombre de domaines, tels que l’interprétation de l’imagerie médicale, la surveillance, la photo satellite et la télévision interactive. Les méthodes traditionnelles de classification d’images procèdent par analyse des blocs distincts d’une **image**, ce qui aboutit à un formalisme non contextuel des caractéristiques visuelles. Toutefois, face à l’analyse d’une parcelle d’image, l’œil humain est souvent dans l’incapacité d’identifier ce qu’il voit. Les approches récen- tes tendent donc de plus en plus vers une vision globale de l’image incluant sa structure et sa forme générale (ex: le soleil dans le ciel, le ciel au dessus d’un paysage ou encore un bateau sur l’eau, etc.) [20], [21].

En savoir plus
165 En savoir plus

Integral imaging cannot be used for applications where a large angle of view is required, such as Free Navigation for example, as it only captures the light-ﬁeld under a narrow angle of view. In Part III, we also study the compression of Super Multi-View content, that provides a sparser sampling of the light-ﬁeld but with a large baseline. In Chapter 4, we present a subjective quality evaluation of compressed Super Multi-View content on a light- ﬁeld display system. The goal is to study the impact of compression at the display side in the speciﬁc case of light-ﬁeld content. We provide some initial conclusions on the feasibility of a **video** service that would require rendering about 80 views. We ﬁrst show that the bitrates required for encoding **and** rendering 80 views are realistic **and** coherent with future networks requirements to support 4K/8K, although some considerations on the tested content characteristics highlight the need for a better codec, in order to further improve the quality **and** avoid network overload. Preliminary experiments performed during this study lead to recommended coding conﬁgurations for Super Multi-View **video** content, particularly with groups of views (GOVs), that enable a compromise between memory limitations, coding eﬃciency **and** parallel processing. Some conclusions are also drawn on the amount of views to skip at the encoder, **and** to synthesize after the decoding, that is highly sequence-dependent. The ratio between coded **and** synthesized views depends on the quality of the synthesized views, hence is linked to the quality of the depth maps, the eﬃciency of the renderer, **and** the complexity of the scene. Apart from compression, view synthesis can introduce severe distortions, **and** aﬀects the overall rendering scheme. Our results conﬁrm that improvement of view synthesis **and** depth estimation algorithms is mandatory. Concerning the evaluation method **and** metric, results show that the PSNR remains able to reﬂect an increase or decrease in subjective quality for light-ﬁeld content. However, depending on the ratio of coded **and** synthesized views, we have observed that the order of magnitude of the eﬀective quality variation is biased by the PSNR, that is less tolerant to view synthesis artifacts than human viewers.

En savoir plus
179 En savoir plus

Recently, a new **image** model, namely the α-tree [1], has been introduced as a powerful tool for multiscale **image** representation. It offers a compact **and** efficient way to access **image** content, **and** can be further exploited in various **image** analysis **and** processing tasks. We consider here the α-tree for **image** segmentation purpose, **and** study how this new **image** model can be used in such a context. Indeed, some relevant **image** features can be extracted from the tree, leading then in segmentation methods operating either auto- matically or interactively. Moreover, we propose an efficient implementation scheme which ensures user interactivity **and** extension to **video** data. Preliminary results obtained on the Berkeley Segmentation Dataset are very promising **and** show the relevance of the α-tree in **image** processing **and** analysis. This paper is organized as follows. In the next section, we recall the definitions of flat **and** quasi-flat zones, that lead to the α-tree model for **image** representation. We then describe our contribution in Sec. III where we study how the α-tree can provide relevant features for **image** segmentation before introducing a new segmentation method **and** its efficient implementation. In Sec. IV, we discuss parameter settings **and** provide an experimental evaluation of our method on the Berkeley Segmentation Dataset. We also provide an insight

En savoir plus
been set in such a way that one pixel on the display array is pictured by 4×4 pixels on the CCD array. This permitted us to obtain a good approximation of the 56×40 pixels display frame by computing the mean of each 4×4 blocks in the CCD frame. Stim- uli were generated with Matlab on a PC using the PsychToolbox extension [22]. They consisted of a straight edge moving from left to right. One exam- ple of frames grabbed by the CCD camera is shown in Figure 6. As mentioned before, the blurred pro- file was obtained by motion compensation of each CCD frames to simulate the smooth pursuit of the eyes. The high camera frame rate **and** the precise calibration of apparatus to have 4×4 CCD cam- era pixels to picture one display pixel permit us to achieve this motion compensation precisely. Next, all frames are added to each other to simulate the temporal integration on the retina. An example of blurred edge obtained with this method is shown in Figure 7 for a edge moving with a velocity V = 10 pixels per frame. The blurred edge width BEW (in pixels) is measured as illustrated. The blurred edge time BET (in seconds or in frames) is generally used, it’s expressed by dividing BEW by the ve- locity V (in pixels per seconds or pixels per frame): BET = BEW/V (3) Moreover, it has been observed that for a given grey-to-grey transition (i.e. for a given temporal re- sponse of the liquid crystal cells), BET was not varying with the velocity V . In other terms, the measured blur width BEW was proportional to the velocity of the moving edge. This result agree with the relation 2 **and** the parameter a can then be identified with the blurred edge time BET .

En savoir plus
given N training pairs (w (n) , a (n,k) ). θ represents parameters in the LSTM.
7.3.2. Atoms Construction
Each configuration of a may be associated with a different distribution P θ (w|a), therefore a different oracle model. We define configuration as an orderless collection of unique atoms. That is, a (k) = {a 1 , . . . , a k } where k is the size of the bag **and** all items in the bag are different from each other. Considering the particular problem of **image** **and** **video** captioning, atoms are defined as words in captions that are most related to actions, entities, **and** attributes of entities (in Figure 7.1 ). The reason of using these three particular choices of language com- ponents as atoms is not an arbitrary decision. It is reasonable to consider these three types among the most visually perceivable ones when human describes visual content in natural language. We further verify this by conducting a human evaluation procedure to identify “visual” atoms from this set **and** show that a dominant majority of them indeed match human visual perception, detailed in Section 7.5.1 . Being able to capture these important concepts is considered as crucial in getting superior performance. Therefore, comparing the performance of existing models against this oracle reveals their ability of capturing atoms from visual inputs when P (a|v) is unknown.

En savoir plus
154 En savoir plus

The k nearest neighbors are provided by the Approximate Nearest Neighbor Searching (ANN) library (available at
http://www.cs.umd.edu/˜mount/ANN/ ).
In order to measure the performance of our algorithm we degraded the Lena **image** (256x256) adding a gaussian noise with standard deviation σ = 10. The original **image** has in- tensity ranging from 0 to 100. We consider a 9 x 9 neighbor- hoods, **and** we add spatial features to the original radiometric data [10, 13], as explained in section 2. These spatial features allow us to reduce the effect of the non stationarity of the sig- nal in the estimation process, by preferring regions closer to the estimation point. The dimension of the data d is therefore equal to 83, **and** we have to search the k nearest neighbors in such a high dimensional space.

En savoir plus
Stochastic approximation techniques have been used in var- ious contexts in data science. We propose a stochastic ver- sion of the forward-backward algorithm for minimizing the sum of two convex functions, one of which is not necessarily smooth. Our framework can handle stochastic approxima- tions of the gradient of the smooth function **and** allows for stochastic errors in the evaluation of the proximity operator of the nonsmooth function. The almost sure convergence of the iterates generated by the algorithm to a minimizer is es- tablished under relatively mild assumptions. We also propose a stochastic version of a popular primal-dual proximal split- ting algorithm, establish its convergence, **and** apply it to an online **image** **restoration** problem.

En savoir plus
tarel@lcpc.fr hautiere@lcpc.fr
Abstract
One source of difficulties when processing outdoor im- ages is the presence of haze, fog or smoke which fades the colors **and** reduces the contrast of the observed objects. We introduce a novel algorithm **and** variants for visibility **restoration** from a single **image**. The main advantage of the proposed algorithm compared with other is its speed: its complexity is a linear function of the number of **image** pixels only. This speed allows visibility **restoration** to be applied for the first time within real-time processing appli- cations such as sign, lane-marking **and** obstacle detection from an in-vehicle camera. Another advantage is the pos- sibility to handle both color images or gray level images since the ambiguity between the presence of fog **and** the ob- jects with low color saturation is solved by assuming only small objects can have colors with low saturation. The al- gorithm is controlled only by a few parameters **and** con- sists in: atmospheric veil inference, **image** **restoration** **and** smoothing, tone mapping. A comparative study **and** quanti- tative evaluation is proposed with a few other state of the art algorithms which demonstrates that similar or better qual- ity results are obtained. Finally, an application is presented to lane-marking extraction in gray level images, illustrating the interest of the approach.

En savoir plus
3.2. Extensions to polarimetric **and**/or interferometric SAR
Most deep learning approaches for speckle reduction focused on the case of intensity images. Multi-channel complex- valued SAR images, as in SAR polarimetry or in SAR in- terferometry, raise other challenges. Polarimetric **and** in- terferometric information are encoded in complex-valued covariance matrices. Restricting the estimated matrices to the cone of positive definite covariance matrices requires an ad- equate design of the learning strategy **and**/or of the network. Due to the increase of the dimensionality of the data **and** of the unknowns, the learning task becomes more complex **and** it is expected that many more training samples are re- quired to capture all spatial **and** polarimetric/interferometric configurations during the learning phase.

En savoir plus