University Saad Dahlab of Blida (1), Sharjah University (2)

# FPGA Embedded Implementation for Neutron Radiography Images Restoration Based on Tikhonov and TV Algorithms

**Abstract**. In this paper, we present a mixed software/hardware Implementation for image restoration using Tikhonov and Total Variations (TV) approaches. The proposed work is implemented and compiled on the embedded development kit EDK6.3i and the synthesis software ISE6.3i available with Xilinx Virtex-II FPGA using C language. The proposed implementation is designed to be integrated in the whole imaging system and image enhancement results have been appreciated by neutron radiography final users. The design can significantly accelerate the two Algorithms.

Streszczenie. W artykule przedstawiono zastosowanie metod Tikhonova oraz Odchylenia Całkowitego w algorytmie odtwarzania obrazu. Do implementacji w języku C, wykorzystano płytę rozwojową EDK 6.3i z układem FPGA Virtex II Pro firmy Xilinx wraz z oprogramowaniem ISE 6.3i. (Implementacja w układzie FPGA algorytmu odtwarzania obrazu Radiografii Neutronowej przy wykorzystaniu metod Tikhonova oraz odchylenia całkowitego)

Keywords: Tikhonov, Total Variation, Embedded, FPGA, Neutron Radiography Słowa kluczowe: Tikhonov, Odchylenie Całkowite, wbudowany, FPGA, Radiografia Neutronowa

## Introduction

Digital radiological image is a digital image acquired by a certain radiological procedure which can be X-rays, neutron radiography, gamma camera image or a nuclear magnetic resonance image. It is a two-dimensional mxn array of non-negative integers (gray levels). For neutron radiography, the gray level value represents the relative linear attenuation coefficient of the object. The performance of radiological imaging depends not only on the reconstruction algorithm but also on the image quality. Higher quality images show finer structural or functional information of body organs and support more reliable diagnostic outcomes [1]. Images made with neutrons have been widely used in industrial research and non-destructive testing applications since the early 1960s [2]. General applications for neutron radiography include inspections of nuclear materials, explosive devices, turbine blades, electronic packages and miscellaneous assemblies including aerospace structure (metallic honeycomb and composite components), valves and other assemblies. Industrial applications generally involve the detection of a particular material in an assembly containing two or more materials. Examples include detection of residual ceramic core in an investment-cast turbine blade, corrosion in a metallic assembly and water in honeycomb or explosive in a metallic assembly. Digital image restoration is a solution to the problem.

We can broadly classify restoration techniques into two classes: the filtering reconstruction techniques and the algebraic techniques. The filtering techniques are rather classical and they make use of the fact that noise signals usually have higher frequencies than image signals. This means that image signals die out faster than noise signals in high frequencies. The ill-posedness of this problem arises from the fact that the kernel of the blurring function is badly conditioned and the degraded image contains noise. As a result, small perturbations in additive noise may lead to significant oscillations in the inversion result when using matrix inversion solution.

The solution methods for this optimization problem include many proposed algorithms developed recently. Total variation (TV) is a regularization approach that performs edge preserving image restoration, but at a high computational cost. Total variation regularization requires linearization of a highly nonlinear penalty term, which increases the restoration time considerably for large scale images. In the total variation method, we will consider an iterative regularization approach in the spatial domain, which was first addressed in optimization as the Barzilai-Borwein minimization (BB) method [3].

Purely software implementations of image processing algorithms, however, appear to present a problem when real time systems are required in terms of performance. Therefore, hardware acceleration of these algorithms has become a topic of recent research.

Authors of [4] implemented a standard LMS Algorithm for adaptive signal processing on FPGA hardware with low resource utilization and fast convergence. In [5], they replaced the combination of ASIC and microprocessor with a single FPGA chip solution using the Xilinx's MicroBlaze soft IP processor to implement a modern frequency converter control, allowing more flexible modification of the used algorithms and control methods.

The basic motivation of our application to neutron radiography imaging is to enhance unavoidable blurs and noises during acquisition that may reduce image quality in order to reach a good reconstruction on a distant Computer or storage for future retrieval. The proposed design employs various digital techniques and specific features of the Xilinx Virtex-II/V2MB1000 FPGA to accelerate data processing.

#### **Image Degradation**

In neutron radiography, there are several components tend to degrade the image, limiting the resolution of neutron radiography. The image degraded sources in neutron radiography are geometric unsharpness associated with lack of collimation in the beam, statistical fluctuation associated with low neutron beam intensities or gamma ray background, scattering degradation caused by scattering of neutrons which deflects the beam, motion unsharpness due to object motion during the exposure, and limitations in the imaging and processing systems, such as converter film unsharpness and electrical noise [6]. In neutron radiography, corrupted images often pose problem for analysis and detection of the object being observed. To overcome this problem, restoration process was used to reduce the blurring and noise effects on the image.

# Image Restoration Algorithms Tikhonov Approach

The main objective of regularization is to incorporate more information about the desired solution in order to stabilize the problem and find a useful and stable solution. The most common and well-known form of regularization is that of Tikhonov [7]. The Tikhonov regularized minimum norm solution of the equation:

$$(1) g = Hf + \eta$$

is the vector  $F_{\delta} \in \Re^{N}$  that minimizes the expression:

(2) 
$$||Hf - g||_2^2 + \lambda^2 ||Lf||_2^2$$

Where the degradation process is modelled as a blurring function H that together with an additive noise term  $\eta$  operates on an original input image f to produce an observed degraded image g. L is the discrete Laplacian

and  $\lambda > 0$  is called a regularization parameter.

We denote:

(3) 
$$F_{\delta} = \arg\min_{x} \left\| Hf - g \|_{2}^{2} + \lambda^{2} \| Lf \|_{2}^{2} \right\}$$

Regularization can be understood as a balance between two requirements:

- 1. f should give a small residual Hf g.
- 2. f should be small in  $L^2$  norm.

The regularization parameter  $\lambda > 0$  can be used to "tune" the balance. Note that in inverse problems there are typically infinitely many *f* satisfying (3).

#### **Total Variation Approach**

TV is often used for image filtering and restoration. TV based filtering was introduced by Rudin, Osher, and Fatemi [8]. TV denoising is an effective filtering method for recovering piecewise-constant signals. Many algorithms have been proposed to implement it. The most famous one used in this comparison is by Chambolle [9].

The derivation in this algorithm is based on the min-max property and the majorization-minimization procedure. Authors of [8] introduced in 1992 the following idea:

 $\left\|Hf-g\right\|_{2}^{2}+\lambda\left\|Lf\right\|_{1}$ 

Instead of minimizing (2), let us minimize:

(4)

Recall that:  $||Z||_{a}^{2} = |Z_{1}|^{2} + ... + |Z_{N}|^{2}$ 

$$\begin{aligned} \|Z\|_{2} &= |Z_{1}| + \dots + |Z_{N}| \\ |Z\|_{1} &= |Z_{1}| + \dots + |Z_{N}| \end{aligned}$$

The idea is that (4) should allow occasional larger jumps in the reconstruction leading to piecewise smoothness instead of overall smoothness. It turns out that (4) really is a powerful method, but numerical minimization is more difficult. First approach to practical minimization of the TV functional is half-quadratic programming. We want to minimize:

(5) 
$$\left\| Hf - g \right\|_{2}^{2} + \lambda \sum_{j=0}^{N+1} \left| (Lf)_{j} \right|$$

where  $(Lf)_j = f_{j+1} - f_j$  for *j=0,...,N+1* with the convention:  $f_0 = 0 = f_{N+1}$ 

Write Lf in the form:V\_+-V\_= Lf, Where  $V_{\pm}$  are non-negative

vectors:  $V_{\pm} \in \mathfrak{R}_{+}^{N+2}$  , or  $(V_{\pm})_{j} \ge 0$  for all j=0,...,N+1

Now minimizing (TV) is equivalent to minimizing:

(6) 
$$\left\|Hf\right\|_{2}^{2} - 2g^{T}Hf + \lambda L^{T}V_{+} + \lambda L^{T}V_{-}\right\|$$

where  $L=[1 \ 1 \ ...]^T \in \Re^{N+2}$  and the minimization is taken over vectors:

$$y = \begin{bmatrix} f \\ V_+ \\ V_- \end{bmatrix}, \text{ where: } f \in \mathfrak{R}^N, V_+ \in \mathfrak{R}^{N+2}_+, V_- \in \mathfrak{R}^{N+2}_-$$

Write now:

$$A = \begin{bmatrix} 2H^{T}H & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}, B = \begin{bmatrix} -2g^{T}Hf \\ \lambda L \\ \lambda L \end{bmatrix}$$

then we deal with the standard form of the problem:

(7) 
$$\operatorname{argmin}\left\{\frac{1}{2}y^{T}Ay + B^{T}y\right\}$$

with the constraints:

Simple way to deal with the second problem is to check if f increases, and if so, half the step length. A more delicate approach would be to enforce the so-called Armijo-Goldstein-Wolf two conditions:  $0<\delta_1<\delta_2<1$ 

1. 
$$func(x_k + \alpha_k d_k) \le func(x_k) + \delta_1 \alpha_k g_k^T d_k$$
  
2.  $g(x_k + \alpha_k d_k)^T d_k \ge \delta_2 g_k^T d_k$   
where  $g_k = \nabla func(x_k)$  and  $d_k$  denotes the  $k^{th}$  search

direction. In our case  $d_k = -\nabla func(x_k) = -g_k$ 

It can be shown that if  $d_k$  is a descent direction and f is bounded below along the ray  $\{x_k + \alpha d_k | \alpha > 0\}$ , then there exist step lengths satisfying the two above conditions.

We remark that the first condition ensures that the function is reduced sufficiently, and second condition prevents the steps from being too small.

Note that the storage need of (BB) method is of the order *n* instead of  $n^2$  typical for many methods. If *x* is a large *MxN* size image, then:  $n^2 = M^2 x N^2$  is too large for computer memory.

Our aim is to minimize:

$$func(f) = \|Hf - g\|_{2}^{2} + \lambda \|Lf\|_{1} = \|Hf - g\|_{2}^{2} + \lambda \sum |f_{i} - f_{j}|$$

## for $f_i$ and $f_j$ neighbors

But  $f_{unc}$  is not continuously differentiable, so we can not apply the Barzilai-Borwein method. Let us replace the absolute value function |t| by an approximation:

$$\left|t\right|_{eta} = \sqrt{t^2 + eta}$$
 where  $eta$  >0 is small

Another popular choice is:

$$\left|l\right|_{\beta} = \frac{1}{\beta} \log(\cosh(\beta t))$$

then the objective function becomes:

func 
$$(f)_{\beta} = \|Hf - g\|_{2}^{2} + \lambda \sum |f_{i} - f_{j}|_{\beta}$$

and we can apply the Chambolle Algorithm.

Antonin Chambolle describes in [9] an iterative algorithm for the resolution of the TV regularized restoration problem (the so called "Rudin-Osher-Fatemi" method). This algorithm exploits a dual formulation of the minimization problem, and uses a fixed point iteration to find a solution of the dual formulation. Chambolle proves that these iterations are contractant, and thus converge to a solution with linear speed. Chambolle also exposes some important extensions of this algorithm such as the regularization parameter *lambda* that can be updated during the iterations in order to solve the  $L^2$  constrained problem (instead of *Lagrangian* regularization). This is very useful if one knows the level of noise that is perturbs the measurements.

# Concept of reconfigurable computing

The advance in technology has made possible to produce better programmable circuits with FPGAs and with increased performance and higher integration densities. FPGAs offer the highest flexibility and cost effectiveness because hardware implementation can be done several times on the same chip at no additional cost and because mass productions of FPGA chips have lead to very competitive prices.

Oriented Application Specific Integrated Circuits (ASIC's), can perform a restricted set of operations/tasks, but configurable computing hardware, such as Field Programmable Gate Arrays (FPGAs), Fig.4, allow modifications at any stage of the design.



Fig. 4: Virtex-IIV2MB1000 System Board

#### **EDK: Embedded Development Kit**

The EDK is a development environment that provides application designers with the tools necessary to build embedded soft cores processor systems. The steps within the EDK that are necessary to build the embedded processor system include: hardware platform creation, software platform creation [10], and software application creation. The hardware platform is defined by the Microprocessor Hardware Specification (MHS) file which defines our system architecture, memory modules, embedded processors and the systems connectivity as well as the configurable options and the address map for each memory module in our system [11].

The Xilinx Platform Studio (XPS) which is an integrated development environment gives a wide selection of settings to modify the kernel as needed. It allows the user to, for example, initiate interrupt handlers, add thread support, and specify clock resolution. All modifiable settings can be seen in Xilinx OS and libraries documents.

## The MicroBlaze Processor

The Virtex-II FPGA in addition to containing the traditional elements that are characteristics of previous platform FPGA generations contains the MicroBlaze Soft Core Processor Block [12].

The Microblaze (MB) is an embedded soft processor system designed by Xilinx as an intellectual property (IP) core that is implemented using the logic primitives of the FPGA. This processor provides application designers with the tools needed to build embedded soft cores processor systems. The architecture is shown below in Fig. 5.



Fig. 5: Microblaze Architecture

It has the following features:

- RISC Processor.
- · Thirty-two 32-bit general purpose registers.

• 32-bit instruction word with three operands and two addressing modes.

• Separate 32-bit instruction and data buses OPB (**O**n-chip **P**eripheral **B**us).

• Separate 32-bit instruction and data buses LMB (Local Memory Bus).

• Hardware multiplier (in Virtex-II and subsequent devices).

MicroBlaze processor and peripherals consume about 38 % of Virtex-II XC2V1000 resources (Table II). The processor can run at maximum 100 MHz clock speed. Despite the high clock speed, processor's optimal performance can be achieved using the FPGA's internal BRAM memory blocks for data and instructions memories. Internal memory blocks enable to perform memory read and write operations in two clock cycles [13].

| · · · · · · · · · · · · · · · · · · · |                                                                                                  |
|---------------------------------------|--------------------------------------------------------------------------------------------------|
| Resource Usage                        | % of XC2V1000                                                                                    |
|                                       | Resources                                                                                        |
| 1198                                  | 23                                                                                               |
| 528                                   | 10                                                                                               |
| 121                                   | 2                                                                                                |
| 100                                   | 2                                                                                                |
| 51                                    | 1                                                                                                |
| 8                                     | 0                                                                                                |
|                                       | Resource Usage           1198           528           121           100           51           8 |

Table 1. Ressource usage of the Microblaze processor system

Embedded systems design is a hot application field which merges logic design and processor-based hardware development in a single or few chips solution. In recent years, embedded applications have emerged at a fast rate and have been used in every field. Various technologies have been used in the development of embedded systems; microcontroller, DSP processor, ASIC and now FPGA circuits.

## **FPGA** Compiling of Algorithms

The original source codes are written in standard C language and are not optimized for any specific processor target. These algorithms are designed to restore 8-bit coded pixels of *mxn* pixels image.

The change made to the C programs in support of hardware compilation is how to load the input image data to the memory and how to read the restored output data by the host computer, Fig.10. The algorithm assumes the data in a global matrix (image).

After simulating its functionality using standard desktop tools, we are ready to implement the application on a mixed FPGA/processor target. We choose a Xilinx MicroBlaze based FPGA target and we select the Virtex-II MicroBlaze Development Kit (EDK) as our reference system. This kit includes a hardware reference board populated with a VirtexII FPGA and various peripheral interfaces, as well as all development tools needed to compile and synthesize hardware and software applications (consisting of HDL source files for hardware and C source files for software) to the FPGA target. When combined with the C compiler, this kit provided us with everything needed to compile and execute the two restoration applications from our C language source files [14].

Acquired images are converted to memory bit files and vice versa, initial data to be processed is written inside the memory (degrade image file read pixel by pixel) and the result (restored image file) is read back from the UART (serial port).

The hardware restoration steps are:

1- Read the raw image: this image is loaded previously in the DDRAM.

2- Input algorithm parameters: number of iterations, regularization parameter and the temporal digitalization.

3- Execute the restoration program.

4- Write the restored image: transmit resulted characters towards the UART, these characters are captured by the hyper terminal and written in a file.

The following steps were required to compile our application in the hardware, Fig.6:

1- Platform creation : Microblaze + UART+ DDRAM + GPIO + MDM.

2- Write a small program to be executed by the microblaze on the BRAM: the microblaze boots from the BRAM. The created compiled program is : executable.elf.

3- Write the C program for restoration (add source): tikhonov.c, compiled under the name : executable1.elf.
4- Execution :

4-1- Load the executable main program using the EDK procedures and configure FPGA.

4-2- Run the XMD window, and go to the directory : mblaze and the subdirectory: code.

4-3- Connect XMD to the microblaze using : connect mb mdm (mdm: microprocessor debug module).

4-4- Load the original image (data) using : xdownload 0 – data image.pgm 0x8d000000.

4-5- Load the restoration program using : Xdownload 0 executable1.elf

4 - 6- Run the hyper terminal.

4-7- Run the command : text capture, in the hyper terminal and wait.

4-8- Run the compiled c-programe using the command: xcontinue 0 0x8c000000.

4-9- Output data is transmitted through the serial port to the hyperterminal.

4-10- When completed, stop text capture, data is written in file by the host PC.

The same execution steps are repeated for the program tv.c only the executable and the memory address will be changed.

In this work, we have created a mixed hardware/software application, Fig.7, in which the software portion communicates directly with the hardware process using streams-based communications. We have shown the steps needed to generate the FPGA hardware, combine this with a soft processor and download the combined application to an actual FPGA prototyping board.



Fig. 6: Created design block diagram



Fig. 7: The created hardware plate form

#### Simulation Results

The created platform, with microblaze and peripherals, lets us validate the algorithm quickly and apply a wide variety of input programs simply by changing and recompiling the software application (the restoration program), which runs on the FPGA's embedded processor.

Armed with this software/hardware implementation, we can now proceed to optimize this application for performance, secure in the knowledge that the results will be fully testable at the process level, in actual hardware.

Images are collected by the CCD camera, transferred sequentially to the FPGA board, retrieved through the serial communication cable and transmitted to a distant personal computer to be explored for reconstruction.

Different test images and real neutron projections are tested by this FPGA implementation, collected in a host computer database, transmitted through a local network and observed in a distant computer. The root mean square error increases with the presence of neighboring abrupt changes in gray level values, and with the presence of many gray levels in one image. And vice-Versa, the pick signal to noise ratio (PSNR) increases.



Fig.8: a) original b) motion blurred c) blurred with noise



Fig.9: a) Restored with Tikhonov b)with TV(CG Algorithm c)with TV (Chambolle Algorithm)



Fig.10: a)original b)motion blurred c)blurred with noise d)Restored with Tikhonov e)restored with TV



Fig.11: a)Original image of an electrical relay, b)Hardly motion blurred image, c)Restored with Tikhonov, d) restored with TV



Fig.12: a)Original image of computer hard disk, b) Hardly motion blurred image, c)Restored with Tikhonov, d) restored with TV

Timing is an important metric when comparing hardware and software implementations. The timing results of image restoration (for both Algorithms) implemented on Xilinx Vertex-II FPGA (frequency:100 MHz) and on CPU-based PC (frequency:3.3GHz); indicate that for both test and neutron images, execution time is better in the PC but the embedding, reconfigurability and low cost characteristics are of the FPGA which make it the better choice for our application due to radiation constraints near the neutron imaging apparatus.

Finding a compromise between execution time and device usage is very important in hardware implementations, thus reaching up to 50% of device usage with a very short time for execution (less than milliseconds) is a very good work, suppose that the FPGA board is totally dedicated to this application.

## Conclusion

We have implemented an image restoration scheme on re-programmable hardware FPGA. The design has multiple configurations which support different restoration Algorithms, depending on the user requirements. The neutron radiography users appreciated results. This work will help us develop similar implementations for further image improvements using the same reconfigurable board.

#### REFERENCES

 S.Wong, L. Zaremba, D. Gooden and H. K. Huang. (1995). Radiologic Image Compression A Review, Proceeding of the IEEE, Vol. 83, No. 2. February 1995.

- [2] M.H. Hassan. (2009). Point Scattered Function (PSF) for Fast Neutron Radiography, Nuclear Instruments and Methods in Physics Research B. 267: 2545–2549.
- [3] J.Barzilai, and J. Borwein. (1988). Two-point step size gradient methods, IMA Journal of Numerical Analysis, 8, 141–148. 1988.
- [4] O. S. Tehrani, M. Ashourian and P. Moallem. (2010). An FPGA-Based Implementation of Fixed-Point Standard-LMS Algorithm with Low Resource Utilization and Fast Convergence, International Review on Computers and Software (IRECOS), Praise Worthy Prize (Publishing House), Italy, Vol. 5, No. 4, pp. 436-444, July 2010.
- [5] O.Laakkonen, K. Rauma, A. Penttinen, T. Härkönen, J. Luukko and O. Pyrhönen. (2006). Frequency Converter Control in single FPGA circuit, International Review of Electrical Engineering, IREE, Vol. 0. n. 0., pp.104–109.
- [6] J.Park. (2007). Neutron Scattering Correction Functions for Neutron Radiographic Images, PhD Thesis. University of Michigan; 2000.
- [7] D. K.Stand, M. Rudnicki. (2007). Regularization Parameter Selection In Discrete III–Posed Problems-The Use of The U– Curve, Int. J. Appl. Math. Comput. Sci., 2007, Vol. 17, No. 2, 157–164. DOI: 10.2478/v10006-007-0014-3.
- [8] L.Rudin, S. Osher and E. Fatemi. (1992). Nonlinear Total Variation based noise removal algorithms, Physica D., 60:259– 268, 1992.
- [9] A.Chambolle. (2004). An algorithm for Total Variation minimization applications, Journal of Mathematical Imaging and Vision, 20:89–97.
- [10] Xilinx, (January 2002), Virtex II Pro Platform FPGA Handbook, Ver 1.0 (available at www.xilinx.com).
- [11] J.Viejo, M.J. Bellido. (2006). Efficient Design and Implementation on FPGA of a MicroBlaze Peripheral for Processing Direct Electrical Networks Measurements, 1- 4244 – 0777-X/06.IEEE.2006.
- [12] H.Lim, K. You, W. Sung. (2006). Design and Implementation of Speech Recognition on a Softcore Based FPGA, 1-4244-0469-X/06/IEEE, School of Electrical Engineering, Seoul National University, Korea.2006.
- [13] D.Pellerin, S. Thibault. (2007). Practical FPGA Programming in C, Prentice Hall PTR, ISBN: 0-13- 154318-0. Pub date: 2007.
- [14] S. Saadi, M. Touiza and A. Guessoum. (2007). Embedded System for Image Encoder/Decoder Based on Fast Wavelet MALLAT's Algorithm: Application to Nuclear Imaging, IEEE, 4th International Conference on Computer Integrated Manufacturing CIP'2007, 3-4 November 2007, Setif, Algeria.

#### Acknowledgement

We would like to express our frank thanks to members of the neutron radiography and tomography imaging laboratory, division of techniques and applications around the Algerian Es-Salem Research Reactor for their kindness and for providing us with working materials. This work was also supported by the Ministry of Higher Education & Scientific Research (Algeria) under Project Number: J020282009005.

**Authors**: Slami SAADI, PhD student, Department of electronics, university saad Dahlab of Blida, Algeria,E-mail: <u>saadisdz@yahoo.fr</u>; prof.Abderrezak GUESSOUM, Department of electronics, university saad Dahlab of Blida, Algeria, E-mail: guessouma@hotmail.fr ; prof. Maamar BETTAYEB, Department of Electrical and Computer Engineering Sharjah university, UAE.Email: maamar@sharjah.ac.ae.