## Szymon SZCZĘSNY, Andrzej HANDKIEWICZ, Mariusz NAUMOWICZ, Michał MELOSIK

Faculty of Computing, Poznań University of Technology

doi:10.15199/48.2015.09.48

# **FPAA Accelerator for Machine Vision systems**

Abstract. This article presents a proposition of an FPAA-type programmable accelerator for image preprocessing. The structure of the accelerator is modelled basing on CPLD digital circuits. The innovation here – is using the current mode, which makes it possible to implement the accelerator in nanometre technologies. Another original solution proposed in the work is a reconfigurable multi-output current mirror. The article describes the hardware layer and a method for programming it. An implementation of an RGB-to-YCrCb colour space converter is presented. Moreover physical parameters obtained in post-layout simulations are presented as well. The solution can be used as a standalone programmable circuit or as an IP-core of a larger analogue-digital system.

Streszczenie. W artykule przedstawiono propozycję programowalnego akceleratora typu FPAA do wstępnej obróbki obrazu. Struktura akceleratora wzorowana jest na cyfrowych układach CPLD. Innowacyjność polega na wykorzystaniu trybu prądowego, co umożliwia realizację akceleratora w technologiach nanometrowych. Kolejnym oryginalnym rozwiązaniem zaproponowanym w pracy jest rekonfigurowalne wielowyjściowe zwierciadło prądowe. W artykule omówiono warstwę sprzętową oraz metodę jej programowania. Zaprezentowano implementację konwertera przestrzeni barw RGB do YCrCb w akceleratorze i przedstawiono parametry fizyczne uzyskane w symulacjach post-layoutowych. Rozwiązanie może być wykorzystane jako samodzielny układ programowalny lub IP-core większego systemu analogowo-cyfrowego. (Akcelerator FPAA dla systemów wizyjnych).

Keywords: reconfigurable circuit, colour space converter, hardware acceleration, vision system, FPAA, RGB, YcrCb. Słowa kluczowe: układ rekonfigurowalny, konwerter przestrzeni barw, akceleracja sprzętowa, system wizyjny, FPAA, RGB, YCrCb.

#### Introduction

The market of digital circuits was revolutionised in the 80s with the introduction of the FPGA-type reconfigurable circuits. Implementations of their analogue counterparts, the so-called FPAA, working in voltage modes - currently emerge in literature [1-5]. However, the miniaturisation trend in electronics has caused that it is not possible to implement the existing FPAA architectures in the modern nanometre technologies, which prevents its application in larger analogue digital systems, such as, often, vision systems. Authors decided to propose an FPAA architecture working in the current mode and implementable in the standard CMOS nanometre technology. It is worth mentioning that current techniques such as SI (switchedcurrent) are dedicated mainly to tasks of image processing because of the accuracy level they offer. Their common feature is low power consumption, which makes it possible to adapt SI processors to standalone small-sized systems.



Fig. 1. Image sensor based on the FPAA accelerator

Fig. 1. presents the possibility of adapting the accelerator to implementing a sensor based on using a photosensitive matrix. The accelerator proposed in this article can be used for implementing a variety of image processing algorithms obtained from the matrix. Thanks to the possibility of quick programming – it can perform a number of tasks on the image buffered in the analogue memory [6]. Chapter 2 describes the structure of a basic reprogrammable module, such as a reconfigurable current mirror. Chapter 3 explains the architecture of an accelerator consisting of cells working in the current mode and connected via programmable routing. Chapter 4 presents an example of an RGB-to-YCrCb colour space converter implementated in the accelerator.

#### Reconfigurable current mirror

The basic reconfigurable element of the accelerator is a multi-output programmable current mirror presented in fig. 2.



Fig. 2. Programmable current mirror

It has an analogue-digital interface. The analogue input *in* is copied by a *k*-output mirror CMk with unit scaling factors. Next, the processing line features rows consisting of pairs of D/A converters, architecture of which is based on *n*-output current mirrors CMn, controlled via *n*-bit code words. Structure of such a converter was presented in work [7]. A pair of converters is configured via a 2n-bit word, and the signal *out<sub>i</sub>* at the output of a given pair has the value of  $a_i \cdot b_i \cdot in_i$ , where  $a_i$  and  $b_i$  are the summary scaling factors of CMn converters. Transistors sizes in D/A converters have been calculated with the Hooke-Jeeves method [8] as to meet the equation (1), defining the range of acceptable values of coefficient.

(1) 
$$\alpha_i = a_i \cdot b_i, \quad \alpha_i \in \{0.02, ..., 4.2\}$$

It is worth mentioning that thanks to the odd number of mirrors in the signal processing line (from the level of analogue interface) – the circuit from fig. 2 behaves as a single current mirror. The 2kn bits constitute the digital

interface and are used to determine the value of the scaling factor of one of the *k* outputs. It is also worth mentioning that it is possible to implement the chosen scaling factor  $a_i$  with a minimum of two combinations of coefficients  $a_i$  and  $b_i$  (excluding the maximum scaling factor which is equal to  $a_i b_i$ ). The number of unique scaling factors which can be implemented in a reconfigurable mirror equals:

(2) 
$$2^{2n} - \sum_{i=1}^{2^n - 1} i$$

The remaining coefficients are redundant. Redundancy is not a disadvantage but a positive feature of this implementation because thanks to it, it is possible to choose bit combinations of *B* vectors which reflect the given scaling factor with a lower error. Choosing a bit vector is done basing on an earlier generated grid of redundant solutions. The grid is calculated basing on a post-layout simulation of a mirror loaded with a pair of complementary transistors in a diode connection for all combinations of the *B* bit vector for positive currents *outp* as well as negative *outm* at the output of the mirror. Basing on the grid the assignment of scaling factors for single bit combinations has been defined:

(3) 
$$B_j \leftarrow \frac{outp_j - outm_j}{2 \cdot |in|}, \quad j \in \{1, 2, 3, ..., 2^{2n}\}$$

where *in* is the input current, in both cases. The selection criterium for the given  $B_j$  vector is the lowest reflection error of the scaling factor. Table 1. presents post-layout parameters of an 8-output implementation of the mirror from figure 2. Any scaling factor from range (1) can be implemented with an error is less than or equal to 2.24%.

Table 1. Parameters of a reconfigurable current mirror

|              | Value                            |                       |
|--------------|----------------------------------|-----------------------|
| Architecture | Technology / Power supply        | 180 nm / 1.8          |
|              |                                  | V                     |
|              | Number of outputs (k)            | 8                     |
|              | Number of programming bits       | 96                    |
|              | (2kn)                            |                       |
|              | Number of unique scaling factors | 2080                  |
| Layout       | Surface area                     | 21500 um <sup>2</sup> |
|              | Power consumption                | 2.29 mW               |
|              | Max. frequency                   | 12 MS/s               |
|              | Range of input and output        | 0 -10 uA              |
|              | signals                          |                       |
|              | Range of the scaling factor      | 0.02 - 4.1986         |
|              | Biggest factor error             | 2.24%                 |
|              | Average factor error             | 0.0657%               |

### **Programmable array**



Fig. 3. Layout of a programmable array

Architecture of the proposed FPAA accelerator is based on CPLD digital circuits. The simplified layout of the accelerator is presented in fig. 3. The central point of the circuit is the so-called global interconnection matrix. This block contains digitally programmable nodes  $v_i$ , to which it is possible to connect input and output signals from currentmode cells, as well as input and output ports, both via keys. The programmable keys are implemented with a pair of complementary CMOS transistors. The accelerator works as a balanced structure. In order to ensure a symmetrical answer of cells added to the routing block, Common Mode Rejection Ratio (CMRR) [9] circuits are used, presented in literature [10]. It constitutes an intermediate step in transferring data between the block of connections and the current cells. The symmetry of work is also ensured by choosing nodes of the not negated answer p and the negated *m* according to rule (4):

(4) 
$$\{p_i, m_i\} \leftarrow \{v_{\frac{n}{2}+1-i}, v_{\frac{n}{2}+i}\}, i \in \{1, 2, 3, ..., n\}$$

where n is the number of nodes available in the accelerator. The complete circuit can be implemented as a programmable integrated circuit containing cells of reconfigurable mirrors described in chapter 2, integrators, memories and other current cells or as a so-called IP-core, being a part of a topography of a larger analogue-digital system. The next chapter presents using the circuit as an IP-core consisting of three pairs of reconfigurable mirrors.

## **RGB to YCRCB converter**

A photosensitive matrix provides signals in an analogue form, therefore it is cost-effective to implement image preprocessing using analogue circuits [11], which reduces the complexity of the vision system [12, 13]. One of the basic operations performed on an image from a photosensitive matrix is its conversion to the YCrCb space, in which it is described with the luminance and chrominance components [6]. The classic RGB representation does not reflect the way of perceiving image with the human eye, which is more sensitive to changes of light than colour. Using the classic representation is therefore ineffective due to the redundancy of information [14]. Conversion to the YCrCb space gives wider possibilities in image processing, which can be represented using the luminance information stored using high-resolution data and compressed data about chrominance. Transformation of the RGB space into the YCrCb space is expressed with (4):

$$\begin{bmatrix} \pm Y \\ \pm Cr \\ \pm Cb \end{bmatrix} = \begin{bmatrix} 0.299 & 0.587 & 0.114 \\ 0.5 & -0.419 & 0.081 \\ -0.169 & -0.331 & 0.5 \end{bmatrix} \begin{bmatrix} \mp R \\ \mp G \\ \mp B \end{bmatrix}$$

Implementation of the above transformation in the current technique is done using three-output current mirrors with scaling factors corresponding to the columns of the coefficients matrix. Operations of addition are done in the nodes according to the Kirchhoff's current law. Fig. 4. presents the layout of connections of mirrors working in the balanced structure. implementing the described transformation. The negative values of coefficients are obtained by symmetrically replacing input nodes. Fig. 5. presents the layout of a sample IP-core architecture described in chapter 3. Table 2. presents the comparison of answers of the circuit with theoretical values for a sample input combination. IP-core power consumption equals 13.86 mW. Work speed of the circuit equals 4 MS/s.







Fig. 5. Layout of an IP-Core for RGB->YCbCr converter consisting of 3 pairs of 8-output current mirrors, routing with 32 nodes and a CMRR core in the centre

Table 2. RGB->YCrCb converter response

| YCrCb from eq. (3) | YCrCb from post-layout simulation             |
|--------------------|-----------------------------------------------|
| [µA]               | [µA]                                          |
| 2                  | 1.943                                         |
| 0.324              | 0.315                                         |
| 0                  | 0.041                                         |
|                    | YCrCb from eq. (3)<br>[μΑ]<br>2<br>0.324<br>0 |

# Summary

The article presents a proposition of a hardware programmable accelerator for vision sensors. It is an answer to the problem of adapting FPAA-type solutions in modern nanometre technologies. The circuit is also useful thanks to its structure which is easy to modify. Therefore it can be used as a standalone integrated circuit, as well as an element of a topography of a larger system. The described example of implementation of a colour space converter in the accelerator shows possibilities of its application in sensor techniques. Time of image conversion for the QVGA standard equals 19.2 ms, for HVGA 38.4 ms and for VGA 76.8 ms.

#### REFERENCES

- [1] Brink S., Hasler J., Wunderlich R., Adaptive Floating-Gate Circuit Enabled Large-Scale FPAA, Very Large Scale Integration (VLSI) Systems, *IEEE Transactions on*, vol. 22, Issue 11, 2307-2315, 2014
- [2] Schlottman C.R., Petre C., Hasler P.E., A High-Level Simulink-

Based Tool for FPAA Configuration, Very Large Scale Integration (VLSI) Systems, *IEEE Transactions on*, Vol. 20, Issue 1, 10-18, 2012

- [3] Nease S., George S., Hasler P., Koziol S., Brink S., Modeling and Implementation of Voltage-Mode CMOS Dendrites on a Reconfigurable Analog Platform, *IEEE Transactions on Biomedical Circuits and Systems*, Volume: 6, Issue: 1, 76-84, 2012
- [4] Pankiewicz B., Wojcikowski M., Szczepanski S., and Yichuang S., A Field Programmable Analog Array for CMOS Continuous-Time OTA-C Filter Applications, *Journal of Solid-State Circuits*, vol. 37, no 2, 2002
- [5] Kutuk H. and Kang S.M., A field-programmable analog array (FPAA) using switched-capacitor technique, in *Proc. IEEE Int. Symp. Circuits and Systems*, vol. 4, 41-43, May 1996
- [6] Handkiewicz A., Mixed-Signal Systems: A Guide to CMOS Cicuit Design, John Wiley and Sons, 2002
- [7] Naumowicz M., Szczęsny Sz., Handkiewicz A., 6-bitowy przetwornik C/A małej mocy w technice przełączanych prądów, *Elektronika: konstrukcje, technologie, zastosowania*, 134-136, nr 9, 2013
- [8] Hooke R., Jeeves T.A., Direct search' solution of numerical and statistical problems, J. Assoc. Comp, 8(2), 212-229, 1961
- [9] Giustolisi G., Palmisano G., Palumbo G., CMRR Frequency Response of CMOS Operational Transconductance Amplifiers, *IEEE Transactions on Instrumentation and Measurement*, vol. 49, no. 1, 2000
- [10] Śniatała P., Handkiewicz A., Naumowicz M., Szczęsny Sz., Melosik M., Katarzyński P., Kropidłowski M., Switched Current Sigma-Delta Modulator with a New Comparator Structure

Designed Based on VHDL-AMS Description, International Journal of Electronics and Telecommunications, 391-396, vol. 59, issue 4, 2013

- [11] Vittoz E., Analogue VLSI signal processing: Why, where and how, Analog Integr. Circ. S. 6, 27-44, 1994
- [12] Ahirwal B., Khadtare M., and Mehta R., FPGA based system for colour space transformation RGB to YIQ and YCbCr, *Proc. Int. Conf. on Intelligent and Advanced Systems*, 1345-1349, Kuala Lumpur, 2007
- [13] Li S.-A., Chen C.-Y., Chen C.-H., Design of a shift-and-add based hardware accelerator for colour space conversion, J. *Real-Time Image Proc.*, http://dx.doi.org/10.1007/ s11554-013-0324-7, 1999
- [14] Kimo K. and In-Cheol P., Combined image signal process-ing for CMOS image sensors, *IEEE Symp. on Circuits and Systems*, 3185-3188, Island of Kos, 2006

Authors: Ph. D. Szymon Szczęsny, Poznań University of Technology, Faculty of Computing, Piotrowo 3A, 60-965 Poznań, szymon.szczesny@put.poznan.pl; E-mail: prof. Andrzei Handkiewicz (Ph. D.), Poznań University of Technology, Faculty of Computing, Piotrowo 3A, 60-965 Poznań, E-mail: Andrzej.Handkiewicz@put.poznan.pl; M. Sc. Mariusz Naumowicz, Poznań University of Technology, Faculty of Computing, Piotrowo 3A, 60-965 Poznań; M. Sc. Michał Melosik, Poznań University of Technology, Faculty of Computing, Piotrowo 3A, 60-965 Poznań, E-mail: michal.melosik@put.poznan.pl.