# HARDWARE IMPLEMENTATION OF A LOW COST MATH MODULE USING MULTIFUNCTIONAL REGISTERS WITH DECODED MODE INPUTS 

Grigore Mihai TIMIS*, Alexandru VALACHI*<br>*Technical University "Gh.Asachi"Iasi, Faculty of Automatic Control and Computer Engineering (e-mail:mtimis@tuiasi.ro, avalachi@tuiasi.ro).


#### Abstract

In the present paper, we propose a low cost algorithm of a math module and the implementation using multifunctional registers with decoded mode inputs. The proposed math module algorithm will be implemented using the transition matrix method. According with taxonomy of the algorithms, we use the functional iteration one. It is found in specific literature that it can provide the lowest latency and greatest reliability. Compared with CORDIC math module which is based on the hardware iteration algorithm with design implemented in FPGA (which is more expensive and slow than a dedicate hardware), our proposed math module algorithm use less hardware, means the chip area is minimized, working at a high speed rate. There, will be proved that implementation of the digital automaton can be reduced to a combinational one, this will lead to the economical implementation.


Keywords: math module algorithm, automaton execution elements, multifunctional registers, finite state machine.

## 1. INTRODUCTION

In today devices, it is well known that Processor Floating Point Units - FPU, which handle huge amount of data, are used at the maximum capacity. Based on this fact, we propose an optimal low cost method for a hardware implementation of a math module between two unsigned integer numbers. This represents a small part from the FPU's logic. The mathematical computations can be found inside of Intel 80x86, AMD 80x86, Sun, Sparc, IBM logic cores architectures.

The calculation operations of the FPU processor must be always done in fast and precise way to avoid system crash and errors with the state-of-the-art of the computer technology.

The proposed algorithm is based on functional iterations, using only simply counting and decrementing operations. Based on specific literature (Premys et al., 2011), (Yi-Jun et al., 2011), (Ranjan Kumar et al., 2016), (Mihailov et al., 2011), (Morris et al., 2006), (Morris et al., 2005), (Skylarov et al., 2005), it is found that for low-cost implementations where chip area must be minimized, the iteration algorithms are suitable.

It is well known that although is used an optimal algorithm, the resulted synthesis can be non-optimal, that is why the pipelined combinational logic will be optimised.

The detailed outline of the paper: Section II, details about the binary matrices operations; Section III, describes the state diagram functional representation based on the transition matrix method; Section IV,
outlines the global synthesis for the math module, experimental results and implementation costs; Section V, Further work; Section VI, final remarks.

## 2. BINARY MATRICES OPERATIONS

We define the transition matrix, noted with $T_{r_{-} \text {matrix }}=\left(t_{i j}\right)_{M x M}$ which is build using the state machine diagram, where M represents the numbers of machine states.

The pure binary codification for distinct operations is used like in (Valachi et al., 2010), (Morris et al., 2006). In order to obtain the transition matrix coefficients, it's considered the next operations:

- Count Up, $t_{i+1, i}=1$, makes the transition from state $i$ to state $i+1$
- Count Down, $t_{i-1, i}=1$, makes the transition from state $i$ to state $i-1$
- Hold, $t_{i i}=1$, the automaton remains in state $i$, makes the transition from state $i$ to state $i$ means that remains in the same state
- Shift, $t_{i j}=1$, if $i=2 \cdot j$, with $(2 \cdot j \leq M)$ Shift Left or if $i=\left[\frac{j}{2}\right]$ - Shift Right
- Reset, $t_{1 j}=1, j \neq 1$, makes the transition from state $j$ to initial state $i=1, i \neq j$
- Parallel Load, $\quad t_{i j}=1$, $i \notin\left\{1, j+1, j-1,2 \cdot j,\left[\frac{j}{2}\right], j\right\}, \quad$ makes the transition from state $j$ to state $i$ where [x] represents $x$ integer part.

Considering $C=\left(C_{i j}\right)_{p x p}$, a binary matrix, we note with $\bar{C}=\left(\overline{c_{i j}}\right)_{p x p}$, the complement of that matrix.

For example:

$$
\text { (1) } C=\left[\begin{array}{ll}
1 & b \\
c & 0
\end{array}\right], \bar{C}=\left[\begin{array}{ll}
0 & \bar{b} \\
\bar{c} & 1
\end{array}\right]
$$

For the two binary matrices logic multiplication with the same dimension, we propose the following matrices $B=\left(b_{i j}\right)_{m \times n}, \quad C=\left(c_{i j}\right)_{m \times n}$. The logic
multiplication result noted with $R=B \cdot C=\left(b_{i j} \cdot c_{i j}\right)_{m \times n}$

Ex. $B=\left[\begin{array}{ll}z & 1 \\ d & 0\end{array}\right], C=\left[\begin{array}{ll}x & s \\ 1 & e\end{array}\right]$
$B \cdot C=\left[\begin{array}{cc}z \cdot x & s \\ d & 0\end{array}\right]$

### 2.1. The two-matrices multiplication product algorithm

For the two matrices multiplication, let us consider $S=\left(s_{i j}\right)_{m x p}, L=\left(l_{i j}\right)_{p x n}$

Considering these two matrices, by multiplication of them, it will define the W matrix, as: $\mathrm{W}=\mathrm{S} \otimes \mathrm{L}=\left(\mathrm{c}_{\mathrm{ij}}\right)_{\mathrm{mxn}}$, where $\quad \mathrm{c}_{\mathrm{ij}}=\sum_{k=1}^{p} s_{i k} \cdot l_{k j}, \quad \Sigma-$ represents the logic adder.

For example:

$$
\begin{gathered}
\text { (2) } \begin{array}{c}
S=\left[\begin{array}{lll}
x & 1 & a \\
0 & b & 1
\end{array}\right], L=\left[\begin{array}{ll}
1 & 1 \\
a & u \\
c & 0
\end{array}\right] \\
W=\left[\begin{array}{cc}
x+a+a \cdot c & x+u+0 \\
0+a \cdot b+c & 0+b \cdot u+0
\end{array}\right]= \\
=\left[\begin{array}{cc}
x+a & x+u \\
a \cdot b+c & b \cdot u
\end{array}\right]
\end{array}=\gg 又 力
\end{gathered}
$$

## 3. THE STATE DIAGRAM FUNCTIONAL REPRESENTATION BASED ON THE TRANSITION MATRIX METHOD

Considering a digital automaton that compute the math module for two unsigned integers numbers: $x=x[7: 0], y=y[7: 0]$, where $x$ represents the first operand and $y$ represents the second one. The processing results will be $Q=Q[7: 0]$.

The following assignments were used: $x N u l l=(x=0)$ and $y N u l l=(y=0)$.

This algorithm can be used with good results for any operand size $-8,16,32$ bits.

The logic diagram with the iteration operation algorithm:

Algorithm description - math module $|x-y|$ :
Step 1: Load $x=x[7: 0], y=y[7: 0]$;
Step 2: If x is not zero $(x \mathrm{Null}=0)$ and $y$ is not zero ( $y$ Null $=0$ ) then go to Step 3;
Step 3: $x=x-1, y=y-1$;
Step 4: If $x$ is zero or $y$ is zero, go to Step 5;
Step 5: If $x$ value is zero then the result is stored in $y$;
Step 6: If $y$ value is zero, then the result is stored in $x$;
Step 7: Read the results;
Step 8: Stop algorithm.
Fig. 1. Logical description of the math module algorithm

### 3.1. Algorithm description

After loading two operands $x=x[7: 0]$ and $y=y[7: 0]$, the algorithm will start: if $x N u l l$ and $y N u l l$ are not active, means operands are not zero, the following algorithm will be executed: $x$ value and $y$ values are decrementing in parallel, till one of them becomes zero. If the $x$ value becomes null, this means the math module result is stored in $y$. Also, if the $y$ value becomes null, this means the math module result is stored in $x$.

Observation: the logical description steps from the multiplication algorithm - Fig. 1 are not the same as the states from functional organizational chart Fig.3, but the proposed algorithm is the same.

### 3.2. Description of the Automaton Execution Elements (EEA)

The logic circuits used are the multifunctional registers with decoded mode signals and synchronized with a clock signal. The imposed priority levels are: Reset has the highest priority level, increment/decrement has the lower priority level. The input command signals $\overline{L D x}, \overline{L D y}$ are asynchronous and has higher priority level than the other signals.
$R_{x}$ register functionalities: asynchronous store the $x=x[7: 0]$ operand, load of data $D[7: 0]$, synchronous reset and decrement on the positive edge of the $h_{1}$. The signal $x N u l l=1$ shows that the content of the $R_{x}$ register is null.
$R_{y}$ register functionalities: asynchronous store the $y=y[7: 0]$ operand, load of data $D[7: 0]$, synchronous reset and decrement on the positive edge of the $h_{1}$. The signal $y N u l l$ shows if the content of the register is null.


Fig. 2. Automaton Execution Elements - EEA for the math modulo algorithm

The D Flip-Flop RDYP functionalities: at the end of the processing, will provide the RDYP signal.

The reset of the CBB-D is done using the hardware reset $\overline{(R E S=0)}$ or at the reading of the result $\overline{(\text { OUTR }=0)}$.

The Three State Control Buffer (TSC) BUFOUT: using two consecutive READ operations it allows the transfer of the result.

ST Start Signal: is set only after the loading command of two data operands was received. On asynchronous hardware reset or after the final result was read, it is deactivated.

Considering the states attached code $y_{n}=\left\{y_{2} y_{1} y_{0}\right\}_{n}$, as:

$$
y_{n}=000 \leftrightarrow s_{0}, 00 \leftrightarrow \leftrightarrow s_{1}, 010 \leftrightarrow s_{2}, 01 \leftrightarrow s_{3}, 100 \leftrightarrow s_{4}, 10 \leftrightarrow \leftrightarrow s_{5}
$$

The code matrix, noted with C, (3):
(3) $C=\left[\begin{array}{llllll}0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1\end{array}\right]$
$D_{i}$ represents the $i$ range input data in current states code register and is used in order to obtain the column vector [2], fig. 4.

The modules algorithm description:


Fig. 3. Functional organizational chart
$\left[\begin{array}{l}y_{0} \\ y_{1} \\ y_{2}\end{array}\right]_{n+1}=\left[\begin{array}{l}D_{0} \\ D_{1} \\ D_{2}\end{array}\right]=(C \otimes T) \otimes S, \quad \mathrm{~S} \quad$ represents the
column vector, $S=\left[\begin{array}{c}s_{0} \\ s_{1} \\ \cdot \\ \cdot \\ \cdot \\ s_{5}\end{array}\right]$
From the functional organizational chart, it is deducted the transition matrix, (4):
(4) $T=\left[\begin{array}{cccccc}\overline{S T} & 0 & 0 & 0 & 0 & \overline{S T} \\ S T & 0 & 1 & 0 & 0 & 0 \\ 0 & \overline{(x=0)+(y=0)} & 0 & 0 & 0 & 0 \\ 0 & (x=0)+(y=0) & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & S T\end{array}\right]$

After a series of successive computing, is obtained relations (5):

$$
\begin{aligned}
& \text { (5) } C \otimes T=\left[\begin{array}{cccccc}
S T & (x=0)+(y=0) & 1 & 0 & 1 & S T \\
0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 1 & S T
\end{array}\right] \\
& {\left[\begin{array}{l}
y_{0} \\
y_{1} \\
y_{2}
\end{array}\right]=(C \otimes T) \otimes S=} \\
& =\left[\begin{array}{c}
S_{0} \cdot S T+S_{1} \cdot[(x=0)+(y=0)]+S_{2}+S_{4}+S_{5} \cdot S T \\
S_{1} \\
S_{3}+S_{4}+S_{5} \cdot S T
\end{array}\right]
\end{aligned}
$$

The relations from (5) represents the SLC1 (Combinational Logic) equations. The architecture for the hardwired sequencer, implemented with codified sequences, is shown in figure 4. The command signals relations are presented in equations (6).
(6) $D E C_{x}=D E C_{y}=S_{2}$

$$
\overline{O U T_{y}}=S_{3} \cdot(x=0)
$$

$$
\overline{O U T_{x}}=S_{3} \cdot(\overline{x=0})
$$

$$
\text { SETRDYP }=s_{4}
$$



Fig. 4. Hardwired Sequencer
The final implementation cost charts for the SLC2 will not taken into account, because this synthesis remains the same in every design implementation.

The first proposed implementation of design from fig. 4 is like in fig. 5.

The following relations (7), are deducted from (3):

$$
\text { (7) } \begin{aligned}
y_{0} & =S_{0} \cdot S T+S_{1} \cdot[(x=0)+(y=0)]+S_{2}+S_{4}+S_{5} \cdot S T \\
y_{1} & =S_{1} \\
y_{2} & =S_{3}+S_{4}+S_{5} \cdot S T
\end{aligned}
$$

The total cost of the SLC1 represents the total number of logic gates multiplies by the inputs. Thus, the implementation is calculated as in equation (8):
(8) $\quad C_{1}\left(S L C_{1}\right)=C \cdot\left(D_{0}\right)+C \cdot\left(D_{1}\right)+C \cdot\left(D_{2}\right)=$ $=14+0+5=19$
4. MATH MODULE COMPLEX AUTOMATON SYNTHESIS USING MULTIFUNCTIONAL REGISTERS, IMPLEMENTATION COSTS AND EXPERIMENTAL RESULTS

The novelty of the proposed method represents the separate synthesis for the digital logic system which generates the $\overline{S R}$ reset signal, $\overline{P L}$ parallel load
signal, $I N C$ increment signal, $D E C$ decrement signal etc.

All the logic function has the corresponding transition matrices $T R, T P, T I$.

Relation involved:

$$
\text { (9) } \quad T(F)=I(F) \cdot T
$$

Operation identification matrix is noted with $I(F)$.
Priority orders: Rest - high priority, $I N C / D E C$ low priority.

The validation matrices for Reset, Parallel Load, Increment operations are noted with $I R, I P, I I$, where $\Phi$ symbol means indifferent values (nor 0 logic or 1 logic), are shown in (10).

$$
I R=\left[\begin{array}{llllll}
0 & 1 & 1 & 1 & 1 & 1 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0
\end{array}\right]
$$

$$
\begin{gathered}
I P=\left[\begin{array}{llllll}
0 & \Phi & \Phi & \Phi & \Phi & \Phi \\
0 & 0 & 1 & 1 & 1 & 1 \\
1 & 0 & 0 & 1 & 1 & 1 \\
1 & 1 & 0 & 0 & 1 & 1 \\
1 & 1 & 1 & 0 & 0 & 1 \\
1 & 1 & 1 & 1 & 0 & 0
\end{array}\right] \\
I I=\left[\begin{array}{llllll}
0 & \Phi & \Phi & \Phi & \Phi & \Phi \\
1 & 0 & \Phi & \Phi & \Phi & \Phi \\
\Phi & 1 & 0 & \Phi & \Phi & \Phi \\
\Phi & \Phi & 1 & 0 & \Phi & \Phi \\
\Phi & \Phi & \Phi & 1 & 0 & \Phi \\
\Phi & \Phi & \Phi & \Phi & 1 & 0
\end{array}\right]
\end{gathered}
$$



Fig. 5. Hardwired sequencer

- II is computed as: $t_{i i}=0, t_{i k}=1$ for $i=k+1$ - for hold state and $t_{i j}=\Phi$ for the rest (increment operation has the lowest priority order).
- $\quad I P$ is computed as: $t_{1 k}=\Phi$, for $k \neq 1$ (reset operation has highest priority), $t_{i i}=0, t_{i k}=1$ for $i \neq k+1$
$I R \quad$ is computed as: $t_{1 k}=1$, for $k \neq 1$, transitions on first state.

The specific transition matrices are shown in (11), (12), (13):
(11) $T R=I R \cdot T=\left[\begin{array}{cccccc}0 & 0 & 0 & 0 & 0 & \overline{S T} \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0\end{array}\right]$
(12) $T P=I P \cdot T=$

$$
\left[\begin{array}{cccccc}
0 & 0 & 0 & 0 & 0 & \overline{S T} \cdot \Phi \\
0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & (x=0)+(y=0) & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0
\end{array}\right]
$$

(13) $T I=I(I N C) \cdot T=$

$$
=\left[\begin{array}{cccccc}
0 \cdot \overline{S T} & 0 & 0 & 0 & 0 & \overline{S T} \cdot \Phi \\
S T & 0 & 1 \cdot \Phi & 0 & 0 & 0 \\
0 & 1 \cdot[\overline{(x=0)+(y=0)}] & 0 & 0 & 0 & 0 \\
0 & {[(x=0)+(y=0)] \cdot \Phi} & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0
\end{array}\right]
$$

For the $\overline{S R}, \overline{P L}, I N C$ relations it will be used the line vector [111111].

In order to simplify the computations of the column vector sum, is used the following relation: $[T(F) \otimes S]$.
(14) $\overline{\mathrm{SR}}=\overline{[111111] \otimes[\mathrm{TR} \otimes \mathrm{S}]}=\overline{s_{5} \cdot \overline{S T}}$

$$
\overline{\mathrm{PL}}=\overline{[11111] \otimes[\mathrm{TP} \otimes \mathrm{~S}]}=
$$

$$
=\overline{S_{5} \cdot \overline{S T} \cdot \Phi+S_{2}+S_{1} \cdot[(x=0)+(y=0)]}=
$$

$$
=\overline{S_{2}+S_{1} \cdot[(x=0)+(y=0)]}
$$

$$
\mathrm{INC}=[111111] \otimes[\mathrm{TI} \otimes \mathrm{~S}]
$$

The transition matrices for the $I N C$ function, (15):

$$
\text { (15) } T I \otimes S=\left[\begin{array}{c}
S_{5} \cdot \overline{S T} \cdot \Phi \\
S T \cdot S_{0}+S_{2} \cdot \Phi \\
S_{1} \cdot[\overline{(x=0)+(y=0)}] \\
S_{1} \cdot[(x=0)+(y=0)] \cdot \Phi \\
S_{3} \\
S_{4}
\end{array}\right]
$$

The final $I N C$ equation:

$$
\begin{align*}
& \mathrm{INC}=[111111] \otimes[\mathrm{TI} \otimes \mathrm{~S}]=  \tag{16}\\
& =\mathrm{s}_{5} \cdot \overline{\mathrm{ST}} \cdot \Phi+\mathrm{s}_{0} \cdot S T+S_{2} \cdot \Phi+S_{1}+S_{3}+S_{4}
\end{align*}
$$

In order to result an optimal expression, the values for $\Phi$ were chosen as preferable, equation (17):
(17) $\quad I N C=\mathrm{s}_{0} \cdot S T+S_{1}+S_{3}+S_{4}$

For the next transition sequence, there are deducted the following equation, (18):
(18)
$\left[\begin{array}{l}d_{0} \\ d_{1} \\ d_{2}\end{array}\right]=[C \otimes T P] \otimes S,\left[\begin{array}{l}\overline{d_{0}} \\ \frac{d_{1}}{d_{2}}\end{array}\right]=[\bar{C} \otimes T P] \otimes S$

The code sequences matrix and it's complement are noted with $C, \bar{C}$, (19):
(19) $\left[\begin{array}{l}d_{0} \\ d_{1} \\ d_{0}\end{array}\right]=\left\{\left[\begin{array}{llllll}0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1\end{array}\right] \otimes \mathrm{T} \Pi\right\} \otimes$
$=\left[\begin{array}{c}S_{1} \cdot[(x=0)+(y=0)]+S_{2} \\ S_{1} \cdot[(x=0)+(y=0)] \\ 0\end{array}\right]$
The optimal implementation, (20):

$$
\begin{align*}
& d_{0}=S_{1} \cdot[(x=0)+(y=0)]+S_{2} \\
& d_{1}=S_{1} \cdot[(x=0)+(y=0)]  \tag{20}\\
& d_{2}=0
\end{align*}
$$

The implementation of the hardwired sequencer is shown in figure 7.

The implementation cost for the SLC1, (21):

$$
\begin{align*}
& C_{1}\left(S L C_{1}\right)=C(\overline{S R})+C(\overline{P L})+ \\
& C(I N C)+C\left(d_{3} d_{2} d_{1} d_{0}\right)=  \tag{21}\\
& =2+6+6+6=20
\end{align*}
$$



Fig. 6. Hardwired sequencer


Fig. 7. Implementation of the hardwired sequencer

## FUTURE WORK

This implies the low cost implementation and synthesis for a multifunctional digital device with arithmetic and logic operations. This will simulate a relative huge portion from the FPU wafer. Also, we will study timing in digital systems because fault free
digital circuits may malfunction when asynchronous inputs have critical timing combinations.

## 5. CONCLUSIONS

According with taxonomy of the algorithms, we use the functional iteration one. It is found in specific literature (Mihailov et al., 2010), (Teodorescu et al., 2010), (Valachi et al., 2010), (Morris et al., 2006), (Morris et al., 2005), (Skylarov et al, 2005), (Peng et al., 1987), (Ursaru et al., 2009), (Rodriguez et al., 2008) that it can provide the lowest latency and greatest reliability.

For the proposed algorithm, we show in our paper that the synthesis with multifunctional registers simplify the FPU digital hardware logic. Balanced with the math module algorithms and synthesis methods available in references, our iterations based algorithm works fine with integers numbers with $8,16,32$ bits, this means low cost implementation and reliability. For 64bits integer numbers, due latency, a variable number of digital slices should be used accompanying by a digital arbiter.

Based on available papers listed in the references section, we proved that our research is an actual one.

Compared with CORDIC math module (Muhammad et al., 2013) which is based on the hardware iteration algorithms with design implemented in FPGA (which is more expensive and slow than a dedicate hardware), our proposed math module algorithm use less hardware, means the chip area is minimized and works at a high speed rate for the integers numbers with $8,16,32$ bits. There, was proved that implementation of the digital automaton can be reduced to a combinational one, which leads to the economical implementation.

As a final conclusion, we proposed two methods for synthesis the digital device: first proposed method has a smaller implementation cost $-\mathrm{C}=19$, than the second proposed method $-\mathrm{C}=20$ and those described in specific literature from references. This leads to a small number of the logic gates and digital logic that is used. Low cost means the FPU logic core is much faster and the responses timing are short. Moreover, it's about green architectures which means less power consumed.

## 6. REFERENCES

Ranjan Kumar Barik; Itishree Samal; Manoranjan Pradhan, "Efficient hardware realization of signed arithmetic operation using IEN" in Power, Communication and Information Technology Conference (PCITC), 2015 IEEE, 15-17 Oct. 2015, DOI: 10.1109/PCITC.2015.7438171, INSPEC Accession Number: 15885905, Date Added to IEEE Xplore: 24 March 2016, IEEE Conference Publications.
Muhammad Nasir Ibrahim; Chen Kean Tack; Mariani Idroas; Siti Noormaya Bilmas; Zuraimi Yahya, "Hardware Implementation of Math Module Based on CORDIC Algorithm Using FPGA", in International Conference on Parallel and Distributed Systems, 2013, Pages: 628 632, DOI: 10.1109/ICPADS.2013.112, IEEE Conference Publications.
D. Yi-Jun, B. Zhuo, "CORDIC algorithm based on FPGA", Journal of Shanghai University, vol. 15, no. 4, pp 304-409, Aug 2011.
Premysl Sucha; Zdenek Hanzalek; Antonirn Hermanek; Jan Schier, "Efficient FPGA Implementation of Equalizer for Finite Interval Constant Modulus Algorithm" in Industrial Embedded Systems, 2011. IES '06. International Symposium on Industrial Embedded Systems, 18-20 Oct. 2011, DOI: 10.1109/IES.2006.357480, INSPEC Accession Number: 9551929, Date Added to IEEE Xplore: 07 May 2011, IEEE Conference Publications.
Dmitri Mihailov; Valery Sklyarov; Iouliia Skliarova; Alexander Sudnitson, "Parallel FPGA-Based Implementation of Recursive Sorting Algorithms", in 2010 International Conference on Reconfigurable Computing and FPGAs, Date of Conference: 13-15 Dec. 2010, Date Added to IEEE Xplore: 20 January 2011, INSPEC Accession 11791744, DOI: 10.1109/ReConFig.2010.30,Publisher: IEEE.

Horia-Nicolai Teodorescu, Mircea Hulea, "Improving time measurement precision in embedded systems with a hybrid measuring method", in Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), 2010 IEEE 6th International Conference, Volumul 1, Pag. 59-64, Editor IEEE.
Al. Valachi, M.Timis, S.Tarcau, B.Aignatoaie, "Orders Priorities Settings Criteria for Multifunctional Registers" in Electronics and Electrical Engineering. Intl. Journal of Electronics and Telecommunications Kaunas: Technologija, 2010.
Ovidiu Ursaru, Cristian Aghion, Mihai Lucanu, Liviu Tigaeru, "Pulse width Modulation Command Systems Used for the Optimization of Three Phase Inverters", Advances in Electrical and Computer Engineering Journal. Suceava, Romania, 2009, pag.22-27.
M.R.D. Rodrigues; J.H.P. Zurawski; J.B. Gosling, "Hardware evaluation of mathematical functions", in Computers and Digital Techniques - Volume: 128, Issue: 4, Date of Publication: 11 November 2008, Page(s): $155-164$, Print ISSN: 0143-7062, DOI: 10.1049/ip-e:19810029, Published in: IEE Proceedings.
Lin Yuan, Gang Qu, Villa, T., SangiovanniVincentelli,A., "An FSM Reengineering Approach to Sequential Circuit Synthesis by State Splitting", in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(Volume:27,Issue: 6 ), may 2008, pages 1159-1164.
Gerald R. Morris; Viktor K. Prasanna; Richard D. Anderson, "A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer", in 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Date of Conference: 24-26 April 2006, Date Added to IEEE Xplore: 11 December 2006, INSPEC Accession Number: 9274737, DOI: 10.1109/FCCM.2006.8.
G.R. Morris; V. K. Prasanna, "An FPGA-based floating-point Jacobi iterative solver", in 8th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN'05), Date of Conference: 7-9 Dec. 2005, Date Added to IEEE Xplore: 16 January 2006, Print ISBN: 0-7695-2509-1, INSPEC Accession Number: 8846596, DOI: 10.1109/ISPAN.2005.18, Publisher: IEEE.
V. Skylarov; I. Skilarova; B. Pimentel, "FPGA-based implementation and comparison of recursive and iterative algorithms", in International Conference on Field Programmable Logic and Applications, 24-26 Aug. 2005, Date Added to IEEE Xplore: 10 October 2005, INSPEC Accession Number: 8813928, DOI: 10.1109/FPL.2005.1515728, Publisher: IEEE.

Victor Peng, Sridhar Samudrala, Moshe Gavrielov., Sangiovanni- Vincentelli, A., "On the implementation of shifters, multipliers, and dividers in VLSI floating point units", in Computer Arithmetic (ARITH), 1987 IEEE 8th Symposium, pages 95-102.

