In the last posts I focused on implementation of square root algorithm. I wrote codes for iterative and piplined versions. Today I wanted to compare them. Additionally for comparison I added two square root IPs (ALTSQRT) generated by Quartus. Primary goal is to answer to the following question: is it better to use IP core or write own square root calculator? I focused only on the used resources.
Comparison
I created project for Quartus (sources) with four versions of the algorithm:
- ALTSQRT_in32_d0 – generated IP calculates square root for the same input width as my modules, but with no delay
- ALTSQRT_in32_d16 – generated IP calculates square root for the same input width as my modules, but with 16 cycles delay
- squareRoot_iter – my iterative version
- squareRoot_pipe – my pipelined version
The device is chosen automatically by Quartus within Cyclone 10 family. Results of implementation are depicted in Fig. 1.

Summary
Creating your own implementation does not make sense when is planned to use a pipelined version, because the results of pipelined version and IPs are very similar. However, in cases where the bandwidth of the block is not important and the resources are crucial, it is worth to use an iterative version, which uses much less resources than every other approach.
*** *** ***
All source codes used in that post you can find on gitlab.com.
*** *** ***