implemented and tested. Differences with the corresponding sequential algorithm are clearly
stated. The algorithm's performance is analyzed in the Bulk-Synchronous Parallel (BSP) cost
model which suggests speed-ups on high-bandwidth architectures. Experimental results on
a massively parallel machine Cray T3E-1200 validate the model and show the parallel
algorithm's efficiency as well as its limitations.