Anton 3: twenty microseconds of molecular dynamics simulation before lunch

DE Shaw, PJ Adams, A Azaria, JA Bank… - Proceedings of the …, 2021 - dl.acm.org
Anton 3 is the newest member in a family of supercomputers specially designed for atomic-
level simulation of molecules relevant to biology (eg, DNA, proteins, and drug molecules) …

Volume visualization: a technical overview with a focus on medical applications

Q Zhang, R Eagleson, TM Peters - Journal of digital imaging, 2011 - Springer
With the increasing availability of high-resolution isotropic three-or four-dimensional medical
datasets from sources such as magnetic resonance imaging, computed tomography, and …

Fast packet processing: A survey

D Cerović, V Del Piccolo, A Amamou… - … Surveys & Tutorials, 2018 - ieeexplore.ieee.org
The exponential growth of data traffic, which is not expected to stop anytime soon, brought
about a vast amount of advancements in the networking field. Latest network interfaces …

[图书][B] An introduction to parallel programming

P Pacheco - 2011 - books.google.com
An Introduction to Parallel Programming is the first undergraduate text to directly address
compiling and running parallel programs on the new multi-core and cluster architecture. It …

PacketShader: a GPU-accelerated software router

S Han, K Jang, KS Park, S Moon - ACM SIGCOMM Computer …, 2010 - dl.acm.org
We present PacketShader, a high-performance software router framework for general packet
processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the …

Interactive furniture layout using interior design guidelines

P Merrell, E Schkufza, Z Li, M Agrawala… - ACM transactions on …, 2011 - dl.acm.org
We present an interactive furniture layout system that assists users by suggesting furniture
arrangements that are based on interior design guidelines. Our system incorporates the …

Highly-scalable GPU-accelerated compressible reacting flow solver for modeling high-speed flows

R Bielawski, S Barwey, S Prakash, V Raman - Computers & Fluids, 2023 - Elsevier
Emerging supercomputing systems utilize a combination of central processing units (CPUs)
and graphics processing units (GPUs) in an effort to reach exascale capabilities while …

{Latency-Tolerant} software distributed shared memory

J Nelson, B Holt, B Myers, P Briggs, L Ceze… - 2015 USENIX Annual …, 2015 - usenix.org
We present Grappa, a modern take on software distributed shared memory (DSM) for in-
memory data-intensive applications. Grappa enables users to program a cluster as if it were …

High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

D Komatitsch, G Erlebacher, D Göddeke… - Journal of computational …, 2010 - Elsevier
We implement a high-order finite-element application, which performs the numerical
simulation of seismic wave propagation resulting for instance from earthquakes at the scale …

Energy-efficient mechanisms for managing thread context in throughput processors

M Gebhart, DR Johnson, D Tarjan, SW Keckler… - Proceedings of the 38th …, 2011 - dl.acm.org
Modern graphics processing units (GPUs) use a large number of hardware threads to hide
both function unit and memory access latency. Extreme multithreading requires a …