Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

GF Diamos, AR Kerr, S Yalamanchili… - Proceedings of the 19th …, 2010 - dl.acm.org
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel
execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms …

Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads

R Sasanka, A Das, JJ Cook, J Bobba… - US Patent …, 2015 - Google Patents
(57) Systems, apparatuses, and methods for a hardware and Soft ware system to
automatically decompose a program into mul tiple parallel threads are described. For …

Systems, methods, and apparatuses to decompose a sequential program into multiple threads, execute said threads, and reconstruct the sequential execution

F Latorre, JM Codina, EG Codina, P Lopez… - US Patent …, 2014 - Google Patents
2010-07-06 Assigned to INTEL CORPORATION reassignment INTEL CORPORATION
ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors …

[PDF][PDF] Smooth trajectory planning for a car in a structured world

T Fraichard - Proc. of the IEEE Int. Conf. on Robotics and Automation, 1991 - Citeseer
This paper aims at studying the trajectory planning for a car| ie a non holonomic vehicle
whose turning radius is lower bounded| in a static and structured world. As for the structure …

Optimizing software runtime systems for speculative parallelization

P Yiapanis, D Rosas-Ham, G Brown… - ACM Transactions on …, 2013 - dl.acm.org
Thread-Level Speculation (TLS) overcomes limitations intrinsic with conservative compile-
time auto-parallelizing tools by extracting parallel threads optimistically and only ensuring …

[PDF][PDF] 基于模糊聚类的推测多线程划分算法

李远成, 阴培培, 赵银亮 - 2014 - cjc.ict.ac.cn
摘要推测多线程(SpeculativeMultithreading, SpMT) 技术是一种实现非规则程序自动并行化的
有效途径. 然而, 如何有效评估由诸如控制, 数据依赖等因素导致的多种并行开销并实现最优线程 …

[PDF][PDF] The design and implementation ocelot's dynamic binary translator from ptx to multi-core x86

G Diamos - Center for Experimental Research in Computer …, 2009 - Citeseer
Ocelot is a dynamic compilation framework designed to map the explicitly parallel PTX
execution model used by NVIDIA CUDA applications onto diverse many-core architectures …

Loopapalooza: investigating limits of loop-level parallelism with a compiler-driven approach

AM Zaidi, K Iordanou, M Luján… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Improving sequential performance of out-of-order processors is becoming harder. Further
improvements may require exploitation of thread-level parallelism, on top of ILP, as it can …

Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads

DJ Sager, R Sasanka, R Gabor, S Raikin… - US Patent …, 2020 - Google Patents
Abstract Systems, apparatuses, and methods for a hardware and software system to
automatically decompose a program into multiple parallel threads are described. In some …

Unconventional applications of compiler analysis

JWA Selby - 2011 - uwspace.uwaterloo.ca
Previously, compiler transformations have primarily focused on minimizing program
execution time. This thesis explores some examples of applying compiler technology …