[Publications] [Curriculum Vitae]

Greg Stitt
Associate Professor
Department of Electrical and Computer Engineering
315 Benton Hall, University of Florida
(352) 392-5348


Current Research

Interests: Reconfigurable Computing, FPGAs, GPUs, synthesis, compilers, CAD, architecture, embedded Systems

Elastic Computing

Elastic computing (not to be confused with Amazon's Elastic Compute Cloud) is an optimization framework for multi-core heterogeneous systems that enables mainstream designers to more effectively take advantage of accelerators such as GPUs, FPGAs, and multi-core processors. From a designer's point of view, elastic computing provides a library of specialized elastic functions. However, unlike traditional functions, elastic functions contain a knowledge based of different algorithms, optimizations, and parallelizations for the given function. This knowledge base enables the elastic computing framework to transparently optimize a function for different runtime conditions, while parallelizing the implementation across available resources.

This work is supported by the National Science Foundation grant CNS-0914474. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Relevant publications:

[All Publications]

Intermediate Fabrics (FPGA Device Virtualization)

Despite significant performance and energy advantages for important application domains, FPGAs have limited usage due to low application-design productivity compared to CPUs and GPUs. Two main sources of low productivity are long compile times, often requiring hours or even days, and a lack of application portability that prevents design reuse. Intermediate fabrics address these problems via virtual reconfigurable architectures implemented atop FPGAs. By matching virtual resource granularity to application requirements, intermediate fabrics can perform place & route orders-of-magnitude faster than commerical tools. Furthermore, by hiding the underlying FPGA from the application, intermediate fabrics enable application portabiltiy across potentially any FPGA with enough resources to implement the intermediate fabric.

This work is supported by National Science Foundation grant CNS-1149285 and the I/UCRC Program of the National Science Foundation under Grant No. EEC-0642422. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Relevant publications:

[All Publications]

Previous Research

Warp Processors: Self-Optimizing Chips

Warp processors are hybrid SoC (system on a chip) devices that dynamically optimize software by synthesizing hardware implemented in an on-chip FPGA. From a software developer's point of view, a warp processor initially executes an application like any other microprocessor, but after some period of time the application transparently executes more efficiently, with improved performance and reduced energy. This transparency allows for synthesis to be integrated into any existing application development tool flow, allowing developers to use their existing languages and compilers. Warp processors completely hide synthesis from software developers, who often avoid hardware design due to the difficult and time-consuming process of register-transfer level specification. Also, the dynamic nature of warp processing enables dynamic optimizations not possible in existing static approaches, such as phase-based optimizations.

To perform synthesis at runtime, warp processors have a specialized architecture capable of profiling the executing software, decompiling computation kernels, synthesizing the decompiled kernels, and then mapping, placing, and routing the kernels into an on-chip FPGA. The main challenge in the design of warp processors is the design of these CAD tools, which must run in an on-chip environment - a difficult task considering these tools typically require power workstations. We have currently implemented a complete on-chip CAD tool flow that executes in just several seconds on an ARM microprocessor, resulting in a hardware/software system that is often 10x faster than software execution. We are currently extending warp processors to handle multithreaded applications, by synthesizing custom accelerators for executing threads. Early results show that multithreaded warp processing can achieve more than 100x speedups compared to software execution on multi-core systems with up to 64 cores.

Synthesis from Software Binaries

Much of my research has focused on one of the enabling technologies of warp processors - synthesis from software binaries. Because the dynamic synthesis performed by warp processors must be performed on a software binary as opposed to high-level code, the resulting hardware can potentially be much slower, due to the loss of high-level information during software compilation. To make synthesis from software binaries feasible, I have adapted existing decompilation techniques and introduced new techniques to recover high-level information needed for effective synthesis. By using these techniques, I have shown that for many applications, including a commercial h.264 decoder, synthesis from software binaries can in fact achieve similar, or even identical results compared to high-level synthesis approaches. Synthesis from software binaries can also be used independently of warp processors, providing similar transparency advantages for desktop CAD, in addition to supporting synthesis of library code, legacy code, and hand-optimized assembly.

[Publications] [Curriculum Vitae]