Ravi Teja Mullapudi

I am a first year Ph.D student working with Kayvon Fatahalian at CMU. I am broadly interested in building systems that enable understanding and analyzing visual data efficiently at scale.

I did my Masters at Indian Institute of Science, where I was advised by Uday Bondhugula. Before my masters I worked at NVIDIA and did my bachelors at IIIT Hyderabad.

Email  /  CV  /  Google Scholar  

Automatic scheduling of Halide programs

Halide is a widely adopted domain-specific language for authoring high-performance image processing pipelines. It consists of two languages: one for expressing the image processing algorithm and the other for specifying how the algorithm should be scheduled on the target architecture. The process of determining a high-performance schedule requires expertise and intimate knowledge of the target architecture. Therefore, the current use of Halide is restricted to experts who can manually write schedules. I am currently working on building an automatic scheduler for pipelines written in Halide.

Automatic Optimization for Image Processing Pipelines

Image processing pipelines are ubiquitous and demand high-performance implementations on modern architectures. Manually implementing high performance pipelines is tedious, error prone and not portable. For my masters thesis, I focused on the problem of automatically generating efficient multi-core implementations of image processing pipelines from a high-level description of the pipeline algorithm. I leveraged polyhedral representation and code generation techniques to achieve this goal. PolyMage is a domain-specific system built for evaluating and experimenting with techniques developed during the course of my masters.

PolyMage: Automatic Optimization for Image Processing Pipelines
Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015

Compiling Affine Loop Nests for Dataflow Runtimes

Designed and evaluated a compiler and runtime to automatically extract coarse-grained dataflow parallelism in affine loop nests to target shared and distributed memory systems. As part of the evaluation, we implemented a set of benchmarks using the CnC (Intel Concurrent Collections) programming model to serve as a comparision to our system. Implementation of the Floyd-Warshall All-Pairs-Shortest-Paths algorithm used in the evaluation is now part of Intel CnC samples.

website template stolen from here