ParaBricks

Team:

Co-founder and CEO – Mehrzad Samadi, Ph.D is a Research Fellow at the University of Michigan. He is an expert in automatic code generation for heterogeneous systems. He has been focused on optimizing massively data parallel applications for these systems.

Co-founder and CTO – Ankit Sethia, Ph.D is a Research Fellow at the University of Michigan. His research focuses on the design of heterogeneous systems and his knowledge on their bottlenecks will help generate high performance code.

Co-founder and Advisor – Scott Mahlke, PhD is a Professor & Associate Chair of the Electrical Engineering and Computer Science Department at the University of Michigan. His expertise is in compiler optimization and code generation, automatic parallelization, and application-specific processors.

 

Website: parabricks.com


Problem: R provides a large gamut of statistical and graphical techniques to ease analysis of data and gain deeper insights into the data. With increasing amount of data available from online systems, the output of statistical analysis is becoming more significant and using statistical analysis for better decision making is gaining a lot of traction. The R library implements a wide variety of techniques including, linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. The number of techniques that work out of the box has been increasing steadily for the use of public by packages contributed by open source contributors. A major bottleneck of R based statistical analysis is the low performance it has on modern computing hardware. This becomes a major problem when R is used for large amounts of data and the computation can take hours to days. R provides several packages that enables using modern high performance computing paradigm such as multicores, vectorization, graphics processing units, etc. However, these packages put the onus of achieving high performance on the programmer, who has to rewrite his R code to use these packages and using many of these packages are non-trivial. In summary, R programs crave for high performance to analyze large volumes of available data, however, currently there are no easy to use out of the box solution that fulfill this demand.

Solution: To alleviate this problem of high performance on R without burdening the data analyst to learn esoteric programming paradigms, ParaBricks provides transparent speedup to the users of R by using the computing power of the cloud. The analyst continues to write his analysis as earlier but runs his R script on the ParaBricks platform. At this point, ParaBricks replaces the standard R function calls by highly optimized ParaBricks implementation that exploits the underlying hardware capabilities. The ParaBricks implementation are generally 5-100 times faster than standard R implementation. ParaBricks also provides a web based interface where the end users can develop and store their R code, data and results. So users just login to the website to get full access to the power of the cloud computing for significantly faster processing while maintaining their productivity as in an integrated development environment. Furthermore, by using state-of-the-art cloud storage solutions, the users can access their data from anywhere, anytime in a secure fashion. Anything stored with ParaBricks can be executed, downloaded or deleted anytime by the user from any web enabled device.

Market Opportunity: Big Data is projected to be a $50.1B in 2015 according to Wikkbon. Form this market more than 50% of developers use R for their analytics according to KDnuggets. R is the most used data science language after SQL and is used by 70% of data miners. It has more than 2 million users world-wide.

Competitive Advantage: Unlike its competitors, ParaBricks provides instant access to computing on the cloud, run R on cloud without dealing with the complexities of setting up and managing the cloud, and huge savings in cost as large speedup reduces application execution time on the cloud.