{"id":5747,"date":"2022-07-11T18:40:00","date_gmt":"2022-07-11T18:40:00","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2022\/07\/11\/a-programming-language-for-hardware-accelerators\/"},"modified":"2022-07-11T18:40:00","modified_gmt":"2022-07-11T18:40:00","slug":"a-programming-language-for-hardware-accelerators","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2022\/07\/11\/a-programming-language-for-hardware-accelerators\/","title":{"rendered":"A programming language for hardware accelerators"},"content":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL<\/p>\n<div>\n<p>Moore\u2019s Law needs a hug. The days of stuffing transistors on little silicon computer chips are numbered, and their life rafts \u2014 hardware accelerators \u2014 come with a price.\u00a0<\/p>\n<p>When programming an accelerator\u00a0\u2014 a process where applications <a href=\"https:\/\/www.makeuseof.com\/what-is-hardware-acceleration\/\">offload certain tasks to system hardware <\/a>especially to accelerate that task \u2014 you have to build a whole new software support. Hardware accelerators can run certain tasks orders of magnitude faster than CPUs, but they cannot be used out of the box. Software needs to efficiently use accelerators\u2019 instructions to make it compatible with the entire application system. This translates to a lot of engineering work that then would have to be maintained for a new chip that you&#8217;re compiling code to, with any programming language.\u00a0<\/p>\n<p>Now, scientists from MIT\u2019s Computer Science and Artificial Intelligence Laboratory (CSAIL) created a <a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3519939.3523446\" target=\"_blank\" rel=\"noopener\">new programming language called \u201cExo\u201d<\/a> for writing high-performance code on hardware accelerators. Exo helps low-level performance engineers transform very simple programs that specify what they want to compute, into very complex programs that do the same thing as the specification, but much, much faster by using these special accelerator chips. Engineers, for example, can use Exo to turn a simple matrix multiplication into a more complex program, which runs orders of magnitude faster by using these special accelerators.<\/p>\n<p>Unlike other programming languages and compilers, Exo is built around a concept called \u201cExocompilation.\u201d \u201cTraditionally, a lot of research has focused on automating the optimization process for the specific hardware,\u201d says Yuka Ikarashi, a PhD student in electrical engineering and computer science and CSAIL affiliate who is a lead author on a new paper about Exo. \u201cThis is great for most programmers, but for performance engineers, the compiler gets in the way as often as it helps. Because the compiler\u2019s optimizations are automatic, there\u2019s no good way to fix it when it does the wrong thing and gives you 45 percent efficiency instead of 90 percent.\u201d\u00a0\u00a0\u00a0<\/p>\n<p>With Exocompilation, the performance engineer is back in the driver\u2019s seat. Responsibility for choosing which optimizations to apply, when, and in what order is externalized from the compiler, back to the performance engineer. This way, they don\u2019t have to waste time fighting the compiler on the one hand, or doing everything manually on the other.\u00a0 At the same time, Exo takes responsibility for ensuring that all of these optimizations are correct. As a result, the performance engineer can spend their time improving performance, rather than debugging the complex, optimized code.<\/p>\n<p>\u201cExo language is a compiler that\u2019s parameterized over the hardware it targets; the same compiler can adapt to many different hardware accelerators,\u201d says Adrian Sampson, assistant professor in the Department of Computer Science at Cornell University. \u201c Instead of writing a bunch of messy C++ code to compile for a new accelerator, Exo gives you an abstract, uniform way to write down the &#8216;shape&#8217; of the hardware you want to target. Then you can reuse the existing Exo compiler to adapt to that new description instead of writing something entirely new from scratch. The potential impact of work like this is enormous: If hardware innovators can stop worrying about the cost of developing new compilers for every new hardware idea, they can try out and ship more ideas. The industry could break its dependence on legacy hardware that succeeds only because of ecosystem lock-in and despite its inefficiency.\u201d\u00a0<\/p>\n<p>The highest-performance computer chips made today, such as Google\u2019s TPU, Apple\u2019s Neural Engine, or NVIDIA\u2019s Tensor Cores, power scientific computing and machine learning applications by accelerating something called \u201ckey sub-programs,\u201d kernels, or high-performance computing (HPC) subroutines.\u00a0\u00a0<\/p>\n<p>Clunky jargon aside, the programs are essential. For example, something called Basic Linear Algebra Subroutines (BLAS) is a \u201clibrary\u201d or collection of such subroutines, which are dedicated to linear algebra computations, and enable many machine learning tasks like neural networks, weather forecasts, cloud computation, and drug discovery. (BLAS is so important that it won Jack Dongarra the Turing Award in 2021.) However, these new chips \u2014 which take hundreds of engineers to design \u2014 are only as good as these HPC software libraries allow.<\/p>\n<p>Currently, though, this kind of performance optimization is still done by hand to ensure that every last cycle of computation on these chips gets used. HPC subroutines regularly run at 90 percent-plus of peak theoretical efficiency, and hardware engineers go to great lengths to add an extra five or 10 percent of speed to these theoretical peaks. So, if the software isn\u2019t aggressively optimized, all of that hard work gets wasted \u2014 which is exactly what Exo helps avoid.\u00a0<\/p>\n<p>Another key part of Exocompilation is that performance engineers can describe the new chips they want to optimize for, without having to modify the compiler. Traditionally, the definition of the hardware interface is maintained by the compiler developers, but with most of these new accelerator chips, the hardware interface is proprietary. Companies have to maintain their own copy (fork) of a whole traditional compiler, modified to support their particular chip. This requires hiring teams of compiler developers in addition to the performance engineers.<\/p>\n<p>\u201cIn Exo, we instead externalize the definition of hardware-specific backends from the exocompiler. This gives us a better separation between Exo \u2014 which is an open-source project \u2014 and hardware-specific code \u2014 which is often proprietary. We\u2019ve shown that we can use Exo to quickly write code that\u2019s as performant as Intel\u2019s hand-optimized Math Kernel Library. We\u2019re actively working with engineers and researchers at several companies,\u201d says Gilbert Bernstein, a postdoc at the University of California at Berkeley.\u00a0<\/p>\n<p>The future of Exo entails exploring a more productive scheduling meta-language, and expanding its semantics to support parallel programming models to apply it to even more accelerators, including GPUs.<\/p>\n<p>Ikarashi and Bernstein wrote the paper alongside Alex Reinking and Hasan Genc, both PhD students at UC Berkeley, and MIT Assistant Professor Jonathan Ragan-Kelley.<\/p>\n<p>This work was partially supported by the Applications Driving Architectures center, one of six centers of JUMP, a Semiconductor Research Corporation program co-sponsored by the Defense Advanced Research Projects Agency. Ikarashi was supported by Funai Overseas Scholarship, Masason Foundation, and Great Educators Fellowship. The team presented the work at the ACM SIGPLAN Conference on Programming Language Design and Implementation 2022.<\/p>\n<\/div>\n<p><a href=\"https:\/\/news.mit.edu\/2022\/programming-language-hardware-accelerators-0711\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Rachel Gordon | MIT CSAIL Moore\u2019s Law needs a hug. The days of stuffing transistors on little silicon computer chips are numbered, and their [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2022\/07\/11\/a-programming-language-for-hardware-accelerators\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":460,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5747"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5747"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5747\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/456"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5747"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5747"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5747"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}