Quickly optimizing deep neural networks for different devices

If you have several different wireless devices, you might have noticed that apps sometimes perform differently depending on the devices. Perhaps fun filters on a messaging app work with lightning speed on your laptop but lag on your phone. Or a social media app that works seamlessly on your phone is a little wonky on your tablet.

One reason for this is because the deep neural networks, or DNNs — layers of algorithms that perform computations in an orderly progression to make the apps work — need to be optimized for each device. On other devices, it can take longer for the DNN to perform computations, causing the app to lag. This lag is known to computer scientists as “latency.” But with so many different makes and models of devices and customization options, it is almost impossible for developers to accomplish.

Even if they tried, the process would be lengthy and add considerably to the cost. A crucial bottleneck is the difficulty of quickly evaluating latencies of many DNN candidate models on a wide range of devices. Thus, DNNs tend to be optimized for one specific device.

Now, UC Riverside computer scientists have come up with a simple, inexpensive way to optimize DNNs for numerous devices across different platforms. The work was accepted by and will be presented at the highly selective ACM SIGMETRICS / IFIP Performance 2022 conference.

“If developers adopt our technique, more consumers would notice that their apps would have better performance,” said lead author Shaolei Ren, an associate professor of electrical and computer engineering in UCR’s Marlan and Rosemary Bourns College of Engineering.

Ren’s group studied the DNN latency relationships across different devices. They found that to find the best DNN model for a device, you don’t need to know the actual latencies — you need to know the latency ranking. Ranking latency is a relatively simple matter of sorting latency values from high to low, where lowest is best. The latency rankings for different devices are highly correlated.

“If your latency ranking follows one order on one device, it will be about the same on a different device. The order doesn’t change that much,” Ren said. “For example, it’s not that different between all types of cellphones, regardless of operating system or model.”

If two devices have similar latency rankings, there is no need to do anything. But if a new device has a very different ranking order, a technique called proxy adaptation based on transfer learning can help optimize the DNN for that device quickly.

“Lightweight transfer learning enables us to adapt our default proxy device latency evaluator to the new device. That only needs a few tens of models for latency measurement. Compared to measuring thousands of models to get a new latency predictor, measuring 50 or 60 is nothing. So we can scale up our design process very quickly without building a new latency predictor for each device,” said Ren.

For example, if for 100 different devices, pre-training a super DNN model containing all the candidate models takes about 1,000 hours, building an accuracy predictor takes about 100 hours, and measuring the latencies to build a latency predictor for each device takes 20-100 hours, it will end up taking anywhere between 3,100 and 11,000 machine-hours to optimize DNNs for these 100 devices using existing approaches.

“Using our approach, we still have 1,000 hours for pre-training, 100 hours for building an accuracy predictor, and 20-100 hours for building a latency predictor on one device that we call proxy device, but the rest will be almost no time,” Ren said. “So we keep total design costs about the same as if for only one device, but much better optimized for a wider variety of devices.”

Ren’s group tested their method on a wide range of devices and public datasets and found that it worked extremely well to optimize DNNs.

The paper, “One proxy device is enough for hardware-aware neural architecture search,” has been accepted for the ACM SIGMETRICS 2022 conference and is published in the December, 2021 issue of Proceedings of the ACM on Measurement and Analysis of Computing Systems and can be downloaded here. Other authors include Bingqian Lu, Jianyi Yang, who are doctoral students at UC Riverside; Weiwen Jiang at George Mason University; and Yiyu Shi at Notre Dame.

^{Header photo: Firmbee.com on Unsplash}