Tackling Diverse Tasks with Neural Architecture Search
The past decade has witnessed the success of machine learning (ML) in solving diverse real-world problems, from facial recognition and machine translation to disease diagnosis and protein sequence prediction. However, progress in such areas has involved painstaking manual effort in designing and training task-specific neural networks, leveraging human and computational resources that most practitioners do not have access to.
In contrast to this task-specific approach, general-purpose models such as DeepMind’s Perceiver
IO and Gato and Google’s Pathway have been developed to solve more than one task at once.
However, as these proprietary pretrained models are not publicly available, practitioners cannot
even assess whether fine-tuning one of these models would work on their task of interest.
Independently developing a general-purpose model from scratch is also infeasible due to the
massive amount of compute and training data it requires.
A more accessible alternative is the field of automated machine learning (AutoML), which aims to
obtain high-quality models for diverse tasks with minimal human effort and computational
resources, as noted in a recent blogpost. In particular, we can use Neural Architecture Search
(NAS) to automate the design of neural networks for different learning problems.
Indeed, compared with training large-scale transformer-based general-purpose models, many
efficient NAS algorithms such as DARTS can be run on a single GPU and take a few hours to
complete a simple task. However, while NAS has enabled fast and effective model development in
well-studied areas such as computer vision, its application to domains beyond vision remains
largely unexplored.
In fact, a major difficulty in applying NAS to more diverse problems is the trade-off between
considering a sufficiently expressive set of neural networks and being able to efficiently
search over this set. In this blog post, we will introduce our approach to find a suitable
balance between expressivity and efficiency in NAS.
In our upcoming NeurIPS 2022 paper, we developed a NAS method called DASH that generates and
trains task-specific convolutional neural networks (CNNs) with high prediction accuracy. Our
core hypothesis is that for a broad set of problems (especially those with non-vision inputs
such as audio and protein sequences), simply searching for the right kernel sizes and dilation
rates for the convolutional layers in a CNN can achieve high-quality feature extraction and
yield models competitive to expert-designed ones.
We explicitly focus on extending the generalization ability of CNNs due to the well known
effectiveness of convolutions as feature extractors, coupled with recent work demonstrating the
success of modern CNNs on a variety of tasks (e.g., the state-of-the-art performance of the
ConvNeXt model that incorporates many techniques used by Transformers).
In the following, we will first discuss how DASH is inspired by and differs from existing NAS
work. Then, we will introduce three novel “tricks” that improve the efficiency of searching over
a diverse kernel space. Finally, we will present the empirical evaluation to demonstrate DASH’s
effectiveness.
Ameet Talwalkar THANKS FOR READING IT,
CodeIN BLOGS
More Articles


