📝
Awesome reviews
  • Welcome
  • Paper review
    • [2022 Spring] Paper review
      • RobustNet [Eng]
      • DPT [Kor]
      • DALL-E [Kor]
      • VRT: A Video Restoration Transformer [Kor]
      • Barbershop [Kor]
      • Barbershop [Eng]
      • REFICS [ENG]
      • Deep texture manifold [Kor]
      • SlowFast Networks [Kor]
      • SCAN [Eng]
      • DPT [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Kor]
      • Chaining a U-Net With a Residual U-Net for Retinal Blood Vessels Segmentation [Eng]
      • Patch Cratf : Video Denoising by Deep Modeling and Patch Matching [Eng]
      • LAFITE: Towards Language-Free Training for Text-to-Image Generation [Kor]
      • RegSeg [Eng]
      • D-NeRF [Eng]
      • SimCLR [Kor]
      • LabOR [Kor]
      • LabOR [Eng]
      • SegFormer [Kor]
      • Self-Calibrating Neural Radiance Fields [Kor]
      • Self-Calibrating Neural Radiance Fields [Eng]
      • GIRAFFE [Kor]
      • GIRAFFE [Eng]
      • DistConv [Kor]
      • SCAN [Eng]
      • slowfastnetworks [Kor]
      • Nesterov and Scale-Invariant Attack [Kor]
      • OutlierExposure [Eng]
      • TSNs [Kor]
      • TSNs [Eng]
      • Improving the Transferability of Adversarial Samples With Adversarial Transformations [Kor]
      • VOS: OOD detection by Virtual Outlier Synthesis [Kor]
      • MultitaskNeuralProcess [Kor]
      • RSLAD [Eng]
      • Deep Learning for 3D Point Cloud Understanding: A Survey [Eng]
      • BEIT [Kor]
      • Divergence-aware Federated Self-Supervised Learning [Eng]
      • NeRF-W [Kor]
      • Learning Multi-Scale Photo Exposure Correction [Eng]
      • ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [Eng]
      • ViT [Eng]
      • CrossTransformer [Kor]
      • NeRF [Kor]
      • RegNeRF [Kor]
      • Image Inpainting with External-internal Learning and Monochromic Bottleneck [Eng]
      • CLIP-NeRF [Kor]
      • CLIP-NeRF [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Eng]
      • DINO: Emerging Properties in Self-Supervised Vision Transformers [Kor]
      • DatasetGAN [Eng]
      • MOS [Kor]
      • MOS [Eng]
      • PlaNet [Eng]
      • MAE [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Kor]
      • Fair Attribute Classification through Latent Space De-biasing [Eng]
      • Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning [Kor]
      • PointNet [Kor]
      • PointNet [Eng]
      • MSD AT [Kor]
      • MM-TTA [Kor]
      • MM-TTA [Eng]
      • M-CAM [Eng]
      • MipNerF [Kor]
      • The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [Eng]
      • Calibration [Eng]
      • CenterPoint [Kor]
      • YOLOX [Kor]
    • [2021 Fall] Paper review
      • DenseNet [Kor]
      • Time series as image [Kor]
      • mem3d [Kor]
      • GraSP [Kor]
      • DRLN [Kor]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Eng]
      • VinVL: Revisiting Visual Representations in Vision-Language Models [Kor]
      • NeSyXIL [Kor]
      • NeSyXIL [Eng]
      • RCAN [Kor]
      • RCAN [Eng]
      • MI-AOD [Kor]
      • MI-AOD [Eng]
      • DAFAS [Eng]
      • HyperGAN [Eng]
      • HyperGAN [Kor]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Eng]
      • Scene Text Telescope: Text-focused Scene Image Super-Resolution [Kor]
      • UPFlow [Eng]
      • GFP-GAN [Kor]
      • Federated Contrastive Learning [Kor]
      • Federated Contrastive Learning [Eng]
      • BGNN [Kor]
      • LP-KPN [Kor]
      • Feature Disruptive Attack [Kor]
      • Representative Interpretations [Kor]
      • Representative Interpretations [Eng]
      • Neural Discrete Representation Learning [KOR]
      • Neural Discrete Representation Learning [ENG]
      • Video Frame Interpolation via Adaptive Convolution [Kor]
      • Separation of hand motion and pose [kor]
      • pixelNeRF [Kor]
      • pixelNeRF [Eng]
      • SRResNet and SRGAN [Eng]
      • MZSR [Kor]
      • SANforSISR [Kor]
      • IPT [Kor]
      • Swin Transformer [kor]
      • CNN Cascade for Face Detection [Kor]
      • CapsNet [Kor]
      • Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [Kor]
      • CSRNet [Kor]
      • ScrabbleGAN [Kor]
      • CenterTrack [Kor]
      • CenterTrack [Eng]
      • STSN [Kor]
      • STSN [Eng]
      • VL-BERT:Visual-Linguistic BERT [Kor]
      • VL-BERT:Visual-Linguistic BERT [Eng]
      • Squeeze-and-Attention Networks for Semantic segmentation [Kor]
      • Shot in the dark [Kor]
      • Noise2Self [Kor]
      • Noise2Self [Eng]
      • Dynamic Head [Kor]
      • PSPNet [Kor]
      • PSPNet [Eng]
      • CUT [Kor]
      • CLIP [Eng]
      • Local Implicit Image Function [Kor]
      • Local Implicit Image Function [Eng]
      • MetaAugment [Eng]
      • Show, Attend and Tell [Kor]
      • Transformer [Kor]
      • DETR [Eng]
      • Multimodal Versatile Network [Eng]
      • Multimodal Versatile Network [Kor]
      • BlockDrop [Kor]
      • MDETR [Kor]
      • MDETR [Eng]
      • FSCE [Kor]
      • waveletSR [Kor]
      • DAN-net [Eng]
      • Boosting Monocular Depth Estimation [Eng]
      • Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow [Kor]
      • Syn2real-generalization [Kor]
      • Syn2real-generalization [Eng]
      • GPS-Net [Kor]
      • Frustratingly Simple Few Shot Object Detection [Eng]
      • DCGAN [Kor]
      • RealSR [Kor]
      • AMP [Kor]
      • AMP [Eng]
      • RCNN [Kor]
      • MobileNet [Eng]
  • Author's note
    • [2022 Spring] Author's note
      • Pop-Out Motion [Kor]
    • [2021 Fall] Author's note
      • Standardized Max Logits [Eng]
      • Standardized Max Logits [Kor]
  • Dive into implementation
    • [2022 Spring] Implementation
      • Supervised Contrastive Replay [Kor]
      • Pose Recognition with Cascade Transformers [Eng]
    • [2021 Fall] Implementation
      • Diversity Input Method [Kor]
        • Source code
      • Diversity Input Method [Eng]
        • Source code
  • Contributors
    • [2022 Fall] Contributors
    • [2021 Fall] Contributors
  • How to contribute?
    • (Template) Paper review [Language]
    • (Template) Author's note [Language]
    • (Template) Implementation [Language]
  • KAIST AI
Powered by GitBook
On this page
  • (Start your manuscript from here)
  • 1. Problem definition
  • 2. Motivation & Related work
  • Idea
  • 3. Method
  • 4. Experiment & Result
  • Experimental setup
  • Result
  • Ablation Study
  • 5. Conclusion
  • Take home message (오늘의 교훈)
  • Author / Reviewer information
  • Author
  • Reviewer
  • Reference & Additional materials

Was this helpful?

  1. Paper review
  2. [2021 Fall] Paper review

HyperGAN [Eng]

Ratzlaff et al. / HyperGAN - A Generative Model for Diverse, Performant Neural Networks / ICML 2019

PreviousDAFAS [Eng]NextHyperGAN [Kor]

Last updated 3 years ago

Was this helpful?

(Start your manuscript from here)

한국어로 쓰인 리뷰를 읽으려면 를 누르세요.

1. Problem definition

HyperGAN is a generative model for learning a distribution of neural network parameters. Specifically, weights of convolutional filters are generated with latent and mixer layers.

2. Motivation & Related work

It is well known that it is possible to train deep neural networks from different random initializations. Also, ensembles of deep networks have been further studied that they have better performance and robustness. In Bayesian deep learning, learning posterior distributions over network parameters is a significant interest, and dropout is commonly used for Bayesian approximation. MC dropout was proposed as a simple way to estimate model uncertainty. However applying dropout to every layer may lead to underfitting of the data and it integrates over the space of models reachable from only a single initialization.

As another interesting direction, hypernetworks are neural networks which output parameters for a target neural network. Hypernetwork and the target network together form a single model which is trained jointly. However, prior hypernetworks relied on normalizing flow to produce posteriors, which limited their scalability.

This work explore an approach which generates all the parameters of a neural network in a single pass, without assuming any fixed noise models or functional form of the generating function. Instead of using flow-based models, authors utilize GANs. This method results in models more diverse than training with multiple random starts(ensembles) or past variational methods.

Idea

The idea of HyperGAN is to utilize a GAN-type approach to directly model weights. However, this would require a large set of trained model parameters as training data. So the authors take another approach, they directly optimize the target supervised learning objective. This method can be more flexible than using normalized flows, and also computationally efficient because parameters of each layer is generated in parallel. Also compared to ensemble model, which has to train many models, it is both computationally and memory efficient.

3. Method

Above figure in introduction section shows the HyperGAN architecture. Distinct from the standard GAN, authors propose a Mixer Q which is a fully-connected network that maps s ~ S to a mixed latent space Z. The mixer is motivated by the observation that weight parameters between network layers must be strongly correlated as the output of one layer needs to be the input to the next one. So it produces Nd - dimensional mixed latent vector in mixed latent space Q(z|s), which is all correlated. Latent vector is partitioned to N layer embeddings, each being a d-dimensional vector. Finally, N parallel generators produce parameters for each of the N layers. This mehtod is also memory efficient since the extremely high dimensional space of the weight parameters are now separately connected to multiple latent vectors, instead of fully-connected to the latent space.

Now the new model is evalutated on the training set and generated parameters are optimized with respect to loss L:

However, it is possible that codes sampled from Q(z|s) may collapse to the maximum likelihood estimate (MLE). To prevent this, authors add an adversarial constraint on the mixed latent space and require it to not deviate too much from a high entropy prior P. So this leads to the HyperGAN objective:

D could be any distance function between two distributions in practice. Here, discriminator network together with adversarial loss is used to approximate the distance function.

Since it is difficult to learn a discriminator in the high dimensional space and as there is no structure in those parameters(unlike images), regularizing in the latent space works well.

4. Experiment & Result

Experimental setup

  • classification performance on MNIST and CIFAR-10

  • learning variance of a simple 1D dataset

  • Anomaly detection of out-of-distribution examples

    • Model trained on MNIST / tested with notMNIST

    • Model trained on CIFAR-10 5 classes / tested on rest of 5 classes

  • baselines

    • APD(Wang et al., 2018), MNF(Louizos & Welling, 2016), MC Dropout(Gal & Ghahramani, 2016)

Result

Classification result

Anomaly detection result

Ablation Study

First, removing regularization term D(Q(s), P) from the objective reduces the network diversity. Authors measure L2 norm of 100 weight samples and divide their standard deviation by the mean. Also, authors examine that the diversity decreases over time, so they suggest early stopping of the training. Next the authors remove the mixer Q. While the accuracy is retained, diversity suffers significantly. Authors hypothesize that without the mixer, a valid optimization is difficult to find. With the mixer, the built-in correlation between the parameters of different layers may have made optimization easier.

5. Conclusion

In conclusion, HyperGAN is a great mehtod to build an ensemble models that are highly robust and reliable. It has strength in that it is able to generate parameters with GAN method, without mode collapse using mixer network and regularization terms. However, has weakness that the work was only built with small target networks with small datasets like MNIST and CIFAR10, performing a simple classification tasks. It would be more interesting if the work can be done on large networks like ResNets, training with larger datasets.

Take home message (오늘의 교훈)

Hypernetworks can be trained using GAN to build bayesian neural networks.

Author / Reviewer information

Author

형준하 (Junha Hyung)

  • KAIST graduate school of AI (M.S.)

  • Research Area: Computer Vision

  • sharpeeee@kaist.ac.kr

Reviewer

  1. Korean name (English name): Affiliation / Contact information

  2. Korean name (English name): Affiliation / Contact information

  3. ...

Reference & Additional materials

Ha, D., Dai, A. M., and Le, Q. V. Hypernetworks. CoRR

Henning, C., von Oswald, J., Sacramento, J., Surace, S. C., Pfister, J.P., and Grewe, B. F. Approximating the predic- tive distribution via adversarially-trained hypernetworks

Krueger, D., Huang, C.W., Islam, R., Turner, R., Lacoste, A., and Courville, A. Bayesian Hypernetworks

Lorraine, J. and Duvenaud, D. Stochastic hyperparameter optimization through hypernetworks. CoRR

Pawlowski, N., Brock, A., Lee, M. C., Rajchl, M., and Glocker, B. Implicit weight uncertainty in neural networks

[1]
[2]
[3]
[4]
[5]
여기
alt text
alt text
alt text
alt text
alt text
alt text