Variational autoencoders (VAE) have gained popularity due to their scalability and computational efficiency. It is widely used in voice modeling, clustering and data augmentation applications. This research represents a general-purpose open-source python library (Pythae). The primary goal of this library is to provide a uniform implementation and specialized framework for using generative autoencoder models in an easy, reproducible, and reliable way. Moreover, this research improves the previous results with a better lower bound, encourages disentangling and rectifies the distance between the distributions.
Pythae pipelines having only a few lines of code allow to create new data or to train an autoencoder model. To start training or model building, mainly use the Pytorch framework and requires basic tuning of hyper-parameters and data in the form of arrays or tensors. Additionally, the library incorporates a user-friendly experimental tracker (wandb) that allows users to compare and track runs launched by Pythae. The basic architecture of the Pythae library that illustrates training and generation is shown in Fig. 1.
This library is used to perform benchmark comparisons of developed models for image reconstruction and generation, latent vector classification and clustering, and image interpolation. Three standard image datasets remembered as MNIST, CIFAR and CELEBA are used for this task.
This research is an experiment on fixed latent dimensions and variable latent dimensions. For the MNIST, CIFAR10, and CELEBA datasets, the latent dimensions of the fixed part are set to 16, 256, and 64, respectively. The reconstruction results show that the autoencoder-based models seem to perform best for the reconstruction task. Furthermore, it proves that integrating regularization into the encoder results in improved performance over the standard AutoEncoder. One of the main findings of this experiment is that implementing ex-post density for the variational approach results in better generation metrics even with ten GMM components.
Compared to the GMM, experimentation is done with more complex density estimators. However, the results did not outperform the GMM approach. In GMM, several components play an essential role; therefore, for MNIST and CIFAR, it is set to 10. If the number of components is increased, it will cause overfitting, and if they are decreased, the results will be poor. Compared to a standard VAE, the models that explicitly favor disentangling in the latent space, such as the VAE β-VAE and β-TC, perform better in classification. In clustering, 100 separate runs of the k-means algorithm are performed and an average precision is achieved. Here, the detangling-targeting models seem to be matched by the original e-bike. Moreover, the best results seem to be obtained via contradictory strategies and other alternatives to the classical VAE KL regularization procedure. For interpolation, a start and end frame in the test set of MNIST and CIFAR10 is chosen, and linear interpolation is performed in the latent spaces produced between the two coded frames.
In variable latent dimensions, the same configurations are kept as fixed latent dimensions, with the latent space varying within the range [16; 32; 64; 128; 256; 512]. In this scenario, the optimal choice for the four tasks is the latent dimension of 16 to 32 on the MNIST dataset and 32 to 128 on the CIFAR10 dataset.
In conclusion, on the most frequent tasks, including reconstruction, generation and classification, generative approaches based on EA produce the most remarkable results. However, they are sensitive to latent dimension selection and do not adapt well to complex tasks such as interpolation.
This Article is written as a summary article by Marktechpost Staff based on the paper 'Pythae: Unifying Generative Autoencoders in Python A Benchmarking Use Case'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, github. Please Don't Forget To Join Our ML Subreddit