[Part2]Sketch2Pokemon-Generator 구성하기|Pix2Pix

Generator 구성하기

이제부터 본격적으로 Pix2Pix 구조를 구현해보고자 한다.

앞서 cGAN과 같이 Tensorflow의 Subclassing 방법을 이용해 모델을 만들어볼 것이다.

Generator의 구성요소 알아보기

먼저, 아래의 사진은 Pix2Pix 논문에서 Generator를 구성하는데 필요한 정보인데, 한번 읽어보도록 하자.

Question
논문에서 표기한 encoder의 "C64"는 어떠한 하이퍼 파라미터를 가진 레이어들의 조합을 나타내는 것일까???

64개의 4 * 4 필터에 stride 2를 적용한 Convolution → 0.2 slope의 LeakyReLU (위 논문의 아래 단락에 BatchNorm을 사용하지 않는다고 쓰여 있다)

Question
논문에서 표기한 decoder의 "CD512"는 어떠한 하이퍼파라미터를 가진 레이어들의 조합을 나타내는 것일까???

512개의 4x4 필터에 stride 2를 적용한 (Transposed) Convolution → BatchNorm → 50% Dropout → ReLU

위 그림에서 ENCODE 또는 DECODE라고 쓰인 각각의 블록을 기준으로 양쪽에 쓰인 입출력 크기를 나타낸다.

"in"이라고 쓰여진 입력 부분부터 윗줄의 화살표를 쭉 따라가면 계산된 결과의 (width, height) 크기가 점점 절반씩 줄어들며 최종적으로 (1,1)이 되고, 채널의 수는 512까지 늘어나는 것을 확인할 수 있다. 처음 입력부터 시작해서 (1, 1, 512) 크기를 출력하는 곳까지가 Encoder 부분이다.

이어서 아랫줄 화살표를 따라가면 (width, height) 크기가 점점 두 배로 늘어나 다시 (256, 256) 크기가 되고, 채널의 수는 점점 줄어들어 처음 입력과 같이 3 채널이 된다. (1, 1, 512)를 입력으로 최종 출력까지의 연산들이 Decoder 부분이다.

Generator 구현하기

위 정보들을 토대로 Generator의 Encoder 부분을 구현해보자.

먼저 위 논문에서 "C64", "C128" 등으로 쓰여진 것과 같이 "Convolution → BatchNorm → LeakyReLU"의 3개 레이어로 구성된 기본적인 블록을 아래와 같이 하나의 레이어로 만들었다.

from tensorflow.keras import layers, Input, Model

class EncodeBlock(layers.Layer):
    def __init__(self, n_filters, use_bn=True):
        super(EncodeBlock, self).__init__()
        self.use_bn = use_bn       
        self.conv = layers.Conv2D(n_filters, 4, 2, "same", use_bias=False)
        self.batchnorm = layers.BatchNormalization()
        self.lrelu= layers.LeakyReLU(0.2)

    def call(self, x):
        x = self.conv(x)
        if self.use_bn:
            x = self.batchnorm(x)
        return self.lrelu(x)

__init__() 메서드에서 n_filters, use_bn를 설정하여 사용할 필터의 개수와 BatchNorm 사용 여부를 결정할 수 있다.

이외 Convolution 레이어에서 필터의 크기(= 4) 및 stride(= 2)와 LeakyReLU 활성화의 slope coefficient(= 0.2)는 모든 곳에서 고정되어 사용하므로 각각의 값을 지정한다.

Encoder에 사용할 기본 블록을 구성했으니 이 블록을 여러 번 가져다 사용하면 아래와 같이 쉽게 Encoder를 구성할 수 있다.

class Encoder(layers.Layer):
    def __init__(self):
        super(Encoder, self).__init__()
        filters = [64,128,256,512,512,512,512,512]

        self.blocks = []
        for i, f in enumerate(filters):
            if i == 0:
                self.blocks.append(EncodeBlock(f, use_bn=False))
            else:
                self.blocks.append(EncodeBlock(f))

    def call(self, x):
        for block in self.blocks:
            x = block(x)
        return x

    def get_summary(self, input_shape=(256,256,3)):
        inputs = Input(input_shape)
        return Model(inputs, self.call(inputs)).summary()

각 블록을 거치면서 사용할 필터의 개수를 filters라는 리스트에 지정해 두었으며, blocks이라는 리스트에는 사용할 블록들을 정의해 넣어두고, call() 메서드에서 차례대로 블록들을 통과하게 된다.

Encoder 첫 번째 블록에서는 BatchNorm을 사용하지 않는다.

get_summary는 레이어가 제대로 구성되었는지 확인하기 위한 용도로 따로 만들어 놓았다. 위에서 구성한 Encoder에 (256, 256, 3) 크기의 데이터를 입력했을 때, 어떤 크기의 데이터가 출력되는지 살펴봅시다. 만들어 놓은 get_summary 메서드를 바로 불러오면 된다.

Encoder().get_summary()

블록을 통과할수록 (width, height) 크기는 반씩 줄어들고, 사용된 필터의 수는 최대 512개로 늘어나 최종 (1, 1, 512)로 알맞은 크기가 출력됨을 확인할 수 있다.

이번엔 Decoder를 구현해보자. Encoder와 마찬가지로 사용할 기본 블록을 정의하고, 이 블록을 여러 번 반복하여 아래와 같이 Decoder를 만들 것이다.

class DecodeBlock(layers.Layer):
    def __init__(self, f, dropout=True):
        super(DecodeBlock, self).__init__()
        self.dropout = dropout
        self.Transconv = layers.Conv2DTranspose(f, 4, 2, "same", use_bias=False)
        self.batchnorm = layers.BatchNormalization()
        self.relu = layers.ReLU()

    def call(self, x):
        x = self.Transconv(x)
        x = self.batchnorm(x)
        if self.dropout:
            x = layers.Dropout(.5)(x)
        return self.relu(x)


class Decoder(layers.Layer):
    def __init__(self):
        super(Decoder, self).__init__()
        filters = [512,512,512,512,256,128,64]

        self.blocks = []
        for i, f in enumerate(filters):
            if i < 3:
                self.blocks.append(DecodeBlock(f))
            else:
                self.blocks.append(DecodeBlock(f, dropout=False))

        self.blocks.append(layers.Conv2DTranspose(3, 4, 2, "same", use_bias=False))

    def call(self, x):
        for block in self.blocks:
            x = block(x)
        return x

    def get_summary(self, input_shape=(1,1,512)):
        inputs = Input(input_shape)
        return Model(inputs, self.call(inputs)).summary()

처음 세 개의 블록에서만 Dropout을 사용했으며, 마지막 convolution에는 3개의 필터를 사용해 출력하는 것을 확인할 수 있다. 마찬가지로 (1, 1, 512) 크기의 데이터가 입력되었을 때, 어떤 크기가 출력되는지 확인해 보자.

Decoder().get_summary()

(width, height) 크기가 점점 늘어나고 사용 필터의 수는 점점 줄어들어 최종 (256, 256, 3) 크기로 알맞게 출력되었다.

위에서 구성한 Encoder와 Decoder를 연결시키면 Encoder에서 (256, 256, 3) 입력이 (1, 1, 512)로 변환되고, Decoder를 통과해 다시 원래 입력 크기와 같은 (256, 256, 3)의 결과를 얻을 수 있을 것이다. 스케치를 입력으로 이런 연산 과정을 통해 채색된 이미지 출력을 얻을 수 있다.

아래 코드와 같이 tf.keras.Model을 상속받아 Encoder와 Decoder를 연결해 Generator를 구성해 보자.

class EncoderDecoderGenerator(Model):
    def __init__(self):
        super(EncoderDecoderGenerator, self).__init__()
        self.encoder = Encoder()
        self.decoder = Decoder()

    def call(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

    def get_summary(self, input_shape=(256,256,3)):
        inputs = Input(input_shape)
        return Model(inputs, self.call(inputs)).summary()


EncoderDecoderGenerator().get_summary()

model을 살펴봤을 때 Generator를 잘 작동시키기 위해서는 약 4000만 개의 파라미터를 잘 학습시켜야 되겠네요..... ⛹

다음 투고에서는 U-Net을 통하여 Generator을 재구성하는 방법에 대해서 알아보는 시간을 가져봅시다. 😀

저작자표시

'인공지능' 카테고리의 다른 글

[Part4]Sketch2Pokemon-Discriminator구성\|Pix2Pix (0)	2022.03.23
[Part3]Sketch2Pokemon-UNet Generator\|Pix2Pix (0)	2022.03.23
[Part1]Sketch2Pokemon-데이터 준비하기\|Pix2Pix (0)	2022.03.23
배치 정규화-신경망 훈련 속도 향상\|Neural Network (0)	2022.03.23
[Tip]Colab 노트북 바로 열기\|Github (0)	2022.03.22

Generator 구성하기

Generator의 구성요소 알아보기

Generator 구현하기

'인공지능' 카테고리의 다른 글

티스토리툴바