IT story

Keras LSTM 이해

hot-time 2020. 4. 6. 08:12

Keras LSTM 이해

나는 LSTM에 대한 나의 이해를 조정하려고 노력하고 있으며 Keras에서 구현 한 Christopher Olah 의이 게시물 에서 지적했습니다 . Keras 튜토리얼을 위해 Jason Brownlee이 작성한 블로그를 따르고 있습니다. 내가 주로 혼동하는 것은

데이터 계열을 [samples, time steps, features]및
스테이트 풀 LSTM

아래에 붙여 넣은 코드를 참조하여 위의 두 가지 질문에 집중하십시오.

# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0], look_back, 1))
########################
# The IMPORTANT BIT
##########################
# create and fit the LSTM network
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
    model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
    model.reset_states()

참고 : create_dataset은 길이 N의 시퀀스를 가져 와서 N-look_back각 요소가 look_back길이 시퀀스 인 배열을 반환합니다 .

시간 단계 및 기능이란 무엇입니까?

알 수 있듯이 TrainX는 Time_steps와 Feature가 각각 마지막 2 차원 (이 특정 코드에서 3과 1) 인 3 차원 배열입니다. 아래 이미지와 관련하여 many to one분홍색 상자의 수가 3 인 경우를 고려하고 있습니까? 또는 문자 그대로 체인 길이가 3 (즉, 3 개의 녹색 상자 만 고려 됨)을 의미합니까?

다변량 계열을 고려할 때 기능 인수가 관련성이 있습니까? 예를 들어 두 개의 금융 주식을 동시에 모델링 하는가?

상태 저장 LSTM

상태 저장 LSTM은 배치 실행 사이에 셀 메모리 값을 저장한다는 의미입니까? 이 경우, batch_size하나이고, 훈련 실행 사이에 메모리가 재설정되므로 상태가 좋았다는 것이 중요합니다. 나는 이것이 훈련 데이터가 섞여 있지 않다는 사실과 관련이 있다고 생각하지만 어떻게 해야할지 모르겠습니다.

이견있는 사람? 이미지 참조 : http://karpathy.github.io/2015/05/21/rnn-effectiveness/

편집 1 :

빨간색과 초록색 상자가 동일하다는 @van의 의견에 대해 약간 혼란 스럽습니다. 확인을 위해 다음 API 호출이 롤링되지 않은 다이어그램에 해당합니까? 특히 두 번째 다이어그램에 주목하십시오 ( batch_size임의로 선택되었습니다).

편집 2 :

Udacity의 딥 러닝 과정을 수행했지만 time_step 인수에 대해 여전히 혼란스러워하는 사람들은 다음 토론을 참조하십시오 . https://discussions.udacity.com/t/rnn-lstm-use-implementation/163169

업데이트 2 :

https://www.youtube.com/watch?v=ywinX5wgdEU 에서 LSTM에 대한 나의 이해를 요약했습니다.

우선, 시작하기 위해 훌륭한 자습서 ( 1 , 2 )를 선택하십시오 .

시간 단계의 의미 : Time-steps==3X.shape (데이터 모양 설명)에서 세 개의 분홍색 상자가 있음을 의미합니다. Keras에서 각 단계에는 입력이 필요하므로 녹색 상자 수는 일반적으로 빨간색 상자 수와 같아야합니다. 구조를 해킹하지 않는 한

하나에 많은 많은 대 많은 : keras에서는이 return_sequences당신의 초기화 매개 변수 LSTM나 GRU또는 SimpleRNN. 때 return_sequences입니다 False(기본적으로), 다음은 한 많은 그림과 같이. 반환 모양은 (batch_size, hidden_unit_length)마지막 상태를 나타내는입니다. 때 return_sequences입니다 True, 다음은 많은 많은 . 반환 형태는(batch_size, time_step, hidden_unit_length)

기능 인수가 관련이 있습니까? 기능 인수는 "빨간색 상자의 크기" 또는 각 단계의 입력 차원을 의미합니다. 예를 들어 8 가지 종류의 시장 정보를 예측하려면을 사용하여 데이터를 생성 할 수 있습니다 feature==8.

Stateful : 소스 코드를 찾을 수 있습니다 . 상태를 초기화 할 때이면 stateful==True마지막 훈련의 상태가 초기 상태로 사용되며, 그렇지 않으면 새 상태가 생성됩니다. stateful아직 켜지지 않았습니다. 그러나 batch_size1 만 할 수 있다는 데 동의하지 않습니다 stateful==True.

현재 수집 된 데이터로 데이터를 생성합니다. 모든 순차적 정보를 수집하기 위해 하루를 기다리지 않고 재고 정보가 스트림으로 제공되는 이미지를 작성하십시오 . 네트워크를 통한 교육 / 예측 중에 온라인으로 입력 데이터를 생성하려고 합니다. 동일한 네트워크를 공유하는 400 개의 주식이있는 경우을 설정할 수 있습니다 batch_size==400.

허용되는 답변을 보완하는이 답변은 각성 행동과 각 그림을 달성하는 방법을 보여줍니다.

일반적인 케 라스 행동

표준 케 라스 내부 처리는 다음 그림과 같이 항상 많거나 많습니다 ( features=2예 : 압력 및 온도를 사용한 경우).

이 이미지에서는 다른 차원과의 혼동을 피하기 위해 단계 수를 5로 늘 렸습니다.

이 예의 경우 :

우리는 N 오일 탱크가 있습니다
우리는 5 시간 동안 시간 단위로 조치를 취했습니다 (시간 단계).
We measured two features:
- Pressure P
- Temperature T

Our input array should then be something shaped as (N,5,2):

        [     Step1      Step2      Step3      Step4      Step5
Tank A:    [[Pa1,Ta1], [Pa2,Ta2], [Pa3,Ta3], [Pa4,Ta4], [Pa5,Ta5]],
Tank B:    [[Pb1,Tb1], [Pb2,Tb2], [Pb3,Tb3], [Pb4,Tb4], [Pb5,Tb5]],
  ....
Tank N:    [[Pn1,Tn1], [Pn2,Tn2], [Pn3,Tn3], [Pn4,Tn4], [Pn5,Tn5]],
        ]

Inputs for sliding windows

Often, LSTM layers are supposed to process the entire sequences. Dividing windows may not be the best idea. The layer has internal states about how a sequence is evolving as it steps forward. Windows eliminate the possibility of learning long sequences, limiting all sequences to the window size.

In windows, each window is part of a long original sequence, but by Keras they will be seen each as an independent sequence:

        [     Step1    Step2    Step3    Step4    Step5
Window  A:  [[P1,T1], [P2,T2], [P3,T3], [P4,T4], [P5,T5]],
Window  B:  [[P2,T2], [P3,T3], [P4,T4], [P5,T5], [P6,T6]],
Window  C:  [[P3,T3], [P4,T4], [P5,T5], [P6,T6], [P7,T7]],
  ....
        ]

Notice that in this case, you have initially only one sequence, but you're dividing it in many sequences to create windows.

The concept of "what is a sequence" is abstract. The important parts are:

you can have batches with many individual sequences
what makes the sequences be sequences is that they evolve in steps (usually time steps)

Achieving each case with "single layers"

Achieving standard many to many:

You can achieve many to many with a simple LSTM layer, using return_sequences=True:

outputs = LSTM(units, return_sequences=True)(inputs)

#output_shape -> (batch_size, steps, units)

Achieving many to one:

Using the exact same layer, keras will do the exact same internal preprocessing, but when you use return_sequences=False (or simply ignore this argument), keras will automatically discard the steps previous to the last:

outputs = LSTM(units)(inputs)

#output_shape -> (batch_size, units) --> steps were discarded, only the last was returned

Achieving one to many

Now, this is not supported by keras LSTM layers alone. You will have to create your own strategy to multiplicate the steps. There are two good approaches:

Create a constant multi-step input by repeating a tensor
Use a stateful=True to recurrently take the output of one step and serve it as the input of the next step (needs output_features == input_features)

One to many with repeat vector

In order to fit to keras standard behavior, we need inputs in steps, so, we simply repeat the inputs for the length we want:

outputs = RepeatVector(steps)(inputs) #where inputs is (batch,features)
outputs = LSTM(units,return_sequences=True)(outputs)

#output_shape -> (batch_size, steps, units)

Understanding stateful = True

Now comes one of the possible usages of stateful=True (besides avoiding loading data that can't fit your computer's memory at once)

Stateful allows us to input "parts" of the sequences in stages. The difference is:

In stateful=False, the second batch contains whole new sequences, independent from the first batch
In stateful=True, the second batch continues the first batch, extending the same sequences.

It's like dividing the sequences in windows too, with these two main differences:

these windows do not superpose!!
stateful=True will see these windows connected as a single long sequence

In stateful=True, every new batch will be interpreted as continuing the previous batch (until you call model.reset_states()).

Sequence 1 in batch 2 will continue sequence 1 in batch 1.
Sequence 2 in batch 2 will continue sequence 2 in batch 1.
Sequence n in batch 2 will continue sequence n in batch 1.

Example of inputs, batch 1 contains steps 1 and 2, batch 2 contains steps 3 to 5:

                   BATCH 1                           BATCH 2
        [     Step1      Step2        |    [    Step3      Step4      Step5
Tank A:    [[Pa1,Ta1], [Pa2,Ta2],     |       [Pa3,Ta3], [Pa4,Ta4], [Pa5,Ta5]],
Tank B:    [[Pb1,Tb1], [Pb2,Tb2],     |       [Pb3,Tb3], [Pb4,Tb4], [Pb5,Tb5]],
  ....                                |
Tank N:    [[Pn1,Tn1], [Pn2,Tn2],     |       [Pn3,Tn3], [Pn4,Tn4], [Pn5,Tn5]],
        ]                                  ]

Notice the alignment of tanks in batch 1 and batch 2! That's why we need shuffle=False (unless we are using only one sequence, of course).

You can have any number of batches, indefinitely. (For having variable lengths in each batch, use input_shape=(None,features).

One to many with stateful=True

For our case here, we are going to use only 1 step per batch, because we want to get one output step and make it be an input.

Please notice that the behavior in the picture is not "caused by" stateful=True. We will force that behavior in a manual loop below. In this example, stateful=True is what "allows" us to stop the sequence, manipulate what we want, and continue from where we stopped.

Honestly, the repeat approach is probably a better choice for this case. But since we're looking into stateful=True, this is a good example. The best way to use this is the next "many to many" case.

Layer:

outputs = LSTM(units=features, 
               stateful=True, 
               return_sequences=True, #just to keep a nice output shape even with length 1
               input_shape=(None,features))(inputs) 
    #units = features because we want to use the outputs as inputs
    #None because we want variable length

#output_shape -> (batch_size, steps, units)

Now, we're going to need a manual loop for predictions:

input_data = someDataWithShape((batch, 1, features))

#important, we're starting new sequences, not continuing old ones:
model.reset_states()

output_sequence = []
last_step = input_data
for i in steps_to_predict:

    new_step = model.predict(last_step)
    output_sequence.append(new_step)
    last_step = new_step

 #end of the sequences
 model.reset_states()

Many to many with stateful=True

Now, here, we get a very nice application: given an input sequence, try to predict its future unknown steps.

We're using the same method as in the "one to many" above, with the difference that:

we will use the sequence itself to be the target data, one step ahead
we know part of the sequence (so we discard this part of the results).

Layer (same as above):

outputs = LSTM(units=features, 
               stateful=True, 
               return_sequences=True, 
               input_shape=(None,features))(inputs) 
    #units = features because we want to use the outputs as inputs
    #None because we want variable length

#output_shape -> (batch_size, steps, units)

Training:

We are going to train our model to predict the next step of the sequences:

totalSequences = someSequencesShaped((batch, steps, features))
    #batch size is usually 1 in these cases (often you have only one Tank in the example)

X = totalSequences[:,:-1] #the entire known sequence, except the last step
Y = totalSequences[:,1:] #one step ahead of X

#loop for resetting states at the start/end of the sequences:
for epoch in range(epochs):
    model.reset_states()
    model.train_on_batch(X,Y)

Predicting:

The first stage of our predicting involves "ajusting the states". That's why we're going to predict the entire sequence again, even if we already know this part of it:

model.reset_states() #starting a new sequence
predicted = model.predict(totalSequences)
firstNewStep = predicted[:,-1:] #the last step of the predictions is the first future step

Now we go to the loop as in the one to many case. But don't reset states here!. We want the model to know in which step of the sequence it is (and it knows it's at the first new step because of the prediction we just made above)

output_sequence = [firstNewStep]
last_step = firstNewStep
for i in steps_to_predict:

    new_step = model.predict(last_step)
    output_sequence.append(new_step)
    last_step = new_step

 #end of the sequences
 model.reset_states()

This approach was used in these answers and file:

Achieving complex configurations

In all examples above, I showed the behavior of "one layer".

You can, of course, stack many layers on top of each other, not necessarly all following the same pattern, and create your own models.

One interesting example that has been appearing is the "autoencoder" that has a "many to one encoder" followed by a "one to many" decoder:

Encoder:

inputs = Input((steps,features))

#a few many to many layers:
outputs = LSTM(hidden1,return_sequences=True)(inputs)
outputs = LSTM(hidden2,return_sequences=True)(outputs)    

#many to one layer:
outputs = LSTM(hidden3)(outputs)

encoder = Model(inputs,outputs)

Decoder:

Using the "repeat" method;

inputs = Input((hidden3,))

#repeat to make one to many:
outputs = RepeatVector(steps)(inputs)

#a few many to many layers:
outputs = LSTM(hidden4,return_sequences=True)(outputs)

#last layer
outputs = LSTM(features,return_sequences=True)(outputs)

decoder = Model(inputs,outputs)

Autoencoder:

inputs = Input((steps,features))
outputs = encoder(inputs)
outputs = decoder(outputs)

autoencoder = Model(inputs,outputs)

Train with fit(X,X)

Additional explanations

이 단계가에 대해 LSTMs 계산, 또는 세부 사항을 지정하는 방법에 대한 자세한 내용은 원하는 경우 stateful=True위의 사례를, 당신은이 답변에 더 많은 읽을 수 있습니다 : `이해 Keras LSTMs`에 대한 의심

RNN의 마지막 계층에 return_sequences가 있으면 간단한 Dense 계층을 사용할 수 없으며 대신 TimeDistributed를 사용하십시오.

다음은 다른 사람들을 도울 수있는 예제 코드입니다.

단어 = keras.layers.Input (batch_shape = (없음, self.maxSequenceLength), 이름 = "입력")

    # Build a matrix of size vocabularySize x EmbeddingDimension 
    # where each row corresponds to a "word embedding" vector.
    # This layer will convert replace each word-id with a word-vector of size Embedding Dimension.
    embeddings = keras.layers.embeddings.Embedding(self.vocabularySize, self.EmbeddingDimension,
        name = "embeddings")(words)
    # Pass the word-vectors to the LSTM layer.
    # We are setting the hidden-state size to 512.
    # The output will be batchSize x maxSequenceLength x hiddenStateSize
    hiddenStates = keras.layers.GRU(512, return_sequences = True, 
                                        input_shape=(self.maxSequenceLength,
                                        self.EmbeddingDimension),
                                        name = "rnn")(embeddings)
    hiddenStates2 = keras.layers.GRU(128, return_sequences = True, 
                                        input_shape=(self.maxSequenceLength, self.EmbeddingDimension),
                                        name = "rnn2")(hiddenStates)

    denseOutput = TimeDistributed(keras.layers.Dense(self.vocabularySize), 
        name = "linear")(hiddenStates2)
    predictions = TimeDistributed(keras.layers.Activation("softmax"), 
        name = "softmax")(denseOutput)  

    # Build the computational graph by specifying the input, and output of the network.
    model = keras.models.Model(input = words, output = predictions)
    # model.compile(loss='kullback_leibler_divergence', \
    model.compile(loss='sparse_categorical_crossentropy', \
        optimizer = keras.optimizers.Adam(lr=0.009, \
            beta_1=0.9,\
            beta_2=0.999, \
            epsilon=None, \
            decay=0.01, \
            amsgrad=False))

참고 URL : https://stackoverflow.com/questions/38714959/understanding-keras-lstms

'IT story' 카테고리의 다른 글

Vim의 자동 완성 (0)	2020.04.06
왜 Mockito가 정적 메소드를 조롱하지 않습니까? (0)	2020.04.06
Chrome devtools의 교차 스타일 속성은 무엇을 의미합니까? (0)	2020.04.06
Java에서 유효한 @SuppressWarnings 경고 이름 목록은 무엇입니까? (0)	2020.04.06
키로 해시 정렬, 루비에서 해시 반환 (0)	2020.04.06

현재글Keras LSTM 이해

hot-time

Keras LSTM 이해

Keras LSTM 이해

시간 단계 및 기능이란 무엇입니까?

상태 저장 LSTM

편집 1 :

편집 2 :

최신 정보:

업데이트 2 :

일반적인 케 라스 행동

Inputs for sliding windows

Achieving each case with "single layers"

Achieving standard many to many:

Achieving many to one:

Achieving one to many

One to many with repeat vector

Understanding stateful = True

One to many with stateful=True

Many to many with stateful=True

Achieving complex configurations

Additional explanations

'IT story' 카테고리의 다른 글

'IT story'의 다른글

티스토리툴바

Keras LSTM 이해

Keras LSTM 이해

시간 단계 및 기능이란 무엇입니까?

상태 저장 LSTM

편집 1 :

편집 2 :

최신 정보:

업데이트 2 :

일반적인 케 라스 행동

Inputs for sliding windows

Achieving each case with "single layers"

Achieving standard many to many:

Achieving many to one:

Achieving one to many

One to many with repeat vector

Understanding stateful = True

One to many with stateful=True

Many to many with stateful=True

Achieving complex configurations

Additional explanations

'IT story' 카테고리의 다른 글

'IT story'의 다른글

관련글

티스토리툴바