[AI] 대학교 합격 예측 프로그램

728x90

1. 데이터 합격 예측 프로그램

- python3, jupyter 사용

- 파일은 아래와 같이 admit(y값), gre,gpa,rank (x값)으로 구성이 되어있습니다.

2. 데이터 읽어오기

data = pd.read_csv('gpascore.csv')

data.head()

3. 데이터 전처리 과정

1) null 데이터 제거

2) x 데이터와 y 데이터 만들기

3) 데이터 노멀라이즈 (max값으로 나눠주기)

3) 훈련/검증용 데이터 분할

전처리 시작해보겠습니다~!!!

<코드>

null 데이터를 제거해주지않으면 신경망 학습이 제대로 이루어지지않으니 꼭 null 데이터를 지우고 시작해야합니다.

# null 데이터 제거

data = data.dropna()

X = []
for i, rows in data.iterrows():
    #print(rows['gre'], rows['gpa'],rows['rank'])
    X.append([rows['gre'], rows['gpa'],rows['rank']])

# x 데이터 데이터 노멀라이즈

X_max = np.max(X,axis = 0)
X = X/X_max

array([[0.475 , 0.8025, 0.75  ],
       [0.825 , 0.9175, 0.75  ],
       [1.    , 1.    , 0.25  ],
       ...,
       [0.5625, 0.8125, 1.    ],
       [0.95  , 0.94  , 0.5   ],
       [0.8875, 0.955 , 0.75  ]])

# 훈련/검증용 데이터 분할
X_train, X_test,y_train,y_test = \
    train_test_split(X,Y,
                    test_size = 0.3,
                    random_state = 1,
                    stratify = Y) # 각각의 비율을 유지.

4. 신경망 구축

신경망은 모델 성능에 영향을 미친 요소

- activation

- dropout

- callback 함수

- 은닉층 개수

<코드>

model =tf.keras.models.Sequential([
    tf.keras.layers.Dense(64, activation = 'relu'),
    tf.keras.layers.Dense(128,activation = 'relu'),
    tf.keras.layers.Dense(256,activation = 'relu'),
    tf.keras.layers.Dropout(rate = 0.5),
    tf.keras.layers.Dense(1,activation = 'sigmoid') # yer/no

])

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# 콜백 함수 사용

# 100번 훈련시킴

early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
history = model.fit(np.array(X_train),np.array(y_train),epochs = EPOCHS,
validation_data = (np.array(X_test),np.array(y_test)),
callbacks=[early_stop])

loss: 0.4561 - accuracy: 0.7946 - val_loss: 0.5301 - val_accuracy: 0.7578
Epoch 76/100
297/297 [==============================] - 0s 245us/sample - loss: 0.4497 - accuracy: 0.8047 - val_loss: 0.5380 - val_accuracy: 0.7812

5. 모델 성능 확인

정확도 78% 의 성능을 가진 모델

그래프로 확인해보기.

# 정확도
train_history = history.history["accuracy"]
test_history = history.history["val_accuracy"]
fig = plt.figure(figsize = (8, 8))
plt.title("Accuracy History")
plt.xlabel("EPOCH")
plt.ylabel("Accuracy")
plt.plot(train_history, "red")
plt.plot(test_history,"blue")
fig.savefig("train_history.png")

# 로스율
train_history = history.history["loss"]
test_history = history.history["val_loss"]
fig = plt.figure(figsize = (8, 8))
plt.title("loss History")
plt.xlabel("EPOCH")
plt.ylabel("Loss Function")
plt.plot(train_history, "red")
plt.plot(test_history, "blue")
fig.savefig("loss_history.png")

저작자표시 (새창열림)

'AI' 카테고리의 다른 글

[자연어 처리] 언어학에 대해서 배워보자 (0)	2021.12.27
[자연어처리] 자연어 처리 기초 이론 (0)	2021.12.27
[Python] 현미경 영상 분석을 통한 암 진단 (0)	2021.12.26
[Python] 스팸 문자 분류 프로젝트 LSTM 사용해보기 (0)	2021.12.26
[Python] 자연어 처리 기초, 스팸 문자 분류 프로젝트 (0)	2021.12.24

TEST Dev Space

[AI] 대학교 합격 예측 프로그램

1. 데이터 합격 예측 프로그램

2. 데이터 읽어오기

3. 데이터 전처리 과정

4. 신경망 구축

5. 모델 성능 확인

'AI' 카테고리의 다른 글

티스토리툴바

[AI] 대학교 합격 예측 프로그램

1. 데이터 합격 예측 프로그램

2. 데이터 읽어오기

3. 데이터 전처리 과정

4. 신경망 구축

5. 모델 성능 확인

'AI' 카테고리의 다른 글

관련글

티스토리툴바