데이터 분할 방법 (CV만들기)

CV는 미리미리 만들어놓고 확정해놓는게 좋다.

1) chest (simple) preprocessing

훈련 데이터셋을 CV별로 나누는 로직

  • CV를 나누어서 데이터셋을 미리 만들어주는것이 더 효율적이다.
  • 데이터셋이 커지면, CV를 전부 돌리는것은 굉장히 부담이다. 따라서 대부분은 그냥 1CV만을 기준으로 벤치를 돌리고,
  • 최종적인 모델을 만들때만 하나로 합친다.
In [1]:
import pandas as pd
import numpy as np
import os
import seaborn as sns
import os


from sklearn.model_selection import KFold
In [2]:
DIR = '../CheXpert-v1.0-small/'
OUTPUT_DIR = '../CV/'
CV_COUNT = 6
In [3]:
os.listdir(DIR)
Out[3]:
['train', 'train.csv', 'valid', 'valid.csv']
In [4]:
raw = pd.read_csv(DIR + 'train.csv')
In [5]:
frontal_df = raw[raw['Frontal/Lateral'] == 'Frontal']
lateral_df = raw[raw['Frontal/Lateral'] == 'Lateral']

폴드 나눔

  • 층화추출을 하거나 랜덤으로 폴드를 나누면 된다.
In [6]:
splits = list(KFold(n_splits=CV_COUNT, shuffle=True, random_state=18).split(frontal_df))
In [8]:
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)
In [11]:
for index, sp in enumerate(splits):
    train_df = frontal_df.iloc[sp[0]]
    test_df = frontal_df.iloc[sp[1]]
    
    train_df.to_csv(OUTPUT_DIR + 'chest_frontal_train_cv_' + str(index) + '.csv', index = False)
    test_df.to_csv(OUTPUT_DIR + 'chest_frontal_test_cv_' + str(index) + '.csv', index = False)
  • 이제 모델의 정확성을 측정할때는 아래의 데이터셋중 하나의 CV로만 검증한다
In [12]:
os.listdir(OUTPUT_DIR)
Out[12]:
['chest_frontal_test_cv_0.csv',
 'chest_frontal_test_cv_4.csv',
 'chest_frontal_test_cv_2.csv',
 'chest_frontal_train_cv_3.csv',
 'chest_frontal_test_cv_3.csv',
 'chest_frontal_train_cv_0.csv',
 'chest_frontal_test_cv_1.csv',
 'chest_frontal_test_cv_5.csv',
 'chest_frontal_train_cv_5.csv',
 'chest_frontal_train_cv_2.csv',
 'chest_frontal_train_cv_1.csv',
 'chest_frontal_train_cv_4.csv']

23 thoughts on “데이터 분할 방법 (CV만들기)

  1. If you would like to improve your knowledge simply
    keep visiting this site and g be updated with the most up-to-date news posted here.

  2. I think this is one of the so much vital info for me. And i am satisfied
    reading your article. But wanna commentary on few general issues, The website style is wonderful, the articles is
    in point of fact nice : D. Good job, cheers

    Here is my web-site … g that

  3. Hello my friend! I wish to say that this article is awesome,
    nice written and come with almost all significant infos.
    I’d like to see more posts like this .

    Here is my blog … g rsacwgxy

  4. certainly like your website but you need to check the spelling on quite a few of your posts.

    Several of them are rife with spelling issues and I to find
    it very troublesome to tell the reality then again I will
    surely come back again.

    Also visit my webpage cbd oil (tinyurl.com)

  5. I’ve been surfing online more than 3 hours today,
    yet I never found any interesting article like yours.
    It’s pretty worth enough for me. In my view, if all webmasters and bloggers
    made good content as you did, the net will be much more useful than ever before.

    Here is my webpage cbd oil that works 2020

  6. Magnificent goods from you, man. I’ve understand your stuff previous to and you are just too great.
    I really like what you’ve acquired here, really like what you are
    saying and the way in which you say it. You make it enjoyable and you still
    take care of to keep it wise. I cant wait to read far more from you.
    This is really a terrific web site.

    Feel free to surf to my site; cbd oil that works 2020

댓글 남기기