Skip to content
Snippets Groups Projects
Commit 30a70788 authored by migtoqu's avatar migtoqu
Browse files

Subir nuevo archivo

parent 533fea48
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
Importamos modulos
%% Cell type:code id: tags:
``` python
import os, shutil
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
```
%% Cell type:markdown id: tags:
Lectura de datos (metadatos)
%% Cell type:code id: tags:
``` python
metadata = pd.read_csv('../datos/Metadata/metadatos.csv')
metadata.head()
```
%% Output
FILE NAME CLASE
0 COVID-19(1) COVID-19
1 COVID-19(2) COVID-19
2 COVID-19(3) COVID-19
3 COVID-19(4) COVID-19
4 COVID-19(5) COVID-19
%% Cell type:markdown id: tags:
Separamos en entrenamiento y prueba
%% Cell type:code id: tags:
``` python
train, test = train_test_split(metadata, test_size=1/3, stratify=metadata.CLASE)
```
%% Cell type:markdown id: tags:
Observamos las mismas proporciones en todos los subconjuntos
%% Cell type:markdown id: tags:
Creamos en el directorio datos los directorios train y test donde almacenaremos las imagenes correspondientes
%% Cell type:markdown id: tags:
Intentamos obtener los nombres reales de las imagenes
%% Cell type:code id: tags:
``` python
mypath = "../datos/COVID-19"
covid_files = [f for f in os.listdir(mypath)]
#covid_files
```
%% Cell type:code id: tags:
``` python
mypath = "../datos/NORMAL"
normal_files = [f for f in os.listdir(mypath)]
#normal_files
```
%% Cell type:code id: tags:
``` python
mypath = "../datos/Viral Pneumonia"
pneumonia_files = [f for f in os.listdir(mypath)]
#pneumonia_files
```
%% Cell type:code id: tags:
``` python
files= normal_files+pneumonia_files+covid_files
#files.append(pneumonia_files)
#files.append(covid_files)
```
%% Cell type:code id: tags:
``` python
clase_n = ['NORMAL' for i in range(0,len(normal_files))]
clase_p = ['Viral Pneumonia' for i in range(0,len(pneumonia_files))]
clase_c = ['COVID-19' for i in range(0,len(covid_files))]
clase = clase_n+clase_p+clase_c
```
%% Cell type:code id: tags:
``` python
df = pd.DataFrame(list(zip(files, clase)),
columns =['FILENAME', 'CLASS'])
df
```
%% Output
FILENAME CLASS
0 NORMAL (1).png NORMAL
1 NORMAL (10).png NORMAL
2 NORMAL (100).png NORMAL
3 NORMAL (101).png NORMAL
4 NORMAL (102).png NORMAL
... ... ...
2900 COVID-19(215).png COVID-19
2901 COVID-19(216).png COVID-19
2902 COVID-19(217).png COVID-19
2903 COVID-19(218).png COVID-19
2904 COVID-19(219).png COVID-19
[2905 rows x 2 columns]
%% Cell type:markdown id: tags:
Separacion en entrenamiento y prueba
%% Cell type:code id: tags:
``` python
train, test = train_test_split(df, test_size=1/3, stratify=df.CLASS)
```
%% Cell type:code id: tags:
``` python
print("METADATA:")
print("Total frecuencies: \n",df.CLASS.value_counts())
print("Proportion: \n",df.CLASS.value_counts()/df.shape[0])
print("----------------------------------------------")
print("TRAIN:")
print("Total frecuencies: \n",train.CLASS.value_counts())
print("Proportion: \n",train.CLASS.value_counts()/train.shape[0])
print("----------------------------------------------")
print("TEST:")
print("Total frecuencies: \n",test.CLASS.value_counts())
print("Proportion: \n",test.CLASS.value_counts()/test.shape[0])
```
%% Output
METADATA:
Total frecuencies:
Viral Pneumonia 1345
NORMAL 1341
COVID-19 219
Name: CLASS, dtype: int64
Proportion:
Viral Pneumonia 0.462995
NORMAL 0.461618
COVID-19 0.075387
Name: CLASS, dtype: float64
----------------------------------------------
TRAIN:
Total frecuencies:
Viral Pneumonia 896
NORMAL 894
COVID-19 146
Name: CLASS, dtype: int64
Proportion:
Viral Pneumonia 0.462810
NORMAL 0.461777
COVID-19 0.075413
Name: CLASS, dtype: float64
----------------------------------------------
TEST:
Total frecuencies:
Viral Pneumonia 449
NORMAL 447
COVID-19 73
Name: CLASS, dtype: int64
Proportion:
Viral Pneumonia 0.463364
NORMAL 0.461300
COVID-19 0.075335
Name: CLASS, dtype: float64
%% Cell type:markdown id: tags:
Comprobamos que las proporciones se mantienen, hemos realizado un muestreo balanceado.
%% Cell type:code id: tags:
``` python
for i in range(train.shape[0]):
#train.iloc[i,1] = CLASE
#train.iloc[i,0] = FILE NAME
src = os.path.join("../datos",str(train.iloc[i,1]),str(train.iloc[i,0]))
dst = os.path.join("../datos/train",str(train.iloc[i,1]),str(train.iloc[i,0]))
shutil.copyfile(src,dst)
```
%% Cell type:code id: tags:
``` python
for i in range(test.shape[0]):
#train.iloc[i,1] = CLASE
#train.iloc[i,0] = FILE NAME
src = os.path.join("../datos",str(test.iloc[i,1]),str(test.iloc[i,0]))
dst = os.path.join("../datos/test",str(test.iloc[i,1]),str(test.iloc[i,0]))
shutil.copyfile(src,dst)
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment