Installation¶

Clone the repository:

$ git clone https://github.com/LoreDema/ValidPy.git

then install it using pip (Linux):

$ pip install -e ./ValidPy

Dependencies¶

simplejson 3.3.1
NumPy 1.9.2
PyBrain 0.3
SciPy 0.13.3
matplotlib 1.3.1
scikit-learn 0.15.2

Quick Start¶

This tool implement K-cross validation for both ANN and SVM.

For all the experiments you need a csv file comma ”,” separated. This file have to be 3 columns, each row is:

id, input_x, output_y

ANN k-cross validation¶

To perform a k-cross validation over a file you need to create a configuration JSON like this:

{
  "grid":"true",
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "hidden_layers":[1,2,3],
  "units":[15,25],
  "function":["sigmoid","gaussian"],
  "momentum":[0.0,0.9],
  "learning_rate":[0.01,0.05],
  "lr_decay":[1.0, 0.9999]
}

Then you have to run ann_kcross.sh in executable/ giving the path to the configuration JSON as parameter:

$ cd ./ValidPy/executable/
$ sh ann_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.

SVM k-cross validation¶

To perform a k-cross validation over a file you need to create a configuration JSON like this:

{
  "grid":"true",
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "kernel":["linear", "poly", "rbf", "sigmoid"],
  "C":[0.1, 1.0, 10, 100],
  "epsilon":[0.01,0.05, 0.1, 0.5, 1, 5],
  "degree":[3]
}

Then you have to run svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter:

$ cd ./ValidPy/executable/
$ sh svm_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.

ANN vs SVM k-cross validation¶

To perform a k-cross validation over a file you need to create a configuration JSON like this, you can choose how many time to repeat the experiment setting the experiments parameter:

{
  "experiments":4,
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "ANN": {
    "hidden_layers":2,
    "units":25,
    "function":"sigmoid",
    "momentum":0.0,
    "learning_rate":0.05,
    "lr_decay":0.9999
  },
  "SVM": {
    "kernel":"rbf",
    "C":30,
    "epsilon":0.1,
    "degree":3
  }
}

Then you have to run ann_vs_svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter:

$ cd ./ValidPy/executable/
$ sh ann_vs_svm_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each experiment the average training time, the average average euclidean distance over the k experiments, the total average average training time and the total average average euclidean distance(computed on the validation set outputs). It also produce for each experiment a folder with the single experiment details and models.

ANN test¶

To perform a test you need to create a configuration JSON like this:

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "hidden_layers":2,
  "valid_prop":0.1,
  "units":25,
  "function":"sigmoid",
  "momentum":0.0,
  "learning_rate":0.05,
  "lr_decay":0.9999
}

Then you have to run ann_test.sh in executable/ giving the path to the configuration JSON as parameter:

$ cd ./ValidPy/executable/
$ sh ann_test.sh path_to_config_JSON

The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.

SVM test¶

To perform a test you need to create a configuration JSON like this:

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "kernel":"rbf",
  "C":30,
  "epsilon":0.1,
  "degree":3
}

Then you have to run svm_test.sh in executable/ giving the path to the configuration JSON as parameter:

$ cd ./ValidPy/executable/
$ sh svm_test.sh path_to_config_JSON

The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.

SVM predict¶

To predict over a blind set you need a csv file comma ”,” separated. This file have to be 2 columns, each row is:

id, input_x

You have to create a configuration JSON like this:

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv",
  "out_folder":"absolute_path_output_folder",
  "out_file":"absolute_path_output_file.csv",
  "input_length": 10,
  "output_length": 2,
  "kernel":"rbf",
  "C":10,
  "epsilon":0.1,
  "degree":3
}

Then you have to run svm_train.sh in executable/ giving the path to the configuration JSON as parameter:

$ cd ./ValidPy/executable/
$ sh svm_train.sh path_to_config_JSON

The script will produce for each output a model.

Then you have to run svm_predict.sh in executable/ giving the path to the configuration JSON as parameter:code-block:

$ cd ./ValidPy/executable/
$ sh svm_predict.sh path_to_config_JSON

The script will produce a csv file containing 3 columns, each row is:

id, input_x, output_y

ANN predict¶

Not already implemented.