In libgnn, a pattern is defined as a triple
where
is the input pattern or feature vector,
is the output pattern or target vector and
is the pattern weight or pattern relevance.
Datasets are sets of
training patterns, which could be used for training (as mentioned) or for model validation and testing. To identify a particular pattern, they are indexed by
and written as
,
and
(note that the vectors are superindexed and the scalar value is subindexed). Schematically, a dataset can be ilustrated by the following figure:
Patterns are commonly obtained from the observation of the real system to be modelled, e.g. plants or other phenomenon, or sometimes they are artificially constructed. In libgnn, they can be sampled from different sources, like its parts (input, output, weight).
The gnn_dataset : Datasets for Training. and its functions provide a common interface, or protocol, for different types of datasets. A dataset can be just a sampler from three different sources (input, output and weight), or a shuffler, or a preprocessor, etc. They just provide a logic view of the underlying sample sources, and they have to manage them without mixing them up.
libgnn's design for handling datasets is very simple. A dataset, as an "abstract object", has some properties and can do some things:
and
respectivelly.
.Datasets (as an abstraction) do exist because there are many ways to get samples, and there are many possible sources. Per example, there could be a dataset which samples its patterns form a disk, from RAM, serial port, etc. Or even worse, the three sources itself could be heterogeneous. Also, the sampling method could vary.
Datasets can (but aren't forced to) be made of gnn_input : Reading and handling of sets of vectors. s. As a particular example, gnn_simple_set : A simple implementation of datasets. is built upon three samplers (one for inputs, targets and weights).
What does a trainer do in order to sample from a data set?
The order in which a trainer calls the functions on a gnn_dataset : Datasets for Training. is the one ilustrated in the following flow diagram:
The important steps are marked with a bold border.
How to implement the gnn_dataset interface with a custom dataset?
It's simple. Create a new C datatype, wich should contain a gnn_dataset : Datasets for Training. structure in it:
typedef struct _my_dataset my_dataset; struct _gnn_simple_set { gnn_dataset set; // The underlying gnn_dataset // Other things your dataset needs... ... };
Then, implement the 3 needed functions: reset, get and detroy, conforming to the calling parameter specification:
int my_dataset_reset (gnn_dataset *set); int my_dataset_get (gnn_dataset *set, size_t k, gsl_vector **x, gsl_vector **t, double *p); void my_dataset_destroy (gnn_dataset *set);
And finally, create a constructor which must call the gnn_dataset_init function:
gnn_dataset * my_dataset_new () { my_datasete *myset; // a pointer to the dataset to be created gnn_dataset *set; // a pointer to the same dataset, but viewed as a // gnn_dataset // allocate memory for the dataset myset = (my_dataset *) malloc (sizeof (my_dataset)); // initialize the dataset gnn_dataset_init (set, N_OF_PATTERNS, INPUT_SIZE, TARGET_SIZE, my_dataset_reset, my_dataset_get, my_dataset_destroy); // do other initialization your dataset might need... ... return myset; }
That's it!
Modules | |
| gnn_dataset_view : A view for datasets. | |
| View for datasets implementation. | |
| gnn_random_order : A random order sampler. | |
| Random order sampler for datasets implementation. | |
| gnn_simple_set : A simple implementation of datasets. | |
| Simple dataset implementation. | |
Typedefs | |
| typedef _gnn_dataset | gnn_dataset |
| The datatype for dataset reset functions. | |
| typedef int(* | gnn_dataset_reset_type )(gnn_dataset *set) |
| The datatype for dataset reset functions. | |
| typedef int(* | gnn_dataset_get_type )(gnn_dataset *set, size_t k, gsl_vector **x, gsl_vector **t, double *p) |
| The datatype for dataset get functions. | |
Functions | |
| int | gnn_dataset_default_reset (gnn_dataset *set) |
| Default "reset" function for a dataset. | |
| void | gnn_dataset_default_destroy (gnn_dataset *set) |
| Default "destroy" function for a dataset. | |
| int | gnn_dataset_init (gnn_dataset *set, size_t size, size_t n, size_t m, gnn_dataset_reset_type reset, gnn_dataset_get_type get, gnn_dataset_destroy_type destroy) |
| Initializes a gnn_dataset : Datasets for Training.. | |
| void | gnn_dataset_destroy (gnn_dataset *set) |
| Destroy a dataset. | |
| int | gnn_dataset_reset (gnn_dataset *set) |
| Reset a dataset. | |
| int | gnn_dataset_get (gnn_dataset *set, size_t k, gsl_vector **x, gsl_vector **t, double *weight) |
| Gets the i-th pattern. | |
| size_t | gnn_dataset_get_size (gnn_dataset *set) |
| Gets the size of the dataset. | |
| size_t | gnn_dataset_input_get_size (gnn_dataset *set) |
| Gets the input size of the dataset. | |
| size_t | gnn_dataset_output_get_size (gnn_dataset *set) |
| Gets the output size of the dataset. | |
|
|
This is the datatype that contains the basic elements for the implementation of a dataset. Every new type of dataset should extend this structure. Definition at line 50 of file gnn_dataset.h. |
|
|
This function type defines the form of the "get" functions for datasets. A "get" function should return the k-th pattern full pattern (that is, its input and output vectors, and its weight). The index "k" depends on the specific dataset. Definition at line 73 of file gnn_dataset.h. |
|
|
This function type defines the form of the "reset" functions for datasets. A "reset" function should prepare the internal properties for begin a new sequence of pattern drawings. "reset" is usually called after the end of an epoch has been reached. Definition at line 62 of file gnn_dataset.h. |
|
|
This function is the default "destroy" function for a dataset. It assumes that there isn't any additional data for the specific dataset type, so it actually just returns.
Definition at line 221 of file gnn_dataset.c. |
|
|
This function is the default "reset" function for a dataset. It does nothing.
Definition at line 203 of file gnn_dataset.c. |
|
|
This function destroys the dataset.
Definition at line 318 of file gnn_dataset.c. |
|
||||||||||||||||||||||||
|
This function returns pointers to the pattern located atstores the dataset's i-th pattern into the buffers "x" and "t" (which should be both gsl_vector of the correct size) and its corresponding weight into the location pointed by "weight". Note that what "the i-th pattern" means depends on the underlying implementation. Also, i should be within 0 and the dataset's size.
Definition at line 365 of file gnn_dataset.c. |
|
|
This function returns the dataset's size.
Definition at line 386 of file gnn_dataset.c. |
|
||||||||||||||||||||||||||||||||
|
This function initializes a given dataset, setting its properties and installing its functions. If the reset or destroy functions aren't provided, then the default functions gnn_dataset : Datasets for Training. and gnn_dataset : Datasets for Training. are installed respectively. The "get" function is mandatory and can't be omitted. As an example, suppose that you have made your own extension to the gnn_dataset : Datasets for Training. datatype, which you called "my_dataset_type". Also, suppose you have already coded the appropiate "reset", "get" and "destroy" functions for your special dataset. Then, my_dataset_type *myset; // a pointer to the dataset to be created gnn_dataset *set; // a pointer to the same dataset, but viewed as a // gnn_dataset // allocate memory for the dataset myset = (my_dataset_type *) malloc (sizeof (my_dataset_type)); // initialize the dataset gnn_dataset_init (set, 100, 5, 2, my_dataset_reset, my_dataset_get, my_dataset_destroy);
Definition at line 273 of file gnn_dataset.c. |
|
|
This function returns the size of the pattern's input vector size.
Definition at line 403 of file gnn_dataset.c. |
|
|
This function returns the size of the pattern's output vector size.
Definition at line 420 of file gnn_dataset.c. |
|
|
This function resets the dataset.
Definition at line 336 of file gnn_dataset.c. |
1.2.18