Main Page   Modules   Data Structures   File List   Data Fields   Globals   Related Pages  

gnn_dataset : Datasets for Training.
[Datasets]


Detailed Description

The gnn_dataset : Datasets for Training. type defines a common interface for handling pattern sets.

In libgnn, a pattern is defined as a triple

where is the input pattern or feature vector, is the output pattern or target vector and is the pattern weight or pattern relevance.

Datasets are sets of training patterns, which could be used for training (as mentioned) or for model validation and testing. To identify a particular pattern, they are indexed by and written as , and (note that the vectors are superindexed and the scalar value is subindexed). Schematically, a dataset can be ilustrated by the following figure:

Patterns are commonly obtained from the observation of the real system to be modelled, e.g. plants or other phenomenon, or sometimes they are artificially constructed. In libgnn, they can be sampled from different sources, like its parts (input, output, weight).

The gnn_dataset : Datasets for Training. and its functions provide a common interface, or protocol, for different types of datasets. A dataset can be just a sampler from three different sources (input, output and weight), or a shuffler, or a preprocessor, etc. They just provide a logic view of the underlying sample sources, and they have to manage them without mixing them up.

libgnn's design for handling datasets is very simple. A dataset, as an "abstract object", has some properties and can do some things:

What is the purpose of a dataset?

Datasets (as an abstraction) do exist because there are many ways to get samples, and there are many possible sources. Per example, there could be a dataset which samples its patterns form a disk, from RAM, serial port, etc. Or even worse, the three sources itself could be heterogeneous. Also, the sampling method could vary.

Datasets can (but aren't forced to) be made of gnn_input : Reading and handling of sets of vectors. s. As a particular example, gnn_simple_set : A simple implementation of datasets. is built upon three samplers (one for inputs, targets and weights).

What does a trainer do in order to sample from a data set?

The order in which a trainer calls the functions on a gnn_dataset : Datasets for Training. is the one ilustrated in the following flow diagram:

The important steps are marked with a bold border.

How to implement the gnn_dataset interface with a custom dataset?

It's simple. Create a new C datatype, wich should contain a gnn_dataset : Datasets for Training. structure in it:

 typedef struct _my_dataset my_dataset;

 struct _gnn_simple_set
 {
    gnn_dataset set;     // The underlying gnn_dataset
    
    // Other things your dataset needs...
    ...
 };

Then, implement the 3 needed functions: reset, get and detroy, conforming to the calling parameter specification:

 int
 my_dataset_reset (gnn_dataset *set);

 int
 my_dataset_get (gnn_dataset *set,
                     size_t       k,
                     gsl_vector **x,
                     gsl_vector **t,
                     double      *p);

 void
 my_dataset_destroy (gnn_dataset *set);

And finally, create a constructor which must call the gnn_dataset_init function:

 gnn_dataset *
 my_dataset_new ()
 {
   my_datasete *myset; // a pointer to the dataset to be created
   gnn_dataset *set;   // a pointer to the same dataset, but viewed as a
                       // gnn_dataset

   // allocate memory for the dataset
   myset = (my_dataset *) malloc (sizeof (my_dataset));

   // initialize the dataset
   gnn_dataset_init (set, N_OF_PATTERNS, INPUT_SIZE, TARGET_SIZE,
                     my_dataset_reset, my_dataset_get, my_dataset_destroy);

   // do other initialization your dataset might need...
   ...
    
   return myset;
 }

That's it!


Modules

gnn_dataset_view : A view for datasets.
 View for datasets implementation.

gnn_random_order : A random order sampler.
 Random order sampler for datasets implementation.

gnn_simple_set : A simple implementation of datasets.
 Simple dataset implementation.


Typedefs

typedef _gnn_dataset gnn_dataset
 The datatype for dataset reset functions.

typedef int(* gnn_dataset_reset_type )(gnn_dataset *set)
 The datatype for dataset reset functions.

typedef int(* gnn_dataset_get_type )(gnn_dataset *set, size_t k, gsl_vector **x, gsl_vector **t, double *p)
 The datatype for dataset get functions.


Functions

int gnn_dataset_default_reset (gnn_dataset *set)
 Default "reset" function for a dataset.

void gnn_dataset_default_destroy (gnn_dataset *set)
 Default "destroy" function for a dataset.

int gnn_dataset_init (gnn_dataset *set, size_t size, size_t n, size_t m, gnn_dataset_reset_type reset, gnn_dataset_get_type get, gnn_dataset_destroy_type destroy)
 Initializes a gnn_dataset : Datasets for Training..

void gnn_dataset_destroy (gnn_dataset *set)
 Destroy a dataset.

int gnn_dataset_reset (gnn_dataset *set)
 Reset a dataset.

int gnn_dataset_get (gnn_dataset *set, size_t k, gsl_vector **x, gsl_vector **t, double *weight)
 Gets the i-th pattern.

size_t gnn_dataset_get_size (gnn_dataset *set)
 Gets the size of the dataset.

size_t gnn_dataset_input_get_size (gnn_dataset *set)
 Gets the input size of the dataset.

size_t gnn_dataset_output_get_size (gnn_dataset *set)
 Gets the output size of the dataset.


Typedef Documentation

typedef struct _gnn_dataset gnn_dataset
 

This is the datatype that contains the basic elements for the implementation of a dataset. Every new type of dataset should extend this structure.

Definition at line 50 of file gnn_dataset.h.

typedef int(* gnn_dataset_get_type)(gnn_dataset *set, size_t k, gsl_vector **x, gsl_vector **t, double *p)
 

This function type defines the form of the "get" functions for datasets. A "get" function should return the k-th pattern full pattern (that is, its input and output vectors, and its weight). The index "k" depends on the specific dataset.

Definition at line 73 of file gnn_dataset.h.

typedef int(* gnn_dataset_reset_type)(gnn_dataset *set)
 

This function type defines the form of the "reset" functions for datasets. A "reset" function should prepare the internal properties for begin a new sequence of pattern drawings.

"reset" is usually called after the end of an epoch has been reached.

Definition at line 62 of file gnn_dataset.h.


Function Documentation

void gnn_dataset_default_destroy gnn_dataset   set
 

This function is the default "destroy" function for a dataset. It assumes that there isn't any additional data for the specific dataset type, so it actually just returns.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..

Definition at line 221 of file gnn_dataset.c.

int gnn_dataset_default_reset gnn_dataset   set [static]
 

This function is the default "reset" function for a dataset. It does nothing.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..
Returns:
0 if succeeded.

Definition at line 203 of file gnn_dataset.c.

void gnn_dataset_destroy gnn_dataset   set
 

This function destroys the dataset.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..

Definition at line 318 of file gnn_dataset.c.

int gnn_dataset_get gnn_dataset   set,
size_t    k,
gsl_vector **    x,
gsl_vector **    t,
double *    weight
 

This function returns pointers to the pattern located atstores the dataset's i-th pattern into the buffers "x" and "t" (which should be both gsl_vector of the correct size) and its corresponding weight into the location pointed by "weight".

Note that what "the i-th pattern" means depends on the underlying implementation. Also, i should be within 0 and the dataset's size.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..
i  The index of the pattern to be retrieved.
x  A pointer to the gsl_vector where the input pattern should be placed in.
t  A pointer to the gsl_vector where the output pattern should be placed in.
weight  A pointer to a double where the pattern's weight should be placed in.
Returns:
0 if succeeded.

Definition at line 365 of file gnn_dataset.c.

size_t gnn_dataset_get_size gnn_dataset   set
 

This function returns the dataset's size.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..
Returns:
Returns the size.

Definition at line 386 of file gnn_dataset.c.

int gnn_dataset_init gnn_dataset   set,
size_t    size,
size_t    n,
size_t    m,
gnn_dataset_reset_type    reset,
gnn_dataset_get_type    get,
gnn_dataset_destroy_type    destroy
 

This function initializes a given dataset, setting its properties and installing its functions.

If the reset or destroy functions aren't provided, then the default functions gnn_dataset : Datasets for Training. and gnn_dataset : Datasets for Training. are installed respectively. The "get" function is mandatory and can't be omitted.

As an example, suppose that you have made your own extension to the gnn_dataset : Datasets for Training. datatype, which you called "my_dataset_type". Also, suppose you have already coded the appropiate "reset", "get" and "destroy" functions for your special dataset. Then,

   my_dataset_type *myset; // a pointer to the dataset to be created
   gnn_dataset     *set;   // a pointer to the same dataset, but viewed as a
                           //     gnn_dataset
   
   // allocate memory for the dataset
   myset = (my_dataset_type *) malloc (sizeof (my_dataset_type));
 
   // initialize the dataset
   gnn_dataset_init (set, 100, 5, 2,
                     my_dataset_reset, my_dataset_get, my_dataset_destroy);
would initialize your dataset for 100 patterns, whose inputs and outputs are of size 5 and 2 respectivelly.
Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..
size  The number of patterns that it contains.
n  The size of the inputs.
m  The size of the outputs.
reset  A pointer to the dataset's "reset" function.
get  A pointer to the dataset's "get" function.
destroy  A pointer to the dataset's "destroy" function.
Returns:
0 if succeeded.

Definition at line 273 of file gnn_dataset.c.

size_t gnn_dataset_input_get_size gnn_dataset   set
 

This function returns the size of the pattern's input vector size.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..
Returns:
Returns the size.

Definition at line 403 of file gnn_dataset.c.

size_t gnn_dataset_output_get_size gnn_dataset   set
 

This function returns the size of the pattern's output vector size.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..
Returns:
Returns the size.

Definition at line 420 of file gnn_dataset.c.

int gnn_dataset_reset gnn_dataset   set
 

This function resets the dataset.

Parameters:
set  A pointer to a gnn_dataset : Datasets for Training..
Returns:
0 if succeeded.

Definition at line 336 of file gnn_dataset.c.


Generated on Sun Jun 13 20:51:43 2004 for libgnn Gradient Retropropagation Machine Library by doxygen1.2.18