torchnet.dataset

Provides a Dataset interface, similar to vanilla PyTorch.

class torchnet.dataset.dataset.Dataset[source]

Bases: object

batch(*args, **kwargs)[source]
parallel(*args, **kwargs)[source]
shuffle(*args, **kwargs)[source]
split(*args, **kwargs)[source]
transform(*args, **kwargs)[source]

BatchDataset

class torchnet.dataset.BatchDataset(dataset, batchsize, perm=<function BatchDataset.<lambda>>, merge=None, policy='include-last', filter=<function BatchDataset.<lambda>>)[source]

Bases: torchnet.dataset.dataset.Dataset

Dataset which batches the data from a given dataset.

Given a dataset, BatchDataset merges samples from this dataset to form a new sample which can be interpreted as a batch of size batchsize.

The merge function controls how the batching is performed. By default the occurrences are supposed to be tensors, and they aggregated along the first dimension.

It is often important to shuffle examples while performing the batch operation. perm(idx, size) is a function which returns the shuffled index of the sample at position idx in the underlying dataset. For convenience, the size of the underlying dataset is also passed to the function. By default, the function is the identity.

The underlying dataset size might or might not be always divisible by batchsize. The optional policy string specify how to handle corner cases.

Purpose: the concept of batch is problem dependent. In torchnet, it is up to the user to interpret a sample as a batch or not. When one wants to assemble samples from an existing dataset into a batch, then BatchDataset is suited for the job. Sometimes it is however more convenient to write a dataset from scratch providing “batched” samples.

Parameters:
  • dataset (Dataset) – Dataset to be batched.
  • batchsize (int) – Size of the batch.
  • perm (function, optional) – Function used to shuffle the dataset before batching. perm(idx, size) should return the shuffled index of idx th sample. By default, the function is the identity.
  • merge (function, optional) – Function to control batching behaviour. transform.makebatch(merge) is used to make the batch. Default is None.
  • policy (str, optional) –

    Policy to handle the corner cases when the underlying dataset size is not divisible by batchsize. One of (include-last, skip-last, divisible-only).

    • include-last makes sure all samples of the underlying dataset
      will be seen, batches will be of size equal or inferior to batchsize.
    • skip-last will skip last examples of the underlying dataset if
      its size is not properly divisible. Batches will be always of size equal to batchsize.
    • divisible-only will raise an error if the underlying dataset
      has not a size divisible by batchsize.
  • filter (function, optional) – Function to filter the sample before batching. If filter(sample) is True, then sample is included for batching. Otherwise, it is excluded. By default, filter(sample) returns True for any sample.

ConcatDataset

class torchnet.dataset.ConcatDataset(datasets)[source]

Bases: torchnet.dataset.dataset.Dataset

Dataset to concatenate multiple datasets.

Purpose: useful to assemble different existing datasets, possibly large-scale datasets as the concatenation operation is done in an on-the-fly manner.

Parameters:datasets (iterable) – List of datasets to be concatenated

ListDataset

class torchnet.dataset.ListDataset(elem_list, load=<function ListDataset.<lambda>>, path=None)[source]

Bases: torchnet.dataset.dataset.Dataset

Dataset which loads data from a list using given function.

Considering a elem_list (can be an iterable or a string ) i-th sample of a dataset will be returned by load(elem_list[i]), where load() is a function provided by the user.

If path is provided, elem_list is assumed to be a list of strings, and each element elem_list[i] will prefixed by path/ when fed to load().

Purpose: many low or medium-scale datasets can be seen as a list of files (for example representing input samples). For this list of file, a target can be often inferred in a simple manner.

Parameters:
  • elem_list (iterable/str) – List of arguments which will be passed to load function. It can also be a path to file with each line containing the arguments to load
  • load (function, optional) – Function which loads the data. i-th sample is returned by load(elem_list[i]). By default load is identity i.e, lambda x: x
  • path (str, optional) – Defaults to None. If a string is provided, elem_list is assumed to be a list of strings, and each element elem_list[i] will prefixed by this string when fed to load().

ResampleDataset

class torchnet.dataset.ResampleDataset(dataset, sampler=<function ResampleDataset.<lambda>>, size=None)[source]

Bases: torchnet.dataset.dataset.Dataset

Dataset which resamples a given dataset.

Given a dataset, creates a new dataset which will (re-)sample from this underlying dataset using the provided sampler(dataset, idx) function.

If size is provided, then the newly created dataset will have the specified size, which might be different than the underlying dataset size. If size is not provided, then the new dataset will have the same size as the underlying one.

Purpose: shuffling data, re-weighting samples, getting a subset of the data. Note that an important sub-class ShuffleDataset is provided for convenience.

Parameters:
  • dataset (Dataset) – Dataset to be resampled.
  • sampler (function, optional) – Function used for sampling. idx`th sample is returned by `dataset[sampler(dataset, idx)]. By default sampler(dataset, idx) is the identity, simply returning idx. sampler(dataset, idx) must return an index in the range acceptable for the underlying dataset.
  • size (int, optional) – Desired size of the dataset after resampling. By default, the new dataset will have the same size as the underlying one.

ShuffleDataset

class torchnet.dataset.ShuffleDataset(dataset, size=None, replacement=False)[source]

Bases: torchnet.dataset.resampledataset.ResampleDataset

Dataset which shuffles a given dataset.

ShuffleDataset is a sub-class of ResampleDataset provided for convenience. It samples uniformly from the given dataset with, or without replacement. The chosen partition can be redrawn by calling resample()

If replacement is true, then the specified size may be larger than the underlying dataset. If size is not provided, then the new dataset size will be equal to the underlying dataset size.

Purpose: the easiest way to shuffle a dataset!

Parameters:
  • dataset (Dataset) – Dataset to be shuffled.
  • size (int, optional) – Desired size of the shuffled dataset. If replacement is true, then can be larger than the len(dataset). By default, the new dataset will have the same size as dataset.
  • replacement (bool, optional) – True if uniform sampling is to be done with replacement. False otherwise. Defaults to false.
Raises:

ValueError – If size is larger than the size of the underlying dataset and replacement is False.

resample(seed=None)[source]

Resample the dataset.

Parameters:
  • seed (int, optional) – Seed for resampling. By default no seed is
  • used.

SplitDataset

class torchnet.dataset.SplitDataset(dataset, partitions, initial_partition=None)[source]

Bases: torchnet.dataset.dataset.Dataset

Dataset to partition a given dataset.

Partition a given dataset, according to the specified partitions. Use the method select() to select the current partition in use.

The partitions is a dictionary where a key is a user-chosen string naming the partition, and value is a number representing the weight (as a number between 0 and 1) or the size (in number of samples) of the corresponding partition.

Partioning is achieved linearly (no shuffling). See ShuffleDataset if you want to shuffle the dataset before partitioning.

Parameters:
  • dataset (Dataset) – Dataset to be split.
  • partitions (dict) – Dictionary where key is a user-chosen string naming the partition, and value is a number representing the weight (as a number between 0 and 1) or the size (in number of samples) of the corresponding partition.
  • initial_partition (str, optional) – Initial parition to be selected.
select(partition)[source]

Select the parition.

Parameters:partition (str) – Partition to be selected.

TensorDataset

class torchnet.dataset.TensorDataset(data)[source]

Bases: torchnet.dataset.dataset.Dataset

Dataset from a tensor or array or list or dict.

TensorDataset provides a way to create a dataset out of the data that is already loaded into memory. It accepts data in the following forms:

tensor or numpy array
idx`th sample is `data[idx]
dict of tensors or numpy arrays
idx`th sample is `{k: v[idx] for k, v in data.items()}
list of tensors or numpy arrays
idx`th sample is `[v[idx] for v in data]

Purpose: Easy way to create a dataset out of standard data structures.

Parameters:data (dict/list/tensor/ndarray) – Data for the dataset.

TransformDataset

class torchnet.dataset.TransformDataset(dataset, transforms)[source]

Bases: torchnet.dataset.dataset.Dataset

Dataset which transforms a given dataset with a given function.

Given a function transform, and a dataset, TransformDataset applies the function in an on-the-fly manner when querying a sample with __getitem__(idx) and therefore returning transform[dataset[idx]].

transform can also be a dict with functions as values. In this case, it is assumed that dataset[idx] is a dict which has all the keys in transform. Then, transform[key] is applied to dataset[idx][key] for each key in transform

The size of the new dataset is equal to the size of the underlying dataset.

Purpose: when performing pre-processing operations, it is convenient to be able to perform on-the-fly transformations to a dataset.

Parameters:
  • dataset (Dataset) – Dataset which has to be transformed.
  • transforms (function/dict) – Function or dict with function as values. These functions will be applied to data.