Documentation

### This is machine translation

Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

# datasample

Randomly sample from data, with or without replacement

## Syntax

``y = datasample(data,k)``
``y = datasample(data,k,dim)``
``y = datasample(___,Name,Value)``
``y = datasample(s,___)``
``[y,idx] = datasample(___)``

## Description

example

````y = datasample(data,k)` returns k observations sampled uniformly at random, with replacement, from the data in `data`.```

example

````y = datasample(data,k,dim)` returns a sample taken along dimension `dim` of `data`.```

example

````y = datasample(___,Name,Value)` returns a sample for any of the input arguments in the previous syntaxes, with additional options specified by one or more name-value pair arguments. For example, `'Replace',false` specifies sampling without replacement.```

example

````y = datasample(s,___)` uses the random number stream `s` to generate random numbers. The option `s` can precede any of the input arguments in the previous syntaxes.```

example

````[y,idx] = datasample(___)` also returns an index vector indicating which values `datasample` sampled from `data` using any of the input arguments in the previous syntaxes.```

## Examples

collapse all

Create the random number stream for reproducibility.

`s = RandStream('mlfg6331_64'); `

Draw five unique values from the integers `1` to `10`.

`y = datasample(s,1:10,5,'Replace',false)`
```y = 1×5 9 8 3 6 2 ```

Create the random number stream for reproducibility.

`s = RandStream('mlfg6331_64');`

Generate `48` random characters from the sequence `ACGT` per specified probabilities.

`seq = datasample(s,'ACGT',48,'Weights',[0.15 0.35 0.35 0.15])`
```seq = 'GGCGGCGCAAGGCGCCGGACCTGGCTGCACGCCGTTCCCTGCTACTCG' ```

Set the random seed for reproducibility of the results.

`rng(10,'twister') `

Generate a matrix with 10 rows and 1000 columns.

`X = randn(10,1000);`

Create the random number stream for reproducibility within `datasample`.

`s = RandStream('mlfg6331_64');`

Randomly select five unique columns from `X`.

`Y = datasample(s,X,5,2,'Replace',false)`
```Y = 10×5 0.4317 -0.3327 0.9112 -2.3244 0.9559 0.6977 -0.7422 0.4578 -1.3745 -0.8634 -0.8543 -0.3105 0.9836 -0.6434 -0.4457 0.1686 0.6609 -0.0553 -0.1202 -1.3699 -1.7649 -1.1607 -0.3513 -1.5533 0.0597 -0.3821 0.5696 -1.6264 -0.2104 -1.5486 -1.6844 0.7148 -0.6876 -0.4447 -1.4615 -0.4170 1.3696 1.1874 -0.9901 0.5875 -0.2410 1.4703 -2.5003 -1.1321 -1.8451 0.6212 1.4118 -0.4518 0.8697 0.8093 ```

Resample observations from a dataset array to create a bootstrap replicate data set. See Bootstrap Resampling for more information about bootstrapping.

Load the sample data set.

`load hospital`

Create a data set that has the same size as the `hospital` data set and contains random samples chosen with replacement from the `hospital` data set.

`y = datasample(hospital,size(hospital,1));`

Select samples from data based on indices of a sample chosen from another vector.

Generate two random vectors.

```x1 = randn(100,1); x2 = randn(100,1);```

Select a sample of `10` elements from vector `x1`, and return the indices of the sample in vector `idx`.

`[y1,idx] = datasample(x1,10);`

Select a sample of `10` elements from vector `x2` using the indices in vector `idx`.

`y2 = x2(idx);`

## Input Arguments

collapse all

Input data from which to sample, specified as a vector, matrix, multidimensional array, table, or dataset array. By default, `datasample` samples from the first nonsingleton dimension of `data`. For example, if `data` is a matrix, then `datasample` samples from the rows. Change this behavior with the `dim` input argument.

Data Types: `single` | `double` | `logical` | `char` | `string` | `table`

Number of samples, specified as a positive integer.

Example: `datasample(data,100)` returns 100 observations sampled uniformly and at random from the data in `data`.

Data Types: `single` | `double`

Dimension to sample, specified as a positive integer. For example, if `data` is a matrix and `dim` is `2`, `y` contains a selection of columns in `data`. If `data` is a table or dataset array and `dim` is `2`, `y` contains a selection of variables in `data`. Use `dim` to ensure sampling along a specific dimension regardless of whether `data` is a vector, matrix, or N-dimensional array.

Data Types: `single` | `double`

Random number stream, specified as the global stream or `RandStream`. For example, `s = RandStream('mlfg6331_64')` creates a random number stream that uses the multiplicative lagged Fibonacci generator algorithm. For details, see Creating and Controlling a Random Number Stream (MATLAB).

The `rng` function provides a simple way to control the global stream. For example, `rng(seed)` seeds the random number generator using the nonnegative integer seed. For details, see Managing the Global Stream (MATLAB).

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Replace',false,'Weights',ones(datasize,1)` samples without replacement and with probability proportional to the elements of `Weights`, where `datasize` is the size of the dimension being sampled.

Indicator for sampling with replacement, specified as the comma-separated pair consisting of `'Replace'` and either `true` or `false`.

Sample with replacement if `'Replace'` is `true`, or without replacement if `'Replace'` is `false`. If `'Replace'` is `false`, then `k` must not be larger than the size of the dimension being sampled. For example, if ```data = [1 3 Inf; 2 4 5]``` and `y = datasample(data,k,'Replace',false)`, then `k` cannot be larger than `2`.

Data Types: `logical`

Sampling weights, specified as the comma-separated pair consisting of `'Weights'` and a vector of nonnegative numeric values. The vector is of size `datasize`, where `datasize` is the size of the dimension being sampled. The vector must have at least one positive value and cannot contain `NaN` values. The `datasample` function samples with probability proportional to the elements of `'Weights'`.

Example: ```'Weights',[0.1 0.5 0.35 0.46]```

Data Types: `single` | `double`

## Output Arguments

collapse all

Sample, returned as a vector, matrix, multidimensional array, table, or dataset array.

• If `data` is a vector, then `y` is a vector containing `k` elements selected from `data`.

• If `data` is a matrix and `dim` = `1`, then `y` is a matrix containing `k` rows selected from `data`. Or, if `dim` = `2`, then `y` is a matrix containing `k` columns selected from `data`.

• If `data` is an N-dimensional array and `dim` = `1`, then `y` is an N-dimensional array of samples taken along the first nonsingleton dimension of `data`. Or, if you specify a value for the `dim` name-value pair argument, `datasample` samples along the dimension `dim`.

• If `data` is a table and `dim` = `1`, then `y` is a table containing `k` rows selected from `data`. Or, if `dim` = `2`, then `y` is a table containing `k` variables selected from `data`.

• If `data` is a dataset array and `dim` = `1`, then `y` is a dataset array containing `k` rows selected from `data`. Or, if `dim` = `2`, then `y` is a dataset array containing `k` variables selected from `data`.

If the input `data` contains missing observations that are represented as `NaN` values, `datasample` samples from the entire input, including the `NaN` values. For example, `y = datasample([NaN 6 14],2)` can return `y = NaN 14`.

When the sample is taken with replacement (default), `y` can contain repeated observations from `data`. Set the `Replace` name-value pair argument to `false` to sample without replacement.

Indices, returned as a vector indicating which elements `datasample` chooses from `data` to create `y`. For example:

• If `data` is a vector, then `y = data(idx)`.

• If `data` is a matrix and `dim` = `1`, then `y = data(idx,:)`.

• If `data` is a matrix and `dim` = `2`, then `y = data(:,idx)`.

## Tips

• To sample random integers with replacement from a range, use `randi`.

• To sample random integers without replacement, use `randperm` or `datasample`.

• To randomly sample from data, with or without replacement, use `datasample`.

## Algorithms

`datasample` uses `randperm`, `rand`, or `randi` to generate random values. Therefore, `datasample` changes the state of the MATLAB® global random number generator. Control the random number generator using `rng`.

For selecting weighted samples without replacement, `datasample` uses the algorithm of Wong and Easton .

## Alternative Functionality

You can use `randi` or `randperm` to generate indices for random sampling with or without replacement, respectively. However, `datasample` can be more convenient to use because it samples directly from your data. `datasample` also allows weighted sampling.

 Wong, C. K. and M. C. Easton. An Efficient Method for Weighted Sampling Without Replacement. SIAM Journal of Computing 9(1), pp. 111–113, 1980.

Download ebook