(An incomplete guide)
Preface
Overview of tensors
As Edward Z Yang covered in his 2019 write-up, PyTorch Internals,
The tensor is the central data structure in PyTorch... an n-dimensional data structure containing some scalar type, e.g., floats, ints, etc. We can think of a tensor as consisting of some data, and then some metadata describing the size of the tensor, the type of the elements in contains (dtype), what device the tensor lives on [&] the stride
- The stride is used to provide views onto the tensor's underlying data in memory (the storage).
- Operations (such as
mm, matrix multiplication) involve a device/sparsity-dependent dynamic dispatch followed by a dtype-dependent dispatch (a simple switch statement for the kernel's supported dtypes) - Tensor layout doesn't have to be dense and strided: it can be sparse, MKLDNN, etc. (thanks to extensions)
Automatic differentiation in brief
PyTorch implements reverse mode automatic differentiation (AD):
In reverse accumulation AD, the dependent variable to be differentiated is fixed and the derivative is computed with respect to each sub-expression recursively. In a pen-and-paper calculation, the derivative of the outer functions is repeatedly substituted in the chain rule:In reverse accumulation, the quantity of interest is the adjoint, denoted with a bar (); it is a derivative of a chosen dependent variable with respect to a subexpression :
Reverse accumulation traverses the chain rule from outside to inside, or in the case of the computational graph in Figure 3, from top to bottom [from complete function down to its component variables]
Yang describes this as follows:
we effectively walk the forward computations "backward" to compute the gradients.
Technically, these variables which we call
grad_are not really gradients". They're really Jacobians left-multiplied by a vector.
See example 1 here for a concise explanation
Notes that the Jacobian for a scalar-valued loss function is simply a row-vector, so the innermost parens (bracketing the loss gradient via the associativity of matrix multiplication)
See Kevin Clark's CS224n notes for more formal coverage
Core methods
Basic info
dim()ndimension()- the number of dimensions
nelement()numel()- total number of elements in the tensor
size()- the size of the tensor (also available as
.shape) item()- the value as a standard Python numerical type, for tensors with one element
view(*shape)- gives a new tensor with the same underlying
databut differentshape(a "view" on it) view_as(other)- views a tensor as the same size as the
othertensor clone()- creates a tensor that shares the same data/storage and retained autograd relationship
detach()- creates a tensor that shares the same data/storage but is permanently 'detached' from the
automatic differentiation graph, like setting the
tensor(requires_grad=False)parameter
Basic operations
apply_(callable)- applies the function
callableto each element, replacing the element with the return value backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)-
compute gradient of the tensor w.r.t. graph leaves and 'accumulates' them in the leaves. Must pass
gradient(a tensor of same dtype and device) if non-scalar, containing the gradient of the differentiated function w.r.t. the tensor.Preferred Networks explains how autograd operates on graphs:
- autograd overview –
how PyTorch ships basic derivatives for the functions that comprise the graph nodes,
and differentiates by Jacobian vector products upon calling
backward() - how graphs are constructed –
where the components of autograd live, how C++ code is built for the derivatives,
how the use of
requires_grad=Truein tensor instantiation sets off construction of an autograd metadata computational graph, and how operation gradients are passed as pointers (C++ source code is shown, namely templates in.hheader files) - how graphs are executed –
runs through what happens when calling
Tensor.backward(), leading totorch.autograd.backward()checking the inputs and calling the C++ layer, or when callingtorch.autograd.grad()(which returns a tuple instead of populating the.gradfield of the Tensor objects)
- autograd overview –
how PyTorch ships basic derivatives for the functions that comprise the graph nodes,
and differentiates by Jacobian vector products upon calling
map_(tensor, callable)- applies the function
callableto each element and the giventensorand stores the results in theself(selfand the giventensormust be broadcastable). register_hook(hook)- registers a backward hook (a function taking a single argument,
grad), which will be called every time a gradient w.r.t. the tensor is computed, and returns either a new gradient to use in place ofgrad, orNone. t()- transposes dimensions 0 and 1 of a 2D tensor (0D and 1D tensors are returned as-is)
zero_()- fills the tensor with zeros.
Creating new tensors
Remember
These methods by default return the same dtype and device as the tensor they're called on
new_empty(size, dtype=None, device=None, requires_grad=False)- creates a tensor of size
sizefilled with uninitialised data new_full(size, fill_value, dtype=None, device=None, requires_grad=False)- creates a tensor of size
sizefilled withfill_value new_ones(size, dtype=None, device=None, requires_grad=False)- creates a tensor of size
sizefilled with new_zeros(size, dtype=None, device=None, requires_grad=False)- creates a tensor of size
sizefilled with new_tensor(data, dtype=None, device=None, requires_grad=False)-
creates a tensor with copied
dataImplicitly constructs a leaf variable
Prefer (the equivalent)
x.clone().detach()over initialising a copy on an existing tensor with this method
Memory management
contiguous(memory_format=torch.contiguous_format)- return a contiguous in-memory tensor containing the same data.
If already in the specified
memory_format, return the original tensor. cpu(memory_format=torch.preserve_format)- copy into CPU memory (or return the original if it's already there)
cuda(device=None, non_blocking=False, memory_format=torch.preserve_format)- copy into CUDA memory (or return the original if it's already there)
data_ptr()- return the address of the first element
element_size()- size of an individual element, in bytes
copy_(src, non_blocking=False)- copies elements from
src(which may be of a different dtype or device) into the tensor, and return it. Thenon_blockingflag only applies to CPU-GPU transfer get_device()- gives the device ordinal of the GPU for CUDA tensors, or throws an error for CPU tensors
pin_memory()- copies to pinned memory, if not already pinned
requires_grad_(requires_grad=True)- change if autograd should record operations on this tensor.
Sets the
requires_gradattribute in-place and returns the tensor retain_grad()- enables the tensor to have the
gradpopulated duringbackward()(no-op for leaf tensors) set_(source=None, storage_offset=0, size=None, stride=None)- sets the underlying storage (
sourceand its offset),sizeandstride. Ifsourceis a tensor share the same storage, and match its size and strides, such that changes will be reflected between the two. share_memory_()- moves the underlying storage to shared memory (no-op if already in shared memory or for CUDA tensors)
cannot be resized
storage()- returns the underlying storage
storage_offset()- returns the tensor's offset in the underlying storage (in units of storage elements, not bytes)
storage_type()- returns the underlying storage's type
Type conversion
type(dtype=None)- returns the dtype if no
dtypepassed, or else casts to the given type type_as(tensor)- casts to the type of the given
tensor(no-op if already of that type), equivalent tot.type(tensor.type()) tolist()- converts to a list just like NumPy's
ndarray.tolist()method, with all the items within the tensor becoming base Python types as_subclass(cls)- makes a
clsinstance with the same data pointer (clsmust be a subclass ofTensor) numpy()- returns the tensor as a NumPy
ndarray, sharing the same underlying storage (thus changes to one will be reflected in the other) deg2rad()- converts from angles in degrees to radians
rad2deg()- converts from angles in radians to degrees
to(dtype, non_blocking=False, copy=False, memory_format=torch.preserve_format)to(device=None, dtype=None, non_blocking=False, copy=False, memory_format=torch.preserve_format)- performs dtype and/or device conversion (inferred from args/kwargs), copying if different to the original's
to_mkldnn()- copy as
torch.mkldnnlayout
dtype conversion
The following methods on a tensor convert to a particular dtype:
bool()- convert to booleans via
torch.to(typically used for device movement but also does dtype conversion) float()- convert to
torch.float32(32-bit floating point) int()- convert to
torch.int32(32-bit integer) short()- convert to
torch.int16(16-bit integer) long()- convert to
torch.int64(64-bit integer) half()- convert to
torch.float16(16-bit floating point) double()- convert to
torch.float64(64-bit floating point) bfloat16()-
convert to
torch.bfloat16(Brain 16-bit floating point)see “What Is Bfloat16 Arithmetic?” by Nick Higham
bfloat16 "allocates 8 bits for the significand and 8 bits for the exponent (the same exponent size as fp32), c.f. fp16's 11 for the significand but only 5 for the exponent", as NNs are “far more sensitive to the size of the exponent” (Wang and Kanwar, 2019)". Google TPUs and NVIDIA A100s support it.
byte()- convert to
torch.uint8(8-bit unsigned integer)
Remember
short and long are both integer types, analogous to the half and double float types
See also:
- What every user should know about mixed precision training in PyTorch
- Automatic mixed precision for faster training on NVIDIA GPUs
torch.amp- Automatic Mixed Precision package
Arithmetic
Simple arithmetic
sign()- signs of the elements
abs()absolute()- absolute (non-negative) values of the elements
add(other, *, alpha=1)- add, which takes a keyword
alphafor a scalar multiplier sub(other, *, alpha=1)subtract(other, *, alpha=1)- subtract, which takes a keyword
alphafor a scalar multiplier mul()multiply()- scalar multiply
div(value, *, rounding_mode=None)divide(value, *, rounding_mode=None)- division
true_divide(value)- alias for
t.div(rounding_mode=None) dot()- dot product/inner product
exp()- raise each item base e
exp2()- raise each item base 2
expm1()- raise each item base e, then minus 1
frac()- get the fractional part (after the decimal point) of each float
gcd()- get the greatest common divisor of each pair of integers
log()- natural logarithm,
log10()- logarithm base 10
log1p()- natural logarithm of
(1+input) log2()- logarithm base 2
matmul(tensor2)- matrix multiplication, broadcast inputs (usually use
@instead) mm(mat2)- matrix multiplication, does not broadcast
mv(vec)- matrix-vector product, does not broadcast
mean(dim=None, keepdim=False, *, dtype=None)- mean average, element-wise,
dimcan be a tuple median(dim=None, keepdim=False, *, dtype=None)-
median average,
dimcan be an integer else the last dimension is used.Not unique for
inputtensors with an even number of elements indimIn this case the lower of the two medians is returned. To compute the mean of both medians, use
quantile(q=0.5)instead.indicesdoes not necessarily contain the first occurence of each median value (unless it is unique)Results will vary based on device, likewise do not expect the gradients to be deterministic
mode(dim=None, keepdim=False)- mode average, element-wise, can take
dimotherwise assumes last dimension reciprocal()- reciprocal of the elements
sqrt()- square root of the elements
square()- square of the elements
sum(dim=None, keepdim=False, dtype=None)- sum of the elements
diff(input, n=1, dim=-1, prepend=None, append=None)n'th forward difference in the given dimension (default: lastdim)
Matrix arithmetic
addbmm(batch1, batch2, *, beta=1, alpha=1)- batched matrix-matrix product with a reduced add step, accumulating all matmuls along the first dimension
addcdiv(tensor1, tensor2, *, value=1)addcmul(tensor1, tensor2, *, value=1)- divides/multiplies
tensor1bytensor2element-wise, multiplies the result by a scalarvalueand adds to the input addmm(mat1, mat2, *, beta=1, alpha=1)- matrix multiplication of
mat1bymat2, added to the input (alphascales the matmul product andbetascales the added input matrix) addmv(mat, vec, *, beta=1, alpha=1)- matrix vector product of
matandvec, added to the input (alphascales the matmul product andbetascales the added input matrix) addr(vec1, vec2, *, beta=1, alpha=1)- outer-product of the vectors
vec1andvec2, added to the the input (alphascales the outer product andbetascales the added input matrix) baddbmm(batch1, batch2, *, beta=1, alpha=1)- batched matrix-matrix product, added to the the input (
alphascales the matrix-matrix product andbetascales the added input matrix) bmm(batch2)- batched matrix-matrix product of matrices in the source tensor and
mat2, which both must be 3D and contain the same number of matrices
More arithmetic
copysign(other)- creates a new floating point tensor with the same magnitude but the sign of
other, element-wise cross(other, dim=None)-
vector cross-product in dimension
dimWarning: possible unexpected behaviour
If
dimis not given, it defaults to the first dimension found with size 3 cumprod(dim, dtype=None)- cumulative product of elements in the dimension
dim cumsum(dim, dtype=None)- cumulative sum of elements in the dimension
dim floor_divide(value)-
Deprecated
To actually perform floor division, use
.div(rounding_mode="floor") true_divide(dividend, divisor, *, out)- alias for
div(rounding_mode=None) eq(other)equal(other)- element-wise equality
float_power(exponent)- raises element-wise to the power of
exponent, in double-precision fmod(divisor)- applies C++'s
std::fmodelement-wise.a.fmod(b)is equivalent toa - a.div(b, rounding_mode="trunc") * b frexp()- decompose into mantissa and exponent tensors
inner(other)- dot product for 1D tensors, sum of element-wise product with
otheralong their last dimension for multidimensional tensors. Equivalent to.mul(other)for scalars, else totorch.tensordot(dims=[-1], [-1]) ge(other)greater_equal(other)- element-wise inequality check
gt()greater()- element-wise inequality check
lt()less()- element-wise inequality check
le()less_equal()- element-wise inequality check
ne(other)not_equal(other)- computes element-wise
neg()negative()- takes the negative (flips the sign) of the elements
remainder(divisor)- computes modulus element-wise
rsqrt()- reciprocal of the square root, element-wise
pow(exponent)- raises each element to the power
exponentfor scalar exponent, or broadcasts if tensorexponent prod(dim=None, keepdim=False, dtype=None)- the product of all elements in the input tensor (if no
dimspecified, first flattened) sum_to_size(*size)-
sum the tensor to
size, which must be broadcastable (in other words sum along any axes that differ from the current tensorshape)sum_to_sizeisexpandbackwards"Just as broadcasting is inserting implicit expands, the autograd engine will insert implicit “expand backwards” in the form of
sum_to_size" — Thomas Viehmann lcm(other)- lowest/least common multiple, element-wise with another integer-dtype tensor
ldexp(other)- multiplies by
lerp(end, weight)- linear interpolates with
endbased on a scalar/tensorweight xlogy(other)- multiplies by , similar to SciPy's
scipy.special.xlogy vdot(other)- computes the dot product of a 1D tensor with another
Logical && bitwise operators
logical_and(other)logical_not(other)logical_or(other)logical_xor(other)- element-wise logical AND (), NOT (), OR (), XOR (), where zeros are treated as
Falseand nonzeros asTrue bitwise_and(other)bitwise_not(other)bitwise_or(other)bitwise_xor(other)- bitwise AND/NOT/OR/XOR (logical for bool tensors)
bitwise_left_shift(other)bitwise_right_shift(other)- left/right arithmetic shift by
otherbits on an integer tensor
Rounding
round(decimals=0)- round to the closest integer
ceil()- give the smallest integer greater than or equal to each element
floor()- give the largest integer less than or equal to each element
clamp(min, max)clip(min, max)- constrain the values to the range
clamp_min()clamp_max()- one-sided (lower/upper bound)
clamp(clip) trunc()fix()- truncate the integer part (regardless of sign) of a float
Sorting by and querying for extreme values
sort(dim=-1, descending=False)- sort elements along a given dimension (default: last) into order (default: ascending) by value, returning a named tuple
(values, indices) argsort(dim=-1, descending=False)- return the indices that sort elements along a given dimension (default: last) into order (default: ascending) by value
min()minimum()- minimum
max()maximum()- maximum
topk(k, dim=None, largest=True, sorted=True)- get the
klargest elements along a given dimension (default: last dimension) argmin()argmax()- index the minimum/maximum value
argwhere()- index the non-zero values
cummin(dim)cummax(dim)- values and indices for the cumulative minimum/maximum of elements in the given dimension
amin(dim=None, keepdim=False)amax(dim=None, keepdim=False)aminmax(*, dim=None, keepdim=False)-
minimum, maximum, or both for each slice in the dimension
dimDifferences to
max()/min()- supports reducing on multiple dimensions
- doesn't return indices
- evenly distributes gradient between equal values
(whereas
max/minonly propagates gradient to a single index in the source tensor)
fmin(other)fmax(other)-
element-wise minimum/maximum (wraps C++'s
std::fmin, and similar to NumPy'sfmin())Handles
NaNdifferently tomin()If exactly one of the two elements in a comparison is
NaNthen the non-NaNelement is taken as the minimum (soNaNonly propagates if both are) msort()- sorts elements along the first dimension in ascending order by value
Repetition
expand(*sizes)- return a view with singleton dimensions expanded, with -1 indicating no change to that dimension
expand_as(other)- expand [any singleton dimensions of] the tensor to the same size as another: equivalent to
expand(other.size()) repeat(*sizes)- repeat tensor along specified dimensions the given number of times (
sizes), copying its data (similar to NumPytile) repeat_interleave(repeats: Tensor | int, dim=None)- repeat elements the given number of times [broadcasted to fit the axis], along axis
dim, else by default use the flattened array (similar to NumPyrepeat) tile(dims)- repeat elements the given number of times [broadcasted to fit the axis], along axis
dim, else by default use the flattened array (similar to NumPyrepeat) unique(sorted=True, return_inverse=False, return_counts=False, dim=None)- unique elements without repetition (eliminates non-consecutive duplicate values).
Use
torch.unique_consecutiveinstead if input is sorted unique_consecutive(return_inverse=False, return_counts=False, dim=None)- eliminates duplicates after the first element from every consecutive group of equivalent elements
Dimension and sampling
all(dim=None, keepdim=False)any(dim=None, keepdim=False)- tests if all/any elements evaluate to
True(like NumPy, converts toboolfor all dtypes exceptuint8) allclose(other, rtol=1e-05, atol=1e-08, equal_nan=False)- checks if all source and
otherelements satisfy the condition (behaves like Numpy'sallclose) count_nonzero(dim=None)- counts non-zero values along the given
dim, or in the entire tensor if nodimspecified where(condition, y)- returns a tensor of elements selected from either input or
ydepending on thecondition permute(*dims)- view with dimensions permuted (reordered) as
dims unbind(dim=0)- removes a tensor dimension, returns a tuple of slices along
dimwithout it gather(dim, index)- gathers values from
indexalong an axisdim scatter_(dim, index, src)- writes values from
srcatindexalong an axisdim diagonal_scatter(src, offset=0, dim1=0, dim2=1)- writes values from
srcalong the diagonal elements of the input with respect todim1anddim2 narrow(dim, start, length)- view along dimension
dimat positionstartforlengthitems take(index)- make a new tensor from the values at
index select(dim, index)- view a slice along the
dimaxis atindex fill_()- fill the tensor with the specified value, in-place
fill_diagonal_(fill_value, wrap=False)- fills the main diagonal of a multidimensional tensor in-place (all dimensions must be of equal length for 2D), 'wrapping' after columns for tall matrices (where )
unfold(dimension, size, step)- view all slices of the given
sizein the givendimension roll(shifts, dims=None)- shift the tensor along the given dimension(s)
dims, flattening if nodimsare specified before restoring the original shape (bothshiftsandintscan be anintorinttuple) stride(dim)- gives the integer jump necessary to go from one element to the next in the specified dimension
dim, or a tuple of all strides if nodimspecified chunk(chunks, dim=0)- view a tensor in a specific number of chunks along axis
dim, the last will be smaller if tensor size indivisible bychunks bincount(weights=None, minlength=0)-
count the frequency of each value in an array of non-negative integers
Can produce non-deterministic gradients
See the docs on randomness and reproducibility
dsplit(split_size_or_sections)- split a tensor with 3 or more dimensions into multiple views, depthwise according to
split_size_or_sections
More dimensions/sampling
as_strided()- ...
broadcast_to()- ...
histc()- ...
histogram()- ...
take_along_dim()- ...
hsplit()- ...
index_add(dim, index, source, *, alpha=1)- adds elements of
alphatimessourceby adding to the indices in the order given inindex index_copy(dim, index, tensor2)- copies elements of
tensor2into the source tensor by selecting the indices in the order given inindex index_fill(dim, index, value)- fills elements of the source tensor with
valueby selecting the indices in the order given inindex index_put(indices, values, accumulate=False)-
puts
valuesinto the source tensor using the indices specified inindices(a tuple of tensors),The in-place version is equivalent to indexed assignment
tensor.index_put_(indices, values)tensor[indices] = values index_reduce(dim, index, source, reduce, *, include_self=True)- reduces the source tensor by selecting the indices in the order given in
index, where thereduceargument is one of "prod", "mean", "amax", "amin" index_select(dim, index)- selects the indices of the source tensor in the order given in
index kthvalue()- ...
masked_fill()- ...
masked_scatter()- ...
masked_select()- ...
moveaxis()- ...
movedim()- ...
multinomial()- ...
nextafter()- ...
put_()- ...
ravel()- ...
split()- ...
tensor_split()- ...
var()- ...
vsplit()- ...
scatter_add()- ...
scatter_reduce()- ...
select_scatter()- ...
slice_scatter()- ...
swapaxes()- ...
swapdims()- ...
Shape change
reshape(*shape)- return a tensor with the same data and number of elements but the specified
shape, as a view if compatible with the current shape reshape_as(other)- returns the tensor in the same shape as
other, equivalent toreshape(other.sizes()), as a view if compatible with the current shape resize_(*sizes, memory_format=torch.contiguous_format)-
resizes the tensor to the specified size, resizing the underlying storage if larger than the current storage size
Low-level method
Prefer
reshape() resize_as_(tensor, memory_format=torch.contiguous_format)- resizes the tensor to be the same size as the specified
tensor, equivalent toresize(tensor.size()) transpose(dim0, dim1)- return a tensor with the same data but dimensions
dim0anddim1swapped flatten(start_dim=0, end_dim=-1)- flattens a contiguous range of dimensions in a tensor
unflatten(dim, sizes)- expands the dimension
dimover multiple dimensions of sizes given bysizes squeeze()- remove all dimensions of size 1 ("singleton dimensions")
unsqueeze()- view the tensor with a singleton dimension inserted at the specified position
flip(dims)- reverse the order of a n-dimensional tensor along the given axes
dims fliplr()- flip the entries in each row left/right, equivalent to
input[:,::-1](must be 2D) flipud()- flip the entries in each column up/down, equivalent to
input[::-1,...](must be 1D) tril(diagonal=0)- get the lower triangular part of the matrix, or batches,
diagonaldiagonals above/below the main diagonal triu(diagonal=0)- get the upper triangular part of the matrix, or batches,
diagonaldiagonals above/below the main diagonal rot90(k, dims)- rotate a n-dimensional tensor by 90° in the
dimsplane,ktimes from the first towards the second if or vice versa if diag(diagonal=0)- turns a 1D vector into a 2D diagonal matrix, or vice versa (main diagonal by default, or above/below as specified by
offset) diagflat(offset=0)- puts a 1D vector along the diagonal of a 2D matrix, flattening if multidimensional
diagonal(offset=0, dim1=0, dim2=1)- partial view with diagonal elements in dimensions
dim1anddim2as a new final dimension (i.e. a filled tensor made from diagonal elements, not a diagonal matrix) diag_embed(offset=0, dim1=-2, dim2=-1)- create a tensor whose diagonals of certain 2D planes are filled by the input (by default: the planes of the last 2 dimensions of the input) [and zero off the diagonals]
Linear algebra
cov(correction=1, fweights=None, aweights=None)- estimate the covariance matrix
lstsq(A)- compute a solution to least squares
outer()ger()- outer product
dist(other, p=2)p-norm of input otherinverse()- inverse of a square matrix, or batches
det()- determinant of a square matrix, or batches
logdet()- log-determinant of a square matrix, or batches
cholesky()- Cholesky factorise a symmetric positive-definite matrix, or batches
lu()- LU factorise a matrix, or batches
qr()- QR factorise a matrix, or batches
renorm(p, dim, maxnorm)- calculate a tensor where each sub-tensor of
inputalong axisdimis normalised such that the subtensorp-normmaxnorm svd()- singular value decomposition of a real matrix, or batches
trace()- sum the diagonal elements of a 2D matrix
kron()- Kronecker product
adjoint()- view conjugated and with the last two dimensions transposed
Be careful when using mixed precision training
Operations from torch.linalg can be sensitive to [im]precision (see AMP best practices)
More linear algebra
cholesky_inverse()- ...
cholesky_solve()- ...
corrcoef()- ...
eig()- ...
geqrf()- ...
lu_solve()- ...
matrix_exp()- ...
matrix_power()- ...
norm()- ...
orgqr()- ...
ormqr()- ...
pinverse()- ...
symeig()- ...
std()- ...
slogdet()- ...
triangular_solve()- ...
Missing values
nan_to_num(nan=0.0, posinf=None, neginf=None)- replaces
NaNwithnan(default: zero), and infinity values withposinfandneginf(default: greatest/least finite value representable by the dtype) nanmean(dim=None, keepdim=False, *, dtype=None)- computes the mean of all non-
NaNelements along the dimension(s)dim, equivalent tot[~t.isnan()].mean(...) nanmedian(dim=None, keepdim=False)- computes the median of all non-
NaNelements along the dimension(s)dim, equivalent tot[~t.isnan()].median(...) nanquantile(q, dim=None, keepdim=False, *, interpolation='linear')- computes the quantiles of all non-
NaNelements along the dimension(s)dim, equivalent tot[~t.isnan()].quantile(...) nansum(dim=None, keepdim=False, dtype=None)- computes the sum of all non-
NaNelements along the dimension(s)dim, equivalent tot[~t.isnan()].sum(...)
Checks
(see also: arithmetic checks, ne etc.)
Distributions
bernoulli_(p=0.5, generator=None)- fills each location with an independent sample from
Bernoulli(p) cauchy_(median=0, sigma=1, generator=None)- fills each location with an independent sample from the Cauchy distribution
log_normal_(mean=1, std=2, generator=None)- fills each location with samples from the log-normal distribution,
whose underlying normal distribution is parameterised by =
meanand =std normal_(mean=1, std=2, generator=None)- fills each location with samples from the normal distribution parameterised by =
meanand =std uniform_(from=0, to=1)- fills each location with samples from the continuous uniform distribution
exponential_(lambd=1, *, generator=None)- fills each location with samples from the exponential distribution,
geometric_(p, *, generator=None)- fills each location with samples from the geometric distribution,
random_(from=0, to=None, *, generator=None)- fills each location with samples from the discrete uniform distribution over , else bounded by the data type if not specified (for floating point types, range will be to ensure every value is representable)
Machine learning
Main ML
logit()- logit (a.k.a. log-odds), the natural logarithm of (for the input distribution conventionally written as ),
will give
NaNoutside the range relu()- rectified linear unit function,
softmax(dim=None)- rescales/normalises so elements lie in the range and sum to 1, optionally along dimension
dim. log_softmax()- the of the Softmax function, optionally along dimension
dim sigmoid()- applies the function
heaviside()- applies the Heaviside step function, defined as 1 above 0 and 0 at and below 0.
hardshrink(lambd=0.5)- leaves elements alone whose absolute value exceeds
lambd, and zeros anything in the range .
Some more main ML
logcumsumexp()- ...
logsumexp()- ...
logaddexp()- ...
logaddexp2()- ...
quantile()- ...
Quantisation
Quantisation refers to techniques for computation and memory access with lower precision data,
typically int8 as compared to floating point, which introduces approximations that can lead to an
accuracy gap (which these techniques attempt to minimise).
dequantize()- ...
int_repr()- ...
q_per_channel_axis()- ...
q_per_channel_scales()- ...
q_per_channel_zero_points()- ...
q_scale()- ...
q_zero_point()- ...
qscheme()- ...
Dimension naming
align_to()- permute dimensions to the given order, adding size-one dimensions for any new names (returns a view)
align_as()- permute dimensions to the order of the other tensor, adding size-one dimensions for any new names (returns a view)
refine_names()- 'lift' unnamed dimensions using the given list of names
rename()- rename dimension names using a list of
*namesor a mapping**rename_map(returns a view)
Special functions
i0()- zeroth order modified Bessel function of the first kind, element-wise
Trigonometric functions
angle()- element-wise angle (in radians)
hypot(other)- hypotenuse given the legs of a right-angled triangle
sin()cos()tan()- sine/cosine/tangent
asin()arcsin()acos()arccos()atan()arctan()- inverse sine/cosine/tangent (a.k.a. arcsine/arccosine/arctangent)
asinh()arcsinh()acosh()arccosh()atanh()arctanh()-
inverse hyperbolic sine/cosine/tangent
The domain of
atanhisThe limits map to , values outside this interval map to
NaN atan2(other)arctan2(other)- arctangent with consideration of the quadrant,
following order convention (i.e. source tensor is ,
otheris ) sinh()cosh()tanh()- hyperbolic sine/cosine/tangent
sinc()- normalised sinc
Error functions
erf()- ...
erfc()- ...
erfinv()- ...
Gamma functions
digamma()- ...
igamma()- ...
igammac()- ...
lgamma()- ...
polygamma()- ...
mvlgamma()- ...
STFT (Fourier) functions
stft()- ...
istft()- ...
Special data types
Sparse tensors
coalesce()- ...
dense_dim()- ...
values()- returns the values tensor of a sparse COO tensor
indices()- ...
narrow_copy()- ...
smm()- ...
sparse_dim()- ...
sparse_mask()- ...
sparse_resize_()- ...
sparse_resize_and_clear_()- ...
sspaddmm()- ...
to_dense()- ...
to_sparse()- ...
Complex numbers
conj()- ...
conj_physical()- ...
resolve_conj()- ...
resolve_neg()- ...
sgn()- ...