Shortcuts

torchvision.ops

torchvision.ops implements operators that are specific for Computer Vision.

Note

Those operators currently do not support TorchScript.

torchvision.ops.nms()

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

NMS iteratively removes lower scoring boxes which have an IoU greater than iou_threshold with another (higher scoring) box.

If multiple boxes have the exact same score and satisfy the IoU criterion with respect to a reference box, the selected box is not guaranteed to be the same between CPU and GPU. This is similar to the behavior of argsort in PyTorch when repeated values are present.

Parameters
  • boxes (Tensor[N, 4])) – boxes to perform NMS on. They are expected to be in (x1, y1, x2, y2) format

  • scores (Tensor[N]) – scores for each one of the boxes

  • iou_threshold (float) – discards all overlapping boxes with IoU > iou_threshold

Returns

keep – int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores

Return type

Tensor

torchvision.ops.roi_align(input, boxes, output_size, spatial_scale=1.0, sampling_ratio=-1, aligned=False)[source]

Performs Region of Interest (RoI) Align operator described in Mask R-CNN

Parameters
  • input (Tensor[N, C, H, W]) – input tensor

  • boxes (Tensor[K, 5] or List[Tensor[L, 4]]) – the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. If a single Tensor is passed, then the first column should contain the batch index. If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i in a batch

  • output_size (int or Tuple[int, int]) – the size of the output after the cropping is performed, as (height, width)

  • spatial_scale (float) – a scaling factor that maps the input coordinates to the box coordinates. Default: 1.0

  • sampling_ratio (int) – number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / pooled_w), and likewise for height). Default: -1

  • aligned (bool) – If False, use the legacy implementation. If True, pixel shift it by -0.5 for align more perfectly about two neighboring pixel indices. This version in Detectron2

Returns

output (Tensor[K, C, output_size[0], output_size[1]])

torchvision.ops.roi_pool(input, boxes, output_size, spatial_scale=1.0)[source]

Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

Parameters
  • input (Tensor[N, C, H, W]) – input tensor

  • boxes (Tensor[K, 5] or List[Tensor[L, 4]]) – the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. If a single Tensor is passed, then the first column should contain the batch index. If a list of Tensors is passed, then each Tensor will correspond to the boxes for an element i in a batch

  • output_size (int or Tuple[int, int]) – the size of the output after the cropping is performed, as (height, width)

  • spatial_scale (float) – a scaling factor that maps the input coordinates to the box coordinates. Default: 1.0

Returns

output (Tensor[K, C, output_size[0], output_size[1]])

class torchvision.ops.RoIAlign(output_size, spatial_scale, sampling_ratio, aligned=False)[source]

See roi_align

class torchvision.ops.RoIPool(output_size, spatial_scale)[source]

See roi_pool

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources