micromind.networks package

Submodules

micromind.networks.phinet module

Code for PhiNets (https://doi.org/10.1145/3510832).

Authors:
  • Francesco Paissan, 2023

  • Alberto Ancilotto, 2023

  • Matteo Beltrami, 2023

  • Matteo Tremonti, 2023

class micromind.networks.phinet.DepthwiseConv2d(in_channels, depth_multiplier=1, kernel_size=3, stride=1, padding=0, dilation=1, bias=False, padding_mode='zeros')[source]

Bases: Conv2d

Depthwise 2D convolution layer.

Parameters:
  • in_channels (int) – Number of input channels.

  • depth_multiplier (int, optional) – The channel multiplier for the output channels (default is 1).

  • kernel_size (int or tuple, optional) – Size of the convolution kernel (default is 3).

  • stride (int or tuple, optional) – Stride of the convolution (default is 1).

  • padding (int or tuple, optional) – Zero-padding added to both sides of the input (default is 0).

  • dilation (int or tuple, optional) – Spacing between kernel elements (default is 1).

  • bias (bool, optional) – If True, adds a learnable bias to the output (default is False).

  • padding_mode (str, optional) – ‘zeros’ or ‘circular’. Padding mode for convolution (default is ‘zeros’).

class micromind.networks.phinet.PhiNet(input_shape: List[int], num_layers: int = 7, alpha: float = 0.2, beta: float = 1.0, t_zero: float = 6, include_top: bool = False, num_classes: int = 10, compatibility: bool = False, downsampling_layers: List[int] = [5, 7], conv5_percent: float = 0.0, first_conv_stride: int = 2, residuals: bool = True, conv2d_input: bool = False, pool: bool = False, h_swish: bool = True, squeeze_excite: bool = True, divisor: int = 1, return_layers=None)[source]

Bases: Module

This class implements the PhiNet architecture.

Parameters:
  • input_shape (tuple) – Input resolution as (C, H, W).

  • num_layers (int) – Number of convolutional blocks.

  • alpha (float) – Width multiplier for PhiNet architecture.

  • beta (float) – Shape factor of PhiNet.

  • t_zero (float) – Base expansion factor for PhiNet.

  • include_top (bool) – Whether to include classification head or not.

  • num_classes (int) – Number of classes for the classification head.

  • compatibility (bool) – True to maximise compatibility among embedded platforms (changes network).

forward(x)[source]

Executes PhiNet network

Parameters:

x (torch.Tensor) – Network input.

Returns:

Logits if `include_top=True`, otherwise embeddings

Return type:

torch.Tensor

get_MAC()[source]

Returns number of MACs for this architecture.

Returns:

Number of MAC for this network.

Return type:

int

Example

>>> from micromind.networks import PhiNet
>>> model = PhiNet((3, 224, 224))
>>> model.get_MAC()
9817670
get_complexity()[source]

Returns MAC and number of parameters of initialized architecture.

Returns:

Dictionary with complexity characterization of the network.

Return type:

dict

Example

>>> from micromind.networks import PhiNet
>>> model = PhiNet((3, 224, 224))
>>> model.get_complexity()
{'MAC': 9817670, 'params': 30917}
get_params()[source]

Returns number of params for this architecture.

Returns:

Number of parameters for this network.

Return type:

int

Example

>>> from micromind.networks import PhiNet
>>> model = PhiNet((3, 224, 224))
>>> model.get_params()
30917
class micromind.networks.phinet.PhiNetConvBlock(in_shape, expansion, stride, filters, has_se, block_id=None, res=True, h_swish=True, k_size=3, dp_rate=0.05, divisor=1)[source]

Bases: Module

Implements PhiNet’s convolutional block.

Parameters:
  • in_shape (tuple) – Input shape of the conv block.

  • expansion (float) – Expansion coefficient for this convolutional block.

  • stride (int) – Stride for the conv block.

  • filters (int) – Output channels of the convolutional block.

  • block_id (int) – ID of the convolutional block.

  • has_se (bool) – Whether to include use Squeeze and Excite or not.

  • res (bool) – Whether to use the residual connection or not.

  • h_swish (bool) – Whether to use HSwish or not.

  • k_size (int) – Kernel size for the depthwise convolution.

forward(x)[source]

Executes the PhiNet convolutional block.

Parameters:

x (torch.Tensor) – Input to the convolutional block.

Returns:

Output of the convolutional block.

Return type:

torch.Tensor

class micromind.networks.phinet.ReLUMax(max)[source]

Bases: Module

Implements ReLUMax.

Parameters:

max_value (float) – The maximum value for the clamp operation.

forward(x)[source]

Forward pass of ReLUMax.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

Output tensor after applying ReLU with max value.

Return type:

torch.Tensor

class micromind.networks.phinet.SEBlock(in_channels, out_channels, h_swish=True)[source]

Bases: Module

Implements squeeze-and-excitation block.

Parameters:
  • in_channels (int) – Input number of channels.

  • out_channels (int) – Output number of channels.

  • h_swish (bool, optional) – Whether to use the h_swish (default is True).

forward(x)[source]

Executes the squeeze-and-excitation block.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

Output of the squeeze-and-excitation block.

Return type:

torch.Tensor

class micromind.networks.phinet.SeparableConv2d(in_channels, out_channels, activation=<function relu>, kernel_size=3, stride=1, padding=0, dilation=1, bias=True, padding_mode='zeros', depth_multiplier=1)[source]

Bases: Module

Implements SeparableConv2d.

Parameters:
  • in_channels (int) – Input number of channels.

  • out_channels (int) – Output number of channels.

  • activation (function, optional) – Activation function to apply (default is torch.nn.functional.relu).

  • kernel_size (int, optional) – Kernel size (default is 3).

  • stride (int, optional) – Stride for convolution (default is 1).

  • padding (int, optional) – Padding for convolution (default is 0).

  • dilation (int, optional) – Dilation factor for convolution (default is 1).

  • bias (bool, optional) – If True, adds a learnable bias to the output (default is True).

  • padding_mode (str, optional) – Padding mode for convolution (default is ‘zeros’).

  • depth_multiplier (int, optional) – Depth multiplier (default is 1).

forward(x)[source]

Executes the SeparableConv2d block.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

Output of the convolution.

Return type:

torch.Tensor

micromind.networks.phinet.correct_pad(input_shape, kernel_size)[source]

Returns a tuple for zero-padding for 2D convolution with downsampling.

Parameters:
  • input_shape (tuple or list) – Shape of the input tensor (height, width).

  • kernel_size (int or tuple) – Size of the convolution kernel.

Returns:

A tuple representing the zero-padding in the format (left, right, top, bottom).

Return type:

tuple

micromind.networks.phinet.get_xpansion_factor(t_zero, beta, block_id, num_blocks)[source]

Compute the expansion factor based on the formula from the paper.

Parameters:
  • t_zero (float) – The base expansion factor.

  • beta (float) – The shape factor.

  • block_id (int) – The identifier of the current block.

  • num_blocks (int) – The total number of blocks.

Returns:

The computed expansion factor.

Return type:

float

micromind.networks.phinet.preprocess_input(x, **kwargs)[source]

Normalize input channels between [-1, 1].

Parameters:

x (torch.Tensor) – Input tensor to be preprocessed.

Returns:

Normalized tensor with values between [-1, 1].

Return type:

torch.Tensor

micromind.networks.xinet module

Code for XiNet (https://shorturl.at/mtHT0)

Authors:
  • Francesco Paissan, 2023

  • Alberto Ancilotto, 2023

class micromind.networks.xinet.XiConv(c_in: int, c_out: int, kernel_size: int | Tuple = 3, stride: int | Tuple = 1, padding: int | Tuple | None = None, groups: int | None = 1, act: bool | None = True, gamma: float | None = 4, attention: bool | None = True, skip_tensor_in: bool | None = True, skip_res: List | None = None, skip_channels: int | None = 1, pool: bool | None = None, attention_k: int | None = 3, attention_lite: bool | None = True, batchnorm: bool | None = True, dropout_rate: int | None = 0, skip_k: int | None = 1)[source]

Bases: Module

Implements XiNet’s convolutional block as presented in the original paper.

Parameters:
  • c_in (int) – Number of input channels.

  • c_out (int) – Number of output channels.

  • kernel_size (Union[int, Tuple]) – Kernel size for the main convolution.

  • stride (Union[int, Tuple]) – Stride for the main convolution.

  • padding (Optional[Union[int, Tuple]]) – Padding that is applied in the main convolution.

  • groups (Optional[int]) – Number of groups for the main convolution.

  • act (Optional[bool]) – When True, uses SiLU activation function after the main convolution.

  • gamma (Optional[float]) – Compression factor for the convolutional block.

  • attention (Optional[bool]) – When True, uses attention.

  • skip_tensor_in (Optional[bool]) – When True, defines broadcasting skip connection block.

  • skip_res (Optional[List]) – Spatial resolution of the skip connection, such that average pooling is statically defined.

  • skip_channels (Optional[int]) – Number of channels for the input block.

  • pool (Optional[bool]) – When True, applies pooling after the main convolution.

  • attention_k (Optional[int]) – Kernel for the attention module.

  • attention_lite (Optional[bool]) – When True, uses efficient attention implementation.

  • batchnorm (Optional[bool]) – When True, uses batch normalization inside the ConvBlock.

  • dropout_rate (Optional[int]) – Dropout probability.

  • skip_k (Optional[int]) – Kernel for the broadcast skip connection.

forward(x: Tensor)[source]

Computes the forward step of the XiNet’s convolutional block. :param x: Input tensor. :type x: torch.Tensor

Returns:

ConvBlock output.

Return type:

torch.Tensor

class micromind.networks.xinet.XiNet(input_shape: List, alpha: float = 1.0, gamma: float = 4.0, num_layers: int = 5, num_classes=1000, include_top=False, base_filters: int = 16, return_layers: List | None = None)[source]

Bases: Module

Defines a XiNet.

Parameters:
  • input_shape (List) – Shape of the input tensor.

  • alpha (float) – Width multiplier.

  • gamma (float) – Compression factor.

  • num_layers (int = 5) – Number of convolutional blocks.

  • num_classes (int) – Number of classes. It is used only when include_top is True.

  • include_top (Optional[bool]) – When True, defines an MLP for classification.

  • base_filters (int) – Number of base filters for the ConvBlock.

  • return_layers (Optional[List]) – Ids of the layers to be returned after processing the foward step.

Example

>>> from micromind.networks import XiNet
>>> model = XiNet((3, 224, 224))
forward(x)[source]

Computes the forward step of the XiNet. :param x: Input tensor. :type x: torch.Tensor

Returns:

the init. : Union[torch.Tensor, Tuple]

Return type:

Output of the network, as defined from

micromind.networks.xinet.autopad(k: int, p: int | None = None)[source]

Implements padding to mimic “same” behaviour. :param k: Kernel size for the convolution. :type k: int :param p: Padding value to be applied. :type p: Optional[int]

micromind.networks.yolo module

YOLOv8 building blocks.

Authors:
  • Matteo Beltrami, 2023

  • Francesco Paissan, 2023

This file contains the definition of the building blocks of the yolov8 network. Model architecture has been taken from https://github.com/ultralytics/ultralytics/issues/189

class micromind.networks.yolo.Bottleneck(c1, c2, shortcut: bool, groups=1, kernels: list = (3, 3), channel_factor=0.5)[source]

Bases: Module

Implements YOLOv8’s bottleneck block.

Parameters:
  • c1 (int) – Input channels of the bottleneck block.

  • c2 (int) – Output channels of the bottleneck block.

  • shortcut (bool) – Decides whether to perform a shortcut in the bottleneck block.

  • groups (int) – Groups for the bottleneck block.

  • kernels (list) – Kernel size for the bottleneck block.

  • channel_factor (float) – Decides the number of channels of the intermediate result between the two convolutional blocks.

forward(x)[source]

Executes YOLOv8 bottleneck block.

Parameters:

x (torch.Tensor) – Input to the bottleneck block.

Returns:

Ouput of the bottleneck block

Return type:

torch.Tensor

class micromind.networks.yolo.C2f(c1, c2, n=1, shortcut=False, groups=1, e=0.5)[source]

Bases: Module

Implements YOLOv8’s C2f block.

Parameters:
  • c1 (int) – Input channels of the C2f block.

  • c2 (int) – Output channels of the C2f block.

  • n (int) – Number of bottleck blocks executed in the C2f block.

  • shortcut (bool) – Decides whether to perform a shortcut in the bottleneck blocks.

  • groups (int) – Groups for the C2f block.

  • e (float) – Factor for cancatenating intermeidate results.

forward(x)[source]

Executes YOLOv8 C2f block.

Parameters:

x (torch.Tensor) – Input to the C2f block.

Returns:

Ouput of the C2f block

Return type:

torch.Tensor

class micromind.networks.yolo.Conv(c1, c2, kernel_size=1, stride=1, padding=None, dilation=1, groups=1)[source]

Bases: Module

Implements YOLOv8’s convolutional block

Parameters:
  • c1 (int) – Input channels of the convolutional block.

  • c2 (int) – Output channels of the convolutional block.

  • kernel_size (int) – Kernel size for the convolutional block.

  • stride (int) – Stride for the convolutional block.

  • padding (int) – Padding for the convolutional block.

  • dilation (int) – Dilation for the convolutional block.

  • groups (int) – Groups for the convolutional block.

forward(x)[source]

Executes YOLOv8 convolutional block.

Parameters:

x (torch.Tensor) – Input to the convolutional block.

Returns:

Ouput of the convolutional block

Return type:

torch.Tensor

class micromind.networks.yolo.DFL(c1=16)[source]

Bases: Module

Implements YOLOv8’s DFL block.

Parameters:

c1 (int) – Input channels of the DFL block.

forward(x)[source]

Executes YOLOv8 DFL block.

Parameters:

x (torch.Tensor) – Input to the DFL block.

Returns:

Ouput of the DFL block

Return type:

torch.Tensor

class micromind.networks.yolo.Darknet(w, r, d)[source]

Bases: Module

Implements YOLOv8’s convolutional backbone.

Parameters:
  • w (float) – Width multiple of the Darknet.

  • r (float) – Ratio multiple of the Darknet.

  • d (float) – Depth multiple of the Darknet.

forward(x)[source]

Executes YOLOv8 convolutional backbone.

Parameters:

x (torch.Tensor) – Input to the Darknet.

Returns:

Three intermediate representations with different resolutions

Return type:

tuple

class micromind.networks.yolo.DetectionHead(nc=80, filters=())[source]

Bases: Module

Implements YOLOv8’s detection head.

Parameters:
  • nc (int) – Number of classes to predict.

  • filters (tuple) – Number of channels of the three inputs of the detection head.

forward(x)[source]

Executes YOLOv8 detection head.

Parameters:

x (list) – Input to the detection head.

Returns:

Output of the detection head

Return type:

torch.Tensor

class micromind.networks.yolo.SPPF(c1, c2, k=5)[source]

Bases: Module

Implements YOLOv8’s SPPF block.

Parameters:
  • c1 (int) – Input channels of the SPPF block.

  • c2 (int) – Output channels of the SPPF block.

  • k (int) – Kernel size for the SPPF block Maxpooling operations

forward(x)[source]

Executes YOLOv8 SPPF block.

Parameters:

x (torch.Tensor) – Input to the SPPF block.

Returns:

Ouput of the SPPF block

Return type:

torch.Tensor

class micromind.networks.yolo.Upsample(scale_factor, mode='nearest')[source]

Bases: object

class micromind.networks.yolo.YOLOv8(w, r, d, num_classes=80)[source]

Bases: Module

Implements YOLOv8 network.

Parameters:
  • w (float) – Width multiple of the Darknet.

  • r (float) – Ratio multiple of the Darknet.

  • d (float) – Depth multiple of the Darknet.

  • num_classes (int) – Number of classes to predict.

forward(x)[source]

Executes YOLOv8 network.

Parameters:

x (torch.Tensor) – Input to the YOLOv8 network.

Returns:

Output of the YOLOv8 network

Return type:

torch.Tensor

class micromind.networks.yolo.Yolov8Neck(filters=[256, 512, 768], up=[2, 2], d=1)[source]

Bases: Module

Implements YOLOv8’s neck.

Parameters:
  • w (float) – Width multiple of the Darknet.

  • r (float) – Ratio multiple of the Darknet.

  • d (float) – Depth multiple of the Darknet.

forward(p3, p4, p5)[source]

Executes YOLOv8 neck.

Parameters:

x (tuple) – Input to the neck.

Returns:

Three intermediate representations with different resolutions

Return type:

list