Interference modelling and analysis in hard real-time multiprocessor systems

Patricia Balbastre Universitat Politècnica de València, Spain



#### Outline

- Introduction
- Related work
- Task model
- Interference-aware scheduling
- Schedulability analysis
- Task allocation
- Evaluation
- Conclusions



- Scheduling of multi core real-time systems introduces more complexity than that of single core systems
- Execution of tasks no longer depends only on their own computation time or that of the highest-priority tasks.
- Shared hardware resources between processors introduce an important pessimism factor →interference
- Three main sources of interferences:
  - Main memory
  - Memory bus
  - Cache



# Introduction



- Ad-hoc calculation for a specific resource type
- Only valid for the selected hardware
- Interference value very close to reality

- When hardware vendor does not provide details
- The interference does not depend on the hardware
- But: more pessimistic



## Introduction

#### How to take into account interference?

Add this value to WCET

→traditional sched analysis is valid

→ Very pessimistic

Add this value to the model as a new
parameter
→ Need to define new

sched theory



# **Related work**

#### • Two surveys

- Until 2018 → Maiza C. et al (2019) A survey of timing verification techniques for multi-core real-time systems. ACM Comput Surv 52(3)
- Until 2021 → Lugo T. et al (2022) A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore Platforms. IEEE Access 10

#### Some examples

- WCRA parameter (Worst Case number of shared Resource Accesses) (J. Galizzi et al., 2014)
- isWCET parameter (interference-sensitive Worst Case Execution Time) (Nowotsch et al., 2014)
- DRAM modeling (Kim et al., 2014)
- Shared cache modeling (Guo et al., 2020)
- MRSS model (Davis et al., 2021)



# Our proposal

- We propose a general model where interference is a new parameter of the task model
- Highly critical real-time systems
- Dynamic priorities
- Need to:
  - Allocate tasks to cores using interference information
  - Define how this model is scheduled
  - Provide schedulability analysis



# Task model

- M cores: M<sub>1</sub>, M<sub>2</sub>,..., M<sub>m</sub>
- N tasks:  $\tau = [\tau_1, \tau_2, ..., \tau_n]$
- Each task is characterized by  $\tau_i = (C_i, D_i, T_i, I_i)$

Interference parameter I<sub>i</sub>



- Broadcasting task τ<sub>i</sub> → Provokes interference, I<sub>i</sub>≠0
- Receiving task τ<sub>i</sub> → Receives interference so:
  - I<sub>i</sub>≠0
  - there is at least τ<sub>j</sub> in another core whose I<sub>j</sub>≠0



# Interference-aware scheduling

• Example





- Dynamic priorities (EDF)
  - Well-known analysis based on demand bound function (dbf)
  - We want to obtain the equivalent dbf for the new model
  - Max. number of activations that  $\tau_{j}$  provokes to  $\tau_{i}$

$$\overrightarrow{v_{j \to i}}[a] = 1 + \sum_{t=aT_i+1}^{(a+1)T_i-1} g(t)$$

being







- Use v<sub>j→i</sub> to obtain a modified dbf that incorporates interference
- First approximation: Add  $v_{i \rightarrow i}$  to WCET  $\rightarrow$  dbf'





- Use  $v_{j \rightarrow i}$  to obtain a modified dbf that incorporates interference
- Second approximation: Add  $v_{i \rightarrow i}$  to dbf  $\rightarrow$  dbf"

$$\begin{split} dbf_{\tau_{M_k}}^{''}(t) &= \sum_{\tau_i \in M_k} \left( C_i \left\lfloor \frac{t + T_i - D_i}{T_i} \right\rfloor + \sum_{\tau_j \notin M_k} \sum_{a=0}^{\left\lfloor \frac{t + T_i - D_i}{T_i} \right\rfloor - 1} \overrightarrow{v_{j \to i}}[a] \cdot I_j \right) \\ dbf_{\tau_{M_k}}^{''}(t) &\leq dfb_{\tau_{M_k}}'(t) \end{split}$$



# Schedulability analysis



#### Schedulability condition

$$dbf_{\tau_{M_k}}^{''}(t_1, t_2) \le t_2 - t_1 \quad 0 \le t_1 < t_2 \le H$$



# Task allocation

- Allocating task to cores is key to reduce the interference
- Well-known bin packing algorithms:
  - FFDU  $\rightarrow$  First Fit Decreasing Utilization
  - WFDU → most widely used as it balances the load between cores
  - BF, NF, etc
- We need an allocation strategy that takes into account interference parameter I<sub>i</sub>



# Task allocation

#### Wmin allocator

- W matrix (n x n x H): takes into account the posible interference produced
- W is a binary matrix:
  - $W_{ij} = 1 \rightarrow \tau_i$  provoques interference to  $\tau_j$
  - $W_{ij} = 0 \rightarrow otherwise$
- $\tau_i$  and  $\tau_j$  in the same core  $\rightarrow W_{ij}, W_{ji}=0$
- $\tau_i$  is not broadcasting  $\rightarrow W_{ij}, W_{ji}=0$



| t     | w                                              | Comment                                               |
|-------|------------------------------------------------|-------------------------------------------------------|
| 0     | $\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$ | $\tau_0$ and $\tau_1$ release                         |
| 1     | $\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$ |                                                       |
| 2     | $\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}$ | $\tau_0$ finishes its first activation                |
| 3     | $\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}$ | $\tau_1$ finishes its first activation                |
| 4     | $\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}$ |                                                       |
| 5     | $\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}$ | $\tau_1$ releases, but $\tau_0$ is not active         |
| 6     | $\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$ | $\tau_0$ releases while $\tau_1$ is active            |
| 7     | $\begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$ |                                                       |
| 8     | $\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}$ | $\tau_0$ and $\tau_1$ finish                          |
| 9->15 | $\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}$ | $\tau_0$ and $\tau_1$ are not active at the same time |



## Task allocation

#### Wmin allocates tasks to cores so W matrix is minimised

$$\text{minimise} \qquad maxW = \sum_{\forall k} maxW_k = \sum_{k=1}^m \sum_{\substack{\tau_i \in M_k \\ I_i \neq 0}} \sum_{\tau_j \notin M_k} I_j$$

s.t.

$$\begin{split} \sum_{\forall k} O_{ik} &= 1 & \forall i \\ \sum_{i \in k} U_i \cdot O_{ik} &= U_{M_k} & \forall k \\ U_{M_k} &\leq 1 & \forall k \\ \sum_{\substack{\tau_i \in M_k \\ I_i \neq 0}} \sum_{\substack{\tau_j \notin M_k}} I_j &= max W_k & \forall k \\ O_{ik} &\in \{0, 1\} \\ U_{M_k}, max W_k &\geq 0 \end{split}$$

SETS AND INDICES i Tasks  $\tau_i \in \{0, 1, 2, ..., n-1\}$ k Cores  $M_k \in \{0, 1, 2, ..., m-1\}$ 

PARAMETERS

| $C_i$ | Worst | case | execution | time | of $\tau_i$ |
|-------|-------|------|-----------|------|-------------|
| $C_i$ | WOISU | case | execution | ume  | $017_i$     |

 $T_i$  Period of  $\tau_i$ 

 $U_i$  Theoretical utilisation of  $\tau_i$ 

 $I_i$  Interference factor of  $\tau_i$  over other tasks

#### DECISION VARIABLES

| $O_{ik}$  | Allocation matrix. 1 if $\tau_i$ is allocated in core k and 0 otherwise. |
|-----------|--------------------------------------------------------------------------|
| $U_{M_k}$ | Theoretical utilisation of core $k$ .                                    |
| maxWk     | Maximum value of the sum of all elements of $W$ for core $k$ .           |
| maxW      | Maximum value of the sum of all elements of $W$ for all cores.           |



To evaluate the proposal we created a simulation scenario divided into 5 elementary steps:

- 1. Load generation
- 2. Allocation phase
- 3. Allocation validation
- 4. Scheduling
- 5. Scheduling validation



Intel Core i7 16GB RAM, Gurobi optimizer 9.5



| Number<br>of cores  | utilisation | Tasks | Broadcasting<br>tasks | Interference<br>over WCET (%) | Scenario |
|---------------------|-------------|-------|-----------------------|-------------------------------|----------|
|                     |             | 4     | 2                     | 10                            | 1        |
|                     | 1.1         |       |                       | 20                            | 2        |
| $2  \mathrm{cores}$ |             |       |                       | 30                            | 3        |
|                     | 1.5         |       |                       | 10                            | 4        |
|                     |             |       |                       | 20                            | 5        |
|                     |             |       |                       | 30                            | 6        |
|                     |             | 12    | 3                     | 10                            | 7        |
|                     | 2.1         |       |                       | 20                            | 8        |
| 4 cores             |             |       |                       | 30                            | 9        |
|                     | 3           |       |                       | 10                            | 10       |
|                     |             |       |                       | 20                            | 11       |
|                     |             |       |                       | 30                            | 12       |
|                     | 4.1         | 20    | 5                     | 10                            | 13       |
|                     |             |       |                       | 20                            | 14       |
| 9 00700             |             |       |                       | 30                            | 15       |
| 8 cores             |             |       |                       | 10                            | 16       |
|                     | 6           |       |                       | 20                            | 17       |
|                     |             |       |                       | 30                            | 18       |
|                     | 5.1         | 28    | 7                     | 10                            | 19       |
|                     |             |       |                       | 20                            | 20       |
| 10 cores            |             |       |                       | 30                            | 21       |
|                     | 7.5         |       |                       | 10                            | 22       |
|                     |             |       |                       | 20                            | 23       |
|                     |             |       |                       | 30                            | 24       |



- Evaluated parameters
  - Allocators comparison:
    - Increased utilization: The increase in utilization with respect to the theoretical utilization
    - Schedulability ratio: Percentage of feasible plans over total task set with valid allocations (step5 vs step3)
  - Schedulability comparison
    - $\alpha'$ : difference between real and estimated utilisation by dbf'
    - $\alpha''$ : difference between real and estimated utilisation by dbf''



#### Increased utilization and schedulability ratio

- WFDU has good schedulability but much increased utilisation.
- FFDU has the opposite behaviour

POLITÈCNICA De València

• Wmin presents the advantages of WFDU and FFDU.









# Conclusions

- This talk is the result of the work published in:
  - José María Aceituno, Ana Guasque, Patricia Balbastre, José E. Simó, Alfons Crespo. Hardware resources contention-aware scheduling of hard real-time multiprocessor systems. J. Syst. Archit. Vol. 118 (2021)
  - Ana Guasque, José María Aceituno, Patricia Balbastre, José Simó, Alfons Crespo. Schedulability analysis of dynamic priority real-time systems with contention. The Journal of Supercomputing (2022)
- The model is implemented in Xoncrete tool
  - Eyesat 3U Cubesat
- Future work
  - Improve I<sub>i</sub> calculation with more specific information
  - Improve schedulability analysis
  - Extend to other models

