What is IRQ?
Introduction
Have you seen “irq” in /var/log/messages?
I’m going to understand what it is.
Asynchronous Interrupt
IRQ stands for Interrupt Request. It’s a signal from hardware devices to the kernel.
On the other hand, there are synchronous interrupts, which are the opposite of asynchronous interrupts. In synchronous interrupts, software triggers the interruption. I think the processor keeps executing the same thread even when software triggers it; that’s why this is called “synchronous.” In other words, an asynchronous interrupt is sent to a processor that is executing a different process or thread. For example, when a process initiates disk I/O and waits for its completion, a context switch occurs and the processor executes another process. When the disk I/O completes, an IRQ is triggered.
The kernel has interrupt handlers. There are two types: top-half and bottom-half. The top-half is a handler that initially receives interrupts. This handler is designed to pass tasks to another handler because it cannot accept interrupts while processing them. The handlers that take over these tasks are called bottom-halves.
The following are examples of asynchronous interrupts:
- Timer
- Data reception by NIC
- Write completion to disk
- Input from keyboard or mouse
- USB connection
Example about data reception by NIC
- The top half notifies the hardware and copies data to memory (to free the hardware buffer)
- The bottom half processes TCP/IP
Bottom-half
I understand that there are two main types of bottom-halves. One is SoftIRQ and the other is Workqueue. Workqueue was developed later and provides more flexibility.
How to view statistics
Interrupt statistics
cat /proc/interrupts
CPU0 CPU1
1: 10 0 IO-APIC 1-edge i8042
4: 0 4488 IO-APIC 4-edge ttyS0
8: 0 0 IO-APIC 8-edge rtc0
9: 0 0 IO-APIC 9-fasteoi acpi
12: 0 88 IO-APIC 12-edge i8042
24: 2 12 PCI-MSI 65536-edge nvme0q0
25: 25975 0 PCI-MSI 65537-edge nvme0q1
26: 0 23572 PCI-MSI 65538-edge nvme0q2
27: 25 639 PCI-MSI 81920-edge ena-mgmnt@pci:0000:00:05.0
28: 0 26672 PCI-MSI 81921-edge eth0-Tx-Rx-0
29: 36088 0 PCI-MSI 81922-edge eth0-Tx-Rx-1
NMI: 0 0 Non-maskable interrupts
LOC: 54290 52843 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 22 9 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 201117 208227 Rescheduling interrupts
CAL: 133058 142956 Function call interrupts
TLB: 5636 3375 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
DFR: 0 0 Deferred Error APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 3 3 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 Posted-interrupt notification event
NPI: 0 0 Nested posted-interrupt event
PIW: 0 0 Posted-interrupt wakeup event
Each row is formatted as follows:
IRQ No: CPU0 CPU1 ... InterruptionController TriggerMode Devices
Column descriptions:
- IRQ No: IRQ number. Numbers 1, 4, 8, etc. are legacy IRQs; 24 and above are MSI (Message Signaled Interrupts)
- CPU Col: Cumulative count of interrupts handled by each CPU core
- Interrupt Controller: Hardware such as IO-APIC or PCI-MSI that manages interrupts
- Trigger Mode: edge (edge-triggered), fasteoi (fast End Of Interrupt), etc.
- Devices: Device or driver that generates the interrupt
Example explanations:
-
25: 25975 0 PCI-MSI 65537-edge nvme0q1- IRQ 25
- Handled 25,975 times by CPU0, 0 times by CPU1
- Interrupt from NVMe SSD queue 1
- Biased towards CPU0 (due to affinity settings)
-
28: 0 26672 PCI-MSI 81921-edge eth0-Tx-Rx-0- Transmit and receive queue 0 of network interface (eth0)
- Primarily handled by CPU1
Special interrupts:
- LOC (Local timer interrupts): Local timer interrupt for each CPU
- RES (Rescheduling interrupts): Interrupt for process rescheduling
- CAL (Function call interrupts): Function calls between CPUs
- TLB (TLB shootdowns): TLB cache invalidation
From this information, we can see which CPU handles which interrupt and the load balancing situation.
Softirq statictics
cat /proc/softirqs
CPU0 CPU1
HI: 0 0
TIMER: 17540 15624
NET_TX: 121 1
NET_RX: 36284 26838
BLOCK: 2 13
IRQ_POLL: 0 0
TASKLET: 26 13
SCHED: 32870 33113
HRTIMER: 2 0
RCU: 77026 73895
SoftIRQ is one of the bottom-half mechanisms that takes over processing from the top-half that handles hardware interrupts. Each row shows the number of interrupts executed by each CPU, grouped by SoftIRQ type.
SoftIRQ types:
- HI (High priority tasklets): High-priority tasklets for urgent processing
- TIMER: Timer-related processing, such as periodic tasks and timeout handling
- NET_TX: Network packet transmission processing
- NET_RX: Network packet reception processing (typically higher than NET_TX)
- BLOCK: Block device (disk) I/O completion processing
- IRQ_POLL: I/O device polling
- TASKLET: Normal-priority tasklets
- SCHED: Scheduler-related processing, such as process switching
- HRTIMER: High-resolution timer
- RCU (Read-Copy-Update): Kernel synchronization mechanism for safe data structure updates
Examples to view:
NET_RX: 36284 26838
- CPU0 processed 36,284 network receptions; CPU1 processed 26,838
- CPU0 has more interrupts, indicating that network interrupts are biased toward CPU0
Performance analysis usage:
- When NET_RX is biased toward specific CPUs, IRQ affinity adjustment may be needed
- When TIMER or SCHED values are high, it indicates high system overhead
- We can trace the flow from hardware interrupts to SoftIRQ by viewing this along with
/proc/interrupts
Demo code
The following demo code demonstrates using tasklet as a bottom-half. It uses a timer to trigger interrupts. The timer_handler acts as the top-half and delegates processing via tasklet_schedule. The tasklet_handler is a function that emulates heavy processing. In my experiments, I confirmed that the top-half completes its processing in about 200 ns on average.
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/timer.h>
#include <linux/jiffies.h>
#include <linux/interrupt.h>
#include <linux/delay.h>
static struct timer_list my_timer;
static struct tasklet_struct my_tasklet;
static int interrupt_count = 0;
static ktime_t top_half_start;
static void tasklet_handler(struct tasklet_struct *t)
{
ktime_t end;
s64 total_duration;
mdelay(5);
end = ktime_get();
total_duration = ktime_to_ns(ktime_sub(end, top_half_start));
printk(KERN_INFO "TASKLET: Bottom Half completed, Total duration: %lld ns\n",
total_duration);
}
static void timer_handler(struct timer_list *t)
{
ktime_t top_half_end;
s64 top_half_duration;
top_half_start = ktime_get();
interrupt_count++;
tasklet_schedule(&my_tasklet);
top_half_end = ktime_get();
top_half_duration = ktime_to_ns(ktime_sub(top_half_end, top_half_start));
printk(KERN_INFO "TASKLET: Top Half %d completed, Duration: %lld ns\n",
interrupt_count, top_half_duration);
mod_timer(&my_timer, jiffies + HZ);
}
static int __init tasklet_demo_init(void)
{
printk(KERN_INFO "TASKLET: Module loaded\n");
tasklet_setup(&my_tasklet, tasklet_handler);
timer_setup(&my_timer, timer_handler, 0);
mod_timer(&my_timer, jiffies + HZ);
return 0;
}
static void __exit tasklet_demo_exit(void)
{
del_timer(&my_timer);
tasklet_kill(&my_tasklet);
printk(KERN_INFO "TASKLET: Module unloaded, Total IRQs: %d\n", interrupt_count);
}
module_init(tasklet_demo_init);
module_exit(tasklet_demo_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Tasklet Bottom Half Demo");
From a performance perspective
According to 4.3. Interrupts and IRQ Tuning, each IRQ have a property called smp_affinity, which defines the CPU core that can execute the Interrupt Service Request for that IRQ.
Application performance can be improved when configured properly.
By assigning the same CPU core for both interrupt processing and application threads, you can achieve:
- Ability to share CPU cache lines
- Reduction of memory access latency
- Reduction of context switch overhead
You can view or modify the IRQ affinity in the /proc/irq/IRQ_NUMBER/smp_affinity file.
The values are represented as a hexadecimal bitmask, where each bit corresponds to a CPU core.
Example: Configuring only CPU 0 to process eth0 (IRQ 32) on a 4-core system
# confirm current settings (f = all cpu core)
cat /proc/irq/32/smp_affinity
f
# set it to only CPU0
echo 1 > /proc/irq/32/smp_affinity
# confirm it changed
cat /proc/irq/32/smp_affinity
1
Closing Thoughts
In this article, I learned about the unfamiliar term “IRQ.” I learned that properly configuring IRQ affinity can provide performance benefits. However, from a network performance perspective, IRQ affinity configuration is not always optimal because of mechanisms like Receive Side Scaling (RSS), which improves network packet reception by distributing packets across multiple cores. Additionally, NUMA topology should be considered when configuring affinity settings.