本文将对GMP调度中的stopm函数的部分细节进行阐述,欢迎一起讨论指正。
go中的stopm中的部分细节
M是使用了lock_futex文件中的相关函数,来实现让M完成sleep和wakeup;例如下面的函数:
func notewakeup(n *note) {
old := atomic.Xchg(key32(&n.key), 1)
if old != 0 {
print("notewakeup - double wakeup (", old, ")\n")
throw("notewakeup - double wakeup")
}
futexwakeup(key32(&n.key), 1)
}
入参是mp.park来实现park里key的睡眠和唤醒; 我们在函数findrunnable被调用的地方可以看到如下代码及注释:
// ...
if gp == nil {
gp, inheritTime = runqget(_g_.m.p.ptr())
// We can see gp != nil here even if the M is spinning,
// if checkTimers added a local goroutine via goready.
}
if gp == nil {
gp, inheritTime = findrunnable() // blocks until work is available
}
这说明当实在找不到足够G来运行时,线程的处理逻辑就会阻塞在这里,那么我比较好奇的就是,到底是怎么stopm的呢? 如果当前的M尽力了,但是还没有找到可以执行的G,那么就会stop,可以看到findrunnable函数的倒数两行的函数:
// 前面逻辑我们暂时不表,反正就是各种找,global找完去epoll里找等等
stopm()
goto top
我最开始看到最后一句的goto top
以为是要跳到top这个label地方某种空循环来实现它的挂起,但是马上我觉得这种想法键值naive,这肯定会让cpu直接跳转到100%,并且没有stop;
所以能让线程block的函数就是stopm这个函数本身了,我们可以详细进行观察:
// Stops execution of the current m until new work is available.
// Returns with acquired P.
func stopm() {
_g_ := getg()
if _g_.m.locks != 0 {
throw("stopm holding locks")
}
if _g_.m.p != 0 {
throw("stopm holding p")
}
if _g_.m.spinning {
throw("stopm spinning")
}
lock(&sched.lock)
mput(_g_.m)
unlock(&sched.lock)
notesleep(&_g_.m.park)
noteclear(&_g_.m.park)
acquirep(_g_.m.nextp.ptr())
_g_.m.nextp = 0
}
其中的notesleep就是标记M睡眠的函数: 相关关键调用路径如下:
func notesleep(n *note) {
gp := getg()
if gp != gp.m.g0 {
throw("notesleep not on g0")
}
ns := int64(-1)
if *cgo_yield != nil {
// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
ns = 10e6
}
for atomic.Load(key32(&n.key)) == 0 {
gp.m.blocked = true
futexsleep(key32(&n.key), 0, ns)
if *cgo_yield != nil {
asmcgocall(*cgo_yield, nil)
}
gp.m.blocked = false
}
}
继续,
// Atomically,
// if(*addr == val) sleep
// Might be woken up spuriously; that's allowed.
// Don't sleep longer than ns; ns < 0 means forever.
//go:nosplit
func futexsleep(addr *uint32, val uint32, ns int64) {
// Some Linux kernels have a bug where futex of
// FUTEX_WAIT returns an internal error code
// as an errno. Libpthread ignores the return value
// here, and so can we: as it says a few lines up,
// spurious wakeups are allowed.
if ns < 0 {
futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, nil, nil, 0)
return
}
var ts timespec
ts.setNsec(ns)
futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, unsafe.Pointer(&ts), nil, 0)
}
现在我们就会发现线程会因调用futex系统调用(linux系统)而实现一个跟锁相关的等待队列的挂起和实现,如果之前没有看过futex相关知识点的同学可以先通过该文章了解一些futex。
简单来说,就是futex可以使用在用户态挂起和唤醒线程,也可以看这篇间段文章futex on linux;当然最完善的文档还是man手册,使用man 2 futex和man 7 futex进行查询。
另外,stopm中还有将空闲m放到sched的空闲链表中,简答地过一遍,在下面的代码中贴一些注释
// Put mp on midle list.
// Sched must be locked.
// May run during STW, so write barriers are not allowed.
//go:nowritebarrierrec
func mput(mp *m) {
// 即将被stop的m的shcedlink指针首先指向空闲链表midle的头节点
mp.schedlink = sched.midle
// 然后再将空闲链表midle的头节点设置成当前这个准备stop的m,相当于是将这个要
// stop的m插入到空闲链表的头部
sched.midle.set(mp)
sched.nmidle++
checkdead()
}
// Try to get an m from midle list.
// Sched must be locked.
// May run during STW, so write barriers are not allowed.
//go:nowritebarrierrec
func mget() *m {
// Mark: mp中p表示的是pointer,go的极简命名
mp := sched.midle.ptr()
if mp != nil {
// get的逻辑和put相反,有了上面的解释,相信get逻辑大家也很容易理解
sched.midle = mp.schedlink
sched.nmidle--
}
return mp
}