stopm的部分细节

本文将对GMP调度中的stopm函数的部分细节进行阐述,欢迎一起讨论指正。

go中的stopm中的部分细节

M是使用了lock_futex文件中的相关函数,来实现让M完成sleep和wakeup;例如下面的函数:

func notewakeup(n *note) {
 old := atomic.Xchg(key32(&n.key), 1)
 if old != 0 {
  print("notewakeup - double wakeup (", old, ")\n")
  throw("notewakeup - double wakeup")
 }
 futexwakeup(key32(&n.key), 1)
}

入参是mp.park来实现park里key的睡眠和唤醒; 我们在函数findrunnable被调用的地方可以看到如下代码及注释:

 // ...
 if gp == nil {
  gp, inheritTime = runqget(_g_.m.p.ptr())
  // We can see gp != nil here even if the M is spinning,
  // if checkTimers added a local goroutine via goready.
 }
 if gp == nil {
  gp, inheritTime = findrunnable() // blocks until work is available
 }

这说明当实在找不到足够G来运行时,线程的处理逻辑就会阻塞在这里,那么我比较好奇的就是,到底是怎么stopm的呢? 如果当前的M尽力了,但是还没有找到可以执行的G,那么就会stop,可以看到findrunnable函数的倒数两行的函数:

 // 前面逻辑我们暂时不表,反正就是各种找,global找完去epoll里找等等
 stopm()
 goto top

我最开始看到最后一句的goto top以为是要跳到top这个label地方某种空循环来实现它的挂起,但是马上我觉得这种想法键值naive,这肯定会让cpu直接跳转到100%,并且没有stop; 所以能让线程block的函数就是stopm这个函数本身了,我们可以详细进行观察:

// Stops execution of the current m until new work is available.
// Returns with acquired P.
func stopm() {
 _g_ := getg()

 if _g_.m.locks != 0 {
  throw("stopm holding locks")
 }
 if _g_.m.p != 0 {
  throw("stopm holding p")
 }
 if _g_.m.spinning {
  throw("stopm spinning")
 }

 lock(&sched.lock)
 mput(_g_.m)
 unlock(&sched.lock)
 notesleep(&_g_.m.park)
 noteclear(&_g_.m.park)
 acquirep(_g_.m.nextp.ptr())
 _g_.m.nextp = 0
}

其中的notesleep就是标记M睡眠的函数: 相关关键调用路径如下:

func notesleep(n *note) {
 gp := getg()
 if gp != gp.m.g0 {
  throw("notesleep not on g0")
 }
 ns := int64(-1)
 if *cgo_yield != nil {
  // Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
  ns = 10e6
 }
 for atomic.Load(key32(&n.key)) == 0 {
  gp.m.blocked = true
  futexsleep(key32(&n.key), 0, ns)
  if *cgo_yield != nil {
   asmcgocall(*cgo_yield, nil)
  }
  gp.m.blocked = false
 }
}

继续,

// Atomically,
// if(*addr == val) sleep
// Might be woken up spuriously; that's allowed.
// Don't sleep longer than ns; ns < 0 means forever.
//go:nosplit
func futexsleep(addr *uint32, val uint32, ns int64) {
 // Some Linux kernels have a bug where futex of
 // FUTEX_WAIT returns an internal error code
 // as an errno. Libpthread ignores the return value
 // here, and so can we: as it says a few lines up,
 // spurious wakeups are allowed.
 if ns < 0 {
  futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, nil, nil, 0)
  return
 }

 var ts timespec
 ts.setNsec(ns)
 futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, unsafe.Pointer(&ts), nil, 0)
}

现在我们就会发现线程会因调用futex系统调用(linux系统)而实现一个跟锁相关的等待队列的挂起和实现,如果之前没有看过futex相关知识点的同学可以先通过该文章了解一些futex

简单来说,就是futex可以使用在用户态挂起和唤醒线程,也可以看这篇间段文章futex on linux;当然最完善的文档还是man手册,使用man 2 futex和man 7 futex进行查询。

另外,stopm中还有将空闲m放到sched的空闲链表中,简答地过一遍,在下面的代码中贴一些注释

// Put mp on midle list.
// Sched must be locked.
// May run during STW, so write barriers are not allowed.
//go:nowritebarrierrec
func mput(mp *m) {
 // 即将被stop的m的shcedlink指针首先指向空闲链表midle的头节点
 mp.schedlink = sched.midle
 // 然后再将空闲链表midle的头节点设置成当前这个准备stop的m,相当于是将这个要
 // stop的m插入到空闲链表的头部
 sched.midle.set(mp)
 sched.nmidle++
 checkdead()
}

// Try to get an m from midle list.
// Sched must be locked.
// May run during STW, so write barriers are not allowed.
//go:nowritebarrierrec
func mget() *m {
 // Mark: mp中p表示的是pointer,go的极简命名
 mp := sched.midle.ptr()
 if mp != nil {
  // get的逻辑和put相反,有了上面的解释,相信get逻辑大家也很容易理解
  sched.midle = mp.schedlink
  sched.nmidle--
 }
 return mp
}