0. 问题现象

  • 死机

1. 问题分析

1.1 dmesg_TZ.txt

[    9.188060][  T175] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000102
[    9.188065][  T175] Mem abort info:
[    9.188067][  T175]   ESR = 0x0000000096000005
[    9.188069][  T175]   EC = 0x25: DABT (current EL), IL = 32 bits
[    9.188072][  T175]   SET = 0, FnV = 0
[    9.188074][  T175]   EA = 0, S1PTW = 0
[    9.188075][  T175]   FSC = 0x05: level 1 translation fault
[    9.188078][  T175] Data abort info:
[    9.188079][  T175]   ISV = 0, ISS = 0x00000005
[    9.188080][  T175]   CM = 0, WnR = 0
[    9.188083][  T175] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000c850e000
[    9.188086][  T175] [0000000000000102] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[    9.188095][  T175] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[    9.188188][  T175] Dumping ftrace buffer:
[    9.188199][  T175]    (ftrace buffer empty)

[    9.188845][  T175] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
[    9.188849][  T175] Workqueue: events power_supply_changed_work
[    9.188863][  T175] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    9.188868][  T175] pc : __queue_work+0x28/0x550
[    9.188876][  T175] lr : queue_work_on+0x3c/0x80
[    9.188880][  T175] sp : ffffffc00b473ca0
[    9.188882][  T175] x29: ffffffc00b473ca0 x28: ffffff804531dbc8 x27: ffffff82f2740fa8
[    9.188890][  T175] x26: ffffff800b791f10 x25: 0000000000000000 x24: 0000000000000007
[    9.188896][  T175] x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
[    9.188902][  T175] x20: 0000000000000000 x19: ffffff806d0f9148 x18: ffffffc00ac0d040
[    9.188908][  T175] x17: 000000002a4cec24 x16: 000000002a4cec24 x15: 0000000000000046
[    9.188914][  T175] x14: 0000000000000000 x13: 0000000000000ef0 x12: 0000000000000002
[    9.188920][  T175] x11: 0000000000000000 x10: ffffffffffffd240 x9 : 000000000000001b
[    9.188926][  T175] x8 : 0000000000000001 x7 : ffffff806baa9380 x6 : 000000161b03f216
[    9.188932][  T175] x5 : 1672031b16000000 x4 : 0080000000000000 x3 : 1b430b9338000000
[    9.188939][  T175] x2 : ffffff806d0f9148 x1 : 0000000000000000 x0 : 0000000000000020
[    9.188946][  T175] Call trace:
[    9.188948][  T175]  __queue_work+0x28/0x550
[    9.188953][  T175]  queue_work_on+0x3c/0x80
[    9.188957][  T175]  fts_power_usb_notifier_callback+0x2c/0x40 [focaltech_spi]
[    9.189037][  T175]  blocking_notifier_call_chain+0x70/0xbc
[    9.189047][  T175]  power_supply_changed_work+0x7c/0xc8
[    9.189054][  T175]  process_one_work+0x1e4/0x43c
[    9.189060][  T175]  worker_thread+0x25c/0x430
[    9.189065][  T175]  kthread+0x104/0x1d4
[    9.189069][  T175]  ret_from_fork+0x10/0x20
[    9.189079][  T175] Code: a9054ff4 910003fd aa0203f3 aa0103f7 (39440828) 

初步定位:

  • 问题类型:Unable to handle kernel NULL pointer dereference at virtual address 0000000000000102
  • 问题模块:focaltech_spi
  • 问题函数:fts_power_usb_notifier_callback+0x2c

1.2 trace32恢复现场

android-stability-015_0001.png
从现场可以看出问题点的汇编为:

ldrb x8,[x1, #0x102]

这句的意思是从x1+0x102的内存地址中读取1个字节放到x8寄存器中,而此时x1寄存器为0,所以访问的地址为0x102,这也是calltrace中爆出来的NULL pointer dereference(0000000000000102)

而此时x1寄存器就是参数struct workque_struct *wq的地址,那就说明这个wq已经被销毁了!

将栈帧上移到fts_power_usb_notifier_callback函数
android-stability-015_0002.png

我们可以看到在这个函数传入的wq是一个有效的wq!
所以问题点就在fts_power_usb_notifier_callback在往底层执行queue_work的过程中这个wq被销毁了

2. 根本原因

fts_data->ts_workqueue,这个队列在 fts_power_usb_notifier_callback执行过程中被destory了