一、App Not Responding的类型
- Input事件超过5s没有被处理完 KeyDispatchTimeout(5 seconds) 大概率
按键或触摸事件在特定时间内无响应, 主线程对输入事件5秒内没有处理完毕
PS:KeyDispatchTimeout --- 时间定义的地方
Akey or touch event was not dispatched within the specified time(按键或触摸事件在特定时间内无响应)
具体的超时时间的定义在framework下的
ActivityManagerService.java
//How long we wait until we timeout on key dispatching.
staticfinal int KEY_DISPATCHING_TIMEOUT = 5*1000
- Service处理超时 ServiceTimeout(20 seconds) --小概率类型 ,20S
Service在特定的时间内无法处理完成,主线程在Service的各个生命周期函数时20秒内没有处理完毕。
- BroadcastReceiver处理超时,前台广播 10S,后台广播 60s
BroadcastReceiver在特定时间内无法处理完成, 主线程在执行BroadcastReceiver的onReceive()函数时10秒内没有处理完毕
adb bugreprot xxx
adb pull /data/anr/traces.txt xxx
一份完整的bugreport包含
adb logcat –b system
system.log 包含ANR发生时间点信息、ANR发生前的CPU信息,还包含大量系统服务输出的信息 adb logcat –b system
adb logcat –b main
main.log 包含ANR发生前应用自身输出的信息,可供分析应用是否有异常;此外还包含输出的GC信息,可供分析内存回收的速度,判断系统是否处于低内存或内存碎片化状态
adb logcat –b event
event.log 包含AMS与WMS输出的应用程序声明周期信息,可供分析窗口创建速度以及焦点转换情况
kernel.log 包含kernel打出的信息,LowMemoryKiller杀进程、内存碎片化或内存不足,mmc驱动异常都可以在这里找到。
二、那么导致ANR的根本原因是什么呢?导致ANR的原因有如下几点:
- 主线程有耗时操作,如有复杂的layout布局,IO操作等。
a.耗时的网络访问
b. 大量的数据读写
c.数据库操作
- 被Binder对端block
硬件操作(比如camera)
- 同步锁block
a. sync
b. 其他线程持有锁,导致主线程等待超时
c.其它线程终止或崩溃导致主线程一直等待。
- Binder被占满导致主线程无法和SystemServer通信
service binder的数量达到上限 system server中发生WatchDog ANR
- 得不到系统资源(CPU/RAM/IO)
service忙导致超时无响应
调用thread的join()方法、sleep()方法、wait()方法或者等待线程锁的时候
lowmemorykill
三、 大体的修改思路和方案
1:UI线程尽量只做跟UI相关的工作
2:CPU满负荷, I/O阻塞的 耗时的工作(比如数据库操作,I/O,连接网络或者别的有可能阻碍UI线程的操作)把它放入单独的线程处理
3:尽量用Handler来处理UIthread和别的thread之间的交互
4. 内存不够用的,增大VM内存, 使用largeHeap属性, 排查内存泄露.
重要:-------------------------------
Activity的所有生命周期回调都是执行在主线程的.
Service默认是执行在主线程的.
BroadcastReceiver的onReceive回调是执行在主线程的.
没有使用子线程的looper的Handler的handleMessage, post(Runnable)是执行在主线程的.
AsyncTask的回调中除了doInBackground, 其他都是执行在主线程的.
View的post(Runnable)是执行在主线程的.
1.Loader CursorLoader 异步loader
2. Thread AsyncTask HandlerThread
使用Thread和HandlerThread时, 为了使效果更好, 建议设置Thread的优先级偏低一点:
Process.setThreadPriority(THREAD_PRIORITY_BACKGROUND);
因为如果没有做任何优先级设置的话, 你创建的Thread默认和UI Thread是具有同样的优先级的
同样的优先级的Thread, CPU调度上还是可能会阻塞掉你的UI Thread, 导致ANR的.
// 启动一个名为new_thread的子线程
HandlerThread thread = new HandlerThread("new_thread");
thread.start();
// 取new_thread赋值给ServiceHandler
private ServiceHandler mServiceHandler;
mServiceLooper = thread.getLooper();
mServiceHandler = new ServiceHandler(mServiceLooper);
private final class ServiceHandler extends Handler {
public ServiceHandler(Looper looper) {
super(looper);
}
@Override
public void handleMessage(Message msg) {
// 此时handleMessage是运行在new_thread这个子线程中了.
}
}
3. IntentService
Service是运行在主线程的, 然而IntentService是运行在子线程的.
实际上IntentService就是实现了一个HandlerThread + ServiceHandler的模式.
四、案例
(0 )Sp耗时问题导致应用ANR
06-16 16:16:20.536 1853 2073 I am_anr : [0,27661,com.Android.camera,952745541,Input dispatching timed out (com.android.camera/com.android.camera.Camera, Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 24. Wait queue head age: 5511.1ms.)]
(1)cpu 满负荷
Process:
...
CPU usage from 3330ms to 814ms ago:----CPU在ANR发生前的使用情况
...
100%TOTAL: 5.9% user + 4.1% kernel + 89% iowait
CPUusage from 3697ms to 4223ms later:-- ANR后CPU的使用量
其中绝大数是被iowait即I/O操作占用了.
此时分析方法调用栈 trace log, 一般来说会发现是方法中有频繁的文件读写或是数据库读写操作放在主线程来做了.
(2)内存吃紧
Cmdline: xxxx
DALVIK THREADS:
"main"prio=5 tid=3 VMWAIT
...
...
MEMINFO in pid 1360 [android.xxx.xxx] **
native dalvik other total
free: 296 2436 N/A 2732
(3) 线程,等其他问题都需要分析主线程当时在干什么 Thread状态(每一段都是一个线程,主要看main线程)
ThreadState (defined at “dalvik/vm/thread.h “)
THREAD_UNDEFINED = -1, /* makes enum compatible with int32_t */
THREAD_ZOMBIE = 0, /* TERMINATED 死亡,终止运行 */
THREAD_RUNNING = 1, /* RUNNABLE or running now 线程可运行或正在运行*/
THREAD_TIMED_WAIT = 2, /* TIMED_WAITING in Object.wait() 执行了带有超时参数的wait、sleep或join函数 */
THREAD_MONITOR = 3, /* BLOCKED on a monitor 线程阻塞,等待获取对象锁 */
THREAD_WAIT = 4, /* WAITING in Object.wait() 执行了无超时参数的wait函数*/
THREAD_INITIALIZING= 5, /* allocated, not yet running 新建,正在初始化,*/
THREAD_STARTING = 6, /* started, not yet on thread list 新建,正在启动 */
THREAD_NATIVE = 7, /* off in a JNI native method or do something wait 正在执行JNI本地函数 */
THREAD_VMWAIT = 8, /* waiting on a VM resource 正在等待VM资源 */
THREAD_SUSPENDED = 9, /* suspended, usually by GC or debugger 线程暂停,通常是由于GC或debug被暂停 */
tid=1 线程号
sysTid=27661 主线程的线程号和进程号相同
Waiting 线程状态,其中state也是线程状态,如果state=D代表底层被blocked了。
nice nice值越小,则优先级越高。因为是主线程此处nice=-10, 可以看到优先级很高了
schedstat 括号中的3个数字,依次是Running, Runable, Switch,Running时间。Running时间:CPU运行的时间,单位ns。 Runable时间:RQ队列的等待时间,单位ns。 Switch次数:CPU调度切换次数
utm 该线程在用户态所执行的时间,单位是jiffies
stm 该线程在内核态所执行的时间,单位是jiffies
sCount 此线程被挂起的次数
dsCount 线程被调试器挂起的次数,当一个进程被调试后,sCount会重置为0,调试完毕后sCount会根据是否被正常挂起增长,但是dsCount不会被重置为0,所以dsCount也可以用来判断这个线程是否被调试过
self 线程本身的地址
然而cpu占用195%并不算高,在多核中每个核最大占用率都是100%(八核占用率是800%)
在UI线程进行网络数据的读写
DALVIKTHREADS:
""main"" prio=5 tid=3 NATIVE
|group=""main"" sCount=1 dsCount=0 s=Yobj=0x4001b240 self=0xbda8
| sysTid=2579 nice=0 sched=0/0cgrp=unknown handle=-1343993184
atorg.apache.harmony.luni.platform.OSNetworkSystem.receiveStreamImpl(NativeMethod)
atorg.apache.harmony.luni.platform.OSNetworkSystem.receiveStream(OSNetworkSystem.java:478) ---网络耗时
atorg.apache.harmony.luni.PlainSocketImpl.read(PlainSocketImpl.java:565)
atorg.apache.harmony.luni.SocketInputStream.read(SocketInputStream.java:87)
atorg.apache.harmony.luni.internal.www.protocol.http.HttpURLConnection$LimitedInputStream.read(HttpURLConnection.java:303) ---- 网络
atjava.io.InputStream.read(InputStream.java:133)
内存不足
DALVIK THREADS:
"main"prio=5 tid=3 VMWAIT
|group="main" sCount=1 dsCount=0 s=N obj=0x40026240self=0xbda8
| sysTid=1815 nice=0 sched=0/0 cgrp=unknownhandle=-1344001376
atdalvik.system.VMRuntime.trackExternalAllocation(NativeMethod) ---VMRuntime.trackExternalAllocation 内存不足
atandroid.graphics.Bitmap.nativeCreate(Native Method)
atandroid.graphics.Bitmap.createBitmap(Bitmap.java:468)
atandroid.view.View.buildDrawingCache(View.java:6324)
atandroid.view.View.getDrawingCache(View.java:6178)
atandroid.view.ViewGroup.drawChild(ViewGroup.java:1541)
high iowait
0%409/com.android.eventstream.telephonyplugin: 0% user + 0% kernel /faults: 2 minor
0.1% 632/com.android.devicemonitor: 0.1% user + 0%kernel
100%TOTAL: 6.9% user + 8.2% kernel +84%iowait --
DALVIK THREADS:
(mutexes: tll=0 tsl=0tscl=0 ghl=0 hwl=0 hwll=0)
"main" prio=5 tid=1 WAIT
|group="main" sCount=1 dsCount=0 obj=0x2aaca180self=0xcf20
| sysTid=1134 nice=0 sched=
tandroid.content.ContentResolver.query(ContentResolver.java:268) ----数据库操作耗时
atcom.android.email.provider.EmailContent$Message.restoreMessageWithId(EmailContent.java:648)
atcom.android.email.Controller.setMessageRead(Controller.java:658)
atcom.android.email.activity.MessageView.onMarkAsRead(MessageView.java:700)
atcom.android.email.activity.MessageView.access$2500(MessageView.java:98)
atcom.android.email.activity.MessageView$LoadBodyTask.onPostExecute(MessageView.java:1290) --操作
atcom.android.email.activity.MessageView$LoadBodyTask.onPostExecute(MessageView.java:1255)
atandroid.os.AsyncTask.finish(AsyncTask.java:417)
atandroid.os.AsyncTask.access$300(AsyncTask.java:127)
atandroid.os.AsyncTask$InternalHandler.handleMessage(AsyncTask.java:429)
atandroid.os.Handler.dispatchMessage(Handler.java:99)
Could not read input channel file descriptors from parce
Input dispatching timed out
(Waiting because no window has focus but there is a focused application that may eventually add a window when it
finishes starting up.)
从trace信息看:
system_server:
Binder:3030_13' prio=5 tid=107 Runnable
Line 459: - locked <0x08b6d1b4> (a java.lang.Object) === 锁住0x08b6d1b4 对象
'android.bg' prio=5 tid=9 Blocked
Line 590: - waiting to lock <0x08b6d1b4> (a java.lang.Object) held by thread 107 ======= 该线程处于等待0x08b6d1b4状态
'Binder:3030_4' prio=5 tid=36 Blocked
Line 1076: - waiting to lock <0x08b6d1b4> (a java.lang.Object) held by thread 107 ==== 如上两条Binder 线程处于waiting 状态
在锁的地方,执行代码:
at com.android.server.usage.UsageStatsDatabase.queryUsageStats(UsageStatsDatabase.java:441)- locked <0x022c9e87> (a java.lang.Object)
at com.android.server.usage.UserUsageStatsService.queryStats(UserUsageStatsService.java:301)
at com.android.server.usage.UserUsageStatsService.queryUsageStats(UserUsageStatsService.java:328)
at com.android.server.usage.UsageStatsService.queryUsageStats(UsageStatsService.java:738) - locked <0x08b6d1b4> (a java.lang.Object)
at com.android.server.usage.UsageStatsService$BinderService.queryUsageStats(UsageStatsService.java:1267)
===== 而在同一时刻,system:ui (android/com.android.internal.app.ResolverActivity) 这条进程的main进程也在执行到此处
at android.app.usage.UsageStatsManager.queryUsageStats(UsageStatsManager.java:133)
at android.app.usage.UsageStatsManager.queryAndAggregateUsageStats(UsageStatsManager.java:204)
at com.android.internal.app.ResolverComparator.(ResolverComparator.java:78)
at com.android.internal.app.ResolverActivity.onCreate(ResolverActivity.java:304)
at com.android.internal.app.ResolverActivity.onCreate(ResolverActivity.java:246)
=== 所以system:ui (android/com.android.internal.app.ResolverActivity) 进程的main线程也需要等待Binder:3030_13' prio=5 tid=107 Runnable 端释放锁,
因等待时间过长导致anr现象。那么是什么原因导致binder长时间锁住0x08b6d1b4对象?继续看锁住0x08b6d1b4对象的线程。
Binder:3030_13' 长时间未释放锁原因:
at java.lang.Exception.(Exception.java:66)
at java.lang.RuntimeException.(RuntimeException.java:62)
at java.lang.IllegalArgumentException.(IllegalArgumentException.java:53)
at java.lang.NumberFormatException.(NumberFormatException.java:55)
===在锁住0x08b6d1b4对象后,java代码抛NumberFormatException异常。
从上面可看到,Binder:3030_13'在锁住0x08b6d1b4对象后,抛异常导致,导致Binder:3030_13挂掉,所以无法及时释放锁,
从而导致另外两个线程处于长时间等待状态。那么现在问题则是需要查看该binder:3030_13为什么会报NumberFormatException异常 ?
从logcat log中搜索NumberFormatException异常可看到:
12-21 08:27:24.143 4220 4220 W System.err: java.lang.NumberFormatException: For input string: 'E'
12-21 08:27:24.143 4220 4220 W System.err: at java.lang.Integer.parseInt(Integer.java:521)
12-21 08:27:24.143 4220 4220 W System.err: at java.lang.Integer.parseInt(Integer.java:556)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.phone.settings.AccountSelectionPreference.setModel(AccountSelectionPreference.java:99)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.phone.MultiSimSettings.populateDefaultOutgoingAccountsModel(MultiSimSettings.java:2567)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.phone.MultiSimSettings.updateVoiceSummary(MultiSimSettings.java:980)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.phone.MultiSimSettings.updateState(MultiSimSettings.java:925)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.phone.MultiSimSettings.-wrap11(MultiSimSettings.java)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.phone.MultiSimSettings$1.handleMessage(MultiSimSettings.java:1352)
12-21 08:27:24.143 4220 4220 W System.err: at android.os.Handler.dispatchMessage(Handler.java:102)
12-21 08:27:24.143 4220 4220 W System.err: at android.os.Looper.loop(Looper.java:159)
12-21 08:27:24.143 4220 4220 W System.err: at android.app.ActivityThread.main(ActivityThread.java:6146)
12-21 08:27:24.143 4220 4220 W System.err: at java.lang.reflect.Method.invoke(Native Method)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:886)
12-21 08:27:24.143 4220 4220 W System.err: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:776)
疯狂Binder Call导致应用ANR
发现在主线程进行大量输出上面的Log,极简桌面在1分钟内调用此接口160次,导致了SystemServer不能及时响应这个App,造成了App自己的ANR。像Binder call导致ANR的问题很常见,存在被blocked的风险,此刻可以尝试放到异步里面执行,其次不要短时间的大量Binder调用,这样行为轻则App自己有问题,重则系统发生Watchdog死机重启。
Cmd line: com.jeejen.family
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x77ffca18 self=0xecfce000
| sysTid=20576 nice=-10 cgrp=default sched=0/0 handle=0xf0bf2494
| state=S schedstat=( 628294395 402363898 957 ) utm=42 stm=20 core=4 HZ=100
| stack=0xff5fe000-0xff600000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/20576/stack)
native: #00 pc 00053cfc /system/lib/libc.so (__ioctl+8)
native: #01 pc 00021cd3 /system/lib/libc.so (ioctl+30)
native: #02 pc 0003d3f5 /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+204)
native: #03 pc 0003dde3 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+26)
native: #04 pc 0003713d /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000c3cf1 /system/lib/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+200)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:1127)
at android.wifi.IWifiManager$Stub$Proxy.getConnectionInfo(IWifiManager.java:1441)
at android.wifi.WifiManager.getConnectionInfo(WifiManager.java:1778)
at org.chromium.NetworkChangeNotifierAutoDetect$WifiManagerDelegate.getWifiInfoLocked(NetworkChangeNotifierAutoDetect.java:28)
at org.chromium.NetworkChangeNotifierAutoDetect$WifiManagerDelegate.getWifiSsid(NetworkChangeNotifierAutoDetect.java:22)
- locked <0x0f4edae7> (a java.lang.Object)
at org.chromium.NetworkChangeNotifierAutoDetect.getCurrentNetworkState(NetworkChangeNotifierAutoDetect.java:67)
at org.chromium.NetworkChangeNotifierAutoDetect.<init>(NetworkChangeNotifierAutoDetect.java:21)
at org.chromium.NetworkChangeNotifier.setAutoDetectConnectivityStateInternal(NetworkChangeNotifier.java:61)
------------------
binder call阻塞了,调用的接口是IWifiManager.getConnectionInfo()。因为是binder调用,查看一下binder_sample。
08-28 18:54:01.384 10171 20576 20576 I binder_sample: [android.wifi.IWifiManager,24,16004,com.jeejen.family,100]
08-28 18:54:04.868 10171 20576 20576 I binder_sample: [android.wifi.IWifiManager,24,3479,com.jeejen.family,100]
08-28 18:56:12.712 10171 21885 21885 I binder_sample: [android.wifi.IWifiManager,24,8963,com.jeejen.family,100]
18 binder 线程池耗尽 线程池耗尽很容易引起anr
11-23 19:29:57.905 3505 9248 E IPCThreadState: binder thread pool (15 threads) starved for 1070 ms
广播超时导致App的ANR
四、ANR机制的实现原理 文章:http://gityuan/2016/07/02/android-anr/
从源码角度详细的分析了ANR机制实现的原理。对于上一章讲到的1-4中情况,分别找到了其源码中是如何实现的,
对于每一种大概原理如下:1.在进行相关操作调用hander.sendMessageAtTime()发送一个ANR的消息,延时时间为ANR发生的时间(如前台Service是当前时间20s之后)。2.进行相关的操作3.操作结束后向remove掉该条message。如果相关的操作在规定时间没有执行完成,该条message将被handler取出并执行,就发生了ANR。
下面以BroadcastReceiver为例详细介绍:
BroadcastQueue.processNextBroadcast()
final void processNextBroadcast(boolean fromMsg) { … synchronized (mService) { … do { if (mOrderedBroadcasts.size() == 0) { … if (mService.mProcessesReady && r.dispatchTime > 0) { long now = SystemClock.uptimeMillis(); if ((numReceivers > 0) && (now > r.dispatchTime + (2 * mTimeoutPeriod * numReceivers))) { //1.发送延时消息 broadcastTimeoutLocked(false); // forcibly finish this broadcast forceReceive = true; r.state = BroadcastRecord.IDLE; } } if (r.state != BroadcastRecord.IDLE) { if (DEBUG_BROADCAST) Slog.d(TAG, “processNextBroadcast(” + mQueueName + “) called when not idle (state=” + r.state + “)”); return; } if (r.receivers == null || r.nextReceiver >= numReceivers || r.resultAbort || forceReceive) { // No more receivers for this broadcast! Send the final // result if requested… if (r.resultTo != null) { try { //2. 处理广播消息 performReceiveLocked(r.callerApp, r.resultTo, new Intent(r.intent), r.resultCode, r.resultData, r.resultExtras, false, false, r.userId); // Set this to null so that the reference // (local and remote) isn’t kept in the mBroadcastHistory. r.resultTo = null; } catch (RemoteException e) { … } } //3.取消延时消息 cancelBroadcastTimeoutLocked(); … } } while (r == null) ; … } } }
1.发送延时消息:broadcastTimeoutLocked(false): final void broadcastTimeoutLocked(boolean fromMsg) { … long now = SystemClock.uptimeMillis(); if (fromMsg) { if (mService.mDidDexOpt) { // Delay timeouts until dexopt finishes. mService.mDidDexOpt = false; long timeoutTime = SystemClock.uptimeMillis() + mTimeoutPeriod; setBroadcastTimeoutLocked(timeoutTime); return; } if (!mService.mProcessesReady) { return; } long timeoutTime = r.receiverTime + mTimeoutPeriod; if (timeoutTime > now) { setBroadcastTimeoutLocked(timeoutTime); return; } }
他调用了setBroadcastTimeoutLocked(long timeoutTime): final void setBroadcastTimeoutLocked(long timeoutTime) { if (! mPendingBroadcastTimeoutMessage) { Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this); mHandler.sendMessageAtTime(msg, timeoutTime); mPendingBroadcastTimeoutMessage = true; } }
传入setBroadcastTimeoutLocked(long timeoutTime)的时间xxx + mTimeoutPeriod,mTimeoutPeriod就是onRecieve()可以执行的时间,在BroadcastQueue初始化时候被赋值,前台队列为10s后台队列为60s:
ActivityManagerService.java: public ActivityManagerService(Context systemContext) { … static final int BROADCAST_FG_TIMEOUT = 10 * 1000; static final int BROADCAST_BG_TIMEOUT = 60 * 1000; … mFgBroadcastQueue = new BroadcastQueue(this, mHandler, “foreground”, BROADCAST_FG_TIMEOUT, false); mBgBroadcastQueue = new BroadcastQueue(this, mHandler, “background”, BROADCAST_BG_TIMEOUT, true); … }
performReceiveLocked()为广播的实际处理,就不展开了
cancelBroadcastTimeoutLocked() :
该方法的主要工作是当service启动完成,则移除服务超时消息SERVICE_TIMEOUT_MSG。
final void cancelBroadcastTimeoutLocked() {
if (mPendingBroadcastTimeoutMessage) {
mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
mPendingBroadcastTimeoutMessage = false;
}
更多推荐
Android 的ANR 问题解决
发布评论