ArcGIS回归分析教程|电子爱好者

admin管理员组
文章数量:1564638

2023年12月13日发(作者：)

ArcGIS回归分析教程Analyzing 911 response data using RegressionThis tutorial demonstrates how regression analysis has been implemented in ArcGIS, and explores some of the specialconsideratio ns you’ll want to think about whenever you use regression with spatial sion analysis allows you to model, examine, and explore spatial relationships, to better understand the factors behindobserved spatial patterns, and to predict outcomes based on that understanding. Ordinary Least Squares regression (OLS) is aglobal regression method. Geographically Weighted Regression (GWR) is a local, spatial, regression method that allows therelationships you are modeling to vary across the study area. Both of these are located in the Spatial Statistics Tools -> ModelingSpatial Relationships toolset:Before executing the tools and examining the results, let’s review some terminology:Dependent variable (Y): what you are trying to model or predict (residential burglary incidents, for example).Explanatory variables (X): variables you believe influence or help explain the dependent variable (like: income, the number ofvandalism incidents, or households).Coefficients (β): values, computed by the regression tool, reflecting the relationship and strength of each explanatory variable tothe dependent als (ε): the portion of the dependent variable that isn’t explained by the model; the model under and over sign (+/-) associated with the coefficient (one for each explanatory variable) tells you whether the relationship is positive ornegative. If you were modeling residential burglary and obtain a negative coefficient for the Income variable, for example, it wouldmean that as median incomes in a neighborhood go up, the number of residential burglaries goes from regression analysis can be a little overwhelming at first. It includes diagnostics and model performance indicators. Allof these numbers should seem much less daunting once you complete the tutorial ant notes:1. The steps in this tutorial document assume the data is stored at C:SpatialStats. If adifferent location is used, substitute "C:SpatialStats" with the alternate location when entering data and environment paths.2. This tutorial was developed using ArcGIS 10.0. If you are using a different version ofthe software, the screenshots and how you access results, may be a bit al Estimated time: 1.5 hours Introduction:In order to demonstrate how the regression tools work, you will be doing an analysis of 911 Emergency call data for a portion ofthe Portland Oregon metropolitan e we have a community that is spending a large portion of its public resources responding to 911 emergency tions are telling them that their community’s population is going to double in size over the next 10 years. If they can betterunderstand some of the factors contributing to high call volumes now, perhaps they can implement strategies to help reduce 911calls in the 1 Getting StartedOpen C: (the path may be different on your machine)In this map document you will notice several Data frames containing layers of data for thePortland Oregon metropolitan study that the Hot Spot Analysis data frame is activeIn the map, each point represents a single call into a 911 emergency call center. This is real data representing over 2000 2 Examine Hotspot Analysis resultsExpand the data frame and click the + sign to the right of the Hot Spot Analysis grouped layer Ensure that the ResponseStations layer is checked onResults from running the Hotspot Analysis tool show us where the community is getting lots of 911 calls. We can use theseresults to assess whether or not the stations (fire/police/emergency medical) are optimally with high call volumes are shown in red (hot spots); areas getting very few calls are shown in blue (cold spots). The greencrosses are the existing locations for the police and fire units tasked with responding to these 911 that the 2 stations to the right of the map appear to be located right over, or very near, call hot spots. The station in thelower left, however, is actually located over a cold spot; we may want to investigate further if this station is in the best community can use hot spot analysis to decide if adding new stations or relocating existing stations might improve 911 3 Exploring OLS RegressionThe next question our community is probably asking is, “Why are call volumes so high in those hot spot areas?” and “What arethe factors that contribute to high volumes of 911 calls?” To help answer these questions, we’ll use the regression tools te the Regression Analysis data frame by right clicking and choosing ActivateExpand the Spatial Statistics tools toolboxRight click in a open space in ArcToolbox and set your environment as follows:Disable background processes (Geoprocessing>Geoprocessing Options). With ArcGIS 10, geoprocessing tools can run in thebackground and all results are available through the Results window. By disabling background processing, we will see toolresults in a progress window;this is often best when you are using the Regression tools:In the data frame, check off the Data911Calls layerInstead of looking at individual 911 calls as points, we have aggregated the calls to census tracts and now have a count variable(Calls) representing the number of calls in each click the ObsData911Calls layer and choose Open Attribute TableThe reason we are using census tract level data is because this gives us access to a rich set of variables that might help explain911 call that the table has fields such as Educational status (LowEd), Unemployment levels (Unemploy), done exploring the fields, close the tableCan you think of anything … any variable… that might help explain the call volume pattern we see in the hot spot map?What about population? Would we expect more calls in places with more people? L et’s test the hypothesis that call volume issimply a function of population. If it is, our community can use Census population projections to estimate future 911 emergencycall the OLS tool with the following parameters:Note: once the tool starts running, make sure the “close this dialog when completed successfully box” is NOT checkedo Input Feature Class -> ObsData911Callso Unique ID Field -> UniqIDo Output Feature Class -> C: Dependent Variable -> Callso Explanatory Variables -> PopMove the progress window to the side so you can examine the OLS911calls layer in the OLS default output is a map showing us how well the model performed, using only the population variable to explain 911 callvolumes. The red areas are under predictions (where the actual number of calls is higher than the model predicted); the blueareas are over predictions (actual call volumes are lower than predicted). When a model is performing well, the over/underpredictions reflect random noise… the model is a little high here, bu t a little low there… you don’t see any structure at all in theover/under predictions. Do the over and under predictions in the output feature class appear to be random noise or do you seeclustering? When the over (blue) and under (red) predictions cluster together spatially, you know that your model is missing oneor more key explanatory OLS tool also produces a lot of numeric output. Expand and enlarge the progress window so you can read this output that the Adjusted R-Squared value is 0.393460, or 39%. This indicates that using population alone, the model isexplaining 39% of the call volume looking back at our original hypothesis, is call volume simply a function of population? Might our community be able to predictfuture 911 call volumes from population projections alone? Probably not; if the relationship between population and 911 callvolumes had been higher, say 80%, our community might not need regression at all. But with only 39% of the story, it seems otherfactors and other variables, are needed to effectively model 911 next question that follows is what are these other variables? This, actually, is the hardest part of the regression modelbuilding process: finding all of the key variables that explain what we are trying to the Progress 4 Finding key variablesThe scatterplot matrix graph can help us here by allowing us to examine the relationships between call volumes and a variety ofother variables. We might guess, for example, that the number of apartment complexes, unemployment rates, income oreducation are also important predictors of 911 call ment with the scatterplot matrix graph to explore the relationships between call volumes and other candidate explanatoryvariables. If you enter the “calls” variable either first or last, it will appear as either the bottom row or the first column in the is an example of scatterplot matrix parameter settings:Once you finish creating the scatterplot matrix, select features in the focus graph and notice how those features are highlightedin each scatterplot and on the 5 A properly specified modelNow let’s try a model with 4 explanatory variables: Pop, Jobs, LowEduc, and Dst2UrbCen. The explanatory variables in thismodel were found by using the Scatterplot matrix and trying a number of candidate models. Finding a properly specified OLSmodel, is often an iterative OLS with the following parameters set:o Input Feature Class -> AnalysisObsData911Callso Unique ID Field -> UniqIDo Output Feature Class ->C: Dependent Variable -> Callso Explanatory Variables -> Pop;Jobs;LowEduc;Dst2UrbCenNotice that the Adjusted R2 value is much higher for this new model, 0.831080, indicating this model explains 83% of the 911 callvolume story. This is a big improvement over the model that only used the Progress , too, that the residuals (the model over/under predictions) appear to be less clustered than they were using only thePopulation can check whether or not the residuals exhibit a random spatial pattern using the Spatial Autocorrelation the Spatial Autocorrelation tool (in the Analyzing Patterns Toolset) using the following parameters:o Input Feature Class → Data911CallsOLSo Input Field → StdResido Generate Report → checked ONo Conceptualization of Spatial Relationships → Inverse Distanceo Distance Method → Euclidean Distanceo Standardization → ROW (with polygons you will almost always want to Row Standardize).Close the Progress Window, then open the Results Window and Expand the entry for Spatial Autocorrelation (if you don’t seethe Results Window, select Geoprocessing from the menu, then Results).Double click on the HTML Report File:Results from running the Spatial Autocorrelation tool on the regression residuals indicates they are randomly distributed; the z-score is not statistically significant so we accept the nullhypothesis of complete spatial randomness. This is good news! Anytime there is structure (clustering or dispersion) of theunder/over predictions, it means that your model is still missing key explanatory variables and you cannot trust your results. Whenyou run the SpatialAutocorrelation tool on the model residuals and find a random spatial pattern (as we did here), you are on your way to a properlyspecified 6: The 6 things you gotta check!There are 6 things you need to check before you can be sure you have a properly specified model – a model you can check to see that each coefficient has the “expected” sign.A positive coefficient means the relationship is positive; a negative coefficient means the relationship is negative. Notice that thecoefficient for the Pop variable is positive. This means that as the number of people goes up, the number of 911 calls also goesup. We are expecting a positive coefficient. If the coefficient for the Population variable was negative, we would not trust ourmodel. Checking the other coefficients, it seems that their signs do seem reasonable. Self check: the sign for Jobs (the number ofjob positions in a tract) is positive, this means that as the number of jobs goes , the number of 911 calls also goes (?). check for redundancy among your explanatory variables. If the VIF value (varianceinflation factor) for any of your variables is larger than about 7.5 (smaller is definitely better), it means you have one or morevariables telling the same story. This leads to an over-count type of bias. You should remove the variables associated with largeVIF values one by one until none of your variables have large VIF values. Self check: Which variable has the highest VIF value?, check to see that all of the explanatory variables have statistically columns, Probability and Robust Probability, measure coefficient statistical significance. An asterisk next to the probabilitytells you the coefficient is significant. If a variable is not significant, it is not helping the model, and unless theory tells us that aparticular variable is critical, we should remove it. When the Koenker (BP) statistic is statistically significant, you can only trust theRobust Probability column to determine if a coefficient is significant or not. Small probabilities are “better” (more significant) thanlarge probabilities. Self check: Which variables have the “best” statistical s ignificance? Did you consult the Probability orRobust_Pr column? Why?! Note: An asterisk indicates statistical sure the Jarque-Bera test is NOT statistically significant:The residuals (over/under predictions) from a properly specified model will reflect random noise. Random noise has a randomspatial pattern (no clustering of over/under predictions). It also has a normal histogram if you plotted the residuals. The Jarque-Bera test measures whether or not the residuals from a regression model are normally distributed (think Bell Curve). This is theone test you do NOT want to be statistically significant! When it IS statistically significant, your model is biased. This often meansyou are missing one or more key explanatory variables. Self check: how do you know that the Jarque-Bera Statistic is NOTstatistically significant in this case?, you want to check model performance:The adjusted R squared value ranges from 0 to 1.0 and tells you how much of the variation in your dependent variable has beenexplained by the model. Generally we are looking for values of 0.5 or higher, but a “good” R2 value depends on what we aremodeling. Self Check: go back to the screen shot of the OLS model that only used Population to explain call volume. What wasthe Adjusted R2 value? Does the Adjusted R2 value for our new model (4 variables) indicate model performance has improved?The AIC value can also be used to measure model performance. When we have several candidate models (all models must havethe same dependent variable), we can assess which model is best by looking for the lowest AIC value. Self Check: go back to thescreen shot of the OLS model that only used Population. What was the AIC value? Does the AIC value for our new model (4variables) indicate we improved model performance?y (but certainly NOT least important), you want to make sure your model residuals arefree from spatial autocorrelation (spatial clustering of over and under predictions).We used the Spatial Autocorrelation tool above and found that our model passes this check too. This will not always be the casewhen you build your own regression models, the Regression Analysis Basics online documentation, and look for the table called “How Regression Models Go Bad”. Inthis table there are some strategies for how to deal with Spatially Autocorrelated regression residuals:Self ch eck: run OLS on alternate models. Use “Calls” for your dependent variable, with other variables in the ObsData911Callsfeature class for your explanatory variables (you might select Jobs, Renters, and MedIncome, for example). For each model, gothrough the 6 checks above to determine if the model is properly specified. If the model fails one of the checks, look at the“Common Regression Problems, Consequences, and Solutions” table in the “Regression Analysis Basics” document shownabove to determine the implications and possible 7: Running GWROne OLS diagnostic w e didn’t say very much about, is the Koenker the Koeker test is statistically significant, as it is here, it indicates relationships between some or all of your explanatoryvariables and your dependent variable are non-stationary. This means, for example, that the population variable might be animportant predictor of 911 call volumes in some locations of your study, but perhaps a weak predictor in other er you notice that the Koenker test is statistically significant, it indicates you will likely improve model results by moving toGeographically Weighted good news is that o nce you’ve found your key explanatory variables using OLS, running GWR is actually quite simple. Inmost cases, GWR will use the same dependent and explanatoryvariables you used in the Geographically Weighted Regression tool with the following parameters (open the side panel help and review theparameter descriptions):o Input feature class: ObsData911Callso Dependent variable: Callso Explanatory variables: Pop, Jobs, LowEduc, Dst2UrbCeno Output feature class:C: Kernal type: ADAPTIVEo Bandwidth method: AICs (you will let the tool find the optimal number ofneighbors)Notice the output from GWR:Neighbors : 50ResidualSquares : 7326.2793171502362EffectiveNumber : 19.863531396247254Sigma : 10.44629989196762AICc : 674.65R2 : 0.89572753438054042R2Adjusted : 0.86642979248431506GWR found, applying the AICc method, that using 50 neighbors to calibrate each local regression equation yields optimal results(minimized bias and maximized model fit). Notice that the Adjusted R2 value is higher for GWR than it was for our best OLSmodel (OLS was 83%; GWR is almost 86.6%). The AICc value is lower for the GWR model. A decrease of more than even 3points indicates a real improvement in model performance (OLS was 680; GWR is 674).Close the progress window. Notice that, like the OLS tool, GWR default output is a map of model residuals. Do the over andunder predictions appear random? It’s a bit difficult to tell. Run the Spatial Autocorrelation tool on the Standardized Residuals inthe Output Feature Class:Close the Progress Window, then double click on the HTML Report in the Results Window to see that the residuals do, in fact,reflect a random spatial the table for the ResultsGWR output feature class and notice several fields with names beginning with “C”. These are thecoefficient values for each explanatory variable, for each feature.

本文标签：教程回归分析

版权声明：本文标题：ArcGIS回归分析教程内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/xitong/1702425666a6386.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

ArcGIS回归分析教程

更多相关文章

解决方案：2024年Pytorch（GPU版本）+ torchvision手动安装教程[万能安装方法] win64、linux、macos、arm、aarch64均适用

win10系统 Anaconda 和 Pycharm 的 Tensorflow2.0 之 CPU和 GPU 版本安装教程

【DSP】win10安装CCS5.5教程及报错解决方法

GPU版Pytorch1.6安装教程--基于Win10+MX250+CUDA10.2+cuDNN7.6.5

一步到位——Node版本管理神器nvm安装教程（2024最新）

【 windows10系统 】npm（cnpm）简介 + 最新版详细安装教程

linux 挂载硬盘_Linux系列教程（十八）——Linux文件系统管理之文件系统常用命令...

用U盘安装系统完美教程

Systemback更改默认存储目录home，并在Ubuntu18创建大于4G的Linux镜像教程

详细不过的U盘随身Ubunut装机教程(方案一)

MAC如何重装系统（怒冲30大洋，才拿到的教程～，收藏点赞兄弟们）

Chrome（谷歌浏览器）安装Vue插件vue-devtools（图文详细教程）

Mac谷歌浏览器chromedriver驱动安装教程，实现浏览器自动化

linux安装谷歌浏览器（chrome）教程

Chrome导出cookie的实战教程

YDOOK: 谷歌浏览器 chrome 安装 Vue官方辅助调试工具 vue-devtools 详细安装教程与步骤

Global Speed(视频倍速播放chrome插件)安装教程

win7计算机连接xp计算机,将win7计算机连接到xp共享打印机的教程

使用aircrack和fluxion工具获取wifi密码的教程

Linux系统搭建钓鱼WIFI教程,kali下搭建WiFi钓鱼热点 | 独木の白帆

发表评论

推荐文章

linux配置环境后，出现卡在登录界面循环。

重装win11，出现图标后，黑屏

rocketmq-常见问题总结(基本概念、高可用、中间件选型)

将台式机组成云服务器_四种旧PC台式电脑改造桌面云虚拟化的方案介绍

应用程序无法正常启动，报错“0xc0000142”的解决办法

热门文章

如何恢复笔记本的工作报表呢

最积极的搜索引擎蜘蛛有哪些可以屏蔽掉

iOS和Android的APP启动图标和应用商店截图尺寸

xbox360使用_Xbox 360作为媒体中心

win10下安装和卸载Ubuntu双系统

解决Win10笔记本电脑Wifi频繁断线的问题

想用云电脑，新手怎么免费

手机和电脑如何快速传大文件

[华为云云服务器评测] 华为云耀云服务器 Java、node环境配置

如何彻底删除百度输入法（流氓软件删除方法）

最新文章

输入法 linux安装下载软件,百度输入法linux版下载

android原生输入法皮肤,duang_精选布局_触屏皮肤_皮肤布局_百度手机输入法

百度输入法下载|百度拼音输入法下载

关于卸载百度输入法

百度手机输入法自定义码表

linux 输入法_新闻速读 &amp;gt; 百度输入法 Linux 版本发布 | Linux 中国

百度输入法开启AI时代，突破方言、中英文混合识别难题

日均语音请求量突破10亿次，百度输入法的又一个起点

对百度输入法的使用体验

python如何设置搜狗输入法中英文切换_2020秋季报告：手机输入法AI时代来临，百度输入法优势明显...

中文在线语音识别技术获重大突破！百度输入法准确率超行业最高水平15%

如何彻底删除百度输入法（流氓软件删除方法）

百度AI输入法发布全感官输入2.0版本，语音技术取得世界级突破

WPF与输入法冲突研究之一：百度输入法会导致WPF程序的崩溃！

百度输入法、QQ 浏览器竟都在窃取用户隐私？

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

如何通过treenode实现二叉树

【 windows10系统】npm（cnpm）简介 + 最新版详细安装教程

linux 输入法_新闻速读 > 百度输入法 Linux 版本发布 | Linux 中国

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载