运行多个geojson文件的分析(Running Analysis on Multiple geojson files)

我有大约113个geojson文件，我以前主要在QGIS中处理过。我现在的目标是能够将所有文件同时导入到R中，并对附属于每个层的底层属性表进行分析。我已经想出了在转换成数据框之后导入一个文件并进行任何所需分析的最佳方法。我在文件夹中的文件全部如下所示：0cfb16c1-90c2-412d-bb60-2fec34c75e9a.geojson

我用于此步骤的代码是：

library(rgdal) map1 <- readOGR(dsn = "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles/maps/sampled_maps/0cfb16c1-90c2-412d-bb60-2fec34c75e9a.geojson", layer = "0cfb16c1-90c2-412d-bb60-2fec34c75e9a") summary(map1) map1 <- as.data.frame(map1)

我想在所有geojson文件上对该地图执行相同的分析，而无需逐一进行。我进行的分析涉及选举重新分配指标，其中包括：

cfbdata$reptotal <- (cfbdata$surveyed_republican_percentage/100)*cfbdata$surveyed_total cfbdata$demtotal <- (cfbdata$surveyed_democrat_percentage/100)*cfbdata$surveyed_total cfbdata$NAME <- NULL aggdata <-aggregate(cfbdata, by=list(cfbdata$cluster), FUN=sum, na.rm=TRUE) # Rep district victory is 1 and Dem district victory is 0 aggdata$result <- ifelse(aggdata$reptotal > aggdata$demtotal,1, ifelse(aggdata$demtotal > aggdata$reptotal,0, NA)) EffGapCalc <- subset(aggdata, select=c("cluster","reptotal","demtotal","surveyed_total", "result")) # Step 1: Calculate Dem Wasted, Rep Wasted, and Net Wasted EffGapCalc$repwasted <- ifelse(EffGapCalc$result == 1, EffGapCalc$reptotal - (.51*EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 0, EffGapCalc$reptotal, NA)) EffGapCalc$demwasted <- ifelse(EffGapCalc$result == 0, EffGapCalc$demtotal - (.51 * EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 1, EffGapCalc$demtotal, NA)) EffGapCalc$netwasted <- abs(EffGapCalc$repwasted - EffGapCalc$demwasted) # Step 2: Sum Total Wasted Rep and Dem Votes totrepwasted <- sum(EffGapCalc$repwasted) totdemwasted <- sum(EffGapCalc$demwasted) netwaste <- ifelse(totrepwasted>totdemwasted, totrepwasted-totdemwasted, ifelse(totrepwasted<totdemwasted, totdemwasted-totrepwasted)) netwaste # Democrats had a net waste (more wasted votes) of 74289.6 # Step 3: Divide Net Wasted by Total Number of Votes Case sum(EffGapCalc$surveyed_total) totalsurvtot <- sum(EffGapCalc$surveyed_total) netwaste/totalsurvtot # Efficiency Gap = .0359 [3.60%]

我们的目标是对所有113个GEOJSON文件进行相同的分析，并获得113个“Efficiency Gap”数字列表，例如上面的.0359。

我已经通过一些关于stackoverflow和其他地方的问题进行了搜索，但还没有找到合适的解决方案。虽然我最初认为for循环对此最好，但根据我在别处读到的内容，看起来lapply()实际上可能是更好的路线。我所面临的挑战是确保正确的导入作为'lapply（）'的一部分

我尝试使用的代码失败了：

library(rgdal) fileNames <- list.files(path = "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles/maps/sampled_maps", pattern="*.geojson", full.names = TRUE) lapply(fileNames, function(x) { map1 <- readOGR(dsn = x, layer = x) map1 <- as.data.frame(map1) out <- map1$reptotal <- (map1$surveyed_republican_percentage/100)*map1$surveyed_total; map1$demtotal <- (map1$surveyed_democrat_percentage/100)*map1$surveyed_total; map1$NAME <- NULL; aggdata <-aggregate(map1, by=list(map1$cluster), FUN=sum, na.rm=TRUE); aggdata$result <- ifelse(aggdata$reptotal > aggdata$demtotal,1, ifelse(aggdata$demtotal > aggdata$reptotal,0, NA)); EffGapCalc <- subset(aggdata, select=c("cluster","reptotal","demtotal","surveyed_total", "result")); # Step 1: Calculate Dem Wasted, Rep Wasted, and Net Wasted EffGapCalc$repwasted <- ifelse(EffGapCalc$result == 1, EffGapCalc$reptotal - (.51*EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 0, EffGapCalc$reptotal, NA)); EffGapCalc$demwasted <- ifelse(EffGapCalc$result == 0, EffGapCalc$demtotal - (.51 * EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 1, EffGapCalc$demtotal, NA)); EffGapCalc$netwasted <- abs(EffGapCalc$repwasted - EffGapCalc$demwasted); # Step 2: Sum Total Wasted Rep and Dem Votes totrepwasted <- sum(EffGapCalc$repwasted); totdemwasted <- sum(EffGapCalc$demwasted); netwaste <- ifelse(totrepwasted>totdemwasted, totrepwasted-totdemwasted, ifelse(totrepwasted<totdemwasted, totdemwasted-totrepwasted)); netwaste # Step 3: Divide Net Wasted by Total Number of Votes Case totalsurvtot <- sum(EffGapCalc$surveyed_total); netwaste/totalsurvtot; write.table(out, "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles", sep="\t", quote=F, row.names=F, col.names=T) })

在这一点上，我一直试图弄清楚这两天，只是越来越困惑。任何帮助都感激不尽！

I have about 113 geojson files that I've previously mainly dealt with in QGIS. My goal now is to be able to simultaneously import all files into R and conduct analyses on the underlying attribute tables attached to each respective layer. I have already figured out the best way to import one file and conduct any needed analysis after converting into a data frame. The files that I have in the folder all look like this: 0cfb16c1-90c2-412d-bb60-2fec34c75e9a.geojson

The code I used for this step was:

library(rgdal) map1 <- readOGR(dsn = "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles/maps/sampled_maps/0cfb16c1-90c2-412d-bb60-2fec34c75e9a.geojson", layer = "0cfb16c1-90c2-412d-bb60-2fec34c75e9a") summary(map1) map1 <- as.data.frame(map1)

I want to run the same analysis I did on that map on all geojson files without having to do so one by one. The analysis I conducted related to electoral redistricting metrics and is included here:

cfbdata$reptotal <- (cfbdata$surveyed_republican_percentage/100)*cfbdata$surveyed_total cfbdata$demtotal <- (cfbdata$surveyed_democrat_percentage/100)*cfbdata$surveyed_total cfbdata$NAME <- NULL aggdata <-aggregate(cfbdata, by=list(cfbdata$cluster), FUN=sum, na.rm=TRUE) # Rep district victory is 1 and Dem district victory is 0 aggdata$result <- ifelse(aggdata$reptotal > aggdata$demtotal,1, ifelse(aggdata$demtotal > aggdata$reptotal,0, NA)) EffGapCalc <- subset(aggdata, select=c("cluster","reptotal","demtotal","surveyed_total", "result")) # Step 1: Calculate Dem Wasted, Rep Wasted, and Net Wasted EffGapCalc$repwasted <- ifelse(EffGapCalc$result == 1, EffGapCalc$reptotal - (.51*EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 0, EffGapCalc$reptotal, NA)) EffGapCalc$demwasted <- ifelse(EffGapCalc$result == 0, EffGapCalc$demtotal - (.51 * EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 1, EffGapCalc$demtotal, NA)) EffGapCalc$netwasted <- abs(EffGapCalc$repwasted - EffGapCalc$demwasted) # Step 2: Sum Total Wasted Rep and Dem Votes totrepwasted <- sum(EffGapCalc$repwasted) totdemwasted <- sum(EffGapCalc$demwasted) netwaste <- ifelse(totrepwasted>totdemwasted, totrepwasted-totdemwasted, ifelse(totrepwasted<totdemwasted, totdemwasted-totrepwasted)) netwaste # Democrats had a net waste (more wasted votes) of 74289.6 # Step 3: Divide Net Wasted by Total Number of Votes Case sum(EffGapCalc$surveyed_total) totalsurvtot <- sum(EffGapCalc$surveyed_total) netwaste/totalsurvtot # Efficiency Gap = .0359 [3.60%]

The goal is to run that same analysis for all 113 GEOJSON files and get a list of 113 "Efficiency Gap" numbers like the .0359 above.

I've searched through a number of questions on stackoverflow and elsewhere but have not found a suitable solution. While I initially thought a for loop would be best for this, based on what I've read elsewhere, it appears that lapply() actually might be the better route to go. The challenge I am having is ensuring the right import as part of 'lapply()'

The code I tried using that failed was:

library(rgdal) fileNames <- list.files(path = "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles/maps/sampled_maps", pattern="*.geojson", full.names = TRUE) lapply(fileNames, function(x) { map1 <- readOGR(dsn = x, layer = x) map1 <- as.data.frame(map1) out <- map1$reptotal <- (map1$surveyed_republican_percentage/100)*map1$surveyed_total; map1$demtotal <- (map1$surveyed_democrat_percentage/100)*map1$surveyed_total; map1$NAME <- NULL; aggdata <-aggregate(map1, by=list(map1$cluster), FUN=sum, na.rm=TRUE); aggdata$result <- ifelse(aggdata$reptotal > aggdata$demtotal,1, ifelse(aggdata$demtotal > aggdata$reptotal,0, NA)); EffGapCalc <- subset(aggdata, select=c("cluster","reptotal","demtotal","surveyed_total", "result")); # Step 1: Calculate Dem Wasted, Rep Wasted, and Net Wasted EffGapCalc$repwasted <- ifelse(EffGapCalc$result == 1, EffGapCalc$reptotal - (.51*EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 0, EffGapCalc$reptotal, NA)); EffGapCalc$demwasted <- ifelse(EffGapCalc$result == 0, EffGapCalc$demtotal - (.51 * EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 1, EffGapCalc$demtotal, NA)); EffGapCalc$netwasted <- abs(EffGapCalc$repwasted - EffGapCalc$demwasted); # Step 2: Sum Total Wasted Rep and Dem Votes totrepwasted <- sum(EffGapCalc$repwasted); totdemwasted <- sum(EffGapCalc$demwasted); netwaste <- ifelse(totrepwasted>totdemwasted, totrepwasted-totdemwasted, ifelse(totrepwasted<totdemwasted, totdemwasted-totrepwasted)); netwaste # Step 3: Divide Net Wasted by Total Number of Votes Case totalsurvtot <- sum(EffGapCalc$surveyed_total); netwaste/totalsurvtot; write.table(out, "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles", sep="\t", quote=F, row.names=F, col.names=T) })

I've been trying to figure this out for two days at this point and am only getting more confused. Any help will be much appreciated!

最满意答案

简单的测试代码：

lapply(fileNames, function(x) { map1 <- readOGR(dsn = x, layer = x) }

假设你的情况失败了，我们知道问题在于这一行。这使得这里的人更容易看到它的一个更简单的问题。请务必尽量减少您的问题，这将有助于我们帮助您，并且在很多情况下可以让您自己解决问题。程序...

readOGR的readOGR需要一个文件路径和一个图层名称，并且该代码将使用geojson软件包中的测试文件将该文件路径作为图层名称提供，如下所示：

> testfile <- list.files(path = path, pattern="*.geojson", full.names = TRUE)[5]

快速检查我们得到它：

> file.exists(testfile) [1] TRUE

然后尝试阅读：

> d = readOGR(dsn=testfile, layer=testfile) Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, : Cannot open layer

那么我们如何从文件路径获取图层名称？我们有ogrListLayers ：

> ogrListLayers(testfile) [1] "OGRGeoJSON" attr(,"driver") [1] "GeoJSON" attr(,"nlayers") [1] 1

现在看起来很奇怪，但它的图层名称和一些额外的属性的矢量，你可以忽略这个目的。此测试图层的图层名称是“OGRGeoJSON”。假设你已知的geoJSON只有一层，你可以这样做：

> d = readOGR(dsn=testfile, layer=ogrListLayers(testfile)) OGR data source with driver: GeoJSON Source: "/home/rowlings/R/x86_64-pc-linux-gnu-library/3.4/geojson/examples/linestring_one.geojson", layer: "OGRGeoJSON" with 1 features It has 2 fields Warning message: In readOGR(dsn = testfile, layer = ogrListLayers(testfile)) : Z-dimension discarded

现在我认为geoJSONs只能有一个图层，或者readOGR默认为第一个图层，所以如果你知道geoJSON中只有一个图层，你可以省略 layer=参数并返回一个相同的对象：

> d2 = readOGR(dsn=testfile) OGR data source with driver: GeoJSON Source: "/home/rowlings/R/x86_64-pc-linux-gnu-library/3.4/geojson/examples/linestring_one.geojson", layer: "OGRGeoJSON" with 1 features It has 2 fields Warning message: In readOGR(dsn = testfile) : Z-dimension discarded

Simple test code:

lapply(fileNames, function(x) { map1 <- readOGR(dsn = x, layer = x) }

Assuming that fails for your case, we know the problem is in that one line. That makes it easier for someone here to see its a simpler problem. Please always try and minimise your problems, it will help us help you and in many cases it lets you solve it yourself. Proceeding...

readOGR for a geoJSON needs a file path and a layer name, and that code is going to feed the file path as the layer name, like this, using a test file from the geojson package::

> testfile <- list.files(path = path, pattern="*.geojson", full.names = TRUE)[5]

quick check we've got it:

> file.exists(testfile) [1] TRUE

Then try and read:

> d = readOGR(dsn=testfile, layer=testfile) Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, : Cannot open layer

So how do we get the layer name from the file path? We have ogrListLayers for that:

> ogrListLayers(testfile) [1] "OGRGeoJSON" attr(,"driver") [1] "GeoJSON" attr(,"nlayers") [1] 1

Now that looks pretty weird but its a vector of layer names and some extra attributes that you can ignore for this purpose. The layer name of this test layer is "OGRGeoJSON". Assuming your geoJSONs are known to only be one layer you can do:

> d = readOGR(dsn=testfile, layer=ogrListLayers(testfile)) OGR data source with driver: GeoJSON Source: "/home/rowlings/R/x86_64-pc-linux-gnu-library/3.4/geojson/examples/linestring_one.geojson", layer: "OGRGeoJSON" with 1 features It has 2 fields Warning message: In readOGR(dsn = testfile, layer = ogrListLayers(testfile)) : Z-dimension discarded

Now I think that either geoJSONs can only have one layer, or that readOGR defaults to the first layer, so if you know there's only one layer in your geoJSONs you can leave out the layer= argument and get an identical object returned:

> d2 = readOGR(dsn=testfile) OGR data source with driver: GeoJSON Source: "/home/rowlings/R/x86_64-pc-linux-gnu-library/3.4/geojson/examples/linestring_one.geojson", layer: "OGRGeoJSON" with 1 features It has 2 fields Warning message: In readOGR(dsn = testfile) : Z-dimension discarded

更多推荐

运行多个geojson文件的分析(Running Analysis on Multiple geojson files)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表