这是我今天第一次打开sas,而且我正在查看同事写的一些代码。
所以假设我有一些重复发生的数据( import ),但我只想要那些名为VTNR的唯一编号的数据。
首先,她寻找独特的数字:
data M.import; set M.import; by VTNR; if first.VTNR=1 then unique=1; run;然后她创建了一个包含重复数字的表:
data M.import_dup1; set M.import; where unique^=1; run;最后是一张包含所有副本的表格。 但在这里,她真的很难对这些数字进行编码,例如:
data M.import_dup2; set M.import; where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041); run;我相信一定有更好的方法。
因为我只熟悉RI会写出如下内容:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))我想有一些像sas一样的东西吗?
It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import; set M.import; by VTNR; if first.VTNR=1 then unique=1; run;Then she creates a table with the duplicated numbers:
data M.import_dup1; set M.import; where unique^=1; run;And finally a table with all duplicates. But here she is really hardcoding the numbers, so for example:
data M.import_dup2; set M.import; where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041); run;I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))I guess there must be something like the $ also for sas?
最满意答案
对我来说,它看起来像R代码的最直接的翻译
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))将使用SQL代码
proc sql; create table import_dup2 as select * from import where VTNR in (select VTNR from import_dup1) ; quit;但是,如果您的目的是在IMPORT中找到每个VTNR值具有多个观察值的观察结果,则无需首先创建其他表。
data import_dup2 ; set import; by VTNR ; if not (first.VTNR and last.VTNR); run;To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))Would be to use SQL code
proc sql; create table import_dup2 as select * from import where VTNR in (select VTNR from import_dup1) ; quit;But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ; set import; by VTNR ; if not (first.VTNR and last.VTNR); run;更多推荐
发布评论