将数据读入SAS,未对齐的列(Reading data into SAS, columns not aligned)

编程入门 行业动态 更新时间:2024-10-23 04:45:52
数据读入SAS,未对齐的列(Reading data into SAS, columns not aligned)

我有一个如下所示的数据文件:

001 Mayo Clinic 120 78 7 15 Patient has had a persistent cough for 3 weeks 023 Mayo Clinic 157 72 10 2 Patient complained of ear ache 064 HMC 201 59 . . Patient left against medical advice 003 HMC 166 58 8 15 Patient placed on beta-blockers on 7/1/2006

我发现将这个读入SAS的任务基本上是不可能的。 不,在这种情况下,重新格式化数据文件是不可能的。 那么让我解释一下你在这里看到的内容:

每个科目都有两行数据。 第一行是 -

科目编号/诊所/ wt / hr / dx / sx(不要担心数字的含义,那是无关紧要的)。

第二行是文本,它基本上是一个包含额外信息的注释,该信息涉及其数据在前一行中布局的主题。 所以,线条:

001 Mayo Clinic 120 78 7 15 Patient has had a persistent cough for 3 weeks

适用于单一主题。 主题001.这些需要成为SAS数据集中的单行。 我完全不知所措; 由于诊所名称的长度不同,而且数字列未对齐,我无法弄清楚如何让SAS阅读此内容。 这是我能得到的最接近的:

data ClinData; infile "&wdir.clinic_data.txt"; retain patno clinic weight hr dx sx exinfo; input patno clinic $1. @; if clinic='M' then input patno @5 clinic $11. weight hr dx sx / @1 exinfo $30.; else if clinic='H' then input patno @5 clinic $3. weight hr dx sx / @1 exinfo $30.; run;

这打印为:

http://i61.tinypic.com/2uswl90.png

所有数值都在正确的位置。

但是,这有几个问题。

首先,主题编号('patno')总是显示为缺失值。 为什么?

其次,诊所仅以其第一个字母“M”或“H”表示。 我不能让SAS根据它所在的诊所改变诊所变量的长度。

第三,变量“exinfo”包含关于患者的注释。 但是,我无法让SAS包含整条生产线。 在格式化失败之前,我能得到的最高值是大约30个字符。

有帮助吗? 对于这种类型的输入,SAS文档令人沮丧。 这些例子都没有真正符合我的要求,也没有充分解释如何使用某些选项。 我知道我需要使用列/行指针; 但问题是列与行不一致。 所以无论我使用哪种指针格式,仍然会出现不正确的线条。

I have a data file that looks like this:

001 Mayo Clinic 120 78 7 15 Patient has had a persistent cough for 3 weeks 023 Mayo Clinic 157 72 10 2 Patient complained of ear ache 064 HMC 201 59 . . Patient left against medical advice 003 HMC 166 58 8 15 Patient placed on beta-blockers on 7/1/2006

I am finding the task of reading this into SAS to be basically impossible. And no, in this case, reformatting the data file is out of the question. So let me explain what you are looking at here:

Each subject has two lines of data. The first line is-

subject number / clinic / wt / hr / dx / sx (don't worry about what the numbers mean, thats irrelevant).

The second line is text, which is basically a note containing extra information referring to the subject whose data is laid out in the previous line. So, the lines:

001 Mayo Clinic 120 78 7 15 Patient has had a persistent cough for 3 weeks

Are for a SINGLE subject. Subject 001. These need to become a single row in a SAS data set. I am completely at a loss; because of the different lengths for the clinic names, and the number columns not being aligned, I can't figure out how to get SAS to read this. This is the closest I have been able to get:

data ClinData; infile "&wdir.clinic_data.txt"; retain patno clinic weight hr dx sx exinfo; input patno clinic $1. @; if clinic='M' then input patno @5 clinic $11. weight hr dx sx / @1 exinfo $30.; else if clinic='H' then input patno @5 clinic $3. weight hr dx sx / @1 exinfo $30.; run;

This prints as:

http://i61.tinypic.com/2uswl90.png

All of the numerical values are in the right place.

However, this has a several problems.

First, the subject number ('patno') always shows up as a missing value. Why?

Second, the clinic is only represented by its first letter 'M' or 'H'. I can't get SAS to change the length of the clinic variable based on which clinic it is.

Third, the variable "exinfo" contains the notes about the patient. However, I can't get SAS to include the entire line. The highest I can get it is around 30 characters before the formatting goes haywire.

Any help? The SAS documentation is frustratingly poor for this type of input. None of the examples really match up with what I need, and it doesn't adequately explain how to use some of the options. I know I need to use column/line pointers; but the problem is that the columns aren't consistent from line to line. So no matter which pointer format I use there will still be lines that don't come out right.

最满意答案

您遇到的大多数问题都是因为您已明确声明的长度。 例如,Clinic在初始输入语句中定义为$ 1,并且在第二个输入行中尝试时无法修改事件后的长度。

这应该让你更接近你想要的:

data ClinData(drop=s varlen); retain patno clinic weight hr dx sx; input patno clinic $30. @; clinic=compress(clinic,,'ka'); s=length(clinic)+4+2; input @s weight hr dx sx /@; varlen=length(_infile_); input @1 exinfo $varying256. varlen; datalines4; 001 Mayo Clinic 120 78 7 15 Patient has had a persistent cough for 3 weeks 023 Mayo Clinic 157 72 10 2 Patient complained of ear ache 064 HMC 201 59 . . Patient left against medical advice 003 HMC 166 58 8 15 Patient placed on beta-blockers on 7/1/2006 ;;;; run; proc print data=ClinData; run;

most of the issues you were running into were because of lengths that you had explicitly declared. For example Clinic was defined in the initial input statement as $1 and you can't modify the length after the fact as you attempted in the second input line .

this should get you closer to what you were looking for:

data ClinData(drop=s varlen); retain patno clinic weight hr dx sx; input patno clinic $30. @; clinic=compress(clinic,,'ka'); s=length(clinic)+4+2; input @s weight hr dx sx /@; varlen=length(_infile_); input @1 exinfo $varying256. varlen; datalines4; 001 Mayo Clinic 120 78 7 15 Patient has had a persistent cough for 3 weeks 023 Mayo Clinic 157 72 10 2 Patient complained of ear ache 064 HMC 201 59 . . Patient left against medical advice 003 HMC 166 58 8 15 Patient placed on beta-blockers on 7/1/2006 ;;;; run; proc print data=ClinData; run;

更多推荐

本文发布于:2023-04-27 16:34:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1327175.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据   SAS   Reading   aligned   columns

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!