我有一个完整的数据集,可以说50000个观测值,分配给16个类。 我现在想要绘制一个样本,让我们说70%的完整数据,但我希望MATLAB从每个类中获取相同数量的样本(如果可能的话,因为某些类的数量少于所需的数量)
是否有MATLAB功能可以做到这一点,还是我必须为自己编写一个新的功能? 我只想在这里节省时间。
我找到了cvpartition ,但据我所知,这只能用于获取与原始数据集相同的分布的样本,而不是均匀分布的样本。
感谢您的帮助!
I have a full dataset of lets say 50000 observations which are assigned to 16 classes. I now want to draw a Sample of let's say 70% of the full data, but I want MATLAB to take the same number of samples from each class (if possible of course, because some classes have less numbers than needed)
Is there a MATLAB function that can do this, or do I have to program a new one for myself? I'm just trying to save time here.
I found cvpartition, but as far as I know this can be used only to take a sample with the same distribution over the classes as the original dataset and not a uniformly distributed sample.
Thank you for your help!
最满意答案
这不应该太难。 假设观察结果是在矢量observations 。 那你可以做
fraction = 0.7; classes = unique(observations); nObs = length(observations); nClasses = length(classes); nSamples = round(nObs * fraction / nClasses); for ii = 1:nClasses idx = observations == classes(ii); samples((ii-1)*nSamples+1:ii*nSamples) = randsample(observations(idx), nSamples); end现在samples是一个长度为nClasses * nsamples的向量,它包含您的采样观察值,每个类的数量相等。
目前,如果其中一个类不包含至少nSamples观察值,它将失败。 最简单的解决方法是在调用randsample添加额外的参数'replace','true' ,这将告诉它在被选中后替换每个观察。
It shouldn't be too hard. Let's say that the observations are in a vector observations. Then you can do
fraction = 0.7; classes = unique(observations); nObs = length(observations); nClasses = length(classes); nSamples = round(nObs * fraction / nClasses); for ii = 1:nClasses idx = observations == classes(ii); samples((ii-1)*nSamples+1:ii*nSamples) = randsample(observations(idx), nSamples); endNow samples is a vector of length nClasses * nsamples that contains your sampled observations, with an equal number from each class.
At the moment it will fail if one of the classes doesn't contain at least nSamples observations. The simplest fix is to add the additional arguments 'replace','true' to the call to randsample, which will tell it to replace each observation after being picked.
更多推荐
发布评论