MATLAB:从每个类中获取具有相同数量值的样本(MATLAB: Taking sample with same number of values from each class)

编程入门 行业动态 更新时间:2024-10-26 05:30:06
MATLAB:从每个类中获取具有相同数量值的样本(MATLAB: Taking sample with same number of values from each class)

我有一个完整的数据集,可以说50000个观测值,分配给16个类。 我现在想要绘制一个样本,让我们说70%的完整数据,但我希望MATLAB从每个类中获取相同数量的样本(如果可能的话,因为某些类的数量少于所需的数量)

是否有MATLAB功能可以做到这一点,还是我必须为自己编写一个新的功能? 我只想在这里节省时间。

我找到了cvpartition ,但据我所知,这只能用于获取与原始数据集相同的分布的样本,而不是均匀分布的样本。

感谢您的帮助!

I have a full dataset of lets say 50000 observations which are assigned to 16 classes. I now want to draw a Sample of let's say 70% of the full data, but I want MATLAB to take the same number of samples from each class (if possible of course, because some classes have less numbers than needed)

Is there a MATLAB function that can do this, or do I have to program a new one for myself? I'm just trying to save time here.

I found cvpartition, but as far as I know this can be used only to take a sample with the same distribution over the classes as the original dataset and not a uniformly distributed sample.

Thank you for your help!

最满意答案

这不应该太难。 假设观察结果是在矢量observations 。 那你可以做

fraction = 0.7; classes = unique(observations); nObs = length(observations); nClasses = length(classes); nSamples = round(nObs * fraction / nClasses); for ii = 1:nClasses idx = observations == classes(ii); samples((ii-1)*nSamples+1:ii*nSamples) = randsample(observations(idx), nSamples); end

现在samples是一个长度为nClasses * nsamples的向量,它包含您的采样观察值,每个类的数量相等。

目前,如果其中一个类不包含至少nSamples观察值,它将失败。 最简单的解决方法是在调用randsample添加额外的参数'replace','true' ,这将告诉它在被选中后替换每个观察。

It shouldn't be too hard. Let's say that the observations are in a vector observations. Then you can do

fraction = 0.7; classes = unique(observations); nObs = length(observations); nClasses = length(classes); nSamples = round(nObs * fraction / nClasses); for ii = 1:nClasses idx = observations == classes(ii); samples((ii-1)*nSamples+1:ii*nSamples) = randsample(observations(idx), nSamples); end

Now samples is a vector of length nClasses * nsamples that contains your sampled observations, with an equal number from each class.

At the moment it will fail if one of the classes doesn't contain at least nSamples observations. The simplest fix is to add the additional arguments 'replace','true' to the call to randsample, which will tell it to replace each observation after being picked.

更多推荐

本文发布于:2023-08-03 17:05:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1395151.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:样本   类中   量值   MATLAB   class

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!