我有一个大型数据集,我正在尝试使用SOM进行聚类分析.数据集是 HUGE (约数十亿条记录),我不确定神经元的数量和SOM网格的大小应该是多少.任何涉及某些有关估计神经元数量和网格大小的材料的指针都将不胜感激.
I have a large dataset I am trying to do cluster analysis on using SOM. The dataset is HUGE (~ billions of records) and I am not sure what should be the number of neurons and the SOM grid size to start with. Any pointers to some material that talks about estimating the number of neurons and grid size would be greatly appreciated.
谢谢!
推荐答案我对此没有参考,但我建议您从数据集中每个预期的类使用大约10个SOM神经元开始.例如,如果您认为您的数据集包含8个单独的成分,请查看具有9x9神经元的地图.不过,这完全只是个试探法.
I don't have a reference for it, but I would suggest starting off by using approximately 10 SOM neurons per expected class in your dataset. For example, if you think your dataset consists of 8 separate components, go for a map with 9x9 neurons. This is completely just a ballpark heuristic though.
如果您希望数据更直接地驱动 SOM 的拓扑结构,请尝试在训练期间更改拓扑结构的 SOM 变体之一:
If you'd like the data to drive the topology of your SOM a bit more directly, try one of the SOM variants that change topology during training:
- 增长SOM
- 生长神经毒气
不幸的是,这些算法比纯SOM涉及更多的参数调整,但它们可能适用于您的应用程序.
Unfortunately these algorithms involve even more parameter tuning than plain SOM, but they might work for your application.
更多推荐
Kohonen自组织图:确定神经元的数量和网格大小
发布评论