我想从大约一百万个条目长的numpy数组中删除一些条目.
I want to remove some entries from a numpy array that is about a million entries long.
此代码可以完成,但是需要很长时间:
This code would do it but take a long time:
a = np.array([1,45,23,23,1234,3432,-1232,-34,233]) for element in a: if element<(-100) or element>100: some delete command.我可以通过其他方式这样做吗?
Can I do this any other way?
推荐答案我假设您的意思是a < -100 or a > -100,最简洁的方法是使用逻辑索引.
I'm assuming you mean a < -100 or a > -100, the most concise way is to use logical indexing.
a = a[(a >= -100) & (a <= 100)]这并不是完全删除"条目,而是复制数组减去不必要的值,然后将其分配给先前分配给旧数组的变量.这种情况发生后,旧数组将没有剩余的引用,并且会被垃圾回收,这意味着其内存已释放.
This is not exactly "deleting" the entries, rather making a copy of the array minus the unwanted values and assigning it to the variable that was previously assigned to the old array. After this happens the old array has no remaining references and is garbage collected, meaning its memory is freed.
值得注意的是,此方法不使用常量内存,因为我们制作了一个数组副本,它使用的内存与数组大小成线性关系.如果您的阵列如此之大,以至于达到了计算机内存的极限,这可能会很糟糕.实际使用和删除数组中每个元素的过程(也称为使用常量内存)将是一个非常不同的操作,因为将需要交换数组中的元素并调整内存块的大小.我不确定您是否可以使用numpy数组执行此操作,但是可以避免复制的一件事是使用numpy掩码数组:
It's worth noting that this method does not use constant memory, since we make a copy of the array it uses memory linear in the size of the array. This could be bad if your array is so huge it reaches the limits of the memory on your machine. The process of actually going through and removing each element in the array "in place", aka using constant memory, would be a very different operation, as elements in the array would need to be swapped around and the block of memory resized. I'm not sure you can do this with a numpy array, however one thing you can do to avoid copying is to use a numpy masked array:
import numpy.ma as ma mx = ma.masked_array(a, mask = ((a < -100) | (a > 100)) )被掩码数组上的所有操作都将表现为好像我们删除"的元素不存在,但我们并未真正删除"它们,它们仍然存在于内存中,仅记录了哪些元素现在跳过与该数组关联的操作,我们永远不需要在内存中制作该数组的副本.另外,如果我们希望恢复删除的值,可以像这样删除掩码:
All operations on the masked array will act as if the elements we "deleted" don't exist, but we didn't really "delete" them, they are still there in memory, there is just a record of which elements to skip now associated with the array, and we don't ever need to make a copy of the array in memory. Also if we ever want our deleted values back, we can just remove the mask like so:
mx.mask = ma.nomask更多推荐
使用条件检查从numpy数组中删除某些元素
发布评论