计算大文本文件的反转(Counting Inversion for a large text file)

代码的目标是找出总数。数组的反转。我的代码成功运行。成功测试了6个元素（所有元素以相反的顺序从最高开始），反转计数= 15.此外，成功测试了10个元素（所有元素以相反的顺序从最高开始），反转计数= 45但是，对于大包含100k整数的文件，耗时差不多25秒。这是预期的吗？请建议还是我可以进一步降低执行时间？我刚刚在传统的合并排序算法中进行了一些小调整（即计算反转总数的行）如何进一步缩短整体运行时间？

def mergeSort(final_list): global total_count if len(final_list)>1: mid_no=len(final_list)//2 left_half=final_list[:mid_no] right_half=final_list[mid_no:] mergeSort(left_half) mergeSort(right_half) '''Below code is for merging the lists''' i=j=k=0 #i is index for left half, j for the right half and k for the resultant list while i<len(left_half) and j<len(right_half): if left_half[i] < right_half[j]: final_list[k]=left_half[i] i+=1 k+=1 else: final_list[k]=right_half[j] print 'total count is' print total_count #total_count+=len(left_half)-i total_count+=len(left_half[i:]) print 'total_count is ' print total_count print 'pairs are ' print str(left_half[i:])+' with '+str(right_half[j]) j+=1 k+=1 while i<len(left_half): final_list[k]=left_half[i] k+=1 i+=1 while j<len(right_half): final_list[k]=right_half[j] j+=1 k+=1 '''Code for list merge ends''' #temp_list=[45,21,23,4,65] #temp_list=[1,5,2,3,4,6] #temp_list=[6,5,4,3,2,1] #temp_list=[1,2,3,4,5,6] #temp_list=[10,9,8,7,6,5,4,3,2,1] #temp_list=[1,22,3,4,66,7] temp_list=[] f=open('temp_list.txt','r') for line in f: temp_list.append(int(line.strip())) print 'list is ' print temp_list print 'list ends' print temp_list[0] print temp_list[-1] '''import time time.sleep(1000) print 'hhhhhhhhhh' ''' total_count=0 mergeSort(temp_list) print temp_list

The objective of the code is to find out the total no. of inversions for an array. My code works successfully. Tested successfully for 6 elements (all elements in reverse order starting with the highest) with inversion count= 15. Also,tested successfully for 10 elements(all elements in reverse order starting with the highest) with inversion count= 45 However, for a large file containing 100k integers, it is taking almost a 25 seconds. Is this expected ? Kindly suggest or can i further bring down the execution time ? I have just made a minor tweak in the conventional merge sort algorithm (i.e. the line to count the total no. of inversions) How can i further reduce the overall running time?

def mergeSort(final_list): global total_count if len(final_list)>1: mid_no=len(final_list)//2 left_half=final_list[:mid_no] right_half=final_list[mid_no:] mergeSort(left_half) mergeSort(right_half) '''Below code is for merging the lists''' i=j=k=0 #i is index for left half, j for the right half and k for the resultant list while i<len(left_half) and j<len(right_half): if left_half[i] < right_half[j]: final_list[k]=left_half[i] i+=1 k+=1 else: final_list[k]=right_half[j] print 'total count is' print total_count #total_count+=len(left_half)-i total_count+=len(left_half[i:]) print 'total_count is ' print total_count print 'pairs are ' print str(left_half[i:])+' with '+str(right_half[j]) j+=1 k+=1 while i<len(left_half): final_list[k]=left_half[i] k+=1 i+=1 while j<len(right_half): final_list[k]=right_half[j] j+=1 k+=1 '''Code for list merge ends''' #temp_list=[45,21,23,4,65] #temp_list=[1,5,2,3,4,6] #temp_list=[6,5,4,3,2,1] #temp_list=[1,2,3,4,5,6] #temp_list=[10,9,8,7,6,5,4,3,2,1] #temp_list=[1,22,3,4,66,7] temp_list=[] f=open('temp_list.txt','r') for line in f: temp_list.append(int(line.strip())) print 'list is ' print temp_list print 'list ends' print temp_list[0] print temp_list[-1] '''import time time.sleep(1000) print 'hhhhhhhhhh' ''' total_count=0 mergeSort(temp_list) print temp_list

最满意答案

我找到了（并通过个人资料验证）

#total_count+=len(left_half[i:]) total_count += len(left_half) - i

left_half [i：]在递归函数的主循环中创建一个包含多个元素副本的新列表。这是一个巧妙的拼接使用，但副作用正在扼杀你的表现。

这是我如何分解功能来分析它：

def so_merge (final_list, left_half, right_half): global total_count i=j=k=0 #i is index for left half, j for the right half and k for the resultant list while i<len(left_half) and j<len(right_half): if left_half[i] < right_half[j]: final_list[k]=left_half[i] i+=1 k+=1 else: final_list[k]=right_half[j] count1 = get_incriment_bad(left_half, i) count2 = get_incriment_good(left_half, i) if count1 != count2: raise ValueError total_count += count1 j+=1 k+=1 finish_left(final_list, left_half, i, k) finish_right(final_list, right_half, j, k)

结果显示它花了19.574秒获得len（left_half [i：]）

ncalls tottime percall cumtime percall filename:lineno(function) 199999/1 0.805 0.000 29.562 29.562 week1.py:124(so_mergesort) 99999 7.496 0.000 28.735 0.000 week1.py:104(so_merge) 776644 19.512 0.000 19.574 0.000 week1.py:101(get_incriment_bad) 776644 0.839 0.000 0.895 0.000 week1.py:98(get_incriment_good) 5403164 0.382 0.000 0.382 0.000 {len} 99999 0.273 0.000 0.286 0.000 week1.py:92(finish_right) 99999 0.255 0.000 0.266 0.000 week1.py:86(finish_left) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

I found it (and verified with profile)

#total_count+=len(left_half[i:]) total_count += len(left_half) - i

left_half[i:] creates a new list with a copy of several elements, many times, in the main loop of your recursive function. It was a clever use of splicing, but the side effects are killing your performance.

Here's how I broke down the function to profile it:

def so_merge (final_list, left_half, right_half): global total_count i=j=k=0 #i is index for left half, j for the right half and k for the resultant list while i<len(left_half) and j<len(right_half): if left_half[i] < right_half[j]: final_list[k]=left_half[i] i+=1 k+=1 else: final_list[k]=right_half[j] count1 = get_incriment_bad(left_half, i) count2 = get_incriment_good(left_half, i) if count1 != count2: raise ValueError total_count += count1 j+=1 k+=1 finish_left(final_list, left_half, i, k) finish_right(final_list, right_half, j, k)

and the results show that it spent 19.574 seconds getting len(left_half[i:])

ncalls tottime percall cumtime percall filename:lineno(function) 199999/1 0.805 0.000 29.562 29.562 week1.py:124(so_mergesort) 99999 7.496 0.000 28.735 0.000 week1.py:104(so_merge) 776644 19.512 0.000 19.574 0.000 week1.py:101(get_incriment_bad) 776644 0.839 0.000 0.895 0.000 week1.py:98(get_incriment_good) 5403164 0.382 0.000 0.382 0.000 {len} 99999 0.273 0.000 0.286 0.000 week1.py:92(finish_right) 99999 0.255 0.000 0.266 0.000 week1.py:86(finish_left) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

更多推荐

计算大文本文件的反转(Counting Inversion for a large text file)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表