c++ - FAST unique combinations (from list with duplicates) WITHOUT LOOKUPS -


i seems in spite of fact there online plenty of algorithms , functions generating unique combinations of size list of unique items, there none available in case of list of non-unique items (i.e. list containing repetitions of same value.)

the question how generate on-the-fly in generator function unique combinations non-unique list without computational expensive need of filtering out duplicates?

now there bounty motivated answer question easier provide more clarity expect achieve:

first let's provide code illustrating how check if combination combob considered duplicate of 1 (comboa):

comboa = [1,2,2] combob = [2,1,2] print("b duplicate of a:", comboa.sort()==combob.sort()) 

in given example of b duplicate of , print() prints true.

the problem of getting generator function capable of providing unique combinations on-the-fly in case of non-unique list solved here: getting unique combinations non-unique list of items, faster?, provided generator function needs lookups , requires memory causes problems in case of huge amount of combinations.

the in current version of answer provided function job without lookups , appears right answer here, ...

the goal behind getting rid of lookups speed generation of unique combinations in case of list duplicates.

i have (writing first version of question) wrongly assumed code doesn't require creation of set used lookups needed assure uniqueness expected give advantage on code needing lookups. it not case. @ least not always. code in provided answer not using lookups, taking more time generate combinations in case of no redundant list or if few redundant items in list.

here timings illustrate current situation:

-----------------  k: 6 len(ls): 48 combos   used code                               time --------------------------------------------------------- 12271512 len(list(combinations(ls,k)))       :  2.036 seconds 12271512 len(list(subbags(ls,k)))            : 50.540 seconds 12271512 len(list(uniquecombinations(ls,k))) :  8.174 seconds 12271512 len(set(combinations(sorted(ls),k))):  7.233 seconds --------------------------------------------------------- 12271512 len(list(combinations(ls,k)))       :  2.030 seconds        1 len(list(subbags(ls,k)))            :  0.001 seconds        1 len(list(uniquecombinations(ls,k))) :  3.619 seconds        1 len(set(combinations(sorted(ls),k))):  2.592 seconds 

above timings illustrate 2 extremes: no duplicates , duplicates. other timings between two.

my interpretation of results above pure python function (no itertools or other c-compiled modules) can extremely faster, can slower depending on how many duplicates in list. there no way around writing c++ code python .so extension module providing required functionality.

instead of post-processing/filtering output, can pre-process input list. way, can avoid generating duplicates in first place. pre-processing involves either sorting (or using collections.counter on) input. 1 possible recursive realization is:

def subbags(bag, k):     = sorted(bag)     n = len(a)     sub = []      def index_of_next_unique_item(i):         j = + 1          while j < n , a[j] == a[i]:             j += 1          return j      def combinate(i):         if len(sub) == k:             yield tuple(sub)         elif n - >= k - len(sub):             sub.append(a[i])             yield combinate(i + 1)             sub.pop()             yield combinate(index_of_next_unique_item(i))      yield combinate(0)  bag = [1, 2, 3, 1, 2, 1] k = 3 = -1  print(sorted(bag), k) print('---')  i, subbag in enumerate(subbags(bag, k)):     print(subbag)  print('---') print(i + 1) 

output:

[1, 1, 1, 2, 2, 3] 3 --- (1, 1, 1) (1, 1, 2) (1, 1, 3) (1, 2, 2) (1, 2, 3) (2, 2, 3) --- 6 

requires stack space recursion, + sorting input should use substantially less time + memory generating , discarding repeats.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -