Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

IT story

Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

hot-time 2020. 9. 8. 22:02

Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

목록의 요소를 비교하는 코드를 최적화하려고합니다.

예 :

public void compare(Set<Record> firstSet, Set<Record> secondSet){
    for(Record firstRecord : firstSet){
        for(Record secondRecord : secondSet){
            // comparing logic
        }
    }
}

세트의 레코드 수가 많을 것임을 고려하십시오.

감사

셰 카르

firstSet.equals(secondSet)

비교 논리에서 수행하려는 작업에 따라 다릅니다. 즉, 한 세트에서 다른 요소가 아닌 요소를 찾으면 어떻게됩니까? 귀하의 메서드에는 void반환 유형이 있으므로이 메서드에서 필요한 작업을 수행 할 것이라고 가정합니다.

필요한 경우보다 세밀한 제어 :

if (!firstSet.containsAll(secondSet)) {
  // do something if needs be
}
if (!secondSet.containsAll(firstSet)) {
  // do something if needs be
}

한 세트에 있고 다른 세트에있는 요소를 가져와야하는 경우.
편집 : set.removeAll(otherSet)집합이 아닌 부울을 반환합니다. removeAll ()을 사용하려면 세트를 복사 한 다음 사용해야합니다.

Set one = new HashSet<>(firstSet);
Set two = new HashSet<>(secondSet);
one.removeAll(secondSet);
two.removeAll(firstSet);

의 내용을 경우 one와는 two모두 비어있는, 당신은 두 세트가 동일한이라고 알고있다. 그렇지 않다면 세트를 불평등하게 만든 요소가 있습니다.

레코드 수가 많을 수 있다고 언급하셨습니다. 기본 구현이 a HashSet이면 각 레코드 가져 오기가 제 O(1)시간에 완료 되므로 그보다 훨씬 더 나을 수 없습니다. TreeSet입니다 O(log n).

단순히 세트가 동일한 지 알고 싶다면 equalson 메서드 AbstractSet는 대략 다음과 같이 구현됩니다.

    public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof Set))
            return false;
        Collection c = (Collection) o;
        if (c.size() != size())
            return false;
        return containsAll(c);
    }

다음과 같은 일반적인 경우를 어떻게 최적화하는지 확인하십시오.

두 개체는 동일합니다
다른 개체는 전혀 집합이 아닙니다.
두 세트의 크기가 다릅니다.

After that, containsAll(...) will return false as soon as it finds an element in the other set that is not also in this set. But if all elements are present in both sets, it will need to test all of them.

The worst case performance therefore occurs when the two sets are equal but not the same objects. That cost is typically O(N) or O(NlogN) depending on the implementation of this.containsAll(c).

And you get close-to-worst case performance if the sets are large and only differ in a tiny percentage of the elements.

UPDATE

If you are willing to invest time in a custom set implementation, there is an approach that can improve the "almost the same" case.

The idea is that you need to pre-calculate and cache a hash for the entire set so that you could get the set's current hashcode value in O(1). Then you can compare the hashcode for the two sets as an acceleration.

How could you implement a hashcode like that? Well if the set hashcode was:

zero for an empty set, and
the XOR of all of the element hashcodes for a non-empty set,

then you could cheaply update the set's cached hashcode each time you added or removed an element. In both cases, you simply XOR the element's hashcode with the current set hashcode.

Of course, this assumes that element hashcodes are stable while the elements are members of sets. It also assumes that the element classes hashcode function gives a good spread. That is because when the two set hashcodes are the same you still have to fall back to the O(N) comparison of all elements.

You could take this idea a bit further ... at least in theory.

Suppose that your set element class has a method to return a crypto checksums for the element. Now implement the set's checksums by XORing the checksums returned for the elements.

What does this buy us?

Well, if we assume that nothing underhand is going on, the probability that any two unequal set elements have the same N-bit checksums is 2^-N. And the probability 2 unequal sets have the same N-bit checksums is also 2^-N. So my idea is that you can implement equals as:

    public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof Set))
            return false;
        Collection c = (Collection) o;
        if (c.size() != size())
            return false;
        return checksums.equals(c.checksums);
    }

Under the assumptions above, this will only give you the wrong answer once in 2^-N time. If you make N large enough (e.g. 512 bits) the probability of a wrong answer becomes negligible (e.g. roughly 10^-150).

The down side is that computing the crypto checksums for elements is very expensive, especially as the number of bits increases. So you really need an effective mechanism for memoizing the checksums. And that could be problematic.

There is a method in Guava Sets which can help here:

public static <E>  boolean equals(Set<? extends E> set1, Set<? extends E> set2){
return Sets.symmetricDifference(set1,set2).isEmpty();
}

You have the following solution from https://www.mkyong.com/java/java-how-to-compare-two-sets/

public static boolean equals(Set<?> set1, Set<?> set2){

    if(set1 == null || set2 ==null){
        return false;
    }

    if(set1.size() != set2.size()){
        return false;
    }

    return set1.containsAll(set2);
}

Or if you prefer to use a single return statement:

public static boolean equals(Set<?> set1, Set<?> set2){

  return set1 != null 
    && set2 != null 
    && set1.size() == set2.size() 
    && set1.containsAll(set2);
}

There's an O(N) solution for very specific cases where:

the sets are both sorted
both sorted in the same order

The following code assumes that both sets are based on the records comparable. A similar method could be based on on a Comparator.

    public class SortedSetComparitor <Foo extends Comparable<Foo>> 
            implements Comparator<SortedSet<Foo>> {

        @Override
        public int compare( SortedSet<Foo> arg0, SortedSet<Foo> arg1 ) {
            Iterator<Foo> otherRecords = arg1.iterator();
            for (Foo thisRecord : arg0) {
                // Shorter sets sort first.
                if (!otherRecords.hasNext()) return 1;
                int comparison = thisRecord.compareTo(otherRecords.next());
                if (comparison != 0) return comparison;
            }
            // Shorter sets sort first
            if (otherRecords.hasNext()) return -1;
            else return 0;
        }
    }

If you are using Guava library it's possible to do:

        SetView<Record> added = Sets.difference(secondSet, firstSet);
        SetView<Record> removed = Sets.difference(firstSet, secondSet);

And then make a conclusion based on these.

I would put the secondSet in a HashMap before the comparison. This way you will reduce the second list's search time to n(1). Like this:

HashMap<Integer,Record> hm = new HashMap<Integer,Record>(secondSet.size());
int i = 0;
for(Record secondRecord : secondSet){
    hm.put(i,secondRecord);
    i++;
}
for(Record firstRecord : firstSet){
    for(int i=0; i<secondSet.size(); i++){
    //use hm for comparison
    }
}

public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof Set))
            return false;

        Set<String> a = this;
        Set<String> b = o;
        Set<String> thedifference_a_b = new HashSet<String>(a);


        thedifference_a_b.removeAll(b);
        if(thedifference_a_b.isEmpty() == false) return false;

        Set<String> thedifference_b_a = new HashSet<String>(b);
        thedifference_b_a.removeAll(a);

        if(thedifference_b_a.isEmpty() == false) return false;

        return true;
    }

I think method reference with equals method can be used. We assume that the object type without a shadow of a doubt has its own comparison method. Plain and simple example is here,

Set<String> set = new HashSet<>();
set.addAll(Arrays.asList("leo","bale","hanks"));

Set<String> set2 = new HashSet<>();
set2.addAll(Arrays.asList("hanks","leo","bale"));

Predicate<Set> pred = set::equals;
boolean result = pred.test(set2);
System.out.println(result);   // true

참고URL : https://stackoverflow.com/questions/3341202/what-is-the-fastest-way-to-compare-two-sets-in-java

'IT story' 카테고리의 다른 글

Python의 "스레드 로컬 저장소"란 무엇이며 왜 필요합니까? (0)	2020.09.09
React Native에서 회전을 비활성화하는 방법은 무엇입니까? (0)	2020.09.08
Swift에서 String.Index는 어떻게 작동합니까? (0)	2020.09.08
하위 디렉터리를 포함한 코드 줄을 계산하는 방법 (0)	2020.09.08
배열의 첫 번째와 마지막 요소를 제거하려면 (0)	2020.09.08

현재글Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

hot-time

Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

'IT story' 카테고리의 다른 글

'IT story'의 다른글

티스토리툴바

Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

Java에서 두 세트를 비교하는 가장 빠른 방법은 무엇입니까?

'IT story' 카테고리의 다른 글

'IT story'의 다른글

관련글

티스토리툴바