IT story

집계 프레임 워크 용 Mongodb Explain

hot-time 2020. 8. 13. 19:00
반응형

집계 프레임 워크 용 Mongodb Explain


MongoDB에 집계 프레임 워크에 대한 설명 기능이 있습니까? 문서에서 볼 수 없습니다.

확인할 수있는 다른 방법이없는 경우 집계 프레임 워크 내에서 쿼리가 어떻게 수행됩니까?

나는 당신이 그냥 할 찾기로 알고 있습니다

db.collection.find().explain()

하지만 집계 프레임 워크를 사용하면 오류가 발생합니다.

db.collection.aggregate(
    { $project : { "Tags._id" : 1 }},
    { $unwind : "$Tags" },
    { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
    { 
        $group: 
        { 
            _id : { id: "$_id"},
            "count": { $sum:1 } 
        }
    },
    { $sort: {"count":-1}}
).explain()

MongoDB 버전 3.0부터 간단히 순서를 변경합니다.

collection.aggregate(...).explain()

collection.explain().aggregate(...)

원하는 결과를 얻을 수 있습니다 ( 여기에 문서 ).

2.6보다 큰 이전 버전의 경우 집계 파이프 라인 작업에 대한 옵션 을 사용해야합니다.explain

explain:true

db.collection.aggregate([
    { $project : { "Tags._id" : 1 }},
    { $unwind : "$Tags" },
    { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
    { $group: { 
        _id : "$_id",
        count: { $sum:1 } 
    }},
    {$sort: {"count":-1}}
  ],
  {
    explain:true
  }
)

집계 프레임 워크 중요한 고려 사항 인덱스는 파이프 라인에 대한 초기 데이터를 가져 오는 데 사용할 수 있다는 것입니다 (예 : 사용 $match, $sort, $geonear파이프 라인의 시작 부분)뿐만 아니라 다음 $lookup$graphLookup단계. 데이터 처리 집약 파이프 라인으로 인출되어있다 (예처럼 단계를 통과하면 $project, $unwind, 및 $group)는 상기 조작에 메모리 (경우 생성 가능한 임시 파일을 사용할 것이다 allowDiskUse옵션 설정 됨).

파이프 라인 최적화

일반적으로 다음과 같은 방법으로 집계 파이프 라인을 최적화 할 수 있습니다.

  • $match관련 문서로 처리를 제한 하는 단계로 파이프 라인을 시작합니다 .
  • 효율적인 색인에 의해 초기 $match/ $sort단계가 지원 되는지 확인합니다 .
  • 필터링 데이터는 초기 사용 $match, $limit$skip.
  • 불필요한 단계 및 문서 조작 최소화 (복잡한 집계 체조가 필요한 경우 스키마를 재고 할 수 있음).
  • MongoDB 서버를 업그레이드 한 경우 최신 집계 연산자를 활용합니다. 예를 들어 MongoDB 3.4 는 배열, 문자열 및 패싯 작업 지원을 포함하여 많은 새로운 집계 단계 및 표현식추가했습니다 .

MongoDB 서버 버전에 따라 자동으로 발생 하는 여러 집계 파이프 라인 최적화도 있습니다. 예를 들어, 출력 결과에 영향을주지 않고 실행을 개선하기 위해 인접한 단계를 통합 및 / 또는 재정렬 할 수 있습니다.

한계

As at MongoDB 3.4, the Aggregation Framework explain option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats mode for a find() query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain() query with executionStats or allPlansExecution verbosity.

There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:


Starting with version 2.6.x mongodb allows users to do explain with aggregation framework.

All you need to do is to add explain : true

db.records.aggregate(
  [ ...your pipeline...],
  { explain: true }
)

Thanks to Rafa, I know that it was possible to do even in 2.4, but only through runCommand(). But now you can use aggregate as well.


The Aggregation framework

The aggregation framework is a set of analytics tools within MongoDB that allows us to run various types of reports or analysis on documents in one or more collections. Based on the idea of a pipeline. We take input from a MongoDB collection and pass the documents from that collection through one or more stages, each of which performs a different operation on it's inputs. Each stage takes as input whatever the stage before it produced as output. And the inputs and outputs for all stages are a stream of documents. Each stage has a specific job that it does. It's expecting a specific form of document and produces a specific output, which is itself a stream of documents. At the end of the pipeline, we get access to the output.

aggregation framework stage

An individual stage is a data processing unit. Each stage takes as input a stream of documents one at a time, processes each document one at a time and produces the output stream of documents. Again, one at a time. Each stage provide a set of knobs or tunables that we can control to parameterize the stage to perform whatever task we're interested in doing. So a stage performs a generic task - a general purpose task of some kind and parameterize the stage for the particular set of documents that we're working with. And exactly what we would like that stage to do with those documents. These tunables typically take the form of operators that we can supply that will modify fields, perform arithmetic operations, reshape documents or do some sort of accumulation task as well as a veriety of other things. Often times, it the case that we'll want to include the same type of stage multiple times within a single pipeline.

same type of stage multiple times within a single pipeline

e.g. We may wish to perform an initial filter so that we don't have to pass the entire collection into our pipeline. But, then later on, following some additional processing, want to filter once again using a different set of criteria. So, to recap, pipeline works with a MongoDB collection. They're composed of stages, each of which does a different data processing task on it's input and produces documents as output to be passed to the next stage. And finally at the end of the pipeline output is produced that we can then do something within our application. In many cases, it's necessary to include the same type of stage, multiple times within an individual pipeline.

참고URL : https://stackoverflow.com/questions/12702080/mongodb-explain-for-aggregation-framework

반응형