ElasticSearch - Path Hierarchy 토크나이저 [ 예제, 설명 ]

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

결국 무엇이든 해내는 사람

ElasticSearch - Path Hierarchy 토크나이저 [ 예제, 설명 ] 본문

두서없는 공부 노트/ElasticSearch

ElasticSearch - Path Hierarchy 토크나이저 [ 예제, 설명 ]

kkm8257 2021. 12. 14. 16:24

-- 디렉토리나 파일 경로등을 다루는데 사용하는 토크나이저
-- Pattern 토크나이저는 디렉토리명들이 하나씩 토큰으로 분리된 것을 확인
-- Path Hierarchy 토크나이저는 계층별로 저장해서 수준별로 검색하거나 집계하는 것이 가능

POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text": "/usr/share/elasticsearch/bin"
}

-- 상위 디렉토리부터 차근차근 쌓여서 토큰화 된다.
{
  "tokens" : [
    {
      "token" : "/usr",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/share",
      "start_offset" : 0,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/share/elasticsearch",
      "start_offset" : 0,
      "end_offset" : 24,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/usr/share/elasticsearch/bin",
      "start_offset" : 0,
      "end_offset" : 28,
      "type" : "word",
      "position" : 0
    }
  ]
}




-- delimiter 옵션으로 경로 구분자를 지정하고 다른 구분자로 치환도 가능하다

PUT hir_tokenizer
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "my_hir_tokenizer": {
          "type": "path_hierarchy",
          "delimiter": "-",
          "replacement": "/"
        }
      }
    }
  }
}



GET hir_tokenizer/_analyze
{
  "tokenizer": "my_hir_tokenizer",
  "text": [
    "one-two-three"
  ]
}


-- "-"가 "/"로 치환되어서 하이라키 구조를 이룬다.
{
  "tokens" : [
    {
      "token" : "one",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "one/two",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "one/two/three",
      "start_offset" : 0,
      "end_offset" : 13,
      "type" : "word",
      "position" : 0
    }
  ]
}

'두서없는 공부 노트 > ElasticSearch' 카테고리의 다른 글

ElasticSearch - Stop 토큰필터 [ 예제, 설명 ] (0)	2021.12.14
ElasticSearch - lowercase, uppercase 토큰 필터 [ 예제, 설명 ] (0)	2021.12.14
ElasticSearch - uax_url_email 토크나이저 [ 예제, 설명 ] (0)	2021.12.14
ElasticSearch - standard, letter, whitespace 토크나이저 [ 예제, 설명 ] (0)	2021.12.14
ElasticSearch - Full Text Query [ 예제, 설명 ] (0)	2021.12.14

'두서없는 공부 노트/ElasticSearch' Related Articles

Comments

결국 무엇이든 해내는 사람

ElasticSearch - Path Hierarchy 토크나이저 [ 예제, 설명 ] 본문

ElasticSearch - Path Hierarchy 토크나이저 [ 예제, 설명 ]

'두서없는 공부 노트 > ElasticSearch' 카테고리의 다른 글

티스토리툴바