深入了解Elasticsearch存储

编程入门行业动态更新时间:2024-10-26 14:31:33

深入了解<a href=https://www.elefans.com/category/jswz/34/1770454.html style= Elasticsearch存储"/>

深入了解Elasticsearch存储

本文我们深入了解关于Elasticsearch存储，如我们写入Elasticsearch的数据是如何在节点上存储的。

Elasticsearch的路径

Elasticsearch主要有以下路径：

path.home：运行Elasticsearch进程的用的的home目录，默认为Java系统属性user.dir
path.conf：Elasticsearch的配置文件目录
path.plugins：Elasticsearch安装第三方插件的目录
path.work：Elasticsearch存放工作和临时文件的目录，现在已经弃用
path.logs：存放Elasticsearch日志目录
path.data：存放Elasticsearch数据目录

本文我们详细研究path.data目录存储结构。

path.data存储详情

由于Elasticsearch的底层是基于Lucene的，所以path.data索引文件中的主要是由Lucene产生。Elasticsearch与Lucene各自有各自的分工。Lucene主要负责编写和维护索引文件，Elasticsearch则是在Lucene的基础之上维护元数据信息，比如Mapping和集群状态等。一些Lucene做不到的功能则由Elasticsearch来弥补。

参考：Elasticsearch原理（二）：索引存储方式

Elasticsearch存储

Node Data

data（path.data）
└── elasticsearch
└── nodes
└── 0
├── _state
├── indices

│ └── global-0.st
└── node.lock

node.lock文件用于确保一次只能从一个数据目录读取/写入一个Elasticsearch实例。
global-0.st文件是存储集群状态的二进制文件，global后面的数字代表集群状态的版本号，每个节点在选举Master过程中的存储的集群状态版本号不一定一致，在选举Master成功后会采用最大版本号的集群状态。
indices目录下面介绍

理论上这些文件都是可以使用特定编辑器进行修改的，但原则上不建议修改，有可能会造成数据丢失。
###Index Data
上面看到了indices文件夹中主要存储索引数据，下面是indices目录下的结构：

indices
└── index_id
└── shard_id
└── 0
├── _state
└── state-0.st

index_id对应的是索引的唯一标识，Elasticsearch内部是根据这个唯一标识来区分不同索引的。
shard_id即为分片编号，从0开始递增
state-0.st文件是保存索引状态文件，如索引创建时间、设置等。也是二进制文件。state后面的编号是版本号，类似集群状态文件。

Shard Data

分片数据存储在上面提到的shard_id中，不同分片存在不同的目录下。

0
└── index
├── _state
└── state-0.st
└── translog

└── 0
├── _state
├── state-0.st

index目录中包含所有索引Lucene文件
state-0.st文件存储分片状态，state后跟版本号
translog目录存储Elasticsearch的事务日志

Lucene Index Files

Lucene对于索引文件的记录这方面做的很好，下面我们进入Lucene索引文件目录。

Name	Extension	Brief Description
Segments File	segments_N	Stores information about a commit point
Lock File	write.lock	The Write lock prevents multiple IndexWriters from writing to the same file.
Segment Info	.si	Stores metadata about a segment
Compound File	.cfs, .cfe	An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles.
Fields	.fnm	Stores information about the fields
Field Index	.fdx	Contains pointers to field data
Field Data	.fdt	The stored fields for documents
Term Dictionary	.tim	The term dictionary, stores term info
Term Index	.tip	The index into the Term Dictionary
Frequencies	.doc	Contains the list of docs which contain each term along with frequency
Positions	.pos	Stores position information about where a term occurs in the index
Payloads	.pay	Stores additional per-position metadata information such as character offsets and user payloads
Norms	.nvd, .nvm	Encodes length and boost factors for docs and fields
Per-Document Values	.dvd, .dvm	Encodes additional scoring factors or other per-document information.
Term Vector Index	.tvx	Stores offset into the document data file
Term Vector Documents	.tvd	Contains information about each document that has term vectors
Term Vector Fields	.tvf	The field level info about term vectors
Live Documents	.liv	Info about what files are live