Home

Awesome

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

English | 中文

TsFile Document

<pre> ___________ ___________.__.__ \__ ___/____\_ _____/|__| | ____ | | / ___/| __) | | | _/ __ \ | | \___ \ | \ | | |_\ ___/ |____|/____ >\___ / |__|____/\___ > version 1.0.0 \/ \/ \/ </pre>

codecov Maven Version

Introduction

TsFile is a columnar storage file format designed for time series data, which supports efficient compression, high throughput of read and write, and compatibility with various frameworks, such as Spark and Flink. It is easy to integrate TsFile into IoT big data processing frameworks.

Time series data is becoming increasingly important in a wide range of applications, including IoT, intelligent control, finance, log analysis, and monitoring systems.

TsFile is the first existing standard file format for time series data. Despite the widespread presence and significance of temporal data, there has been a longstanding absence of standardized file formats for its management. The advent of TsFile introduces a unified file format to facilitate users in managing temporal data.

Click for More Information

TsFile Features

TsFile offers several distinctive features and benefits:

TsFile Basic Concepts

TsFile can manage the time series data of multiple devices. Each device can have different measurement.

Each measurement of each device corresponds to a time series.

The TsFile Scheme defines a set of measurement for all devices, as shown in the table below (m1~m5)

TimedeviceIdm1m2m3m4m5
1device1123
2device1123
3device21345
4device21345
5device312345

Among them, Time and deviceId are built-in fields that do not need to be defined and can be written directly.

TsFile Design

File Structure

TsFile adopts a columnar storage design, similar to other file formats, primarily to optimize time-series data's storage efficiency and query performance. This design aligns with the nature of time series data, which often involves large volumes of similar data types recorded over time. However, TsFile was developed particularly with a structure of page, chunk, chunk group, and index:

TsFile Architecture

Encoding and Compression

TsFile employs advanced encoding and compression techniques to optimize storage and access for time series data. It uses methods like run-length encoding (RLE), bit-packing, and Snappy for efficient compression, allowing separate encoding of timestamp and value columns for better data processing. Its unique encoding algorithms are designed specifically for the characteristics of time series data in IoT scenarios, focusing on regular time intervals and the correlation among series.

Its uniqueness lies in the encoding algorithm designed specifically for time series data characteristics, focusing on the correlation between time attributes and data.

The table below compares 3 file formats in different dimensions.

TsFile, CSV and Parquet in Comparison

DimensionTsFileCSVParquet
Data ModelIoTPlainNested
Write ModeTablet, LineLineLine
CompressionYesNoYes
Read ModeQuery, ScanScanQuery
Index on SeriesYesNoNo
Index on TimeYesNoNo

Its development facilitates efficient data encoding, compression, and access, reflecting a deep understanding of industry needs, pioneering a path toward efficient, scalable, and flexible data analytics platforms.

Data TypeRecommended EncodingRecommended Compression
INT32TS_2DIFFLZ4
INT64TS_2DIFFLZ4
FLOATGORILLALZ4
DOUBLEGORILLALZ4
BOOLEANRLELZ4
TEXTDICTIONARYLZ4

more see Docs

Build and Use TsFile

Java

C++

Python