Initial commit
This commit is contained in:
@@ -0,0 +1,20 @@
|
||||
---
|
||||
|
||||
slug: /json-formatted-data-types
|
||||
---
|
||||
|
||||
# Overview of JSON data types
|
||||
|
||||
seekdb supports the JavaScript Object Notation (JSON) data type in compliance with the RFC 7159 standard. You can use it to store semi-structured JSON data and access or modify the data within JSON documents.
|
||||
|
||||
The JSON data type offers the following advantages:
|
||||
|
||||
* **Automatic validation**: JSON documents stored in JSON columns are automatically validated. Invalid documents will trigger an error.
|
||||
|
||||
* **Optimized storage format**: JSON documents stored in JSON columns are converted into an optimized format that enables fast reading and access. When the server reads a JSON value stored in binary format, it doesn't need to parse the value from text.
|
||||
|
||||
* **Semi-structured encoding**: This feature further reduces storage costs by splitting a JSON document into multiple sub-columns, with each sub-column encoded individually. This improves compression rates and reduces the storage space required for JSON data. For more information, see [Create a JSON value](200.create-a-json-value.md) and [Semi-structured encoding](600.json-semi-struct.md).
|
||||
|
||||
## References
|
||||
|
||||
* [Overview of JSON functions](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974794)
|
||||
@@ -0,0 +1,257 @@
|
||||
---
|
||||
|
||||
slug: /create-a-json-value
|
||||
---
|
||||
|
||||
# Create a JSON value
|
||||
|
||||
A JSON value must be one of the following: objects (JSON objects), arrays, strings, numbers, boolean values (true/false), or the null value. Note that false, true, and the null value must be in lowercase.
|
||||
|
||||
## JSON text structure
|
||||
|
||||
A JSON text structure includes characters, strings, numbers, and three literal names. Whitespace characters (spaces, horizontal tabs, line feeds, and carriage returns) are allowed before or after any structural character.
|
||||
|
||||
```sql
|
||||
begin-array = [ left square bracket
|
||||
|
||||
begin-object = { left curly bracket
|
||||
|
||||
end-array = ] right square bracket
|
||||
|
||||
end-object = } right curly bracket
|
||||
|
||||
name-separator = : colon
|
||||
|
||||
value-separator = , comma
|
||||
```
|
||||
|
||||
### Objects
|
||||
|
||||
An object is represented by a pair of curly brackets containing zero or more name/value pairs (also called members). Names within an object must be unique. Each name is a string followed by a colon that separates the name from its value. Multiple name/value pairs are separated by commas.
|
||||
Here is an example:
|
||||
|
||||
```sql
|
||||
{ "NAME": "SAM", "Height": 175, "Weight": 100, "Registered" : false}
|
||||
```
|
||||
|
||||
### Arrays
|
||||
|
||||
An array is represented by square brackets containing zero or more values (also called elements). Array elements are separated by commas, and values in an array do not need to be of the same type.
|
||||
|
||||
Here is an example:
|
||||
|
||||
```sql
|
||||
["abc", 10, null, true, false]
|
||||
```
|
||||
|
||||
### Numbers
|
||||
|
||||
Numbers use decimal format and contain an integer component that may optionally be prefixed with a minus sign (-). This can be followed by a fractional part and/or an exponent part. Leading zeros are not allowed. The fractional part consists of a decimal point followed by one or more digits. The exponent part begins with an uppercase or lowercase letter E, optionally followed by a plus (+) or minus (-) sign and one or more digits.
|
||||
|
||||
Here is an example:
|
||||
|
||||
```sql
|
||||
[100, 0, -100, 100.11, -12.11, 10.22e2, -10.22e2]
|
||||
```
|
||||
|
||||
### Strings
|
||||
|
||||
A string begins and ends with quotation marks ("). All Unicode characters can be placed within the quotation marks, except characters that must be escaped (including quotation marks, backslashes, and control characters).
|
||||
|
||||
JSON text must be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8.
|
||||
|
||||
Here is an example:
|
||||
|
||||
```sql
|
||||
{"Url": "http://www.example.com/image/481989943"}
|
||||
```
|
||||
|
||||
## Create JSON values
|
||||
|
||||
seekdb supports the following DDL operations on JSON types:
|
||||
|
||||
* Create tables with JSON columns.
|
||||
|
||||
* Add or drop JSON columns.
|
||||
|
||||
* Create indexes on generated columns based on JSON columns.
|
||||
|
||||
* Enable semi-structured encoding when creating tables.
|
||||
|
||||
* Enable semi-structured encoding on existing tables.
|
||||
|
||||
### Limitations
|
||||
|
||||
You can create multiple JSON columns in each table, with the following limitations:
|
||||
|
||||
* JSON columns cannot be used as `PRIMARY KEY`, `FOREIGN KEY`, or `UNIQUE KEY`, but you can add `NOT NULL` or `CHECK` constraints.
|
||||
|
||||
* JSON columns cannot have default values.
|
||||
|
||||
* JSON columns cannot be used as partitioning keys.
|
||||
|
||||
* The length of JSON data cannot exceed the length of `LONGTEXT`, and the maximum depth of each JSON object or array is 99.
|
||||
|
||||
### Examples
|
||||
|
||||
#### Create or modify JSON columns
|
||||
|
||||
```sql
|
||||
obclient> CREATE TABLE tbl1 (id INT PRIMARY KEY, docs JSON NOT NULL, docs1 JSON);
|
||||
Query OK, 0 rows affected
|
||||
|
||||
obclient> ALTER TABLE tbl1 MODIFY docs JSON CHECK(docs <'{"a" : 100}');
|
||||
Query OK, 0 rows affected
|
||||
|
||||
obclient> CREATE TABLE json_tab(
|
||||
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT COMMENT 'Primary key',
|
||||
json_info JSON COMMENT 'JSON data',
|
||||
json_id INT GENERATED ALWAYS AS (json_info -> '$.id') COMMENT 'Virtual field from JSON data',
|
||||
json_name VARCHAR(5) GENERATED ALWAYS AS (json_info -> '$.NAME'),
|
||||
index json_info_id_idx (json_id)
|
||||
)COMMENT 'Example JSON table';
|
||||
Query OK, 0 rows affected
|
||||
|
||||
obclient> ALTER TABLE json_tab ADD COLUMN json_info1 JSON;
|
||||
Query OK, 0 rows affected
|
||||
|
||||
obclient> ALTER TABLE json_tab ADD INDEX (json_name);
|
||||
Query OK, 0 rows affected
|
||||
|
||||
obclient> ALTER TABLE json_tab drop COLUMN json_info1;
|
||||
Query OK, 0 rows affected
|
||||
```
|
||||
|
||||
#### Create an index on a specific key using a generated column
|
||||
|
||||
```sql
|
||||
obclient> CREATE TABLE jn ( c JSON, g INT GENERATED ALWAYS AS (c->"$.id"));
|
||||
Query OK, 0 rows affected
|
||||
|
||||
obclient> CREATE INDEX idx1 ON jn(g);
|
||||
Query OK, 0 rows affected
|
||||
Records: 0 Duplicates: 0 Warnings: 0
|
||||
|
||||
obclient> INSERT INTO jn (c) VALUES
|
||||
('{"id": "1", "name": "Fred"}'), ('{"id": "2", "name": "Wilma"}'),
|
||||
('{"id": "3", "name": "Barney"}'), ('{"id": "4", "name": "Betty"}');
|
||||
Query OK, 4 rows affected
|
||||
Records: 4 Duplicates: 0 Warnings: 0
|
||||
|
||||
obclient> SELECT c->>"$.name" AS name FROM jn WHERE g <= 2;
|
||||
+-------+
|
||||
| name |
|
||||
+-------+
|
||||
| Fred |
|
||||
| Wilma |
|
||||
+-------+
|
||||
2 rows in set
|
||||
|
||||
obclient> EXPLAIN SELECT c->>"$.name" AS name FROM jn WHERE g <= 2\G
|
||||
*************************** 1. row ***************************
|
||||
Query Plan: =========================================
|
||||
|ID|OPERATOR |NAME |EST. ROWS|COST|
|
||||
-----------------------------------------
|
||||
|0 |TABLE SCAN|jemp(idx1)|2 |92 |
|
||||
=========================================
|
||||
|
||||
Outputs & filters:
|
||||
-------------------------------------
|
||||
0 - output([JSON_UNQUOTE(JSON_EXTRACT(jemp.c, '$.name'))]), filter(nil),
|
||||
access([jemp.c]), partitions(p0)
|
||||
|
||||
1 row in set
|
||||
```
|
||||
|
||||
#### Use semi-structured encoding
|
||||
|
||||
seekdb supports enabling semi-structured encoding when creating tables, primarily controlled by the table-level parameter `SEMISTRUCT_PROPERTIES`. You must also set `ROW_FORMAT=COMPRESSED` for the table, otherwise an error will occur:
|
||||
|
||||
* When `SEMISTRUCT_PROPERTIES=(encoding_type=encoding)`, the table is considered a semi-structured table, meaning all JSON columns in the table will have semi-structured encoding enabled.
|
||||
* When `SEMISTRUCT_PROPERTIES=(encoding_type=none)`, the table is considered a structured table.
|
||||
* You can also set the frequency threshold using the `freq_threshold` parameter.
|
||||
* Currently, `encoding_type` and `freq_threshold` can only be modified using online DDL statements, not offline DDL statements.
|
||||
|
||||
1. Enable semi-structured encoding.
|
||||
|
||||
:::tip
|
||||
If you enable semi-structured encoding, make sure that the parameter <a href="https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971939">micro_block_merge_verify_level</a> is set to the default value <code>2</code>. Do not disable micro-block major compaction verification.
|
||||
:::
|
||||
|
||||
:::tab
|
||||
tab Example: Enable semi-structured encoding during table creation
|
||||
|
||||
```sql
|
||||
CREATE TABLE t1( j json)
|
||||
ROW_FORMAT=COMPRESSED
|
||||
SEMISTRUCT_PROPERTIES=(encoding_type=encoding, freq_threshold=50);
|
||||
```
|
||||
|
||||
For more information about the syntax, see [CREATE TABLE](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974140).
|
||||
|
||||
tab Example: Enable semi-structured encoding for existing table
|
||||
|
||||
```sql
|
||||
CREATE TABLE t1(j json);
|
||||
ALTER TABLE t1 SET ROW_FORMAT=COMPRESSED SEMISTRUCT_PROPERTIES = (encoding_type=encoding, freq_threshold=50);
|
||||
```
|
||||
|
||||
For more information about the syntax, see [ALTER TABLE](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974126).
|
||||
|
||||
Some modification limitations:
|
||||
|
||||
* If semi-structured encoding is not enabled, modifying the frequent column threshold will not report an error but will have no effect.
|
||||
* The `freq_threshold` parameter cannot be modified during direct load operations or when the table is locked.
|
||||
* Modifying one sub-parameter does not affect the others.
|
||||
:::
|
||||
|
||||
2. Disable semi-structured encoding.
|
||||
|
||||
When `SEMISTRUCT_PROPERTIES` is set to `(encoding_type=none)`, semi-structured encoding is disabled. This operation does not affect existing data and only applies to data written afterward. Here is an example of disabling semi-structured encoding:
|
||||
|
||||
```sql
|
||||
ALTER TABLE t1 SET ROW_FORMAT=COMPRESSED SEMISTRUCT_PROPERTIES = (encoding_type=none);
|
||||
```
|
||||
|
||||
3. Query semi-structured encoding configuration.
|
||||
|
||||
Use the `SHOW CREATE TABLE` statement to query the semi-structured encoding configuration. Here is an example statement:
|
||||
|
||||
```sql
|
||||
SHOW CREATE TABLE t1;
|
||||
```
|
||||
|
||||
The result is as follows:
|
||||
|
||||
```shell
|
||||
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| Table | Create Table |
|
||||
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| t1 | CREATE TABLE `t1` (
|
||||
`j` json DEFAULT NULL
|
||||
) ORGANIZATION INDEX DEFAULT CHARSET = utf8mb4 ROW_FORMAT = COMPRESSED COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE ENABLE_MACRO_BLOCK_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 SEMISTRUCT_PROPERTIES=(ENCODING_TYPE=ENCODING, FREQ_THRESHOLD=50) |
|
||||
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
When `SEMISTRUCT_PROPERTIES=(encoding_type=encoding)` is specified, the query displays this parameter, indicating that semi-structured encoding is enabled.
|
||||
|
||||
Using semi-structured encoding can improve the performance of conditional filtering queries with the [JSON_VALUE() function](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001975890). Based on JSON semi-structured encoding technology, seekdb optimizes the performance of `JSON_VALUE` expression conditional filtering query scenarios. Since JSON data is split into sub-columns, the system can filter directly based on the encoded sub-column data without reconstructing the complete JSON structure, significantly improving query efficiency.
|
||||
|
||||
Here is an example query:
|
||||
|
||||
```sql
|
||||
-- Query rows where the value of the name field is 'Devin'
|
||||
SELECT * FROM t WHERE JSON_VALUE(j_doc, '$.name' RETURNING CHAR) = 'Devin';
|
||||
```
|
||||
|
||||
Character set considerations:
|
||||
|
||||
- seekdb uses `utf8_bin` encoding for JSON.
|
||||
|
||||
- To ensure string whitebox filtering works properly, we recommend the following settings:
|
||||
|
||||
```sql
|
||||
SET @@collation_server = 'utf8mb4_bin';
|
||||
SET @@collation_connection='utf8mb4_bin';
|
||||
```
|
||||
@@ -0,0 +1,174 @@
|
||||
---
|
||||
|
||||
slug: /querying-and-modifying-json-values
|
||||
---
|
||||
|
||||
# Query and modify JSON values
|
||||
|
||||
seekdb supports querying and referencing JSON values. Using path expressions, you can extract or modify specific portions of a JSON document.
|
||||
|
||||
## Reference JSON values
|
||||
|
||||
seekdb provides two methods for querying and referencing JSON values:
|
||||
|
||||
* Use the `->` operator to return a key's value with double quotes in JSON data.
|
||||
|
||||
* Use the `->>` operator to return a key's value without double quotes in JSON data.
|
||||
|
||||
Examples:
|
||||
|
||||
```sql
|
||||
obclient> SELECT c->"$.name" AS name FROM jn WHERE g <= 2;
|
||||
+---------+
|
||||
| name |
|
||||
+---------+
|
||||
| "Fred" |
|
||||
| "Wilma" |
|
||||
+---------+
|
||||
2 rows in set
|
||||
|
||||
obclient> SELECT c->>"$.name" AS name FROM jn WHERE g <= 2;
|
||||
+-------+
|
||||
| name |
|
||||
+-------+
|
||||
| Fred |
|
||||
| Wilma |
|
||||
+-------+
|
||||
2 rows in set
|
||||
|
||||
obclient> SELECT JSON_UNQUOTE(c->'$.name') AS name
|
||||
FROM jn WHERE g <= 2;
|
||||
+-------+
|
||||
| name |
|
||||
+-------+
|
||||
| Fred |
|
||||
| Wilma |
|
||||
+-------+
|
||||
2 rows in set
|
||||
```
|
||||
|
||||
Because JSON documents are hierarchical, JSON functions use path expressions to extract or modify portions of a document and to specify where in the document the operation should occur.
|
||||
|
||||
seekdb uses a path syntax consisting of a leading `$` character followed by a selector to represent the JSON document being accessed. The selector types are as follows:
|
||||
|
||||
* The `.` symbol represents the key name to access. Unquoted names are not valid in path expressions (for example, names containing spaces), so key names must be enclosed in double quotes.
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
obclient> SELECT JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name');
|
||||
+---------------------------------------------------------+
|
||||
| JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name') |
|
||||
+---------------------------------------------------------+
|
||||
| "Aztalan" |
|
||||
+---------------------------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
* The `[N]` symbol is placed after the path of the selected array and represents the value at position N in the array, where N is a non-negative integer. Array positions are zero-indexed. If `path` does not select an array value, then `path[0]` evaluates to the same value as `path`.
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
obclient> SELECT JSON_SET('"x"', '$[0]', 'a');
|
||||
+------------------------------+
|
||||
| JSON_SET('"x"', '$[0]', 'a') |
|
||||
+------------------------------+
|
||||
| "a" |
|
||||
+------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
* The `[M to N]` symbol specifies a subset or range of array values, starting from position M and ending at position N.
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
obclient> SELECT JSON_EXTRACT('[1, 2, 3, 4, 5]', '$[1 to 3]');
|
||||
+----------------------------------------------+
|
||||
| JSON_EXTRACT('[1, 2, 3, 4, 5]', '$[1 to 3]') |
|
||||
+----------------------------------------------+
|
||||
| [2, 3, 4] |
|
||||
+----------------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
* Path expressions can also include `*` or `**` wildcard characters:
|
||||
|
||||
* `.[*]` represents the values of all members in a JSON object.
|
||||
|
||||
* `[*]` represents the values of all elements in a JSON array.
|
||||
|
||||
* `prefix**suffix` represents all paths that begin with the specified prefix and end with the specified suffix. The prefix is optional, but the suffix is required. Using `**` or `***` alone to match arbitrary paths is not allowed.
|
||||
|
||||
:::info
|
||||
Paths that do not exist in the document (evaluating to non-existent data) evaluate to <code>NULL</code>.
|
||||
:::
|
||||
|
||||
## Modify JSON values
|
||||
|
||||
seekdb also supports modifying complete JSON values using DML statements, and using the JSON_SET(), JSON_REPLACE(), or JSON_REMOVE() functions in `UPDATE` statements to modify partial JSON values.
|
||||
|
||||
Examples:
|
||||
|
||||
```sql
|
||||
// Insert complete data.
|
||||
INSERT INTO json_tab(json_info) VALUES ('[1, {"a": "b"}, [2, "qwe"]]');
|
||||
|
||||
// Insert partial data.
|
||||
UPDATE json_tab SET json_info=JSON_ARRAY_APPEND(json_info, '$', 2) WHERE id=1;
|
||||
|
||||
// Update complete data.
|
||||
UPDATE json_tab SET json_info='[1, {"a": "b"}]';
|
||||
|
||||
// Update partial data.
|
||||
UPDATE json_tab SET json_info=JSON_REPLACE(json_info, '$[2]', 'aaa') WHERE id=1;
|
||||
|
||||
// Delete data.
|
||||
DELETE FROM json_tab WHERE id=1;
|
||||
|
||||
// Update partial data using a function.
|
||||
UPDATE json_tab SET json_info=JSON_REMOVE(json_info, '$[2]') WHERE id=1;
|
||||
```
|
||||
|
||||
## JSON path syntax
|
||||
|
||||
A path consists of a scope and one or more path segments. For paths used in JSON functions, the scope is the document being searched or otherwise operated on, represented by the leading `$` character.
|
||||
|
||||
Path segments are separated by periods (.). Array elements are represented by `[N]`, where N is a non-negative integer. Key names must be either double-quoted strings or valid ECMAScript identifiers.
|
||||
|
||||
Path expressions (like JSON text) should be encoded using the ascii, utf8, or utf8mb4 character set. Other character encodings are implicitly converted to utf8mb4.
|
||||
|
||||
The complete syntax is as follows:
|
||||
|
||||
```sql
|
||||
pathExpression: // Path expression
|
||||
scope[(pathLeg)*] // Scope is represented by the leading $ character
|
||||
|
||||
pathLeg:
|
||||
member | arrayLocation | doubleAsterisk
|
||||
|
||||
member:
|
||||
period ( keyName | asterisk )
|
||||
|
||||
arrayLocation:
|
||||
leftBracket ( nonNegativeInteger | asterisk ) rightBracket
|
||||
|
||||
keyName:
|
||||
ESIdentifier | doubleQuotedString
|
||||
|
||||
doubleAsterisk:
|
||||
'**'
|
||||
|
||||
period:
|
||||
'.'
|
||||
|
||||
asterisk:
|
||||
'*'
|
||||
|
||||
leftBracket:
|
||||
'['
|
||||
|
||||
rightBracket:
|
||||
']'
|
||||
```
|
||||
@@ -0,0 +1,54 @@
|
||||
---
|
||||
|
||||
slug: /json-formatted-data-type-conversion
|
||||
---
|
||||
|
||||
# Convert JSON data types
|
||||
|
||||
seekdb supports the CAST function for converting between JSON and other data types.
|
||||
|
||||
The following table describes the conversion rules for JSON data types.
|
||||
|
||||
| Other data types | CAST(other_type AS JSON) | CAST(JSON AS other_type) |
|
||||
|-------------------------------------|---------------------------------------------|----------------------------------------------------------|
|
||||
| JSON | No change. | No change. |
|
||||
| UTF-8 character types (including utf8mb4, utf8, and ascii) | The characters are converted to JSON values and validated. | The data is serialized into utf8mb4 strings. |
|
||||
| Other character sets | First converted to utf8mb4 encoding, then processed as UTF-8 character type. | First serialized into utf8mb4-encoded strings, then converted to the corresponding character set. |
|
||||
| NULL | An empty JSON value is returned. | Not applicable. |
|
||||
| Other types | Only scalar values are converted to JSON values containing that single value. | If the JSON value contains only one scalar value that matches the target type, it is converted to the corresponding type; otherwise, NULL is returned and a warning is issued. |
|
||||
|
||||
:::info
|
||||
<code>other_type</code> specifies a data type other than JSON.
|
||||
:::
|
||||
|
||||
Here are some conversion examples:
|
||||
|
||||
```sql
|
||||
obclient> SELECT CAST("123" AS JSON);
|
||||
+---------------------+
|
||||
| CAST("123" AS JSON) |
|
||||
+---------------------+
|
||||
| 123 |
|
||||
+---------------------+
|
||||
1 row in set
|
||||
|
||||
obclient> SELECT CAST(null AS JSON);
|
||||
+--------------------+
|
||||
| CAST(null AS JSON) |
|
||||
+--------------------+
|
||||
| NULL |
|
||||
+--------------------+
|
||||
1 row in set
|
||||
|
||||
CREATE TABLE tj1 (c1 JSON,c2 VARCHAR(20));
|
||||
INSERT INTO tj1 VALUES ('{"id": 17, "color": "red"}','apple'),('{"id": 18, "color": "yellow"}', 'banana'),('{"id": 16, "color": "orange"}','orange');
|
||||
obclient> SELECT * FROM tj1 ORDER BY CAST(JSON_EXTRACT(c1, '$.id') AS UNSIGNED);
|
||||
+-------------------------------+--------+
|
||||
| c1 | c2 |
|
||||
+-------------------------------+--------+
|
||||
| {"id": 16, "color": "orange"} | orange |
|
||||
| {"id": 17, "color": "red"} | apple |
|
||||
| {"id": 18, "color": "yellow"} | banana |
|
||||
+-------------------------------+--------+
|
||||
3 rows in set
|
||||
```
|
||||
@@ -0,0 +1,328 @@
|
||||
---
|
||||
|
||||
slug: /json-partial-update
|
||||
---
|
||||
|
||||
# Partial JSON data updates
|
||||
|
||||
seekdb supports partial JSON data updates (JSON Partial Update). When only specific fields in a JSON document need to be modified, this feature allows you to update only the changed portions without having to update the entire JSON document.
|
||||
|
||||
## Limitations
|
||||
|
||||
## Enable or disable JSON Partial Update
|
||||
|
||||
The JSON Partial Update feature in seekdb is disabled by default. It is controlled by the system variable `log_row_value_options`. For more information, see [log_row_value_options](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001972193).
|
||||
|
||||
**Here are some examples:**
|
||||
|
||||
* Enable the JSON Partial Update feature.
|
||||
|
||||
* Session level:
|
||||
|
||||
```sql
|
||||
SET log_row_value_options="partial_json";
|
||||
```
|
||||
|
||||
* Global level:
|
||||
|
||||
```sql
|
||||
SET GLOBAL log_row_value_options="partial_json";
|
||||
```
|
||||
|
||||
* Disable the JSON Partial Update feature.
|
||||
|
||||
* Session level:
|
||||
|
||||
```sql
|
||||
SET log_row_value_options="";
|
||||
```
|
||||
|
||||
* Global level:
|
||||
|
||||
```sql
|
||||
SET GLOBAL log_row_value_options="";
|
||||
```
|
||||
|
||||
* Query the value of `log_row_value_options`.
|
||||
|
||||
```sql
|
||||
SHOW VARIABLES LIKE 'log_row_value_options';
|
||||
```
|
||||
|
||||
The result is as follows:
|
||||
|
||||
```sql
|
||||
+-----------------------+-------+
|
||||
| Variable_name | Value |
|
||||
+-----------------------+-------+
|
||||
| log_row_value_options | |
|
||||
+-----------------------+-------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
## JSON expressions for partial updates
|
||||
|
||||
In addition to the JSON Partial Update feature switch `log_row_value_options`, you must use specific expressions to update JSON documents to trigger JSON Partial Update.
|
||||
|
||||
The following JSON expressions in seekdb currently support partial updates:
|
||||
|
||||
* json_set or json_replace: updates the value of a JSON field.
|
||||
* json_remove: deletes a JSON field.
|
||||
|
||||
:::tip
|
||||
<ol><li>Ensure that the left operand of the <code>SET</code> assignment clause and the first parameter of the JSON expression are the same and both are JSON columns in the table. For example, in <code>j = json_replace(j, '$.name', 'ab')</code>, the parameter on the left side of the equals sign and the first parameter of the JSON expression <code>json_replace</code> on the right side are both <code>j</code>.</li><li>JSON Partial Update is only triggered when the current JSON column data is stored as <code>outrow</code>. Whether data is stored as <code>outrow</code> or <code>inrow</code> is controlled by the <code>lob_inrow_threshold</code> parameter when creating the table. <code>lob_inrow_threshold</code> is used to configure the <code>INROW</code> threshold. When the LOB data size exceeds this threshold, it is stored as <code>OUTROW</code> in the LOB Meta table. The default value is 4 KB.</li></ol>
|
||||
:::
|
||||
|
||||
**Examples:**
|
||||
|
||||
1. Create a table named `json_test`.
|
||||
|
||||
```sql
|
||||
CREATE TABLE json_test(pk INT PRIMARY KEY, j JSON);
|
||||
```
|
||||
|
||||
2. Insert data.
|
||||
|
||||
```sql
|
||||
INSERT INTO json_test VALUES(1, CONCAT('{"name": "John", "content": "', repeat('x',8), '"}'));
|
||||
```
|
||||
|
||||
The result is as follows:
|
||||
|
||||
```shell
|
||||
Query OK, 1 row affected
|
||||
```
|
||||
|
||||
3. Query the data in the JSON column `j`.
|
||||
|
||||
```sql
|
||||
SELECT j FROM json_test;
|
||||
```
|
||||
|
||||
The result is as follows:
|
||||
|
||||
```shell
|
||||
+-----------------------------------------+
|
||||
| j |
|
||||
+-----------------------------------------+
|
||||
| {"name": "John", "content": "xxxxxxxx"} |
|
||||
+-----------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
4. Use `json_repalce` to update the value of the `name` field in the JSON column.
|
||||
|
||||
```sql
|
||||
UPDATE json_test SET j = json_replace(j, '$.name', 'ab') WHERE pk = 1;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
Query OK, 1 row affected
|
||||
Rows matched: 1 Changed: 1 Warnings: 0
|
||||
```
|
||||
|
||||
5. Query the modified data in JSON column `j`.
|
||||
|
||||
```sql
|
||||
SELECT j FROM json_test;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
+---------------------------------------+
|
||||
| j |
|
||||
+---------------------------------------+
|
||||
| {"name": "ab", "content": "xxxxxxxx"} |
|
||||
+---------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
6. Use `json_set` to update the value of the `name` field in the JSON column.
|
||||
|
||||
```sql
|
||||
UPDATE json_test SET j = json_set(j, '$.name', 'cd') WHERE pk = 1;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
Query OK, 1 row affected
|
||||
Rows matched: 1 Changed: 1 Warnings: 0
|
||||
```
|
||||
|
||||
7. Query the modified data in JSON column `j`.
|
||||
|
||||
```sql
|
||||
SELECT j FROM json_test;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
+---------------------------------------+
|
||||
| j |
|
||||
+---------------------------------------+
|
||||
| {"name": "cd", "content": "xxxxxxxx"} |
|
||||
+---------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
8. Use `json_remove` to delete the `name` field value in the JSON column.
|
||||
|
||||
```sql
|
||||
UPDATE json_test SET j = json_remove(j, '$.name') WHERE pk = 1;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
Query OK, 1 row affected
|
||||
Rows matched: 1 Changed: 1 Warnings: 0
|
||||
```
|
||||
|
||||
9. Query the modified data in JSON column `j`.
|
||||
|
||||
```sql
|
||||
SELECT j FROM json_test;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
+-------------------------+
|
||||
| j |
|
||||
+-------------------------+
|
||||
| {"content": "xxxxxxxx"} |
|
||||
+-------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
## Granularity of updates
|
||||
|
||||
JSON data in seekdb is stored based on LOB storage, and LOBs in seekdb are stored in chunks at the underlying level. Therefore, the minimum data amount for each partial update is one LOB chunk. The smaller the LOB chunk, the smaller the amount of data written. A DDL syntax is provided to set the LOB chunk size, which can be specified when creating a column.
|
||||
|
||||
**Example:**
|
||||
|
||||
```sql
|
||||
CREATE TABLE json_test(pk INT PRIMARY KEY, j JSON CHUNK '4k');
|
||||
```
|
||||
|
||||
The chunk size cannot be infinitely small, as too small a size will affect the performance of `SELECT`, `INSERT`, and `DELETE` operations. It is generally recommended to set it based on the average field size of JSON documents. If most fields are very small, you can set it to 1K. To optimize LOB type reads, seekdb stores data smaller than 4K directly as `INROW`, in which case partial update will not be performed. Partial Update is mainly intended to improve the performance of updating large documents; for small documents, full updates actually perform better.
|
||||
|
||||
## Rebuild
|
||||
|
||||
JSON Partial Update does not impose restrictions on the data length before and after updating a JSON column. When the length of the new value is less than or equal to the length of the old value, the data at the original location is directly replaced with the new data. When the length of the new value is greater than the length of the old value, the new data is appended at the end. seekdb sets a threshold: when the length of the appended data exceeds 30% of the original data length, a rebuild is triggered. In this case, Partial Update is not performed; instead, a full overwrite is performed.
|
||||
|
||||
You can use the `JSON_STORAGE_SIZE` expression to get the actual storage length of JSON data, and `JSON_STORAGE_FREE` to get the additional storage overhead.
|
||||
|
||||
**Example:**
|
||||
|
||||
1. Enable JSON Partial Update.
|
||||
|
||||
```sql
|
||||
SET log_row_value_options = "partial_json";
|
||||
```
|
||||
|
||||
2. Create a test table named `json_test`.
|
||||
|
||||
```sql
|
||||
CREATE TABLE json_test(pk INT PRIMARY KEY, j JSON CHUNK '1K');
|
||||
```
|
||||
|
||||
3. Insert a row of data into the `json_test` table.
|
||||
|
||||
```sql
|
||||
INSERT INTO json_test VALUES(10 , json_object('name', 'zero', 'age', 100, 'position', 'software engineer', 'profile', repeat('x', 4096), 'like', json_array('a', 'b', 'c'), 'tags', json_array('sql boy', 'football', 'summer', 1), 'money' , json_object('RMB', 10000, 'Dollers', 20000, 'BTC', 100), 'nickname', 'noone'));
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
Query OK, 1 row affected
|
||||
```
|
||||
|
||||
4. Use `JSON_STORAGE_SIZE` to query the storage size of the JSON column (actual occupied storage space) and `JSON_STORAGE_FREE` to estimate the storage space that can be freed from the JSON column.
|
||||
|
||||
```sql
|
||||
SELECT JSON_STORAGE_SIZE(j), JSON_STORAGE_FREE(j) FROM json_test WHERE pk = 10;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
+----------------------+----------------------+
|
||||
| JSON_STORAGE_SIZE(j) | JSON_STORAGE_FREE(j) |
|
||||
+----------------------+----------------------+
|
||||
| 4335 | 0 |
|
||||
+----------------------+----------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
Since no partial update has been performed, the value of `JSON_STORAGE_FREE` is 0.
|
||||
|
||||
5. Use `json_replace` to update the value of the `position` field in the JSON column, where the length of the new value is less than the length of the old value.
|
||||
|
||||
```sql
|
||||
UPDATE json_test SET j = json_replace(j, '$.position', 'software enginee') WHERE pk = 10;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
Query OK, 1 row affected
|
||||
Rows matched: 1 Changed: 1 Warnings: 0
|
||||
```
|
||||
|
||||
6. Again, use `JSON_STORAGE_SIZE` to query the storage size of the JSON column and `JSON_STORAGE_FREE` to estimate the storage space that can be freed from the JSON column.
|
||||
|
||||
```sql
|
||||
SELECT JSON_STORAGE_SIZE(j), JSON_STORAGE_FREE(j) FROM json_test WHERE pk = 10;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
+----------------------+----------------------+
|
||||
| JSON_STORAGE_SIZE(j) | JSON_STORAGE_FREE(j) |
|
||||
+----------------------+----------------------+
|
||||
| 4335 | 1 |
|
||||
+----------------------+----------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
After the JSON column data is updated, since the new data is one byte less than the old data, the `JSON_STORAGE_FREE` result is 1.
|
||||
|
||||
7. Use `json_replace` to update the value of the `position` field in the JSON column, where the length of the new value is greater than the length of the old value.
|
||||
|
||||
```sql
|
||||
UPDATE json_test SET j = json_replace(j, '$.position', 'software engineer') WHERE pk = 10;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
Query OK, 1 row affected
|
||||
Rows matched: 1 Changed: 1 Warnings: 0
|
||||
```
|
||||
|
||||
8. Use `JSON_STORAGE_SIZE` again to query the JSON column storage size, and `JSON_STORAGE_FREE` to estimate the storage space that can be freed from the JSON column.
|
||||
|
||||
```sql
|
||||
SELECT JSON_STORAGE_SIZE(j), JSON_STORAGE_FREE(j) FROM json_test WHERE pk = 10;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```shell
|
||||
+----------------------+----------------------+
|
||||
| JSON_STORAGE_SIZE(j) | JSON_STORAGE_FREE(j) |
|
||||
+----------------------+----------------------+
|
||||
| 4355 | 19 |
|
||||
+----------------------+----------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
After appending new data to the JSON column, the length of `JSON_STORAGE_FREE` is 19, indicating that 19 bytes can be freed after a rebuild.
|
||||
@@ -0,0 +1,124 @@
|
||||
---
|
||||
|
||||
slug: /json-semi-struct
|
||||
---
|
||||
|
||||
# Semi-structured encoding
|
||||
|
||||
This topic describes the semi-structured encoding feature supported by seekdb.
|
||||
|
||||
seekdb supports enabling semi-structured encoding when creating tables, primarily controlled by the table-level parameter `SEMISTRUCT_PROPERTIES`. You must also set `ROW_FORMAT=COMPRESSED` for the table, otherwise an error will occur.
|
||||
|
||||
## Considerations
|
||||
|
||||
* When `SEMISTRUCT_PROPERTIES=(encoding_type=encoding)`, the table is considered a semi-structured table, meaning all JSON columns in the table will have semi-structured encoding enabled.
|
||||
* When `SEMISTRUCT_PROPERTIES=(encoding_type=none)`, the table is considered a structured table.
|
||||
* You can also set the frequency threshold using the `freq_threshold` parameter. When semi-structured encoding is enabled, the system analyzes the frequency of each path in the JSON data and stores paths with frequencies exceeding the specified threshold as independent subcolumns, known as frequent columns. For example, if you have a user table where the JSON field stores user information and 90% of users have the `name` and `age` fields, the system will automatically extract `name` and `age` as independent frequent columns. During queries, these columns are accessed directly without parsing the entire JSON, thereby improving query performance.
|
||||
* Currently, `encoding_type` and `freq_threshold` can only be modified using online DDL statements, not offline DDL statements.
|
||||
|
||||
## Data format
|
||||
|
||||
JSON data is split and stored as structured columns in a specific format. The columns split from JSON columns are called subcolumns. Subcolumns can be categorized into different types, including sparse columns and frequent columns.
|
||||
|
||||
* Sparse columns: Subcolumns that exist in some JSON documents but not in others, with an occurrence frequency lower than the threshold specified by the table-level parameter `freq_threshold`.
|
||||
* Frequent columns: Subcolumns that appear in JSON data with a frequency higher than the threshold specified by the table-level parameter `freq_threshold`. These subcolumns are stored as independent columns to improve filtering query performance.
|
||||
|
||||
For example:
|
||||
|
||||
```sql
|
||||
{"id": 1001, "name": "n1", "nickname": "nn1"}
|
||||
{"id": 1002, "name": "n2", "nickname": "nn2"}
|
||||
{"id": 1003, "name": "n3", "nickname": "nn3"}
|
||||
{"id": 1004, "name": "n4", "nickname": "nn4"}
|
||||
{"id": 1005, "name": "n5"}
|
||||
```
|
||||
|
||||
In this example, `id` and `name` are fields that exist in every JSON document with an occurrence frequency of 100%, while `nickname` exists in only four JSON documents with an occurrence frequency of 80%.
|
||||
|
||||
If `freq_threshold` is set to 100%, then `nickname` will be inferred as a sparse column, while `id` and `name` will be inferred as frequent columns. If set to 80%, then `nickname`, `id`, and `name` will all be inferred as frequent columns.
|
||||
|
||||
## Examples
|
||||
|
||||
1. Enable semi-structured encoding.
|
||||
|
||||
:::tip
|
||||
If you enable semi-structured encoding, make sure that the parameter <a href="https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971939">micro_block_merge_verify_level</a> is set to the default value <code>2</code>. Do not disable micro-block major compaction verification.
|
||||
:::
|
||||
|
||||
:::tab
|
||||
tab Example: Enable semi-structured encoding during table creation
|
||||
|
||||
```sql
|
||||
CREATE TABLE t1( j json)
|
||||
ROW_FORMAT=COMPRESSED
|
||||
SEMISTRUCT_PROPERTIES=(encoding_type=encoding, freq_threshold=50);
|
||||
```
|
||||
|
||||
For more information about the syntax, see [CREATE TABLE](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974140).
|
||||
|
||||
tab Example: Enable semi-structured encoding for existing table
|
||||
|
||||
```sql
|
||||
CREATE TABLE t1(j json);
|
||||
ALTER TABLE t1 SET ROW_FORMAT=COMPRESSED SEMISTRUCT_PROPERTIES = (encoding_type=encoding, freq_threshold=50);
|
||||
```
|
||||
|
||||
For more information about the syntax, see [ALTER TABLE](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974126).
|
||||
|
||||
Some modification limitations:
|
||||
|
||||
* If semi-structured encoding is not enabled, modifying the frequent column threshold will not report an error but will have no effect.
|
||||
* The `freq_threshold` parameter cannot be modified during direct load operations or when the table is locked.
|
||||
* Modifying one sub-parameter does not affect the others.
|
||||
:::
|
||||
|
||||
2. Disable semi-structured encoding.
|
||||
|
||||
When `SEMISTRUCT_PROPERTIES` is set to `(encoding_type=none)`, semi-structured encoding is disabled. This operation does not affect existing data and only applies to data written afterward. Here is an example of disabling semi-structured encoding:
|
||||
|
||||
```sql
|
||||
ALTER TABLE t1 SET ROW_FORMAT=COMPRESSED SEMISTRUCT_PROPERTIES = (encoding_type=none);
|
||||
```
|
||||
|
||||
3. Query semi-structured encoding configuration.
|
||||
|
||||
Use the `SHOW CREATE TABLE` statement to query the semi-structured encoding configuration. Here is an example statement:
|
||||
|
||||
```sql
|
||||
SHOW CREATE TABLE t1;
|
||||
```
|
||||
|
||||
The result is as follows:
|
||||
|
||||
```shell
|
||||
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| Table | Create Table |
|
||||
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| t1 | CREATE TABLE `t1` (
|
||||
`j` json DEFAULT NULL
|
||||
) ORGANIZATION INDEX DEFAULT CHARSET = utf8mb4 ROW_FORMAT = COMPRESSED COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE ENABLE_MACRO_BLOCK_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 SEMISTRUCT_PROPERTIES=(ENCODING_TYPE=ENCODING, FREQ_THRESHOLD=50) |
|
||||
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
When `SEMISTRUCT_PROPERTIES=(encoding_type=encoding)` is specified, the query displays this parameter, indicating that semi-structured encoding is enabled.
|
||||
|
||||
Using semi-structured encoding can improve the performance of conditional filtering queries with the [JSON_VALUE() function](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001975890). Based on JSON semi-structured encoding technology, seekdb optimizes the performance of `JSON_VALUE` expression conditional filtering query scenarios. Since JSON data is split into sub-columns, the system can filter directly based on the encoded sub-column data without reconstructing the complete JSON structure, significantly improving query efficiency.
|
||||
|
||||
Here is an example query:
|
||||
|
||||
```sql
|
||||
-- Query rows where the value of the name field is 'Devin'
|
||||
SELECT * FROM t WHERE JSON_VALUE(j_doc, '$.name' RETURNING CHAR) = 'Devin';
|
||||
```
|
||||
|
||||
Character set considerations:
|
||||
|
||||
- seekdb uses `utf8_bin` encoding for JSON.
|
||||
|
||||
- To ensure string whitebox filtering works properly, we recommend the following settings:
|
||||
|
||||
```sql
|
||||
SET @@collation_server = 'utf8mb4_bin';
|
||||
SET @@collation_connection='utf8mb4_bin';
|
||||
```
|
||||
Reference in New Issue
Block a user