Initial commit

2025-11-30 09:02:54 +08:00
commit ca2c452ded
5 changed files with 300 additions and 0 deletions
--- a/skills/activations/SKILL.md
+++ b/skills/activations/SKILL.md
@@ -0,0 +1,133 @@
+---
+name: activations
+description: Query activation logs to check for errors and view volume
+---
+
+# Activations
+
+Treasue Data's real time services include digital marketing activations. These are sent to various destinations. This skill can be used to query the database table that stores activations logs for both successful and failed activations, which is a source of useful information for the digital marketer.
+
+# Description
+
+## Requirements
+
+In order to query the activation log we must know the parent segment the customer is interested in. A customer may have a number of parent segments so we must ask them to provide the one they are interested before making a query. A segment ID will be a numeric value like 411671.
+
+The user must also have a correctly configured Treasure Data mcp server to enable the database lookup @treasuredata/mcp-server
+
+In addition the api key with appropriate access to the database table should be available and configured.
+
+## Database
+
+The database name contains the parent segment ID and has this format cdp_audience_394649_rt. This is where you can plug in the parent segment the user gives in the request.
+
+## Activations table
+
+The activation log table is always called activations and is in the Parent Segment real time database.
+
+## Schema
+
+The activations table has the following schema. The format is of the schema below is:
+
+1. Schema column index (example 1)
+2. Column name (example time)
+3. Query Column name (example time) (generally can be ignored for activations)
+4. Data type (string, time, int)
+
+1
+time
+time
+int
+2
+delivered
+delivered
+string
+--
+3
+status
+status
+long
+--
+4
+timestamp
+timestamp
+long
+--
+5
+activation_type
+activation_type
+string
+--
+6
+log_time
+log_time
+long
+--
+7
+journey_name
+journey_name
+string
+--
+8
+journey_stage_name
+journey_stage_name
+string
+--
+9
+activation_name
+activation_name
+string
+--
+10
+rid
+rid
+string
+--
+11
+error
+error
+string
+--
+12
+activation_id
+activation_id
+string
+--
+13
+event_id
+event_id
+string
+--
+14
+response
+response
+string
+--
+
+For the purposes of making queries and talking about activations with the user here are the meanings of the columns.
+
+time - epoch time of the activation send attempt
+delivered - true if successfully sent and false otherwise
+status - numeric status code corresponding to http response return codes
+timestamp 
+activation_type - the text name of the activation. td_webhook_out for example
+log_time - 
+journey_name - The real time journey this is related to. For example journey_11738
+journey_stage_name - The stage of the real time journey. For example journey_stage_15055.
+activation_name - Customer's name for the activation. For example first activation_14136
+error - When an error occurs the text will be here. For example "AxiosError: Request failed with status code 404"
+activation_id - Unique identifier of the activation.
+event_id - Unique identifier for the event which triggered the activation.
+response - Contains text that was returned from the activation server. This occurs whether or not the request succeeded. Example `{"success":false,"error":{"message":"Token \"c58dd43b-5bbc-4d21-a81a-e8c5643bcc18\" not found","id":""}}` It can be useful to show the response when errors are involved.
+
+## Query tips
+
+If the user does not specify a time range assume the last 24 hours. Use a where clause like `TD_INTERVAL(time, '-1d/now')`
+When users are asking "What kind of errors are occuring" focus on unique error status and error texts so they can quickly identify issues.
+Provide summary data where appropriate. For example the customer may ask for "Which journeys are triggering the most activations", or "group the number of activations by journey". Use unique and other SQL constructs to make useful aggregations on the data.
+
+
+
+
+
+
--- a/skills/identity/SKILL.md
+++ b/skills/identity/SKILL.md
@@ -0,0 +1,103 @@
+---
+name: identity
+description: Query identity change logs to explore profile creation and merging
+---
+
+# Identity
+
+This skill is used when the user asks about identity stitching, id stitching or wants to explore real time profiles. It involves querying a table called id_changes which is a log updated by the application when id changes occur.
+
+Treasure Data's real time service includes creation and management of real time profiles. Profiles are updated based on a stream of event records that contain one or more "id stitching key" properties. The identity of a profile consists of both an internal Rid (real time Id) and one or more stitching keys. The system creates new profiles when the keys are all unique. It updates a profile when a new key is found that was not previously associated with other keys in the record. It can also stitch multiple profiles together when we find that keys in the record exist in two or more existing profiles. We combine the profiles together and the rid with the earliest date is the kept whilst the other profiles are merged and deleted. 
+
+# Description
+
+## Requirements
+
+In order to query the id changes log we must know the parent segment the customer is interested in. A customer may have a number of parent segments so we must ask them to provide the one they are interested in before making a query. A segment ID will be a numeric value like 411671.
+
+The user must also have a correctly configured tdx-skill or the Treasure Data mcp server (@treasuredata/mcp-server) to enable the database lookup.
+
+In addition the api key with appropriate access to the database table should be available and configured.
+
+## Database
+
+The database name contains the parent segment ID and has this format cdp_audience_394649_rt. This is where you can plug in the parent segment the user gives in the request.
+
+## id_changes table
+
+The id_changes log table is always called id_changes and is in the Parent Segment real time database.
+
+## Schema
+
+The id_changes table has the following schema. The format of the schema below is:
+
+column name, type, description
+time, int, unix timestamp of when the record was logged
+event_type, string, always "id". It is not informative.
+key_values, string, always "NA" or null. It has no information.
+reactor_instance_id, string, for customer support assistance this can be useful for the engineering team
+profile_change_type, string, "profile_added" or "profile_updated_by_stitching" or "profile_deleted_by_stitching" that determines what caused the change
+current_id_attributes, string, A json array of key value colon separated strings showing the known id keys for the profile. For example ["customer_id:3001"] This is only valid when profile_change_type is profile_added
+current_rid, string, internal unique identifier for this realtime profile in UUID v7 string format for example 0199f0ad-13b8-7c3d-9fab-b5fb14ebe7cc
+merged_rids, string, json string showing which rids were merged during a stitching process as a json array for example ["019a9e30-8407-76ff-a67b-b1576b72cf12"]
+
+
+## Understanding stitching
+
+### New profiles
+
+When you see profile_change_type is profile_added that means a new profile is created and the new rid will be in current_rid. This log entry is the only one where the current_id_attributes is populated. You can determine the attributes of merged profiles by taking the union of their initial attributes when the profile was created. 
+
+### Updated profiles
+
+Sometimes you may see profile_change_type is profile_updated_by_stitching and merged_rids is just an empty array. This means a profile was updated but it is not useful information for identity purposes.
+
+### Merged profiles
+
+In this case it means multiple profiles were merged. 
+
+There will be a row for each identity deleted and one for the oldest profile. They have the following properties.
+
+For each deleted profile the profile_change_type is profile_deleted_by_stitching.
+For the oldest profile the others are merged into the profile_change_type is profile_updated_by_stitching and the merged_rids column will have the json array of the rids. 
+
+## Query tips
+
+If the user does not specify a time range assume the last 24 hours. Use a where clause like `TD_INTERVAL(time, '-1d/now')`
+
+Rather than show the time as a timestamp it is useful to convert it to a human friendly string using TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss', 'GMT'). The user may specify a different time zone.
+Sample trino/presto query to get 8 hours of logs.
+
+```
+select TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss', 'PST'), *
+from id_changes
+where 
+TD_INTERVAL(time, '-8h/now')
+order by time desc;
+```
+
+When you need to access the key value pairs in current_id_attributes the following query helps extract it.
+
+```
+WITH data AS (
+    SELECT '["customer_id:customer test 4","cookie_id:abcdei"]' AS raw_col
+)
+SELECT 
+    -- This creates: {customer_id=customer test 4, cookie_id=abcdei}
+    map(
+        transform(parsed_arr, x -> split_part(x, ':', 1)), -- Extract keys
+        transform(parsed_arr, x -> split_part(x, ':', 2))  -- Extract values
+    ) AS kv_map
+FROM (
+    SELECT CAST(json_parse(raw_col) AS ARRAY(VARCHAR)) AS parsed_arr 
+    FROM data
+)
+```
+
+Some likely queries the customer may want:
+
+How many new profiles did we get in a time period? Search and count the rows where add_profile occurs in profile_change_type.
+You can discover the number merged profiles by counting where profile_change_type is profile_updated_by_stitching.
+Similarly you can determine the number of deleted profiles by counting profile_deleted_by_stitching in the time period.
+
+Customers can query by identifier to ask for the history of a profile, in which case you should search the current_id_attributes as above to identify the rid and then the merge history.