Menu
Feedback
Start here
Tutorials


Tutorials
Explore in-depth tutorials for operating your VTEX store.
Tutorials
Beta
VTEX Data Pipeline Beta
Catalog Data Pipeline
9 min read

The dataset consists of five tables that provide the most recent catalog information for a VTEX account. This dataset provides information on products, SKUs, brands, categories and specifications.

In this section you will find the following information:

Data Characteristics

CharacteristicDescription
Data SourceThe data is obtained from the catalog module.
AvailabilityThe data can be accessed in the VTEX Admin.
HistoryThe available data starts from February 2025.
Minimum Possible Update IntervalOne hour.

Table: product

The product table contains information about products registered in the VTEX catalog, including identifiers, categories, brands, and visibility in the store. It also stores information about images, related SKUs, and sales channels in which the product is available. It also records important dates, such as creation, update, and launches.

Column nameColumn typeColumn description
accountcharacter varying(255)Account who owns the given product.
product_idintegerIdentifier created by VTEX when we create this product. It's unique for each account.
product_ref_idcharacter varying(255)Reference code used internally for organizational purposes, which is configured by the Merchant.
brand_idsuperBrand's Identifier of the given product.
category_idsuperReference code used internally for organizational purposes. It has a hierarchical configuration.
skussuperArray of collections to join with SKU table.
clusters_idsuperIdentifier of the clusters.
product_namecharacter varying(255)Name of the product.
product_imagecharacter varying(65535)This is the most common image used for the product SKUs we have data on.
is_activebooleanThis field defines, from Catalog's point of view, if a product is active or not. However, other factors can also impact whether this product will be available in the store, such as pricing and inventory.
is_visibilebooleanIndicates if the product is visible in the store.
tax_codecharacter varying(255)Product tax code, should be used for tax calculation and is registered by the Merchant.
product_pathcharacter varying(255)It's a part of the URL in the product page URL.
related_categoriessuperCategories related to this product.
similar_categoriessuperSimilar Category List: This aids in categorizing items, such as placing both mouse and keyboard under the desktop category.
sales_channelssuperDefine in which sales channels this product is being offered.
dt_first_releasetimestamp with time zonePlanned product launch date as recorded in the Catalog's index.
dt_last_releasetimestamp with time zoneTimestamp of the last release.
dt_createdtimestamp with time zoneTimestamp when the record was created in our internal systems (UTC timezone).
dt_updatedtimestamp with time zoneTimestamp when the record was updated in our internal systems (UTC timezone).
batch_idcharacter varying(13)Identifier of the batch, used for processing and tracking data updates.
sk_productcharacter varying(32)Synthetic key of the product, used as primary key, composed by the hash of account and its product_id.
sk_brandcharacter varying(32)Foreign key for the brand, used to join with the brand table, composed by the hash of account and its brand_id.
sk_categorycharacter varying(32)Foreign key for the category, used to join with the category table, composed by the hash of an account and its category_id.
sk_skussuperInformation about SKUs, used to join with the SKU table, composed by the hash of an account, its product_id and sku_id.
sk_clustersuperInformation about clusters, composed by the hash of an account and its cluster_id.
sk_related_categoriessuperInformation about related categories, it's a list of categories related to this product and have an hierarchical configuration.

Table: SKU

This table details the SKUs associated with products. It includes unique identifiers, physical dimensions, manufacturer codes, and EANs. It also stores information about images, additional services, kits, and attributes specific to each SKU. Creation and update dates are recorded for tracking.

Column nameColumn typeColumn description
accountcharacter varying(255)Account who owns the given product.
product_idbigintIdentifier created by VTEX when we create this product. It's unique for each account.
sku_idbigintIdentifier of that SKU.
is_activebooleanThis field defines, from Catalog's point of view, if a product is active or not. However, other factors can also impact whether this product will be available in the store, such as pricing and inventory.
is_kitbooleanThis flag indicates whether the SKU is a kit consisting of multiple products.
sku_ref_idcharacter varying(65535)Reference code used internally for organizational purposes. Configured by the merchant.
sku_dimensionssuperProduct dimensions for each SKU, most commonly used for shipping calculations. Contains: cubicWeight, heightCentimeter, lenghtCentimeter, weightKg, widthCentimenter.
sku_real_dimensionssuperProduct dimensions without boxes or packaging, used more on the PDP. Contains: cubicWeight, heightCentimeter, lenghtCentimeter, weightKg, widthCentimenter.
sku_manufacturer_codecharacter varying(65535)Code used by merchant to reference the manufacturer.
sku_eanssuperEAN codes for the SKU (it's possible to have more than one EAN for the same SKU)
sku_kit_itemssuperItems included in the SKU kit.
sku_image_urlcharacter varying(65535)URL with the product's image.
sku_image_gallerysuperGallery of SKU images.
sku_servicessuperServices related to this product (like a birthday package), used for cross-sell.
sku_attachmentssuperAttachments related to the SKU, such as customizations.
sku_attributessuperAttributes of the SKU.
sku_videossuperVideos related to the SKU.
sku_filessuperFiles associated with the SKU.
dt_createdtimestamp with time zoneTimestamp when the record was created in our internal systems (UTC timezone).
dt_updatedtimestamp with time zoneTimestamp when the record was updated in our internal systems (UTC timezone).
batch_idcharacter varying(13)Identifier of the batch, used for processing and tracking data updates.
sk_skucharacter varying(32)Unique identifier for the SKU, used to join with the product table, composed by hash of account, product_id and sku_id.
sk_productcharacter varying(32)Unique identifier for the product.
sk_categorycharacter varying(32)Foreign key for the category, used to join with the category table, composed by the hash of an account and its category_id.
sk_skussuperInformation about SKUs, used to join with the SKU table, composed by the hash of an account, its product_id and sku_id.
sk_clustersuperInformation about clusters, composed by the hash of an account and its cluster_id.
sk_related_categoriessuperInformation about related categories, it's a list of categories related to this product and have an hierarchical configuration.

Table: brand

The brand table contains information about the brands registered in the VTEX catalog, including identifiers, names, and activation status. It also keeps records of creation and update, as well as a unique identifier for integration with other tables.

Column nameColumn typeColumn description
accountcharacter varying(255)Account associated with the brand. It represents the account that owns the given product.
brand_idbigintUnique identifier for the brand. This identifier is created by VTEX when the brand is created.
brand_namecharacter varying(65535)Name of the brand. This is the brand's name associated with the given product.
is_activebooleanIndicates whether the brand is active. This boolean defines if a brand is available or not.
dt_createdtimestamp with time zoneTimestamp when the record was created in our internal systems (UTC timezone).
dt_updatedtimestamp with time zoneTimestamp when the record was updated in our internal systems (UTC timezone).
batch_idcharacter varying(13)Identifier for the batch in which the record was processed.
sk_brandcharacter varying(32)Unique identifier for the brand in catalog. This identifier is created by hash concatenating the brand_id and account.

Table: category

The category table presents products in hierarchical categories. It stores identifiers, names, full category paths, and activation status. It allows organizing products within the catalog structure.

Column nameColumn typeColumn description
sk_categorycharacter varying(32)Synthetic key created by hashing of account and category_id, its used as primary key.
accountcharacter varying(255)Account who owns the given category.
category_idbigintUnique identifier of the category.
category_namecharacter varying(65535)The name of the category that the product was associated with.
category_full_path_uri_namecharacter varying(65535)Reference for hierarchical path of category, related with the name of each category.
category_full_pathcharacter varying(65535)Reference for hierarchical path of category, but using the id of each category.
is_activebooleanFlag to validate category activation.
dt_createdtimestamp with time zoneTimestamp when the record was created in our internal systems (UTC timezone).
dt_updatedtimestamp with time zoneTimestamp when the record was updated in our internal systems (UTC timezone).
batch_idcharacter varying(13)Identifier of the batch, used for processing and tracking data updates.

Table: specification

The specification table stores technical specifications of products and SKUs, such as specification groups, assigned values, and whether a specification is required. It also contains identifiers to facilitate integration with products, SKUs, and specification groups, ensuring more accurate detailing of registered items.

Column nameColumn typeColumn description
sk_specificationcharacter varying(32)Synthetic key identifier created by data team for the specification item in catalog. It is composed by the hashing of account and product_id, sku_id, specification_group_id, specification_id and value_id. When value_id was null, we changed for -1.
sk_productcharacter varying(32)Unique identifier for the product, used to join with the product table, it's composed by account and product_id.
sk_specification_groupcharacter varying(32)Unique identifier for the specification group. It's composed by account and specification_group_id.
sk_skucharacter varying(32)An unique identifier for the SKU, used to join with the SKU table, is composed by account, product_id and sku_id.
accountcharacter varying(255)Account associated with the specification.
product_idbigintIdentifier for the product
sku_idbigintIdentifier for the SKU
specification_group_idbigintIdentifier for the specification group
specification_group_namecharacter varying(65535)Name of the specification group
specification_idbigintIdentifier for the specification
specification_namecharacter varying(65535)Name of the specification
specification_is_requiredbooleanIndicates if the specification is required
specification_value_idbigintIdentifier for the specification value
specification_valuecharacter varying(65535)Value of the specification
is_product_specificationbooleanIndicates if it is a product specification
is_sku_specificationbooleanIndicates if it is a SKU specification
dt_createdtimestamp with time zoneTimestamp when the record was created in our internal systems (UTC timezone).
dt_updatedtimestamp with time zoneTimestamp when the record was updated in our internal systems (UTC timezone).
batch_idcharacter varying(13)Identifier of the batch, used for processing and tracking data updates.

Table: cluster

The cluster table saves information about account groupings, representing sets of stores or sellers.

Column nameColumn typeColumn description
sk_categorycharacter varying(32)Synthetic key created by hashing of account and category_id, is used as primary key.
accountcharacter varying(255)Account associated with the cluster, representing the merchant or store.
cluster_idbigintIdentifier for the cluster, unique within the account.
cluster_namecharacter varying(65535)Name of the cluster, used for display and identification purposes.
is_activebooleanIndicates if the cluster is active and should be considered in operations.
is_searchablebooleanIndicates if the cluster is searchable and can be found through search functionalities.
cluster_date_fromtimestamp with time zoneStart date of the cluster, indicating when the cluster becomes active.
cluster_date_totimestamp with time zoneEnd date of the cluster, indicating when the cluster becomes inactive.
dt_createdtimestamp with time zoneTimestamp when the record was created in our datalake, used UTC timezone.
dt_updatedtimestamp with time zoneTimestamp when the record was updated in our datalake, used UTC timezone.
batch_idcharacter varying(13)Identifier of the batch, used for processing and tracking data updates.

Analyses with catalog

Some analysis options with catalog data are:

  • Verify a product's status: use the "isActive" field to check if the Catalog module considers a product active.
  • Identify related SKUs: obtain a list of all SKUs linked to a parent product.
  • Detail product specifications: retrieve all specifications of a product, including its brand, category, and other relevant details.

Correlations with other data

Catalog data is strongly connected to various other data models. Here are some noteworthy correlations:

  • Relationship with Inventory: by integrating catalog data with inventory information, you can precisely determine the available stock for each product.
  • Relationship with Orders: combining catalog data with order details enables you to accurately analyze the number of orders associated with each product in your catalog.
  • Impact on Conversion Rate: evaluating navigation data to determine the store's funnel conversion rate helps you understand how each product plays a role in users' navigation behavior.

Discover other Datasets

Contributors
1
Photo of the contributor
+ 1 contributors
Was this helpful?
Yes
No
Suggest Edits (GitHub)
Data dictionary: VTEX Data Pipeline
« Previous
Inventory Data Pipeline
Next »
Contributors
1
Photo of the contributor
+ 1 contributors
On this page
Still got questions?
Ask the community
Find solutions and share ideas in the VTEX community.
Join our community
Request support from VTEX
For personalized assistance, contact our experts.
Open a support ticket
GithubDeveloper portalCommunityFeedback