Hello to Amasty blog readers!
Today we’re sharing a free tool which improves Magento 2.1 catalog performance using MySQl as a search engine. The issue which caused catalog performance meltdowns is valid for Magento 2.1 versions, as it was fixed in the recent Magento 2.2 release.
The fix was originally created for a store on Magento 2.1.8.
Please be aware that this is a free tool that we created for internal needs and are sharing it with the public, because lots of stores are using 2.1 versions, and we thought they can benefit from using it. However, the tool is delivered as is, and you take your own responsibility for installing and using it on your store – this is not a commercial extension which is not covered by support services. To be safe, always test new tools on a staging environment.
If you don’t understand the details of this article, simply send a link to your developer =)
And of course we always encourage our customers to update their Magento stores to the latest versions if possible.
How Magento 2 uses MySQl with search
In Magento 2, most probably for the sake of search engines usage unification , MySQL search is set as a default instrument for displaying catalog data on the shop frontend. It is also used for filtering the catalog and navigating through it.
Basically, the search uses two stages to work it out: a temporary table generation and catalog data generation based on the temporary table data.
Firstly, a list of the items which correspond to the applied filters based on the previously indexed data is created. We are talking about catalog_search_fulltext and catalog_product_index_eav tables.
The first one contains attributes IDs and their options, and their corresponding text values are shown on the frontend.
The second table contains values of all attributes for all store’s products.
There’s a product with id: 1 of red color. So, the attribute set of this product contains the color attribute (id : 10) with the red option (id: 100).
In the catalogsearch_fulltext table, the presentation of the mentioned attribute would be as follows:
entity_id | attribute_id | data_index
1 | 10 | “Red”
This is a table of full-text search, so searching is performed by the name of the option.
In the catalog_product_index_eav table, the presentation of this product (partial, of course, as we are not talking about all the possible attributes) would look like this:
entity_id | attribute_id | store_id | value | source_id
1 | 10 | 1 | 100 | 1
Where source_id is the ID of cataloginvetory entity of the item in the cataloginvetory_stock_status table.
So, the first stage of the process is to build a temporary table for getting the list of the items which correspond to the given filters. If no filters are applied, the standard ones are used – such as ID of the current category, visibility and stock_status of the item.
This is basically the query to build this temporary table of memory type (of searh_tmp_<#some_hash> kind). The table is created only for the time needed for the script to run and then is deleted.
In Magento of versions below 2.2.0, when applying the filters to the catalog and building this temporary table,only left joins for all filters are used.
Here’s an example of such a query:
SELECT `main_select`.`entity_id`, MAX(score) AS `relevance` FROM (SELECT `search_index`.`entity_id`, (((0) + (0) + (0) + (0) + (0)) * 1) AS `score` FROM `catalogsearch_fulltext_scope1` AS `search_index` LEFT JOIN `catalog_eav_attribute` AS `cea` ON search_index.attribute_id = cea.attribute_id INNER JOIN `catalog_category_product_index` AS `category_ids_index` ON search_index.entity_id = category_ids_index.product_id LEFT JOIN `catalog_product_index_eav` AS `length_filter` ON search_index.entity_id = length_filter.entity_id AND length_filter.attribute_id = 173 AND length_filter.store_id = 1 LEFT JOIN `cataloginventory_stock_status` AS `length_filter_stock` ON length_filter_stock.product_id = length_filter.source_id LEFT JOIN `catalog_product_index_eav` AS `material_filter` ON search_index.entity_id = material_filter.entity_id AND material_filter.attribute_id = 176 AND material_filter.store_id = 1 LEFT JOIN `cataloginventory_stock_status` AS `material_filter_stock` ON material_filter_stock.product_id = material_filter.source_id LEFT JOIN `catalog_product_index_eav` AS `finish_filter` ON search_index.entity_id = finish_filter.entity_id AND finish_filter.attribute_id = 177 AND finish_filter.store_id = 1 LEFT JOIN `cataloginventory_stock_status` AS `finish_filter_stock` ON finish_filter_stock.product_id = finish_filter.source_id LEFT JOIN `catalog_product_index_eav` AS `full_size_filter` ON search_index.entity_id = full_size_filter.entity_id AND full_size_filter.attribute_id = 203 AND full_size_filter.store_id = 1 LEFT JOIN `cataloginventory_stock_status` AS `full_size_filter_stock` ON full_size_filter_stock.product_id = full_size_filter.source_id LEFT JOIN `cataloginventory_stock_status` AS `stock_index` ON search_index.entity_id = stock_index.product_id AND stock_index.website_id = 0 WHERE (stock_index.stock_status = 1) AND (category_ids_index.category_id = 2542) AND (length_filter.value = '6219' AND length_filter_stock.stock_status = 1) AND (material_filter.value = '6627' AND material_filter_stock.stock_status = 1) AND (finish_filter.value = '352' AND finish_filter_stock.stock_status = 1) AND (full_size_filter.value = '23645' AND full_size_filter_stock.stock_status = 1)) AS `main_select` GROUP BY `entity_id` ORDER BY `relevance` DESC LIMIT 10000
This is a real-life query from a production site. The whole query was created for a result of only one product, four filters applied, two minutes time to run.
Seems that something is wrong, but what exactly? Let’s find out.
The issue dissected
When using large chunks of data, such as 100 000 items and 1000 attributes, the size of the catalog_product_index_eav table will be as follows: number of all items*number of all attributes. In the given case, some insanity of 100 000 000 lines.
Of course, this is an ideal example, because we’re looking at a bad case to make it more vivid. It still can happen, if, say, all the products have values for all the attributes. In any case, even a 1000 000 lines table influences performance heavily.
With each left join of the table, the potential size of the query (we’re talking about the result table built by MySQL while running the query) will be as big as the size of the catalog_product_index_eav table raised to the power of the number of left joins of the table itself.
What we’ve seen from experience is that this particular query was the bottleneck which took several minutes of the server to run it. And, of course, the load time of the end page is influenced by the case, too, not even talking about the RAM needed to process the query.
If the hardware of the server doesn’t have enough memory resources, all the other processes will be put in line to be run after this long one. Which, in fact, influences not only the customer which requested the data, but all the other people which are working with the server.
From Magento 2.2.0, this algorithm was improved, as well as the process of indexing the data which participates in it. In other words, this issue no longer exists in the newest versions of Magento 2.
However, we at Amasty needed a tool to avoid the issue on the previous versions. We created an extension, which changed LEFT JOIN to INNER JOIN and made it work. The result was significant performance improvement and memory usage for such queries.
Here’s a small example of the heaviest query we could find.
- No extension – 25 secs needed.
- With the enabled extension – 5 seconds needed (five times better).
Of course, these numbers shouldn’t be referenced as the final ones, because our tests were performed on the local machines, and we didn’t do extensive testing, as this is not a commercial tool. But. of course, it improves the performance of Magento 2.1 catalogs significantly.
You can download the extension for free on Amasty’s GitHub.
Got any questions? Feel free to ask them in the comments section. Thanks, and have a productive week!