{"id":14386,"date":"2021-01-11T15:25:38","date_gmt":"2021-01-11T15:25:38","guid":{"rendered":"https:\/\/www.improvemysearchranking.com\/?p=14386"},"modified":"2023-12-05T11:10:31","modified_gmt":"2023-12-05T11:10:31","slug":"googles-new-smith-algorithm-and-how-it-outperforms-bert","status":"publish","type":"post","link":"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/","title":{"rendered":"Google\u2019s new SMITH algorithm (and how it outperforms BERT)"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Google has a new search engine algorithm, SMITH. And according to Google, it is outperforming Google BERT in understanding long-form queries and content.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It remains a mystery whether or not Google is using the SMITH algorithm. It is important to note that Google rarely says which specific algorithms it is using at a given time. Therefore, Google may or may not be using SMITH as of now.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">That, however, does not diminish its value and the need for understanding how this algorithm works. In my opinion, this gives a fascinating insight into the direction Google is moving as a search engine and how it sees the future of online content and content consumption.<\/span><\/p>\n<p><!--more--><\/p>\n<h2><b>What is SMITH?<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Put simply, SMITH or Siamese Multidepth Transformer-based Hierarchical Encoder is a new search engine algorithm by Google that focuses on understanding long-form documents. More specifically, SMITH is particularly good in understanding the context of certain passages within long-form content.<\/span><\/p>\n<h2><b>How is SMITH different from BERT?<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">SMITH and BERT appear to be related, and SMITH seems like an extension of BERT.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">While SMITH deals with understanding passages within the context of documents, BERT is trained to understand words within the context of sentences.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When it comes to understanding long-form content, BERT has limitations that SMITH does not.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">According to a research whitepaper by Google:<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">\u201cIn recent years, self-attention based models like Transformers\u2026 and BERT \u2026have achieved state-of-the-art performance in the task of text matching. These models, however, are still limited to short text like a few sentences or one paragraph due to the quadratic computational complexity of self-attention with respect to input text length.<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400;\">In this paper, we address the issue by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for long-form document matching. Our model contains several innovations to adapt self-attention models for longer text input.\u201d<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">The whitepaper also explains why understanding long documents could be more difficult:<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">\u201cSemantic matching between long texts is a more challenging task due to a few reasons:<\/span><\/i><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><i><i><span style=\"font-weight: 400;\">When both texts are long, matching them requires a more thorough understanding of semantic relations including matching pattern between text fragments with long distance;<\/span><\/i><\/i><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">Long documents contain internal structure like sections, passages and sentences. For human readers, document structure usually plays a key role for content understanding. Similarly, a model also needs to take document structure information into account for better document matching performance;<\/span><\/i><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400;\"><i><span style=\"font-weight: 400;\">The processing of long texts is more likely to trigger practical issues like out of TPU\/GPU memories without careful model design.\u201d<\/span><\/i><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><b>The results<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">BERT is limited when it comes to understanding longer documents. On the other hand, SMITH performs better the longer the document is. According to the whitepaper:<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">\u201cExperimental results on several benchmark data for long-form text matching\u2026 show that our proposed SMITH model outperforms the previous state-of-the-art models and increases the maximum input text length from 512 to 2048 when comparing with BERT based baselines.\u201d<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400;\">\u201cOur experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention\u2026, multi-depth attention-based hierarchical recurrent neural network\u2026, and BERT.<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400;\">Comparing to BERT based baselines, our model is able to increase maximum input text length from 512 to 2048.\u201d<\/span><\/i><\/p>\n<h2><b>Pre-training and content blocks<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Pre-training algorithm is a tried and tested method that not only produces excellent results but also helps the algorithm mature over time and make fewer mistakes.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In pre-training, random words are hidden in a sentence, and the algorithm predicts the hidden words. The algorithm keeps learning and, eventually, makes fewer mistakes.<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">\u201cInspired by the recent success of language model pre-training methods like BERT, SMITH also adopts the \u201cunsupervised pre-training + fine-tuning\u201d paradigm for the model training.<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400;\">For the SMITH model pre-training, we propose the masked sentence block language modeling task in addition to the original masked word language modeling task used in BERT for long text inputs.\u201d<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">In the case of SMITH, blocks of sentences are hidden in pre-training. This is a key part of SMITh and how it operates.<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">\u201cWhen the input text becomes long, both relations between words in a sentence block and relations between sentence blocks within a document becomes important for content understanding. Therefore, we mask both randomly selected words and sentence blocks during model pre-training.\u201d<\/span><\/i><\/p>\n<h2><b>Conclusion<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">It is important to note that SMITH does not replace BERT. Instead, SMITH supplements BERT by doing what BERT is unable to do.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you want to learn more about SMITH, you can <\/span><a href=\"https:\/\/research.google\/pubs\/pub49617\/\"><span style=\"font-weight: 400;\">read the original research paper here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google has a new search engine algorithm, SMITH. And according to Google, it is outperforming Google BERT in understanding long-form queries and content. It remains a mystery whether or not Google is using the SMITH algorithm. It is important to note that Google rarely says which specific algorithms it is using at a given time. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":14392,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_oct_exclude_from_cache":false,"inline_featured_image":false},"categories":[33],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Google\u2019s New SMITH Algorithm | IMSR<\/title>\n<meta name=\"description\" content=\"Discover Google&#039;s powerful new SMITH algorithm and how it revolutionises long-form content understanding. Unleash SMITH&#039;s potential!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Google\u2019s New SMITH Algorithm | IMSR\" \/>\n<meta property=\"og:description\" content=\"Discover Google&#039;s powerful new SMITH algorithm and how it revolutionises long-form content understanding. Unleash SMITH&#039;s potential!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/\" \/>\n<meta property=\"og:site_name\" content=\"Improve My Search Ranking\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/ImproveMySearchRanking\" \/>\n<meta property=\"article:published_time\" content=\"2021-01-11T15:25:38+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-05T11:10:31+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.improvemysearchranking.com\/wp-content\/uploads\/2021\/01\/Google-algorithm-update.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"854\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Josh Hamit\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@ImproveMySearch\" \/>\n<meta name=\"twitter:site\" content=\"@ImproveMySearch\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Josh Hamit\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/\",\"url\":\"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/\",\"name\":\"Google\u2019s New SMITH Algorithm | IMSR\",\"isPartOf\":{\"@id\":\"https:\/\/www.improvemysearchranking.com\/#website\"},\"datePublished\":\"2021-01-11T15:25:38+00:00\",\"dateModified\":\"2023-12-05T11:10:31+00:00\",\"author\":{\"@id\":\"https:\/\/www.improvemysearchranking.com\/#\/schema\/person\/cd296d14b492e28427951b912dddde7b\"},\"description\":\"Discover Google's powerful new SMITH algorithm and how it revolutionises long-form content understanding. Unleash SMITH's potential!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.improvemysearchranking.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Google\u2019s new SMITH algorithm (and how it outperforms BERT)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.improvemysearchranking.com\/#website\",\"url\":\"https:\/\/www.improvemysearchranking.com\/\",\"name\":\"Improve My Search Ranking\",\"description\":\"Improve My Search Ranking\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.improvemysearchranking.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.improvemysearchranking.com\/#\/schema\/person\/cd296d14b492e28427951b912dddde7b\",\"name\":\"Josh Hamit\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/www.improvemysearchranking.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5e2eaeeac454cc68a732747769a1f350?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5e2eaeeac454cc68a732747769a1f350?s=96&d=mm&r=g\",\"caption\":\"Josh Hamit\"},\"url\":\"https:\/\/www.improvemysearchranking.com\/author\/josh\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Google\u2019s New SMITH Algorithm | IMSR","description":"Discover Google's powerful new SMITH algorithm and how it revolutionises long-form content understanding. Unleash SMITH's potential!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/","og_locale":"en_GB","og_type":"article","og_title":"Google\u2019s New SMITH Algorithm | IMSR","og_description":"Discover Google's powerful new SMITH algorithm and how it revolutionises long-form content understanding. Unleash SMITH's potential!","og_url":"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/","og_site_name":"Improve My Search Ranking","article_publisher":"https:\/\/www.facebook.com\/ImproveMySearchRanking","article_published_time":"2021-01-11T15:25:38+00:00","article_modified_time":"2023-12-05T11:10:31+00:00","og_image":[{"width":1280,"height":854,"url":"https:\/\/www.improvemysearchranking.com\/wp-content\/uploads\/2021\/01\/Google-algorithm-update.jpg","type":"image\/jpeg"}],"author":"Josh Hamit","twitter_card":"summary_large_image","twitter_creator":"@ImproveMySearch","twitter_site":"@ImproveMySearch","twitter_misc":{"Written by":"Josh Hamit","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/","url":"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/","name":"Google\u2019s New SMITH Algorithm | IMSR","isPartOf":{"@id":"https:\/\/www.improvemysearchranking.com\/#website"},"datePublished":"2021-01-11T15:25:38+00:00","dateModified":"2023-12-05T11:10:31+00:00","author":{"@id":"https:\/\/www.improvemysearchranking.com\/#\/schema\/person\/cd296d14b492e28427951b912dddde7b"},"description":"Discover Google's powerful new SMITH algorithm and how it revolutionises long-form content understanding. Unleash SMITH's potential!","breadcrumb":{"@id":"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.improvemysearchranking.com\/googles-new-smith-algorithm-and-how-it-outperforms-bert\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.improvemysearchranking.com\/"},{"@type":"ListItem","position":2,"name":"Google\u2019s new SMITH algorithm (and how it outperforms BERT)"}]},{"@type":"WebSite","@id":"https:\/\/www.improvemysearchranking.com\/#website","url":"https:\/\/www.improvemysearchranking.com\/","name":"Improve My Search Ranking","description":"Improve My Search Ranking","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.improvemysearchranking.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.improvemysearchranking.com\/#\/schema\/person\/cd296d14b492e28427951b912dddde7b","name":"Josh Hamit","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.improvemysearchranking.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5e2eaeeac454cc68a732747769a1f350?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5e2eaeeac454cc68a732747769a1f350?s=96&d=mm&r=g","caption":"Josh Hamit"},"url":"https:\/\/www.improvemysearchranking.com\/author\/josh\/"}]}},"_links":{"self":[{"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/posts\/14386"}],"collection":[{"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/comments?post=14386"}],"version-history":[{"count":1,"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/posts\/14386\/revisions"}],"predecessor-version":[{"id":22652,"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/posts\/14386\/revisions\/22652"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/media\/14392"}],"wp:attachment":[{"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/media?parent=14386"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/categories?post=14386"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.improvemysearchranking.com\/wp-json\/wp\/v2\/tags?post=14386"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}