<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Amit’s Newsletter]]></title><description><![CDATA[A builder's occasional musings on data and tech industry]]></description><link>https://amit.thoughtspot.com</link><image><url>https://amit.thoughtspot.com/img/substack.png</url><title>Amit’s Newsletter</title><link>https://amit.thoughtspot.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 11 Apr 2026 05:39:42 GMT</lastBuildDate><atom:link href="https://amit.thoughtspot.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Amit Prakash]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[prakasha@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[prakasha@substack.com]]></itunes:email><itunes:name><![CDATA[Amit Prakash]]></itunes:name></itunes:owner><itunes:author><![CDATA[Amit Prakash]]></itunes:author><googleplay:owner><![CDATA[prakasha@substack.com]]></googleplay:owner><googleplay:email><![CDATA[prakasha@substack.com]]></googleplay:email><googleplay:author><![CDATA[Amit Prakash]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How does ChatGPT work? A brief history of computational language understanding ]]></title><description><![CDATA[Unless you&#8217;ve been living under a rock with no internet access, you&#8217;ve no-doubt heard examples of how people are using the platform, and prophecies of how ChatGPT is set to change the course of society as we know it.]]></description><link>https://amit.thoughtspot.com/p/what-is-chatgpt-and-how-does-it-work</link><guid isPermaLink="false">https://amit.thoughtspot.com/p/what-is-chatgpt-and-how-does-it-work</guid><dc:creator><![CDATA[Amit Prakash]]></dc:creator><pubDate>Thu, 23 Feb 2023 18:34:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XX8Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XX8Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XX8Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XX8Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XX8Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XX8Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XX8Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73403,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XX8Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XX8Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XX8Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XX8Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc989d2-2ea3-4b9a-a6c3-7316b923fab1_1200x628.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Unless you&#8217;ve been living under a rock with no internet access, you&#8217;ve no-doubt heard examples of how people are using the platform, and prophecies of how ChatGPT is set to change the course of society as we know it. When something this interesting shows up, surely curious minds want to know how it works. Despite having worked on Machine Learning for over a decade and having an active interest in the space, when I tried to read the research papers initially, I found it to be a daunting task. Each paper that I tried to understand referred to concepts that I did not understand, and when I tried to read another paper for those concepts, I found another set of concepts that I did not understand. It took a substantial resolve to keep following the rabbit holes till I understood the papers that describe how ChatGPT and similar technologies work from first principles.</p><p>This article is my attempt at simplifying the timeline of technical advancements that led to ChatGPT as we know it today. If you understand basic computer science and probability, you should have a strong enough foundation to follow along. Without getting into any complex math or very technical discussion, you should get some sense of how these things are wired.</p><p>Keep in mind, no one really understands how these models are doing what they are doing, in the same way, that no one really knows how they learn to ride a bike or walk. But that doesn&#8217;t mean we don&#8217;t have some handle on what's going on and we certainly know how we arrived here.&nbsp;</p><p>In this four-part series we&#8217;ll cover:</p><ol><li><p><strong>A brief history of computational language understanding</strong>, starting from the early collaboration of linguists and computer scientists (This article)</p></li><li><p><strong><a href="https://www.thoughtspot.com/data-trends/ai/what-is-transformer-architecture-chatgpt">Transformer architecture &#8211; The engine behind ChatGPT</a>,</strong> including a brief study of sequence-to-sequence models starting from RNNs ending at Transformers.</p></li><li><p><strong><a href="https://www.thoughtspot.com/data-trends/ai/large-language-models-vs-chatgpt">Emerging properties of ChatGPT and other large language models (LLMs)</a></strong>, where we&#8217;ll explore how language models are built using very large Transformer models that have surprising properties.</p></li><li><p><strong><a href="https://www.thoughtspot.com/data-trends/ai/artificial-intellegence-trends-with-chatgpt">The future of AI &#8211; trends to watch for in a ChatGPT world</a></strong>, including applications, future research, and implications for our world.</p></li></ol><p>There is a lot to unpack, so let&#8217;s get started!</p><h2>What is ChatGPT and how does it work</h2><p>GPT stands for Generative Pre-trained Transformer. The chat prefix to GPT was added because of the chat interface added to the GPT models. ChatGPT is an AI language model developed by OpenAI that can generate a natural language response to human input&#8212;basically, it&#8217;s an advanced chatbot.&nbsp;</p><p>To get started, we must first examine the two interwoven threads of computational language understanding and machine learning using neural networks. Both are fairly deep and technical topics, but my hope here is that at the expense of some accuracy, we can simplify the concepts enough that anyone with basic computer science knowledge can get an understanding.&nbsp;</p><h2>Early natural language processing (NLP)</h2><p>Computer scientists and linguists have been collaborating for at least half a century in an attempt to make computer programs understand language. Most people in the field believed that the path to doing so involved creating taxonomies of words, breaking sentences into parse trees, assigning parts of speech tags, and then using templates and rules to derive meaning. But as early as the 1950s people realized that there was a lot of information in looking at language purely from a statistical standpoint. For example, a word&#8217;s significance to the documents it appears in is directly proportional to the number of times it appears in the document and inversely proportional to how frequently it occurs in other documents. This was observed in 1957 and some version of this idea (<a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">TF-IDF</a>) is still used in most search engines today. If someone searches for &#8220;the indic languages&#8221; the most significant word to match would be indic, followed by languages followed by the because indic is the rarest word in that phrase.</p><h2>Language Models</h2><p>Quickly fill in the blanks without thinking too much: The dog [blank] so loudly that it was painful. If you are like most people, you guessed barked and not scratched. If you were to assign a probability to every possible word, you would probably give a fairly high probability to barked and a very low probability to anything else. This process of computing probabilities is called Language Modeling. ChatGPT is essentially a language model. It is also a very large <a href="https://en.wikipedia.org/wiki/Artificial_neural_network">neural network</a> with 175 billion programmable connections which is why it is called a Large Language Model (LLM). How large is large? For comparison, the human brain is estimated to have 10^15 or 1000 trillion connections.&nbsp;</p><p>The fun thing about language models is that you don&#8217;t have to stop after one prediction. You can append the predicted word to your text and then make the next prediction and make the next prediction and so on until you have a full page of text. You can see this happen in real time if you have predictive text turned on for your phone keyboard. Most phone keyboards give you a ranked suggestion for the next word. If you keep accepting it, usually you have a bizarre but interesting piece of text created entirely by the phone&#8217;s language model. These models are nowhere near the sophistication of ChatGPT.&nbsp;</p><p>The simplest language model would just count pairs of words that occur next to each other in a large body of text and then given the first word, look at the histogram of the next word. This is essentially where language models started. For example, after the word &#8220;phishing&#8221;, the most likely words may be &#8220;email&#8221; or &#8220;attack&#8221;. If you look at Google&#8217;s auto-completions, it matches that roughly intuition.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JTRI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JTRI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 424w, https://substackcdn.com/image/fetch/$s_!JTRI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 848w, https://substackcdn.com/image/fetch/$s_!JTRI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 1272w, https://substackcdn.com/image/fetch/$s_!JTRI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JTRI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png" width="1350" height="736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:736,&quot;width&quot;:1350,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86112,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JTRI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 424w, https://substackcdn.com/image/fetch/$s_!JTRI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 848w, https://substackcdn.com/image/fetch/$s_!JTRI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 1272w, https://substackcdn.com/image/fetch/$s_!JTRI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b7e6beb-61dc-4210-a5e9-ba6e0f84f871_1350x736.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>N-grams</h2><p>Next comes the idea of counting not just pairs of words (bigrams) but triplets (trigrams) and sequences of four and five and so on. These collectively were called N-grams. As the length of a sequence goes up, the number of possibilities goes up exponentially. So all n-grams cannot be stored and used in language modeling. Fortunately, as the length of a sequence goes up the number of times you find a sequence in a body of text also goes down so if you keep a threshold on counts to be above statistically significant occurrences, you can build it. Once you have a collection of N-grams, a simplistic way to predict the next word would be to find the longest matching sequences for the current text and use the distribution of the next word to predict the probabilities. This was more or less the state of art for language modeling for the longest time.</p><p>In fact, when I asked a friend at Google Translate in the late 2000s about how it worked, I was really surprised to learn that the core of it was simply matching n-grams in one language to n-grams in another language without much regard for grammar rules or anything else.</p><h2>Representing concepts as big vectors</h2><p>The N-gram model&#8217;s biggest drawback is that it has no information about the meaning of the word. In my opinion, representing words as big vectors was one of the biggest advancements in the early days of statistical NLP. Before vectors, people thought that the way to figure out synonymy and related words was to put words in a large tree (ontology) and then find how close they are to each other. In fact, there were armies of linguists whose job was to put words into ontologies. The problem with these word trees is that you&#8217;re forcing a somewhat arbitrary structure. For example, in a given ontology, Albert Einstein may show under&nbsp; <em>People-&gt;Scientists-&gt;Physicists-&gt;Born in the 1800s</em>, while the theory of relativity may show under <em>Abstract Concepts-&gt;Scientific Concepts-&gt;Physics-&gt;Modern Physics</em>. From this viewpoint, the two seem very far apart,&nbsp; yet they are clearly related concepts.&nbsp;</p><p>Imagine another way to organize words, where you are given a very large whiteboard and each word represents a fixed-size circle with a center at a specific point on this 2D plane. And words are arranged on the whiteboard so that words that are related to each other are close to each other and words that are less related are far apart.</p><p>You may find organizing the words in this way near impossible because if these circles are packed together, each circle can have only a small number of (<a href="https://mathworld.wolfram.com/CirclePacking.html#:~:text=A%20circle%20packing%20is%20an,35%2D41).">six</a> to be precise) close neighbors, but a word can have many different neighbors that are all slightly different in meaning. Now imagine instead of a 2D plane, we are in 3D and words are represented by spheres. This gives you a lot more room to pack related concepts together. This in fact goes up exponentially with the number of dimensions. So by the time you reach 256 or more dimensions, this becomes an excellent way to represent the concepts behind words so that related concepts are close together and unrelated concepts are far. What this means is that each word can be represented by a few hundred real numbers that represent the coordinates of the center of its sphere. These vectors of numbers of often called word embeddings.</p><p>This idea is so important that we are going to try and look at it another way. Suppose you limit yourself to the ten thousand most frequent words in the English language and everything else gets ignored in the following analysis. Also, imagine someone coming up with a thousand topics. And each word is scored on a scale of zero to one whether it belongs to that topic. For example, &#8220;mango&#8221; may belong to the topic &#8220;fruit&#8221; with a strength of 1.0, but the topic of &#8220;yellow objects&#8221;&nbsp; with a strength of 0.5 and &#8220;abstract concept&#8221; with a strength of 0.0. In this way, each word can be represented by a vector of thousand numbers. Now if we wanted to see if two words are related all we need to do is see how far apart these vectors are. Of course, if the topics were not chosen well, this scheme will perform poorly. So all you have to do is come up with a thousand topics that would do a good job of capturing the essence of all the ten thousand words, and then come up with 1,000 x 10,000 = 10 million weights. This kind of work has been done manually. It takes a lot of effort and the results are usually mediocre.</p><h2>Latent semantic analysis (LSA)</h2><p>One of the first successful attempts at solving this problem in an algorithmic way was the idea of <a href="https://towardsdatascience.com/latent-semantic-analysis-intuition-math-implementation-a194aff870f8">Latent Semantic Analysis</a> published in 1988. The idea was that if you add up the vectors of the words in the document, it should do a good enough job of representing the topic of the document. Now, based on the vector representing the document, if you guess the word distribution in the document, you may not get the exact word distribution but you might get something close as long as we had done a good job of picking the topics and association of words to topics.</p><p>For a given document, we could measure the difference in the original distribution of words and the re-constructed distribution of words from the topic vectors. Let&#8217;s call this difference the error in reconstruction. Now, if we have a large collection of documents, we could add up the error across all the documents. The key intuition of this paper was that the problem of picking topics and their connections to words could be considered a mathematical optimization problem that tries to reduce the total error.&nbsp;</p><p>Furthermore, if you constrain the function taking you from words to topics and topics to words to be linear, the process of finding the best topics and best vectors for words so that error is minimized is identical to another mathematical problem called <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">Principle Component Analysis</a> (PCA). Luckily PCA has a closed-form solution that can be computed efficiently.</p><p>The vectors computed in this way were fairly useful in finding the conceptual overlap between words and very useful for things like web searches. One disadvantage of computing vectors this way was that it ignored coherent concepts, instead reducing language to arbitrary mathematical optimization. Sometimes you could see the hidden meaning behind the numbers in the vector, and sometimes you couldn&#8217;t. This was our first clue that advanced mathematical modeling of language would be less and less explainable.</p><h2>Significance of vector representation of words</h2><p>Word embeddings from the very start were a powerful tool for measuring similarities in the meaning of two words. As the technique for computing these embeddings improved, it turned out that you could do arithmetic with the meaning of the words in surprising ways. For example, if you take the embedding of king, subtract the embedding of man, and then add the embedding of woman, you get something very close to the embedding for queen. What this means is that you can mathematically manipulate concepts. This is a big deal. So far the only way you could solve this problem was through rules and the application of logic &#8211; a brittle way of encapsulating real-life complexities.&nbsp;</p><p>With these embeddings, you can map and translate concepts from one language to another, produce summaries, and mix the meaning of words to get another word. For example, jaguar + animal will represent a different concept than jaguar + car. Or you can ask what word represents sad + lonely.&nbsp; Put into practice, this gives models the ability to suggest whether an email is significant or spam, perform sentiment analysis of an online review, and most importantly run semantic search.&nbsp;</p><h2>Going beyond linear functions</h2><p>The LSA approach above artificially constrained the functions going from word to topic to be linear for mathematical and computational convenience. The only question was, are there better embedding functions to be found if we relax these constraints so that we can discover better representations of concepts for the various applications listed above?&nbsp;</p><p>While multi-layer neural networks were always a very promising class of functions for representing arbitrarily complex functions, they were notoriously difficult to train and for the most part, had fallen out of favor. However, a few research groups strongly believed in its power and kept trying different techniques to make them do useful things for decades. After many different iterations, <a href="https://en.wikipedia.org/wiki/Geoffrey_Hinton">Geoff Hinton</a>&#8217;s group and <a href="https://en.wikipedia.org/wiki/Yoshua_Bengio">Yeshua Bengio</a>&#8217;s group successfully demonstrated that neural networks could do a great job of learning representations of words and concepts as embeddings. Along with <a href="https://en.wikipedia.org/wiki/Yann_LeCun">Yann LeCun</a>&#8217;s work on <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">Convolution Neural Networks</a> (CNN), this work is considered pivotal in the resurgence of neural networks in their current form as the most powerful tool in machine learning.</p><h2>AlexNet and GPUs</h2><p>2012 was another pivotal year for the resurgence of neural networks. Most experts credited it to <a href="https://en.wikipedia.org/wiki/AlexNet">AlexNet</a>, which used GPUs to train a large CNN for computer vision tasks with a substantially better output than anything else in the field. Using GPUs was already floating around for a while, but the magnitude of AlexNet&#8217;s success opened the door to many others who were training larger and larger neural networks with bigger training sets and more and more compute. One big realization for researchers at this time was that the more data and compute you were willing to give your neural network the better the results. This was not the case with most other machine learning methods. Most other approaches saturated after a while, but the neural nets kept going. The two fields that benefited the most from this approach were Computer Vision and NLP. From there, innovation became very rapid and it hasn&#8217;t stopped since.&nbsp;</p><h2>Adding context to words</h2><p>One problem with trying to map words to embeddings was determining context. Words can have very different meanings depending on how they&#8217;re used. For example, the word bank in either bank balance or river bank has two completely two different meanings and therefore must have two different embeddings.</p><p>From 2013 to 2018 a series of research papers improved the quality of embeddings by accounting for other words in the context and using different model architectures. The most influential papers in this period were <a href="https://arxiv.org/pdf/1301.3781.pdf">Word2Vec</a> (2013), <a href="https://nlp.stanford.edu/pubs/glove.pdf">Glove</a> (2014), and <a href="https://allenai.org/allennlp/software/elmo">Elmo</a> (2018) which tried different techniques for capturing the meaning of a word in its context. The techniques behind these results are totally worth studying but are not that relevant to our purpose of understanding ChatGPT. However, the idea of understanding the context itself is very important to understanding ChatGPT.</p><h2>Machine translation</h2><p>One particular problem that was fertile ground for applying various neural network advancements to was the problem of translating text from one language to another. The key idea here was computing sentence embeddings that map a full sentence to a vector and capture all the meaning in the word. This vector can then be passed to another network and translated into text in a different language. In that sense, you have an encoder network that takes the sentence and encodes it into an embedding and there is a decoder network that can decode it. Now this combined network can be trained on all the available training data for any language pair where we have human-generated translations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HGWp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HGWp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 424w, https://substackcdn.com/image/fetch/$s_!HGWp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 848w, https://substackcdn.com/image/fetch/$s_!HGWp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 1272w, https://substackcdn.com/image/fetch/$s_!HGWp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HGWp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png" width="1040" height="335" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:335,&quot;width&quot;:1040,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HGWp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 424w, https://substackcdn.com/image/fetch/$s_!HGWp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 848w, https://substackcdn.com/image/fetch/$s_!HGWp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 1272w, https://substackcdn.com/image/fetch/$s_!HGWp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1800c7be-51ec-437c-bdb5-94b40cb7be1b_1040x335.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><blockquote></blockquote><p>Even early attempts at using neural networks looked quite promising compared to classical techniques, but it also provided the grounds for a lot of innovation that followed.&nbsp;</p><h4>How ChatGPT is able to understand human conversation</h4><p>So far, we saw how embedding words or sentences into a high-dimensional space is useful for many NLP tasks. We also saw that interpreting words or sentences within the context in which they occur is very important, and also challenging. In the <a href="https://www.thoughtspot.com/data-trends/ai/what-is-transformer-architecture-chatgpt">next article</a>, we will look at various sequence-to-sequence models starting from RNNs building toward the Transformer Architecture which was the key innovation that made ChatGPT possible.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wT0I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wT0I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 424w, https://substackcdn.com/image/fetch/$s_!wT0I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 848w, https://substackcdn.com/image/fetch/$s_!wT0I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 1272w, https://substackcdn.com/image/fetch/$s_!wT0I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wT0I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png" width="1040" height="931" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:931,&quot;width&quot;:1040,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75559,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wT0I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 424w, https://substackcdn.com/image/fetch/$s_!wT0I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 848w, https://substackcdn.com/image/fetch/$s_!wT0I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 1272w, https://substackcdn.com/image/fetch/$s_!wT0I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F428b1164-b6a5-4575-a8d0-e4139754d42f_1040x931.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Next Post</h2><p><strong><a href="https://www.thoughtspot.com/data-trends/ai/what-is-transformer-architecture-chatgpt">Transformer architecture &#8211; The engine behind ChatGPT</a>,</strong> includes a brief study of sequence-to-sequence models starting from RNNs and ending at Transformers.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://amit.thoughtspot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Amit&#8217;s Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The case for a query modification language]]></title><description><![CDATA[And why dashboards are dead]]></description><link>https://amit.thoughtspot.com/p/the-case-for-a-query-modification</link><guid isPermaLink="false">https://amit.thoughtspot.com/p/the-case-for-a-query-modification</guid><dc:creator><![CDATA[Amit Prakash]]></dc:creator><pubDate>Thu, 13 Oct 2022 05:13:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Kzf0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kzf0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kzf0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Kzf0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Kzf0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Kzf0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kzf0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png" width="728" height="380.9866666666667" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:823677,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Kzf0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!Kzf0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!Kzf0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!Kzf0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f544aa4-4823-422b-83bc-a5f49acd2b71_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In 1895, a German physicist was trying to determine if he could observe cathode rays escaping from a glass tube and noticed an unexpected glow on a fluorescent screen several feet away. On further examination, it turned out to be a different kind of radiation that we now know as X-ray. Fast forward to today and you can&#8217;t even imagine diagnosing many medical problems without an X-ray.&nbsp;</p><p>There are many instances where someone was looking to study one thing and accidentally discovered something completely different and revolutionary &#8212; like the <a href="https://en.wikipedia.org/wiki/Microwave_oven#Discovery">microwave oven</a> or <a href="https://www.acs.org/content/acs/en/education/whatischemistry/landmarks/flemingpenicillin.html#alexander-fleming-penicillin">penicillin</a>.</p><p>A similarly accidental discovery is shaking up the <a href="https://www.thoughtspot.com/blog/what-defines-the-modern-data-stack-and-why-you-should-care">modern data stack</a>. I&#8217;m talking about a language for modifying data queries. While it doesn&#8217;t quite belong in the same class of revolutions as the above list, it is one of the most powerful enablers of <a href="https://www.thoughtspot.com/data-trends/analytics/self-service-analytics">self-service analytics</a>.&nbsp;</p><h3>What is query modification?</h3><p>Before we get into what a query modification language is, let&#8217;s level set on what query modification is and why it matters.&nbsp; New data often leads to new questions. For example, if you look at weekly revenue and you see a trend that you did not expect, you may want to dive deeper into an anomalous point. Often the query to dig deeper is a derivative of the old query that was used to get the first piece of data. For example, in the case above all the metrics and filters remain the same, you just add one more filter and change the group-by clause to how you want to drill. Query modification is the act of modifying the original query to a new query to answer the next exploratory question.&nbsp;</p><h3>What is a query modification language?</h3><p>Imagine you or your visualization tool has written a SQL query to compute total revenue. This may look like this:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nNDt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nNDt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 424w, https://substackcdn.com/image/fetch/$s_!nNDt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 848w, https://substackcdn.com/image/fetch/$s_!nNDt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 1272w, https://substackcdn.com/image/fetch/$s_!nNDt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nNDt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png" width="1146" height="76" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:76,&quot;width&quot;:1146,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nNDt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 424w, https://substackcdn.com/image/fetch/$s_!nNDt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 848w, https://substackcdn.com/image/fetch/$s_!nNDt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 1272w, https://substackcdn.com/image/fetch/$s_!nNDt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7712274a-6a99-4fb5-b3cc-c470ce87d5fa_1146x76.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Now suppose you want to break this number down by product category. The new SQL you need is going to look like this:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ET6N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ET6N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 424w, https://substackcdn.com/image/fetch/$s_!ET6N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 848w, https://substackcdn.com/image/fetch/$s_!ET6N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 1272w, https://substackcdn.com/image/fetch/$s_!ET6N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ET6N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png" width="1456" height="350" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ET6N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 424w, https://substackcdn.com/image/fetch/$s_!ET6N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 848w, https://substackcdn.com/image/fetch/$s_!ET6N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 1272w, https://substackcdn.com/image/fetch/$s_!ET6N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6cb2089d-3722-4a72-b83b-84a853d782b5_1600x385.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We typically rewrite SQL queries to match our intent.&nbsp;</p><p>But imagine a system that could simply take the command &#8220;Drill down by product_category&#8221; and modify the first query into the second. You can probably think of 10 or so common ways of modifying a typical query. What if there was a language that allowed you to express all of them? Or even better, you were allowed to compose these commands arbitrarily to keep modifying the query? That is the idea of a query modification language.</p><p>A query modification language, or QML, provides a set of instructions that can modify a data query in specific ways.&nbsp;</p><p>In the rest of this blog, I won&#8217;t talk in terms of a QML for SQL, but a QML for ThoughtSpot&#8217;s search language but the same ideas can apply to any data query language.</p><p>For example, suppose you have a query that gives you &#8220;top 10 products that produced the most revenue in last year&#8221;. Now suppose you want to change the query to consider only the products sold in the USA. A QML will have a phrase like &#8220;Add Filter Country = USA&#8221;. This of course is one of the simplest examples. But once you consider all the different ways in which a user may often want to change a query, it becomes a fairly rich set of operations. A common set of operations we see users doing are drilling down into a number, showing specific rows that make up a specific aggregate number on a <a href="https://www.thoughtspot.com/data-trends/data-visualization/what-is-data-visualization">data visualization</a>, excluding specific values (for example null), and changing the sorting order of a query or sorting direction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_rwX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_rwX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 424w, https://substackcdn.com/image/fetch/$s_!_rwX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 848w, https://substackcdn.com/image/fetch/$s_!_rwX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!_rwX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_rwX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_rwX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 424w, https://substackcdn.com/image/fetch/$s_!_rwX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 848w, https://substackcdn.com/image/fetch/$s_!_rwX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!_rwX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F671f7f79-fc95-442d-9120-5f4020846408_1600x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>In the search above, we have filtered our simple online spend search results for region equals California.</em></p><h3>What are the advantages of a query modification language?</h3><p>The advantage of having a query language coupled with a query modification language is that it becomes a way to do unconstrained exploration. From any question, you can ask a related or follow-up question. And when that inevitably sparks another question, you can keep exploring until you actually get to the insights you are looking for. This entire cycle happens nearly instantaneously, with only seconds between successive queries.&nbsp;</p><p>This is the fundamental reason why we say <a href="https://go.thoughtspot.com/e-book-dashboards-are-dead.html">Dashboards are Dead</a>, and modern businesses run on Liveboards. Dashboards serve you answers to questions that were premeditated months ago. There&#8217;s little freedom to ask the next question. Liveboards, on the other hand, get you started in the right place but don&#8217;t box you in. The entire neighborhood, including any and all exploratory paths, are accessible to the end-user.&nbsp;</p><p>We have seen the majority of data organizations are stuck in a paradigm where every time new requirements show up, the data teams get busy building new data pipelines and new dashboards - after the user has submitted a JIRA ticket. Too often, the requirements have already become dated before these dashboards are ever delivered. For a Fortune 500 company, it is not unusual to be maintaining tens of thousands of dashboards without even knowing which small fraction actually delivers value. As some data leaders have correctly identified, dashboard producers care more about the dashboards themselves than the people who consume them.</p><p>Liveboards introduce a new paradigm for building analytics that drastically cuts down on the busy work for data teams while giving business people what they need from their data.</p><h3>Discovery of QML</h3><p>Our initial vision for ThoughtSpot was to build a product that empowered anyone, regardless of their technical skills, to ask data questions in the vocabulary they understood and get the correct answer. We originally did not think about users modifying and refining their queries, other than just editing their queries in a manner that someone would edit their Google search. We were very focused on creating an experience that looked and felt like searching on Google, yet behind the scenes, was generating SQL queries and visualizing data.&nbsp;</p><p>However, we very quickly realized that in real life, data questions are fairly complex and hard to express precisely in natural language. There are typically too many ambiguities that can become tedious for someone asking the questions (although it does make a compelling demo!) When you are building an application that will be used to inform business-critical decisions, it&#8217;s critical to limit any potential misinterpretation of the user&#8217;s intent.</p><p>Armed with these insights, we changed our goal to design an interface that is as easy as using Google and does not require any training, yet works in a very deterministic way without making any probabilistic assumptions. This led to an approach where we built a platform that in effect is a factory for generating Domain-specific languages (DSL) based on your business language and entities relevant to your business (product names, customers, categories, etc.).&nbsp;</p><p>Looking back almost 10 years now, this turned out to be one of the most important and fundamental choices behind ThoughtSpot&#8217;s wide adoption.</p><h2>Building in interactions</h2><p>Two of the most fundamental interactions for a BI tool are filtering and drill-down. Filtering is seeing the same data visualization but restricting input records to an area of focus, such as a specific time period or specific region.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3mJs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3mJs!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 424w, https://substackcdn.com/image/fetch/$s_!3mJs!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 848w, https://substackcdn.com/image/fetch/$s_!3mJs!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 1272w, https://substackcdn.com/image/fetch/$s_!3mJs!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3mJs!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif" width="864" height="540" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:864,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3mJs!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 424w, https://substackcdn.com/image/fetch/$s_!3mJs!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 848w, https://substackcdn.com/image/fetch/$s_!3mJs!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 1272w, https://substackcdn.com/image/fetch/$s_!3mJs!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d0a5c4-cef2-468a-879d-60c7b34b5ca8_864x540.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The search above applies filters for both time (last 24 months) and place (North America, APAC).</em></p><p>Drill-down is expanding on a specific statistic. For example, if you are seeing &#8220;top 10 products by revenue&#8221; you may choose to drill down into the revenue coming from the top product by asking how revenue from the top product breaks down by region. Most BI products require the dashboard author to pre-program the columns on which a consumer of the dashboard is allowed to filter or the paths along which they are allowed to drill down.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pC0w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pC0w!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 424w, https://substackcdn.com/image/fetch/$s_!pC0w!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 848w, https://substackcdn.com/image/fetch/$s_!pC0w!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 1272w, https://substackcdn.com/image/fetch/$s_!pC0w!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pC0w!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif" width="864" height="540" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/cbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:864,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pC0w!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 424w, https://substackcdn.com/image/fetch/$s_!pC0w!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 848w, https://substackcdn.com/image/fetch/$s_!pC0w!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 1272w, https://substackcdn.com/image/fetch/$s_!pC0w!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbae57ae-03c9-4749-aa45-c8a2d3860527_864x540.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>In the interaction above, we drill down on our top selling product type this year to uncover how sales for that product type break down by region.</em></p><p>Of course, designing a tool for business users, we did not want to constrain the consumer of a dashboard to be limited to the filters and drill paths configured by the author of the dashboard. We also did not want any additional effort in enabling any new interactions.</p><h2>Keeping search and answer in sync</h2><p>One of the product design points we debated heavily was when someone types a query, and then they modify the query through a UI interaction, what happens to the search query?&nbsp;</p><p>There are three options:</p><ol><li><p>The user goes into a new experience where the model of interaction is no longer search</p></li><li><p>The query text remains the same (perhaps greyed out to indicate that it does not match the answer)</p></li><li><p>Query changes to match the new answer</p></li></ol><p>The first option would have been the easiest answer, but it would reduce the experience down to any other traditional BI product after the first UI interaction, so we never really considered it.</p><p>The advantage of #2 was that it is a much simpler system to build and for an end user they are always looking at a query they typed as opposed to something system generated that now they have to try and understand.</p><p>The advantage of #3 was twofold. First, the user can seamlessly go from modifying the query through UI interaction and then modifying the query in the search bar, and back. All the interactions would make sense. With approach #2, there is a danger that you may ask for &#8220;Revenue by state&#8221;, then drill down into California by category, and then add the filter &#8220;last year&#8221; and expect the result to be &#8220;revenue for California in the last one year by category&#8221; But actually it will be &#8220;Revenue by state for last one year&#8221;. This is avoided by #3.&nbsp;</p><p>The second advantage of this approach is that as the query changes through UI interaction, it&#8217;s a way for end users to discover different ways of asking the question in the search bar.</p><p>Between #2 and #3, there was a strong debate. Initially, we did not know how to modify the query in every context in a way that didn&#8217;t create too much cognitive overhead for the user (reduce the edit distance between old and new query as much as possible while getting to the semantics of the new query). Prototyping and building approach #3 took two years of iteration and refinement to create something that was consistent, usable, and always correct.</p><p>When a user did a drill down or changed the filter in the UI, we wanted to keep their search query in sync with the answer so they could subsequently continue modifying the search query.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8z9I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8z9I!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 424w, https://substackcdn.com/image/fetch/$s_!8z9I!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 848w, https://substackcdn.com/image/fetch/$s_!8z9I!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 1272w, https://substackcdn.com/image/fetch/$s_!8z9I!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8z9I!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif" width="864" height="540" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:864,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8z9I!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 424w, https://substackcdn.com/image/fetch/$s_!8z9I!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 848w, https://substackcdn.com/image/fetch/$s_!8z9I!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 1272w, https://substackcdn.com/image/fetch/$s_!8z9I!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28eee5ae-485e-4167-bc44-afebc2c7bf0b_864x540.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>After the search above, we drill down twice. Each time, the search bar updates to represent the search behind the data we are visualizing.</em></p><h2>Enriching interactions</h2><p>What started out as enabling three interactions eventually expanded to a substantial list of query modifications.&nbsp;</p><p>Below is the subset of the most important interactions we added to our interaction model:&nbsp;</p><ul><li><p>Adding a filtering widget and modifying filter values</p></li><li><p>Drilldown in any direction</p></li><li><p>Exclude/include a certain data point from the data</p></li><li><p>Comparing metrics period over period</p></li><li><p>Comparing different cohorts</p></li><li><p>Show the most granular data that makes up a specific number on the visualization</p></li><li><p>Add/remove/edit sorting columns and sort direction (ascending vs descending)</p></li><li><p>Adding/removing a having filter (post-aggregation filter)</p></li><li><p>Change time bucket granularity (monthly -&gt; daily) for date-time columns</p></li></ul><p>In building these interactions, it became clear a lot of the query change intent could be described as a combination of more primitive operations.&nbsp;</p><ol><li><p>For example, drill-down means you drop all the other measures in the query other than the measure of interest</p></li><li><p>Add filters that represent the specific data point you want to drill down on&nbsp;</p></li><li><p>Remove all existing grouping in the query&nbsp;</p></li><li><p>Add grouping for the column on which you want to drill down&nbsp;</p></li></ol><p>This led to a mechanism of combining a sequence of primitive operations into a macro query modification operation. This allows for much more complex operations, such as comparing two data points on a visualization to see which segments changed the most contributing to the overall change.</p><h2>Birth of a query modification language</h2><p>Once we had built a large subset of the above interactions, we realized that what we were building was a true companion language to our query language, which can be called a query modification language, the first of its kind. This language became the foundation of many powerful capabilities.</p><p>Note that the QML described here complements our own ThoughtSpot Modeling Language (<a href="https://datamonkeysite.com/2022/03/17/first-look-at-thoughtspot-modeling-language-tml/">TML</a>) but is a very different kind of language in the sense that users are never exposed to a programming language (other than when they invoke APIs) and it is exposed purely as a visual interaction language.</p><h2>Putting it all together in Liveboards</h2><p>The query modification language is the foundation of many of the powerful capabilities that make Liveboards so different from dashboards. The QML sits outside of the search bar and gives users two ways of interacting with the query. Many times QML can be a way that someone can get to a complex query by breaking the steps into fairly simple modifications.&nbsp;</p><p>Most dashboards are designed based on requirements given by business users and are mostly static in their functionality based on the requirements known at the time of authoring. In contrast, Liveboards use the visualizations only as a starting point for data exploration. When a business user looks at the metrics and KPIs on a Liveboard, they invariably generate new questions. The answer to those questions generates new questions.&nbsp;</p><p>Having a system based on this QML means that the business user can get to all those questions within minutes instead of waiting on the data team for every new question.</p><p><em>While dashboards typically provide users a static of their data, Liveboards are simply a starting point. In the interaction above, we dive into Explore mode, select a filter recommended by ThoughtSpot&#8217;s AI, and then drill down one more level to get to an entirely new answer.</em></p><p>As a business user with all the context in your brain, you can get to the right question to ask your data to truly find insights and add value. Many times you arrive, at this question by asking a lot of questions.</p><p>With QML as the foundation layer, Liveboards are the perfect tool to answer any data question.</p><p>PS: If you want to see this in action with a real user, this Youtube video has our CFO describing how he uses Liveboards 20 minutes into the <a href="https://www.youtube.com/watch?v=mEzlY01ayrI&amp;t=1078s">video</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/watch?v=mEzlY01ayrI&amp;t=1078s" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fhQf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 424w, https://substackcdn.com/image/fetch/$s_!fhQf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 848w, https://substackcdn.com/image/fetch/$s_!fhQf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 1272w, https://substackcdn.com/image/fetch/$s_!fhQf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fhQf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png" width="1456" height="758" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2313391,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.youtube.com/watch?v=mEzlY01ayrI&amp;t=1078s&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fhQf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 424w, https://substackcdn.com/image/fetch/$s_!fhQf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 848w, https://substackcdn.com/image/fetch/$s_!fhQf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 1272w, https://substackcdn.com/image/fetch/$s_!fhQf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F83b9dbdc-fd4f-40e7-b294-e512b1795030_3018x1572.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[AI is not coming for analyst jobs anytime soon]]></title><description><![CDATA[That is, if you can rise above SQL jockey]]></description><link>https://amit.thoughtspot.com/p/ai-is-not-coming-for-analyst-jobs</link><guid isPermaLink="false">https://amit.thoughtspot.com/p/ai-is-not-coming-for-analyst-jobs</guid><dc:creator><![CDATA[Amit Prakash]]></dc:creator><pubDate>Tue, 22 Mar 2022 17:25:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!k0KD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k0KD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k0KD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 424w, https://substackcdn.com/image/fetch/$s_!k0KD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 848w, https://substackcdn.com/image/fetch/$s_!k0KD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 1272w, https://substackcdn.com/image/fetch/$s_!k0KD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k0KD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png" width="1456" height="763" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/eccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:763,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:468288,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k0KD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 424w, https://substackcdn.com/image/fetch/$s_!k0KD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 848w, https://substackcdn.com/image/fetch/$s_!k0KD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 1272w, https://substackcdn.com/image/fetch/$s_!k0KD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Feccf2018-e12a-49c8-9f65-de1d28050ad3_4992x2615.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>This title has the potential to offend two very different groups. Before you take out the pitchfork, let me clarify two things:</p><ol><li><p><strong>For SQL enthusiasts</strong>: Our collective SQL skills are still going to be very valuable for at least the next decade.</p></li><li><p><strong>For AI enthusiasts</strong>: Advances in deep learning and in particular the recent larger transformer models (e.g. <a href="https://en.wikipedia.org/wiki/GPT-3">GPT-3</a>, <a href="https://github.com/google-research/text-to-text-transfer-transformer">T5</a> ) will have a dramatic impact on the industry, just not in the ways most people think.</p></li></ol><p>At ThoughtSpot, we have spent the last decade building an AI-powered analytics engine that sits on top of SQL systems. We have also built one of the fastest SQL processing engines (we built a distributed in-memory database for our customers that are not in the cloud yet) so I am both a huge AI enthusiast and a SQL enthusiast. I have also spent a lot of time figuring out what today&#8217;s AI can and cannot do in the data space. This blog is a distillation of learnings from that decade-long quest that should be helpful for the analyst community in understanding what may be coming in the near future and picking what you could prioritize in skill-building.</p><p>The summary is that today AI only works in very narrow and well-defined problem spaces. Also, it is very hard to use AI in domains where tolerance for failure is relatively low. As a result, in the data space, AI is primarily useful in either micro-decisions with low stakes (e.g. ranking search results, ranking ads, product recommendations) or with human-in-the-loop products<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lohF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lohF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 424w, https://substackcdn.com/image/fetch/$s_!lohF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 848w, https://substackcdn.com/image/fetch/$s_!lohF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 1272w, https://substackcdn.com/image/fetch/$s_!lohF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lohF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png" width="1456" height="483" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:483,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:490975,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lohF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 424w, https://substackcdn.com/image/fetch/$s_!lohF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 848w, https://substackcdn.com/image/fetch/$s_!lohF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 1272w, https://substackcdn.com/image/fetch/$s_!lohF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fa72ae5-2d26-47f2-bdd4-9feb9fbdb4de_2000x664.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Summary of tasks where AI struggles or Succeeds today</figcaption></figure></div><h2><strong>Where AI struggles</strong></h2><p>Here are some things that are going to be hard for AI to do in the near future:</p><h3><strong>Adding missing context to data</strong></h3><p>Data in itself is quite useless until you put it in the real-world context. For example, what do different tables and columns mean, how to interpret data, how different tables are supposed to join is mostly locked inside people&#8217;s brain. A column could be named stripe_revenue, but the fact that it represents monthly recurring revenue (MRR) may be missing. There can be 3 different columns called customer in different tables and they may represent a customer in different contexts, but which one to use again requires a lot of contexts. If you try and use AI to extract this kind of knowledge, your best bet would be to use NLP on documents, code comments, or conversations like in Slack channels. But the technology is no way mature enough to come even close to automating this kind of task, and any attempt to do so will produce unreliable information. In the foreseeable future, someone needs to own building usable, well-documented data models for analytics to happen.</p><h3><strong>Inferring underlying causal structure and processing behind the data</strong></h3><p>To do a good job of analyzing data, you have to understand the process that generates the data and how it relates. In the absence of knowledge of causal structures in the data we may see:</p><ol><li><p>The algorithms miss an important relationship that may be somewhat obvious to a human mind because they understand the causality.</p></li><li><p>The algorithms may pick up on correlations that have no causal link or significance to the end-user, causing noise that drowns other significant insights.</p></li><li><p>The algorithms may not know the significance of a pattern in the data because the relationship between the observable variables in the pattern and the variables that the end user really cares about may not be known. So it becomes hard to rank different insights based on their significance to the end-user and surface the important ones.&nbsp;</p></li></ol><p>For example, ad impressions and Google searches may cause people to visit your website. Visitors on the website navigate from page to page and may transact. These transactions are dependent on users liking what they see and inventory being present. Without understanding all these underlying processes, an AI algorithm won&#8217;t understand that you are losing potential revenue and wasting marketing dollars by advertising for a product that is out of stock.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3QDJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3QDJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 424w, https://substackcdn.com/image/fetch/$s_!3QDJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 848w, https://substackcdn.com/image/fetch/$s_!3QDJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 1272w, https://substackcdn.com/image/fetch/$s_!3QDJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3QDJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png" width="1456" height="576" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:576,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3QDJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 424w, https://substackcdn.com/image/fetch/$s_!3QDJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 848w, https://substackcdn.com/image/fetch/$s_!3QDJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 1272w, https://substackcdn.com/image/fetch/$s_!3QDJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac84c900-70f2-46e3-8acd-c1e84f123939_1600x633.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Breaking complex problems into smaller problems</strong></h3><p>As human beings, one of our most powerful problem-solving tools is our ability to break a complex problem into multiple simpler problems. Take software engineering for example. In some ways, all they are doing is dividing and conquering until they get to primitives that the underlying library or hardware can solve for. There is some evidence that AI can do this in controlled settings. The most famous example in a widely-used product is perhaps the <a href="https://support.microsoft.com/en-us/office/save-time-with-flash-fill-9159216a-75a0-4c11-82e6-8eca29cb3b89">Excel Flash-fill</a> feature, but it is limited to fairly simple functions. Recently DeepMind caught many people by surprise with <a href="https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode">AlphaCode</a> which performed better than roughly 45% of human participants in a coding competition. While this is impressive, it still works in a very narrow domain of problems with lots of training data on those problems.&nbsp;</p><p>In the context of analytics, breaking a problem into pieces may mean breaking cohort analysis into sub-queries, required sub-queries, or first transforming data into an easy-to-analyze data model and then doing the analysis. Even if we look at the most state-of-the-art AI, we won&#8217;t be able to automate such things for several years.&nbsp;</p><h3><strong>Adding context to data questions</strong></h3><p>This is an area where I have spent a lot of time exploring. I&#8217;ll give a few motivating examples first:</p><ol><li><p>We were talking to an airline and they have two important metrics: A0 = &#8220;Average arrival delay for a flight segment&#8221; and D0 = &#8220;Average departure delay for a flight segment.&#8221; When someone asks what A0 for DFW means, it means A0 where Arrival_Airport = DFW. Alternately when someone asks What is D0 for DFW, it means D0 where Departure_airport = DFW. This context is very specific to that customer and you cannot learn this from data available in the public domain.</p></li><li><p>One customer that represents a large fraction of global corporate travel asked, How many customers are in New York today? This means that their departure date was before today, arrival date was after today, and New York was the destination of their travel. Their data model contains &#8220;New York&#8221; in close to 20 different columns.</p></li><li><p>In a database of movies, someone asks what is the longest movie ever? To answer the question, you need to understand that longest in this question means duration. This is something that a model that has a general understanding of the world should be able to figure out. Below is a pretty reasonable answer from GPT-3. GPT-3 is acting more like a search engine here and extracting the right phrases from its training but with some effort, it can be inferred that &#8220;longest&#8221; means duration.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4dl1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4dl1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 424w, https://substackcdn.com/image/fetch/$s_!4dl1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 848w, https://substackcdn.com/image/fetch/$s_!4dl1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 1272w, https://substackcdn.com/image/fetch/$s_!4dl1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4dl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png" width="1286" height="322" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:322,&quot;width&quot;:1286,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4dl1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 424w, https://substackcdn.com/image/fetch/$s_!4dl1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 848w, https://substackcdn.com/image/fetch/$s_!4dl1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 1272w, https://substackcdn.com/image/fetch/$s_!4dl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F574e6ada-0b40-4a41-93e3-17d6e5ca0b95_1286x322.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ol><p>Our experience with this class of problems is that most can be solved with &gt;90% accuracy using ML. But if you try to go from arbitrary natural language to SQL directly, it often requires getting multiple disambiguation problems right, and it is hard to do it today with greater than 80% accuracy. In an analytics product, this kind of accuracy is not acceptable. This is why at ThoughtSpot our search engine avoids any probabilistic inference when interpreting a query.</p><h2><strong>Where AI succeeds</strong></h2><p>Now that we have talked about all the difficulties, let&#8217;s cover how AI, when used with the right combination of systems and UX, is a really powerful tool in the data space. It is already having a transformative impact on many data teams.</p><h3><strong>Making low stakes repeated decisions</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1l7i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1l7i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 424w, https://substackcdn.com/image/fetch/$s_!1l7i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 848w, https://substackcdn.com/image/fetch/$s_!1l7i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 1272w, https://substackcdn.com/image/fetch/$s_!1l7i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1l7i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png" width="536" height="633.5202156334232" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:877,&quot;width&quot;:742,&quot;resizeWidth&quot;:536,&quot;bytes&quot;:63682,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1l7i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 424w, https://substackcdn.com/image/fetch/$s_!1l7i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 848w, https://substackcdn.com/image/fetch/$s_!1l7i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 1272w, https://substackcdn.com/image/fetch/$s_!1l7i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff00af21b-4c02-4996-b3e7-d08144d8db36_742x877.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If each individual decision does not have a large impact and needs to be done millions of times, it is a perfect candidate for AI. This is where most of the enterprise ML sits today whether it's deciding what ads to show, what products to recommend, or which transactions to flag as potential fraud. </p><p>At large enterprises, this work cannot be done with a human in the loop.&nbsp; Instead, it was traditionally done with heuristics, and machine learning models can definitely improve upon these decisions. With enough scale, this can have a tremendous impact. During my five years at Google, my small team of ML engineers contributed close to a billion dollars in incremental revenue through feature engineering and ML improvements. While a lot of this work was generating hypotheses and improving algorithms, a big part of the work was analyzing. Every hypothesis about how the predictions could be improved could be quickly validated or invalidated by running analytics on the prediction errors. In that sense, I feel it is great for analysts to expand their understanding to at least the basics of ML. Because a good analyst can be an extremely valuable part of an ML team.</p><h3><strong>Democratizing the wisdom of the crowd</strong></h3><p>Have you ever wondered what makes Google so smart that when you search for <a href="https://www.google.com/search?q=pain+in+the+bottom+of+my+foot">pain in the bottom of my foot</a> it gives you results about <a href="https://www.mayoclinic.org/diseases-conditions/plantar-fasciitis/symptoms-causes/syc-20354846">plantar fasciitis</a>? There are many things at play here, but the primary source of this intelligence is us, the users. Some users somewhere queried &#8220;pain in the bottom of my foot&#8221; and followed it up in the session by a query on &#8220;plantar fasciitis&#8221; and then clicked on the result. This allows Google to learn the association and help the rest of the billions of users when they need the right answer.</p><p>The same idea works for analytics as well. When you learn from users and use that to give personalized recommendations to each user, it dramatically improves their experience and reduces the level of difficulty for them to get the right answer. There may be three different definitions of revenue in your data model, but usually, there is one that gets the most use by people in similar roles as you. When you ask for &#8220;Closed bookings this quarter&#8221; Usually This quarter maps to the &#8220;close date&#8221; column, while when you ask for the &#8220;pipeline created this quarter&#8221;, quarter maps to the &#8220;creation date&#8221; column. These kinds of recommendations in an auto-completion setting become fairly easy to make if you have a Machine Learned model helping users.</p><h3><strong>Exploring a large space looking for patterns and anomalies</strong></h3><p>Suppose, at the end of the month your leadership team is doing a business review and someone says that compared to last year we made 10% less from electronics sales this year. Why would that be? All of a sudden a lot of eyebrows are raised, hypothesis after hypothesis flying in the room. Maybe it's the pandemic and people facing hardships are not buying as much as before. If that is the case, we should see some zipcodes drop a lot more than others. Maybe it is that people are not buying more high-end gear as much as they used to. If that is the case, some price buckets should have much lower revenue than others. After a lot of stress, and a lot of calls to the analytics team, you figure out that it was a lack of inventory for specific products supplied by a specific supplier. This doesn&#8217;t need to be a manual process. After the right data model has been built by an analyst and the ML algorithms have learned enough from the history of users asking questions, AI algorithms can do a much better job of searching through hundreds of auto-generated hypotheses looking for a root cause.</p><p>This kind of automation works not just for monitoring and root cause analysis, but for all kinds of interesting insights. I have seen manufacturing companies save millions of dollars by finding price discrepancies between suppliers, banks save millions of dollars by letting automated algorithms search for lost claims guided by machine-learned systems. One of the most amusing examples of this was that one of our customers was able to spot that sales for fidget spinners were growing fast way before most people realized that it was a trend using automated insights back in 2017.</p><h3><strong>Language modeling over domains with a large amount of data</strong></h3><p>Large-scale language models are a really powerful tool. The most well-known model in this class is <a href="https://en.wikipedia.org/wiki/GPT-3">GPT-3</a>, but every few weeks we do significant research advancing the state of the art in this space. Building analytics products out of these models is still too early. Some obvious uses are 1. Auto-completion for a SQL/code editor, 2. natural language to code translation, 3. natural language generation to describe insights in the data 4. Conversational data apps to reduce UX complexity. The biggest barrier I see in this space is that if there is any missing context in the input, then it is hard for language models to fill it. Also, if you are trying to model a language with little training data, you can try transfer learning and it can do non-trivial things but building a usable product here will still require either a lot of hand-crafted systems or a little more maturity in language modeling.&nbsp;</p><h3><strong>Progressive disclosure of complexity</strong></h3><p>Even though AI today is imperfect, one of the best ways to deploy it in products is to support it with UX. One of the key design principles for us here at ThoughtSpot has been <em>less input more output </em>or LIMO. This means we want the user to do as little work as possible while getting the most value.&nbsp;</p><p>Analyzing data, creating visualizations and building data models is all input-intensive work. We try to eliminate as much input (and intellectual burden) as possible by predicting at least part of the input and making an explicit choice for the user. If we get it right, the user can simply move on. If we don&#8217;t get it right, the user can edit those decisions. This is one of the big reasons why ThoughtSpot allows thousands of non-technical business users in hundreds of the world's largest enterprises to be more data-driven with every decision.</p><h2><strong>The impact of AI on a modern data analyst</strong></h2><p>Traditionally, the analyst job has consisted of preparing and analyzing data to answer relevant business questions and communicating those insights back to stakeholders so they can take action. For the first part, a lot of automation is making analysts 10X more powerful. The intellectual parts are not going away, but the elimination of rote tasks will enable analysts to do much more high-value work. It is also creating more room for the analyst to focus on impact and communication. In my experience, this often leads to faster promotion for analysts. As an analogy, we are not deploying self-driving cars which would eliminate cab driver jobs. We are upgrading horse carriages to automobiles, which means a lot less dealing with manure and going much faster to longer distances.</p><p>Based on what I see in the near future, here is my advice to anyone in the analyst role:</p><ol><li><p><strong>Say no to repetitive data pull requests</strong>: Even though your company may need it today, it is detrimental to your career. The time you spend on these requests is all time you are not spending learning modern data stack tools that you&#8217;ll need in the future. Occasional requests as exceptions are okay, but if your employer is not willing to invest in the right toolset it&#8217;s a red flag for your career growth.</p></li><li><p><strong>Be the founder of your analytics community: </strong>Try and include as many people in analytical decision-making as possible. Educate people on how to use the data models you have built. Educate your organization on data models, interpreting data, and self-service analytics. Educate people on how to make better data-driven decisions. The bigger your community, the more impact you will create. As dbt&#8217;s Erica Louie puts it, <a href="https://www.youtube.com/watch?v=7OYGWM3Bwhw">scaling knowledge &gt; scaling bodies</a>. Things that I have seen work well in this regard are:</p><ul><li><p>Holding regular office hours</p></li><li><p>Maintaining well documented and searchable data models</p></li><li><p>Creating tutorials</p></li><li><p>Hosting internal or external meetups for non-data teams to show off their expertise and share problems&nbsp;&nbsp;</p></li></ul></li><li><p><strong>Do more engineering</strong>: Learn to automate as many things in your job as you can. If you like, learn python or some other programming tool. You can also consider learning low-code tools for automation and powerful abstractions on top of SQL such as TML or LookML for analytics, or dbt for data pipelines.</p></li><li><p><strong>Become the product manager for analytics</strong>: Most of us engineers pride ourselves on being great problem solvers. What I have learned over the years is that there is a lot more leverage in asking which problems are worth solving and what the end-user actually needs. In the analytics domain, this largely means three things:&nbsp;</p><ul><li><p><strong>Ruthless prioritization</strong>: Prioritize work that will generate long-term value, not the urgent request coming from someone with a title. This requires a mature leadership for data teams.</p></li><li><p><strong>User empathy</strong>: The end-user is not always right. If you keep adding new things to your data model based on in-the-moment requests your data model will become completely unusable. Instead, it&#8217;s better to understand user needs and synthesize them into a coherent product definition that is usable.</p></li><li><p><strong>Communication</strong>: Being good at communicating complex ideas, influencing opinions, and building consensus are some of the most valuable skills in most careers but are especially important in the analyst role for impact.</p></li></ul></li><li><p><strong>Learn the basics of machine learning</strong>: The future of data will contain more and more machine learning and it's a useful adjacent skill. Additionally, a big part of improving machine learning models is analytics. If you have a business-critical ML task, you need to be working on potentially hundreds of iterative improvements to the model. This means you need to analyze the error, build a hypothesis on what are the missing features, test those hypotheses with some more analytics, train new ML models, and then analyze their results to establish if you have improved things in each iteration. Even if your primary job is not ML, your analytics skills can be extremely useful to an ML team.&nbsp;</p></li></ol><p><strong>Own the outcome</strong>: There has been some debate about whether data teams should be thought of as supporting teams or not. I am a big believer in data teams owning outcomes instead of just having a supporting role for the execution team. For example, your OKR could be helping the Customer Success team reduce churn by 50% or help reduce wasted AWS resources by 30%. This is much more impactful than an OKR like delivering self-service analytics to the Customer Success team. But it requires a level of alignment with the operational teams. It also requires data teams to be able to influence the behavior with data. Accountability without authority can be challenging, however, product managers will tell you that this has always been a part of their job. This is one of the best ways to get yourself a seat at the executive table and put yourself on a fast growth trajectory.</p><p>The role of data professionals isn&#8217;t going anywhere. Data has never been more important to businesses than it is today &#8211; and this importance is only growing in the years ahead. What will change, however, are the skills analysts need to help their organizations build and thrive</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://medium.com/authority-magazine/the-great-resignation-the-future-of-work-prudentials-kjersten-moody-on-how-employers-and-809bd32c3503">A recent study from the World Economic Forum predicts that the time spent on tasks at work by humans and machines will be equal as soon as 2025</a>.</p></div></div>]]></content:encoded></item><item><title><![CDATA[How to design your data stack for curiosity]]></title><description><![CDATA[&#8220;I have no particular talent, I am just passionately curious&#8221; &#8212; Albert Einstein]]></description><link>https://amit.thoughtspot.com/p/how-to-design-your-data-stack-for</link><guid isPermaLink="false">https://amit.thoughtspot.com/p/how-to-design-your-data-stack-for</guid><dc:creator><![CDATA[Amit Prakash]]></dc:creator><pubDate>Tue, 08 Feb 2022 19:11:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pkHe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pkHe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pkHe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 424w, https://substackcdn.com/image/fetch/$s_!pkHe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 848w, https://substackcdn.com/image/fetch/$s_!pkHe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 1272w, https://substackcdn.com/image/fetch/$s_!pkHe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pkHe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png" width="1456" height="770" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:770,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pkHe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 424w, https://substackcdn.com/image/fetch/$s_!pkHe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 848w, https://substackcdn.com/image/fetch/$s_!pkHe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 1272w, https://substackcdn.com/image/fetch/$s_!pkHe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6575f2-ca53-4269-9167-9137c850c39d_1600x846.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>When was the last time you thought about optimizing your analytics toolset for curiosity? Yet what is the value of all the data and analytics in the world if not paired with human curiosity?</p><p>If you ask most organizations what is important to them in terms of the data stack, usually it is things like moving to the cloud, getting clean data, building dashboards and reports to serve the execs running the business, security, and governance. Cultivating curiosity hardly ever makes it to the top of the list. If you take a moment and think about why that is the case, it is usually because no one considers it their job. Clearly, a lot of leaders consider data strategic and one of the most important tools for their business, but why does that not elevate curiosity as a concern? My guess is that curiosity is just assumed to be present, similar to qualities like motivation and ambition.&nbsp;</p><p>To some extent, that is true. All of us are born curious. A <a href="https://www.youtube.com/watch?v=xZJwMYeE9Ak">five-year-old asks around 300 questions a day</a>. But then something happens, and we start asking fewer and fewer questions to the extent that most grown-ups rarely ask questions that could surprise them. Most knowledge workers and leaders aspire to be curious and see themselves as someone open to new data, but their curiosity is poorly supported both culturally and technologically.</p><p>Everything that has ever been invented or discovered or improved almost always originated from a curious mind. And while curiosity is hard to quantify on its own, there are many examples of the relationship between curiosity and better business outcomes.</p><ol><li><p>When St. Jude children's research hospital, one of the largest charitable organizations in the US for children's health, allowed their fundraising team to explore their curiosity around their <a href="https://thoughtspot.wistia.com/medias/593ycxxtoy?wtime=19m53s">fundraising data</a>, they realized that events within two miles of a Whole Foods store tend to produce much better results than anywhere else.</p></li><li><p>When one of the leading companies in the Insurance and Banking industry in Australia, adopted self-service analytics for their business users, people started asking a lot of questions they would have never asked. One of these questions around insurance claims data led to the discovery of an anomaly that saved the company 30 million dollars within months of deployment.</p></li><li><p>When one of the investment funds in Canada enabled their traders to ask their own data questions, within an hour of training one of the traders found they were being overcharged for securities lending by another bank to the tune of millions of dollars.</p></li><li><p>When one of the largest technology companies in the US enabled their accountants to ask data questions at will, within a few days they discovered misuse of travel policy that was leading to loss of millions of dollars.&nbsp;</p></li></ol><p>While these are just a few anecdotes, I hope you can see how enabling people to be curious about data can result in substantially better business outcomes.&nbsp;&nbsp;</p><p>So, what can you do to cultivate curiosity in the context of data and analytics? To quote James Clear, the famous author of &#8220;Atomic Habits&#8221;, albeit a bit out of context,&nbsp; "<strong>You do not rise to the level of your goals. You fall to the level of your systems.</strong>" That is to say, the biggest lever you have in terms of cultivating curiosity is deploying the right data stack. So, here here&#8217;s how you can ensure your data stack is built to cultivate curiosity:</p><h3><strong>Reward curiosity instead of punishing it</strong></h3><p> Imagine you are managing the inventory for a very large retail chain in the middle of a pandemic. One week your customers are asking for exercise bikes in hordes, the next they are interested in stockpiling toilet paper, and the week after all the rage is sanitizers. You are curious which product is rising in demand now so you can gather as much inventory as possible for the following week. For most folks, the first step to answering this question is consulting a dashboard. But perhaps you find that to manage the scale of data, your data team has aggregated transactions to daily and broad product category level instead of SKU level or sub-category level. That means if you want granular data, you have to submit a data pull request, which sits in the queue of other requests, and there is a wait time of two weeks or more to get the answer. You are not even sure if your request will yield meaningful insight, so you feel bad piling on one more request when everyone is frantically trying to get their data.&nbsp;</p><p>Compare that to an experience where you can drill down into whatever slice of data you need and ask questions in a Google-like interface and you are rewarded with instant insights. Which experience will encourage you to be more curious? Which one will bring better results? For Canadian Tire, <a href="https://www.wsj.com/articles/artificial-intelligence-helps-canadian-tire-navigate-pandemic-11597656601">the choice was clear</a>.</p><h4><em>&#8220;During much of the second quarter, Canadian Tire Retail was forced to temporarily close or operate in a limited capacity about 40% of its stores. Still, Canadian Tire Retail&#8217;s sales were up 20% from a year earlier&#8221;</em></h4><h3><strong>Grant people permission to be curious</strong> </h3><p>Another reason people often curb their curiosity is because they don&#8217;t feel empowered to explore it. &#8220;Curiosity killed the cat&#8221; is often the refrain from frustrated adults to a curious child who has exhausted the adult&#8217;s patience or explored something dangerous. While well-meaning, over time children feel disempowered and stop asking questions. The corporate equivalent of that is, &#8220;if I allow everyone to ask their questions, they will reach wrong or biased conclusions&#8221;, or &#8220;when everyone has their own version of data, no one can trust the data.&#8221;&nbsp;</p><p>These concerns were absolutely valid in the older generation analytics stack when everyone dumped their own data in excel with different filters and different metric definitions. This even gave rise to the term &#8220;excel hell.&#8221; When your analytics tools couldn&#8217;t handle more than a few million rows, you had to work with Tableau Extracts and PowerBI reports, and it was absolutely important that these things were tightly governed.</p><p>But with the modern data stack, you can leave data in granular form in a cloud data warehouse, define your metrics in one central place<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, and let everyone do their own analysis without worrying too much about having different versions of the truth. Both granular security and centralized definitions without locking down the grain of aggregation is very much possible with the right stack.&nbsp;&nbsp;</p><h3><strong>Invoke curiosity through adjacency</strong></h3><p>We have all gone from watching one clip on YouTube to looking at a cat playing a piano an hour later. Less often, we have also gone through the Wikipedia rabbit hole where you go to read the page of your favorite author (<a href="https://en.wikipedia.org/wiki/P._G._Wodehouse">P. G. Wodehouse</a>) and an hour later you are reading about different kinds of coffees (If you must know, this is the path I took<a href="https://en.wikipedia.org/wiki/The_Code_of_the_Woosters"> Code of Wooseters</a> -&gt;<a href="https://en.wikipedia.org/wiki/Creamer_(vessel)"> Cow Creamer</a> -&gt;<a href="https://en.wikipedia.org/wiki/Espresso#Espresso-based_drinks"> Espresso based drinks</a>). When you are looking at one concept, you are naturally curious about all the adjacent concepts. If the adjacent piece of information is easily accessible in context, it is natural for people to let their curiosity go wild. There are many ways of incorporating this into your analytics stack. Often people hardcode links to related analysis in their data visualization. In ThoughtSpot, you can drill in any direction possible given the data model. Also, the explore feature uses machine learning to collect the next best data question in that context, personalized for the user and made accessible with one click.</p><h3><strong>Make curiosity a social phenomenon</strong></h3><p>We are social beings (some more than others.) When your friends or people you respect are looking at something cool, it is hard for you to not be curious about it as well. This is why we saw significant engagement with analytics when we started showing people who else has looked at the answer to a particular question or Liveboard. Having a social feed of activity and commentary is another idea that we have often played around with to evoke curiosity.</p><h3><strong>Surface the knowledge gap</strong></h3><p>When our curiosity is piqued by teasing something we don&#8217;t know, it usually has immense power over us. That is why it is so hard to resist those damned notifications on the phone or why it is so easy to grab someone&#8217;s attention by starting a sentence with, &#8220;Did you know&#8230;&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MrW3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MrW3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 424w, https://substackcdn.com/image/fetch/$s_!MrW3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 848w, https://substackcdn.com/image/fetch/$s_!MrW3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 1272w, https://substackcdn.com/image/fetch/$s_!MrW3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MrW3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png" width="770" height="251" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/a6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:251,&quot;width&quot;:770,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MrW3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 424w, https://substackcdn.com/image/fetch/$s_!MrW3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 848w, https://substackcdn.com/image/fetch/$s_!MrW3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 1272w, https://substackcdn.com/image/fetch/$s_!MrW3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b89cbf-9328-4146-a8f3-b465eef1de8d_770x251.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At ThoughtSpot, we are working on a feature called automated business monitoring where a machine learning algorithm is constantly watching the metrics on your behalf and invites you in to dig deeper when one of the metrics behaves in unexpected ways along any of the dimensions you care about.</p><h2>Parting thoughts</h2><p>While these ideas largely put the onus on tools and organizations to change behaviors, ultimately it is up to each of us as individuals to keep our own curiosity alive. Staying curious means being fully present, willing to admit when you don&#8217;t know something (or when you&#8217;re wrong), and having discourse with people we may not agree with. It is not always easy but it is one of the most rewarding things both here and now and in the long term.</p><p>Cognitive neuroscientist, Matthias Gruber, in this<a href="https://www.youtube.com/watch?v=SmaTPPB-T_s"> talk</a>, explains that when they studied brains under fMRI after arousing someone&#8217;s curiosity by asking a trivia question, they found it looks very similar to someone anticipating a reward such as a treat. It makes perfect sense, because what is sweeter than the discovery of a new insight from the question you just asked.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This is very much possible in ThoughtSpot worksheets. This is also the reason why I wholeheartedly support the <a href="https://prakasha.substack.com/p/the-metrics-layer-has-growing-up">idea of a central Metric Store</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>In case you were curious about the image at the beginning, it is a concept image of the Curiosity Rover waking up on Mars.</p></div></div>]]></content:encoded></item><item><title><![CDATA[The metrics layer has growing up to do]]></title><description><![CDATA[Recently there has been a lot of excitement around the idea of a stand-alone metrics layer in the modern data stack.]]></description><link>https://amit.thoughtspot.com/p/the-metrics-layer-has-growing-up</link><guid isPermaLink="false">https://amit.thoughtspot.com/p/the-metrics-layer-has-growing-up</guid><dc:creator><![CDATA[Amit Prakash]]></dc:creator><pubDate>Thu, 13 Jan 2022 17:41:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RUFH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g7pP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g7pP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 424w, https://substackcdn.com/image/fetch/$s_!g7pP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 848w, https://substackcdn.com/image/fetch/$s_!g7pP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 1272w, https://substackcdn.com/image/fetch/$s_!g7pP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g7pP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png" width="568" height="322.12080536912754" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:169,&quot;width&quot;:298,&quot;resizeWidth&quot;:568,&quot;bytes&quot;:4993,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g7pP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 424w, https://substackcdn.com/image/fetch/$s_!g7pP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 848w, https://substackcdn.com/image/fetch/$s_!g7pP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 1272w, https://substackcdn.com/image/fetch/$s_!g7pP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6091d003-9249-4a1b-9a5b-63f15b96bde3_298x169.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p></p><p>Recently there has been a lot of excitement around the idea of a stand-alone metrics layer in the modern data stack.</p><p>Traditionally, metrics have been defined in the BI or analytics layer where various dashboards are used to look at business metrics like Revenue, Sales Pipeline, numbers of Claims, or User Activity. Given that most organizations end up with multiple BI/Analytics tools, the idea has a lot of merits. Instead of pulling data into excel sheets and expecting everyone to calculate metrics independently, why not define the metrics in one central place for all to refer to? In software engineering, it is a classic candidate for refactoring the repeated definition of the same logic and following the principle of &#8220;Do not Repeat Yourself&#8221; <a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself">DRY</a>.</p><p>However, refactorings are not always easy. If you don&#8217;t know the full scope of how your code is being used (or going to be used) you can end up with a lot of empty promises and entangled code. When you are refactoring for eliminating duplication, the trick is to make your abstractions clean enough that the common parts can be expressed in one place, but the parts that vary in different places can be supplied as additional details as needed. For design pattern enthusiasts, this is probably closest to the <a href="http://www.cs.unc.edu/~stotts/GOF/hires/pat5ifso.htm">Strategy Pattern</a>. The same can be said for the metrics layer.</p><p>Having deployed ThoughtSpot in hundreds of enterprises and startups, we have learned a lot about what different businesses need from their analytics solutions in different use cases. Whether banks reducing fraud, retailers optimizing inventory, manufacturers improving the supply chain, hospitals enhancing outcomes, or cancer researchers understanding their data, the metrics people care about in the real world are quite varied and require careful design.&nbsp;</p><h3>The six metrics classes&nbsp;</h3><ol><li><p><strong>Simple aggregations</strong>: These are things like Sum(Revenue), Average(Price), Count_Distict(Users). They are really easy to define and an isolated metric definition layer will do a good job of defining these.&nbsp;</p></li><li><p><strong>Aggregation with scalar functions</strong>: Similar to simple aggregations above, but with additional mathematical operators. For example, Profit may be defined as <strong>sum(Revenue) - Sum(Cost) - Sum (commissions). </strong>Alternatively, you may need some transformations at the row-level such as Sum(Revenue_in_local_currency * USD_Conversion_rate).</p></li><li><p><strong>Metrics that require joins</strong>: The simplest example can be a version of Revenue calculation where conversion rates change daily and you have another table that stores conversion rate in a dimension table. Now Revenue definition becomes <strong>Sum(Revenue_in_local_currency * conversion_rate(local_currency, transaction_date))</strong>. Here <strong>conversion_rate(local_currency, transaction_date) </strong>is a column from another table that needs to be brought in by joining on local_currency and transaction_date from the fact table. You could bypass the join requirement by defining a denormalized view, but views bring in other issues such as requiring all joins to be executed.</p></li><li><p><strong>Metrics with window functions: </strong>Things like Moving Averages, Cumulative Sum, or any kind of aggregation in data that represents a time-series or sequence of events that you need to aggregate on a window around current time falls in this group.</p></li><li><p><strong>Metrics with multiple aggregation levels</strong>: This is a special class of metrics that often represent ratios. For example, if you want to define the Market Share of a product in its category, you first have to sum up Revenue grouped at the Product level, then you want to sum up Revenue at the Product Category level, combine the results and then compute the ratio. In addition, if you want to observe Market Share across years then you need to include Year(transaction_date) in the grouping column both for numerator and denominator. So the grouping becomes dynamic. These metrics are handled differently in different BI products. For example, this would look like a <a href="https://docs.thoughtspot.com/software/latest/formulas-aggregation-group">Group Aggregate Formula</a> in ThoughtSpot, in Tableau, they are called <a href="https://help.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_lod_overview.htm">Level of Detail (LoD) Functions</a>.</p></li><li><p><strong>Multi-fact metrics</strong>: Sometimes, your metrics span multiple fact tables that may or may not have a direct relationship with each other. For example:</p><ol><li><p>Sales Fact and Bulk Purchase Fact for a retailer to calculate Profit Margin.</p></li><li><p>Services Provided, and Insurance Coverage for a hospital to calculate the profitability of different services.</p></li><li><p>Bank Statements, Credit Card transactions, and Credit History for a Bank to figure out default risk.</p></li></ol><p>Back in 2015 when we started looking at these kinds of metrics, to our surprise we found that they were not handled correctly in any of the BI tools we looked at (and I believe that is the case still). Most BI tools default simply to joining the tables and aggregating the rows of the joined table, which in the case of many-to-many join would cause the same numbers to be double-counted and give incorrect results. Getting these metrics requires being able to aggregate different fact tables to a granularity that joining the results makes sense and then post join, re-aggregate them. Legacy BI tools have two major issues here. They either couldn&#8217;t get to the level of granularity of data needed or required customer SQL, which is hard to maintain and limits the level of interactivity that can be built. Nevertheless, they do cover an important segment of metrics that a lot of businesses care about.</p></li></ol><h3>Where to define metrics</h3><p>As you can see, defining metrics is not always just a matter of defining a mathematical formula or stand-alone SQL fragment.</p><p>If you look at a typical analytical tool, you can view them as a three-part stack: logical modeling (Semantic Layer), query generation, and interactive data visualization and exploration.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RUFH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RUFH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 424w, https://substackcdn.com/image/fetch/$s_!RUFH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 848w, https://substackcdn.com/image/fetch/$s_!RUFH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 1272w, https://substackcdn.com/image/fetch/$s_!RUFH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RUFH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png" width="1100" height="802" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:802,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:126812,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RUFH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 424w, https://substackcdn.com/image/fetch/$s_!RUFH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 848w, https://substackcdn.com/image/fetch/$s_!RUFH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 1272w, https://substackcdn.com/image/fetch/$s_!RUFH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff891a593-7903-46cc-8a26-57ea8ecfa58c_1150x838.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The logical modeling layer is where you start assigning business context to raw data that exists in tables and columns. The purpose of defining a logical data model is so non-technical business users can express business questions in business terms, read charts and tables more easily, and interact with data visualizations.</p><p>For example, you may start at &#8220;Top 10 products by revenue&#8221; and then drill down into the revenue generated by the top products by country. Typically, the kinds of things that you specify in a logical data model are:</p><ol><li><p><strong>Business names</strong> for column (rev_txn_usd -&gt; Revenue)</p></li><li><p>Whether a column is a Metric (Measure) or Dimension (Attribute). For example, Revenue is a Metric, but Age or Customer Name is a Dimension.</p></li><li><p><strong>Joins</strong>: Exactly how to join tables (which pairs, join based on what join conditions, and whether it&#8217;s an inner join, outer join, or some other kind of join) and whether it represents a many-to-one, one-to-one, or one-to-many relationship.</p></li></ol><p></p><p>The query generation layer is where user intent is transformed into the appropriate SQL or equivalent to generate the data visualization or answer the user&#8217;s questions. If you think of BI as just a way of slicing and dicing metrics in a denormalized table, then it can be a pretty simple thing to do. But in reality, this component needs to cover a lot of ground and do a lot of heavy lifting. More on this later.</p><p>Finally, the interactive data visualization and exploration layer<strong> </strong>is where the business user will ultimately spend most of their time. Whether they&#8217;re looking at a dashboard, using drill-down or other exploration interaction models to ask deeper questions, or asking a completely new question through a search-like interface.</p><h3>Making space for a common metrics layer</h3><p>If we are going to refactor the stack to make space for a common metrics layer, there are a few choices to make about what part of this stack goes into the refactored component. We don&#8217;t want the top layer in there otherwise it will be just another BI tool competing with the rest. That leaves us with three choices:</p><ol><li><p>Encapsulate the metric definition</p></li><li><p>Encapsulate the entire semantic layer</p></li><li><p>Encapsulate both the semantic and query generation layers</p></li></ol><p>There have been many proposals around all three versions of a metrics layer, each with its own unique tradeoffs. Moving down the list from one to three, there is increased complexity both in implementation and deployment - but there is also potential for an increasingly more powerful combined stack.&nbsp;</p><h3><strong>Possibility #1: Encapsulate the metric definition</strong></h3><p>This is good for metrics that are simple aggregates with scalar functions (type 1 and type 2), but will not be able to handle anything else. In my experience, this is where most of the implementations of the metrics layer are (I have not looked at every implementation, but I&#8217;d love to be surprised).</p><h3><strong>Possibility #2: Encapsulate the entire semantic layer</strong></h3><p>This gets a lot more interesting. In this case, the metrics layer would not just be aware of metrics, but also, tables, joins, and dimensions. This allows you to have both complete data models and the ability to define expressions across multiple tables. You can also define metrics that may traverse multiple grains of aggregation. This would be a huge step forward in defining metadata in a common layer.&nbsp;</p><p>However, this requires having a standardized common modeling language that can be understood by different consumers that can encapsulate definitions of joins, metrics of all types above, and dimensions. It also leaves the query generation layer to be implemented independently in each of the consumers.</p><h3><strong>Possibility #3: Encapsulate both the semantic and query generation layers</strong></h3><p>In my opinion, this is the most powerful iteration of a metrics layer. Most BI tools today differ heavily in what type of metrics they can or cannot support. There are also subtle differences in how they generate queries for different types of metrics. In an ideal scenario, the metrics layer will encapsulate query generation as well. This however creates a new requirement: defining a new standard interface between the interactive visualization layer and the query generation layer.</p><p>It&#8217;s easy to think the query layer should be SQL, but I don&#8217;t think it&#8217;s up to the challenge. SQL is designed to be precise in how things are joined, in what order, and where the aggregation happens. What we need is a language that expresses a business user's intent, and combines the intent with predefined data models to generate queries where the same data model could generate very different queries with very different join structures, a different subset of tables, and different ways of aggregating measures.&nbsp;</p><h3>The future of the metrics layer done right</h3><p>I think encapsulating both the semantic and query generation layers is the real deal - and finally possible. That&#8217;s what gets me so excited about the future.&nbsp;</p><p>We like what dbt is doing here. Starting in dbt v1.0, you can define some types of metrics in your models alongside the rest of your existing transformations. Later this year, we may have an integration where ThoughtSpot can read metric definitions from dbt and integrate that into our modeling layer which will be a win for our joint customers. But this integration does not describe what I am talking about in this post. A rich, independent metrics layer that can encapsulate both logical modeling and query generation, with an open standard for consumption layer is yet to be designed and built.&nbsp;</p><p>If you are excited about this too, I&#8217;d love to hear from you! In fact, if you want to collaborate on an open language for defining a semantic layer and a query language that can capture query intent while decoupling the information already defined in the semantic layer, I would love to buy you coffee. Or at the very least have a stimulating conversation!</p><p>If we as an industry are going to refactor this stack, let&#8217;s do it right. We don&#8217;t get too many shots at asking everyone to keep changing how the stack works.</p>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is Amit&#8217;s Newsletter, a newsletter about A builder's occasional musings on data and tech industry.]]></description><link>https://amit.thoughtspot.com/p/coming-soon</link><guid isPermaLink="false">https://amit.thoughtspot.com/p/coming-soon</guid><dc:creator><![CDATA[Amit Prakash]]></dc:creator><pubDate>Thu, 13 Jan 2022 17:29:48 GMT</pubDate><content:encoded><![CDATA[<p><strong>This is Amit&#8217;s Newsletter</strong>, a newsletter about A builder's occasional musings on data and tech industry.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://amit.thoughtspot.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://amit.thoughtspot.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>