export const evolvingMultimodalData = `
<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type">
  </head>

  <body class="c11 doc-content">
    <h1 class="c15" id="h.fsui2g9hg1on">
      <span class="c5">Introduction</span>
    </h1>
    <p class="c4">
      <span>The data landscape has dramatically changed in the last two decades. Twenty years ago, a data scientist may have only interacted with standard structured databases such as PostgreSQL. But today, as companies race to leverage the growing capacities of AI models, data scientists and engineers juggle </span>
      <span class="c18">multiple data types at once&mdash;text, image, video</span>
      <span>, etc. Like jumping from two dimensions to three, the shift to multimodal data is simultaneously exciting and challenging. It&rsquo;s key to understand not only the changing landscape, but also how to get maximum value from multimodal data and choose the right tools.</span>
    </p>
    <h1 class="c15" id="h.m51k36i2skml">
      <span class="c16">How does your data evolve?</span>
    </h1>
    <p class="c4 c6">
      <span class="c10"></span>
    </p>
    <p class="c4">
      <span>Multimodal data is inherently complex. But as AI use cases explode, multimodal data practices are rapidly evolving in step. A few elements typically drive this evolution.</span>
    </p>
    <h2 class="c9" id="h.u16s3v3kd4bo">
      <span class="c3">Annotations</span>
    </h2>
    <p class="c4 c6">
      <span class="c0"></span>
    </p>
    <p class="c4">
      <span>Annotating multimodal data is key to accurately training models during supervised learning. Annotations already enhance data richness and complexity, </span>
      <span>but annotation processes can also change over time. In medicine, for instance, breakthrough discoveries could require teams to refine their annotation process, and new medical imaging techniques might demand more granular annotations to keep models up to date. </span>
    </p>
    <h2 class="c9" id="h.n42n1qosp7yw">
      <span class="c3">Embeddings</span>
    </h2>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">Embeddings encode different types of data in a shared vector space, making it easy to identify and represent relationships between those vectors. For example, embeddings constitute a core component of user-facing recommendation engines because they enable similarity searches. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">Embeddings can change over time for a couple of reasons. Changes in the real world can lead to changes in the underlying embedding spaces. For example, social media sentiment data may shift during significant world events. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">Changes can also come from within. Variations in a company&rsquo;s data collection cadence and methodology, updates to an AI model, or the introduction of new modalities can all cause an evolution in the underlying embeddings. </span>
    </p>
    <h2 class="c9" id="h.r4gr74l2jr7">
      <span class="c3">New classifications</span>
    </h2>
    <p class="c4">
      <span class="c1">As a company&rsquo;s AI models mature, those models can be used to extract (newer) insights from incoming data. Imagine a company that&rsquo;s trained a model to detect faces. With that newly-trained model, the company can create an entirely new classification set of facial emotions, which become its own data points. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <h2 class="c9" id="h.jz7sj4ujuvge">
      <span class="c3">Derived data</span>
    </h2>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">Companies can combine information from multiple modalities into richer, more useful representations&mdash;this is derived data. For example, a company may want to perform a sentiment analysis after collecting product reviews that include text and uploaded user images. By concatenating the two sets of embeddings, the company now has a derived dataset that combines information from both the text and images, which will be useful for training models that need to understand the relationship between the two.</span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span>Derived data can evolve for a number of reasons. In the example above, the company could start enabling users to upload videos in addition to text and images. Sometimes, in order to saturate training of large models on fast machines, companies might be forced to create copies of their data, in the formats required as input for these models, leading to additional </span>
      <span>provenance</span>
      <span class="c1">&nbsp;and data governance information.</span>
    </p>
    <h2 class="c9" id="h.2vwkl9lzl1lg">
      <span class="c3">Provenance information</span>
    </h2>
    <p class="c4 c6">
      <span class="c10"></span>
    </p>
    <p class="c4">
      <span class="c1">As AI models play an increasingly active role in real-life applications, capturing provenance information&mdash;the metadata that describes data&rsquo;s origin and history&mdash;is becoming critical. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">Say a healthcare provider relies on AI models with MRI image inputs for diagnoses. It&rsquo;s crucial that the provider can track the source of the data (machine, parameters used), immediate processing steps (noise reduction, motion correction), manual annotations, and transformation history (fusion with other modalities, like CT or PET scans) so that it can quickly address any mistakes from the model. </span>
    </p>
    <h2 class="c9" id="h.vsd0i6zh7kmo">
      <span class="c3">New use cases</span>
    </h2>
    <p class="c4">
      <span class="c1">One of the most satisfying aspects of working with AI models is getting them to succeed. Say an e-commerce company builds a recommendation engine to show its customers products with similar colors. Customers start clicking on and buying those related products. Success! Now the growth team wants to evolve the recommendation model to include products made of similar materials, so the engineering team must add further filters on the product metadata. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">Because AI models are limitless in their applications, it&rsquo;s almost guaranteed that a given company&rsquo;s set of use cases will evolve over time. </span>
    </p>
    <h1 class="c15" id="h.xhz80jj1qpt8">
      <span class="c16">Why is it challenging to manage evolving multimodal data?</span>
    </h1>
    <p class="imgContainer">
      <span style="overflow: hidden; display: inline-block; margin: 0px; border: none; width: 100%; height: auto; box-sizing: border-box;">
        <img alt="" src="https://aperturedata-public.s3.us-west-2.amazonaws.com/website_images/multimodal_data_evolving/brain.png" style="width: 50%; height: auto;" title="">
      </span>
    </p>
    <h2 class="c9" id="h.eaa2615js5kh">
      <span class="c3">Schema challenges with relational databases</span>
    </h2>
    <p class="c14">
      <span class="c7">
        <a class="c2" href="/blog/purpose-built-database" target="_blank">Traditional relational databases</a>,
            such as PostgreSQL or MySQL, have served engineers and data scientists well for decades. But when it
        comes to multimodal data, these databases fall short for one glaring reason: rigid relational schemas do
            not play well with the complex relationships that exist in multimodal data.</span>
    </p>
    <p class="c14">
      <span class="c1">For example, imagine an AI model that helps doctors recommend treatments based on a mixture of doctor, patient, treatment, and CT scan data. While one could theoretically structure four PostgreSQL datasets with interlinking foreign keys, many-to-many mappings will require additional reference tables. The liquid, evolving nature of multimodal data is at odds with rigid schema enforcement, so engineers do themselves a favor by scaling in systems that allow for flexible schemas over time.</span>
    </p>
    <h2 class="c9" id="h.ofr52594rwd5">
      <span class="c3">Datasets</span>
    </h2>
    <p class="c4">
      <span class="c1">A key component of training AI models is the specific data used, but managing versions of the same dataset can be tedious and costly. For example, say a recommendation model is trained on a dataset of sofa images. The company begins working with new vendors and gets a fresh dataset of sofa images, so the engineering team trains v2 of the model with the new data but notices that the model&rsquo;s recommendation quality deteriorates. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span>In cases like this, it&rsquo;s important to be able to track the changes made to the model and the actual datasets used. Unfortunately, many data teams revert to manually storing copies of the data. This quickly becomes not only an organizational nightmare &ndash; which dataset trained which model? - but also a pricey one, since multiple versions of similar datasets increase storage costs. </span>
    </p>
    <h2 class="c9" id="h.qv5idz4qxrv8">
      <span class="c3">Scalability</span>
    </h2>
    <p class="c4 c6">
      <span class="c19"></span>
    </p>
    <p class="c4">
      <span>As a company&rsquo;s</span>
      <span class="c1"> metadata and data grow, engineers and data scientists are forced to reckon with scalability. Not only must they consider raw storage capacity and cost (vertical scaling) but also how to distribute workload across multiple nodes (horizontal scaling). Without proper planning, companies can quickly find themselves paying too much for multiple databases, sacrificing the performance of their AI workflows, or both. Balancing these concerns with ease-of-setup is a challenge for every team.</span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <h2 class="c9" id="h.cif4ym4ia8j2">
      <span class="c3">Easily connecting processing pipelines with data updates</span>
    </h2>
    <p class="c4">
      <span class="c1">As a business grows, so does its data ingestion. Think of how an e-commerce company&rsquo;s product catalog constantly evolves, or how a travel service aggregator collects new reviews from its customers. As existing data pipelines grow and change, it&rsquo;s crucial to seamlessly connect these datasets with existing data infrastructure, enrich the data with metadata, and fold them into AI model workflows to glean useful insights.</span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">This is easier said than done: today, data and engineering teams find it challenging to update existing data schemas, process visual data, and easily label data to enable their workflows to keep pace with new ingestion.</span>
    </p>
    <h2 class="c9" id="h.gg1017qiuuxr">
      <span>Challenges with consistent views and transactions across data pieces</span>
    </h2>
    <p class="c14">
      <span>It&rsquo;s easy to underestimate the importance of standardizing the engineering and data teams&rsquo; view of multimodal data. If a company uses multiple disconnected databases, not only will these teams (and by extension, the whole company) struggle to build a single view of the data, but it will also be tricky to build consistent read/write transaction processes across these multiple databases.</span>
    </p>
    <p class="c14">
      <span>Unfortunately, this is the reality for too many companies today: because there are </span>
      <span class="c7">
        <a class="c2" href="/blog/purpose-built-database" target="_blank">few products tailor-made</a>
      </span>
      <span>&nbsp;for multimodal data management, teams often opt for several disjointed databases, setting themselves up for an endless Sisyphus-style struggle to maintain a consistent view of their multimodal data.</span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <h1 class="c15" id="h.dvw4waid8u5l">
      <span class="c5">How do we simplify these challenges with ApertureDB?</span>
    </h1>
    <h2 class="c9" id="h.7icz3iaov7zb">
      <span class="c3">Data storage and preprocessing</span>
    </h2>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span>Out of the gate, ApertureDB supports storage of multimodal data types like documents, images, videos. The query interface has in-built </span>
      <span class="c7">
        <a class="c2" href="https://docs.aperturedata.io/category/how-to-guides" target="_blank">preprocessing</a>
      </span>
      <span>&nbsp;support for image and video data, simplifying downstream processes that rely on these data and helping searches and analyses run faster. This also removes the need for users to create copies of this data to support various format requirements downstream and often results in lowering network traffic since most such operations result in downsampling the data.</span>
    </p>
    <h2 class="c9 c22" id="h.fvpy7a3otm8">
      <span class="c3"></span>
    </h2>
    <h2 class="imgContainer">
      <span style="overflow: hidden; display: inline-block; margin: 0px; border: none; width: 100%; height: auto; box-sizing: border-box;">
        <img alt="" src="https://aperturedata-public.s3.us-west-2.amazonaws.com/website_images/multimodal_data_evolving/aperture_feature_diagram.png" style="width: 80%; height: auto;" title="">
      </span>
    </h2>
    <h2 class="c9" id="h.eek0fq5od22u">
      <span class="c3">Vector database</span>
    </h2>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span>ApertureDB comes with a </span>
      <span class="c7">
        <a class="c2" href="https://docs.aperturedata.io/Introduction/WhatIsAperture" target="_blank">vector database</a>, optimized for </span>
      <span>storing, indexing, and querying high-dimensional vector data. This enables several use cases, like: </span>
    </p>
    <ol class="c17 lst-kix_lnpycr11fzm6-0 start" start="1">
      <li class="c4 c8 c12 li-bullet-0">
        <span class="c1">Powering recommendation engines with similarity searches</span>
      </li>
      <li class="c4 c8 c12 li-bullet-0">
        <span class="c1">Building accurate chatbots with RAG</span>
      </li>
      <li class="c4 c8 c12 li-bullet-0">
        <span class="c1">Enabling powerful search applications with semantic and multimodal searches</span>
      </li>
    </ol>
    <h2 class="c9" id="h.gugsb9swfb4n">
      <span class="c3">In-memory graph database: the connective tissue</span>
    </h2>
    <p class="c4 c6 c8">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span>Importantly, ApertureDB comes with an in-memory </span>
      <span class="c7">
        <a class="c2" href="https://docs.aperturedata.io/Introduction/WhatIsAperture" target="_blank">graph database</a>
      </span>
      <span class="c1">&nbsp;that stores application metadata as a knowledge graph. By leveraging the flexibility of a graph database, users can seamlessly connect metadata between any user-defined entities as well as their vector representations and original data.</span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">For example, users can connect AI models to the data used to train them, task the model with classifying new data, and attach accuracy values to the new classifications. This enables searches such as &ldquo;Find images classified by Model X where accuracy is &gt; 0.9.&rdquo; This also allows users to combine their vector searches with advanced graph filtering before accessing the required data in a suitable format for downstream ML processing. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span>The graph database also makes it </span>
      <span class="c7">
        <a class="c2" href="https://docs.aperturedata.io/HowToGuides/start/Schema" target="_blank">easy to adjust schemas</a>
      </span>
      <span>&nbsp;on the fly as AI needs change, although ApertureDB does not require users to declare schemas up front. </span>
    </p>
    <h2 class="c9" id="h.mgowhbc98hcv">
      <span class="c3">Query engine: unifying interface for applications</span>
    </h2>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span>ApertureDB features a </span>
      <span class="c7">
        <a class="c2" href="https://docs.aperturedata.io/query_language/Overview/API%20Description" target="_blank">unified API</a>
      </span>
      <span>&nbsp;across all the aforementioned data types</span>
      <span class="c1">&nbsp;based on a native JSON-based query language, coordinated by an orchestrator. Not only does this API help standardize a team&rsquo;s view of its multimodal data, but it also helps ApertureDB users avoid needing to compose queries that deal with multiple systems. </span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <h2 class="c9" id="h.jwwwsx8ollwf">
      <span class="c3">Transaction support across various modalities</span>
    </h2>
    <p class="c4">
      <span>ApertureDB implements ACID transactions for the queries spanning the different data types thus offering relevant database guarantees at the level of these complex objects.</span>
      <span class="c1">&nbsp;</span>
    </p>
    <h2 class="c9 c21" id="h.j3wg1hpjfav1">
      <span class="c3">Integrations across ML pipelines</span>
    </h2>
    <p class="c4">
      <span>ApertureDB&rsquo;s Python SDK offers convenient ETL and ML processing wrappers over the JSON query language, and simplifies </span>
      <span class="c7">
        <a class="c2" href="https://docs.aperturedata.io/category/integrations" target="_blank">integrations </a>
      </span>
      <span class="c1">across the AI toolchain. This makes it easy for engineers to write standardized queries to serve multimodal data to their applications in the required format. </span>
    </p>
    <h2 class="c9" id="h.9dwludkjt7a7">
      <span class="c3">Schema dashboard</span>
    </h2>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4">
      <span class="c1">ApertureDB offers a dashboard UI that allows users to easily check what objects exist in a dataset, the object properties, and how different objects relate to each other. </span>
    </p>
    <p class="imgContainer">
      <span style="overflow: hidden; display: inline-block; margin: 0px; border: none; width: 100%; height: auto; box-sizing: border-box;">
        <img alt="" src="https://aperturedata-public.s3.us-west-2.amazonaws.com/website_images/multimodal_data_evolving/aperturedb_dashboard.png" style="width: 80%; height: auto;" title="">
      </span>
    </p>

    <p class="c4">
      <span>This dashboard </span>
      <span>makes it surprisingly simple for data science, engineering, and analytics teams to manage the complex relationships between multimodal data types.</span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <p class="c4 c6">
      <span class="c1"></span>
    </p>
    <h1 class="c15" id="h.2ktsyxyysqj">
      <span class="c5">Conclusion</span>
    </h1>
    <p class="c4">
      <span class="c1">AI workflows are exploding. Every industry, from e-commerce to logistics to medicine, is racing to uncover new uses for ever-evolving multimodal data. Engineers must keep up with the pace. This is why we built ApertureDB: to give engineers and data scientists a purpose-built tool for multimodal data management, search, and visualization which could replace the hodge-podge of DIY solutions that existed in the market. </span>
    </p>
    <p class="c14">
      <span>If you&rsquo;re interested in learning more about how ApertureDB works, reach out to us at </span>
      <span class="c7">
        <a class="c2" href="mailto:team@aperturedata.io">team@aperturedata.io</a>
      </span>
      <span>. We have built an industry-leading database for multi-modal AI to future-proof data pipelines as multimodal AI methods evolve. Stay informed about our journey by subscribing </span>
      <span class="c7">
        <a class="c2" href="https://docs.google.com/forms/d/e/1FAIpQLSdl05L10a-AUuf0qGV0jD3SU3u2JMH_4I6tn_aAxmjaGI2ppw/viewform" target="_blank">to our blog.</a>
      </span>
    </p>
    <p class="c14">
      <span class="c1"><i>I want to acknowledge the insights and valuable edits from Ian Yanusko as well as feedback from Ayla Khan (Recursion Pharmaceuticals).</i></span>
    </p>
    <p class="c14 c6">
      <span class="c1"></span>
    </p>
  </body>
</html>
`;
