{"id":23566,"date":"2024-10-04T15:56:36","date_gmt":"2024-10-04T08:56:36","guid":{"rendered":"https:\/\/digi-texx.com\/?post_type=case-studies&#038;p=23566"},"modified":"2024-10-07T13:08:00","modified_gmt":"2024-10-07T06:08:00","slug":"data-annotation-social-media-data-to-predict-the-pandemic","status":"publish","type":"case-studies","link":"https:\/\/digi-texx.com\/ja\/case-studies\/data-annotation-social-media-data-to-predict-the-pandemic\/","title":{"rendered":"Data Annotation and Labeling Social Media Data To Predict The Pandemic"},"content":{"rendered":"<div class=\"gb-container gb-container-049d4be1\"><div class=\"gb-inside-container\">\n\n<h2 class=\"gb-headline gb-headline-9ac0d6d3 gb-headline-text\"><span class=\"ez-toc-section\" id=\"BUSINESS_CHALLENGES\"><\/span>BUSINESS CHALLENGES<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"gb-headline gb-headline-2e78daf4 gb-headline-text\"><span class=\"ez-toc-section\" id=\"Our_Client\"><\/span><strong><strong><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\"><strong>Our Client<\/strong><\/span><\/strong><\/strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>DIGI-TEXX\u2019s client is a professional from the top research universities in the heart of Tokyo, Japan. With a specialization in environmental health and spatial information science, the client conducts various research about the impact of environmental changes on humans by using machine learning and NLP.<\/p>\n\n\n\n<p>The client has researched the application of machine learning to data from disease-related topics on social media, which can be applied to the prediction of pandemic waves.<\/p>\n\n\n<style>.kb-image23566_f519d9-68 .kb-image-has-overlay:after{opacity:0.3;}<\/style>\n<div class=\"wp-block-kadence-image kb-image23566_f519d9-68\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"416\" src=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic_Thumbnail.jpg\" alt=\"Data Annotation and Labeling Social Media Data To Predict The Pandemic_Thumbnail\" class=\"kb-img wp-image-23588\" title=\"\" srcset=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic_Thumbnail.jpg 740w, https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic_Thumbnail-300x169.jpg 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure><\/div>\n\n\n\n<h3 class=\"gb-headline gb-headline-fe55f590 gb-headline-text\"><span class=\"ez-toc-section\" id=\"Project_Challenges\"><\/span><strong><strong><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\"><strong>Project Challenges<\/strong><\/span><\/strong><\/strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p><strong><em><strong><em><span style=\"color: var(--accent);\" class=\"stk-highlight\">Insightful Data Lies in The Daily Social Post <\/span><\/em><\/strong><\/em><\/strong><\/p>\n\n\n\n<p>Fast forward to today, witnessing the COVID-19 threats to global health, social media data has received the attention of researchers. Particularly <a href=\"https:\/\/twitter.com\/\" target=\"_blank\" rel=\"noopener\"><strong>X<\/strong><\/a> (Twitter), which can be used to explore multiple facets in forecasting potential disease spread.<\/p>\n\n\n<style>.kb-image23566_e1a2b6-de .kb-image-has-overlay:after{opacity:0.3;}<\/style>\n<div class=\"wp-block-kadence-image kb-image23566_e1a2b6-de\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"416\" src=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic-2.jpg\" alt=\"Data Annotation and Labeling Social Media Data To Predict The Pandemic 2\" class=\"kb-img wp-image-23575\" title=\"\" srcset=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic-2.jpg 740w, https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic-2-300x169.jpg 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure><\/div>\n\n\n\n<p>According to the<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC7906737\/\" target=\"_blank\" rel=\"noopener\"> National Library of Medicine<\/a>, by collecting social media search indexes for COVID-19 symptoms, many studies have shown that new suspected cases are forecasted in advance 6\u20139 days or even up to 1-2 weeks earlier, compared to official records.&nbsp;<\/p>\n\n\n\n<p>Another <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC8085269\/pdf\/fpubh-09-656635.pdf\" target=\"_blank\" rel=\"noopener\">Frontiers in Public Health Journal<\/a> in 2021 examined digital data streams as early signals of COVID-19 outbreaks in Canada and the US. They found that symptoms-related posts from X (Twitter) showed the best prediction performance by predicting 100% of first waves about 2\u20136 days earlier than other data streams.<\/p>\n\n\n\n<p>Despite the potential advantages of social media for research, our client has met several hurdles. The high data volume that needs to be annotated accurately coupled with tight deadlines presents a significant challenge for them.<\/p>\n\n\n\n<p>In addition, the target platform\u2019s data &#8211; <a href=\"https:\/\/twitter.com\/\" target=\"_blank\" rel=\"noopener\"><strong>X<\/strong><\/a> (Twitter), normally has short-length texts and common use of abbreviations, hashtags, etc., making it difficult to comprehend contextual information.<\/p>\n\n\n\n<h3 class=\"gb-headline gb-headline-25fbbbd3 gb-headline-text\"><span class=\"ez-toc-section\" id=\"Project_Scope\"><\/span><strong><strong><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Project Scope<\/span><\/strong><\/strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Classify, label, and categorize users\u2019 tweets on <strong><a href=\"https:\/\/twitter.com\/\" target=\"_blank\" rel=\"noopener\">X<\/a> <\/strong>(Twitter) based on predefined criteria: keywords, phrases, and sentiments related to flu-like symptoms.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Data Volume<\/span>: <\/strong>The client&#8217;s sizable data, including 200,000 tweets, needs to be annotated within 2 months.<\/li>\n\n\n\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Language<\/span>:<\/strong> English and Chinese language proficiency is required.<\/li>\n\n\n\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Ethical Considerations<\/span>:<\/strong> Adherence to privacy regulations and ethical guidelines.&nbsp;<\/li>\n\n\n\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Service time<\/span>:<\/strong> 24\/7<\/li>\n<\/ul>\n\n<\/div><\/div>\n\n<div class=\"gb-container gb-container-540b5898\"><div class=\"gb-inside-container\">\n\n<h2 class=\"gb-headline gb-headline-c2b72c8c gb-headline-text\"><span class=\"ez-toc-section\" id=\"SOLUTION\"><\/span><strong><strong>SOLUTION<\/strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"gb-headline gb-headline-91203dbc gb-headline-text\"><span class=\"ez-toc-section\" id=\"Text_Annotation_With_Natural_Language_Processing\"><\/span><strong><strong><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\"><span style=\"color: var(--accent);\" class=\"stk-highlight\"><strong>Text Annotation With Natural Language Processing<\/strong><\/span><\/span><\/strong><\/strong><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>DIGI-TEXX provided a hybrid <a href=\"https:\/\/digi-texx.com\/data-management\/data-annotation-services\/\"><strong>text annotation service<\/strong><\/a> with human-in-the-loop, which combined the power of machine learning, natural language processing (NLP), and a team of highly skilled data annotators with advanced English and Chinese proficiency. This approach optimized output for the project, ensuring efficient annotation of the large dataset.<\/p>\n\n\n\n<p><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Text annotation process<\/span><\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span style=\"color: var(--accent);\" class=\"stk-highlight\"><strong>Data Pre-processing:<\/strong>&nbsp;<\/span> Classify relevant categories and remove irrelevant data, duplicates, and noisy content.<\/li>\n\n\n\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Keyword &amp; Sentiment Analysis:<\/span><\/strong> Employ NLP techniques to analyze and identify relevant keywords and phrases related to flu-like symptoms. Utilize machine learning models to determine the sentiment associated with the extracted keywords and phrases.<\/li>\n\n\n\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Data Labeling:<\/span> <\/strong>Label a subset of the data with relevant categories: \u201chigh probability of infection\u201d and \u201clow probability or insufficient information\u201d, to provide efficient data with precision for client-specific needs.&nbsp;<\/li>\n\n\n\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Data Quality Assurance:<\/span><\/strong> Our annotators conducted frequent quality assurance to monitor the accuracy and consistency of the project. In addition, a feedback loop was established to evaluate and enhance performance continuously.<\/li>\n\n\n\n<li><strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">Export and provide the data:<\/span> <\/strong>Deliver the annotated dataset that is compatible with the client\u2019s systems and for further analysis and research.<\/li>\n<\/ol>\n\n\n<style>.kb-image23566_60d73f-52 .kb-image-has-overlay:after{opacity:0.3;}<\/style>\n<div class=\"wp-block-kadence-image kb-image23566_60d73f-52\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"416\" src=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Text-Annotation-With-Natural-Language-Processing-Process.jpg\" alt=\"Text Annotation With Natural Language Processing Process\" class=\"kb-img wp-image-23567\" title=\"\" srcset=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Text-Annotation-With-Natural-Language-Processing-Process.jpg 740w, https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Text-Annotation-With-Natural-Language-Processing-Process-300x169.jpg 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure><\/div>\n\n<\/div><\/div>\n\n<div class=\"gb-container gb-container-3c64cdaf\"><div class=\"gb-inside-container\">\n<div class=\"gb-grid-wrapper gb-grid-wrapper-84dc8722\">\n<div class=\"gb-grid-column gb-grid-column-31652cd0\"><div class=\"gb-container gb-container-31652cd0\"><div class=\"gb-inside-container\">\n\n<h2 class=\"gb-headline gb-headline-6c0964bb gb-headline-text\"><span class=\"ez-toc-section\" id=\"BUSINESS_OUTCOME\"><\/span>BUSINESS OUTCOME<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Accurately annotated <strong><span style=\"color: var(--accent);\" class=\"stk-highlight\">200.000 Chinese posts<\/span><\/strong> from <a href=\"https:\/\/twitter.com\/\" target=\"_blank\" rel=\"noopener\"><strong>X<\/strong><\/a> platforms<\/li>\n\n\n\n<li>Complete the project within<span style=\"color: var(--accent);\" class=\"stk-highlight\"> 2 months<\/span>.<\/li>\n\n\n\n<li>The accuracy rate: <span style=\"color: var(--accent);\" class=\"stk-highlight\"><strong>100%<\/strong><\/span><\/li>\n\n\n\n<li>We provided high-quality annotated data to enhance the client&#8217;s AI algorithm accuracy and efficiency.<\/li>\n\n\n\n<li>The annotated data can be used to develop more accurate and timely early warning systems for future pandemics, allowing for proactive measures to be taken.<\/li>\n<\/ul>\n\n<\/div><\/div><\/div>\n\n<div class=\"gb-grid-column gb-grid-column-0123e88f\"><div class=\"gb-container gb-container-0123e88f\"><div class=\"gb-inside-container\">\n\n<figure class=\"gb-block-image gb-block-image-d804f78c\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"416\" class=\"gb-image gb-image-d804f78c\" src=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic-3.jpg\" alt=\"Data Annotation and Labeling Social Media Data To Predict The Pandemic 3\" title=\"Data Annotation and Labeling Social Media Data To Predict The Pandemic 3\" srcset=\"https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic-3.jpg 740w, https:\/\/digi-texx.com\/wp-content\/uploads\/2024\/10\/Data-Annotation-and-Labeling-Social-Media-Data-To-Predict-The-Pandemic-3-300x169.jpg 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure>\n\n<\/div><\/div><\/div>\n<\/div>\n<\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>DIGI-TEXX provided a robust text annotation service with human-in-the-loop, which combined the power of machine learning, natural language processing (NLP)&#8230;<\/p>\n","protected":false},"featured_media":23583,"template":"","industries":[75],"class_list":["post-23566","case-studies","type-case-studies","status-publish","has-post-thumbnail","hentry","industries-education"],"acf":[],"_links":{"self":[{"href":"https:\/\/digi-texx.com\/ja\/wp-json\/wp\/v2\/case-studies\/23566","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/digi-texx.com\/ja\/wp-json\/wp\/v2\/case-studies"}],"about":[{"href":"https:\/\/digi-texx.com\/ja\/wp-json\/wp\/v2\/types\/case-studies"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/digi-texx.com\/ja\/wp-json\/wp\/v2\/media\/23583"}],"wp:attachment":[{"href":"https:\/\/digi-texx.com\/ja\/wp-json\/wp\/v2\/media?parent=23566"}],"wp:term":[{"taxonomy":"industries","embeddable":true,"href":"https:\/\/digi-texx.com\/ja\/wp-json\/wp\/v2\/industries?post=23566"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}