Data Annotation and Labeling Social Media Data To Predict The Pandemic

BUSINESS CHALLENGES

Our Client

DIGI-TEXX’s client is a professional from the top research universities in the heart of Tokyo, Japan. With a specialization in environmental health and spatial information science, the client conducts various research about the impact of environmental changes on humans by using machine learning and NLP.

The client has researched the application of machine learning to data from disease-related topics on social media, which can be applied to the prediction of pandemic waves.

Data Annotation and Labeling Social Media Data To Predict The Pandemic_Thumbnail

Project Challenges

Insightful Data Lies in The Daily Social Post

Fast forward to today, witnessing the COVID-19 threats to global health, social media data has received the attention of researchers. Particularly X (Twitter), which can be used to explore multiple facets in forecasting potential disease spread.

Data Annotation and Labeling Social Media Data To Predict The Pandemic 2

According to the National Library of Medicine, by collecting social media search indexes for COVID-19 symptoms, many studies have shown that new suspected cases are forecasted in advance 6–9 days or even up to 1-2 weeks earlier, compared to official records.

Another Frontiers in Public Health Journal in 2021 examined digital data streams as early signals of COVID-19 outbreaks in Canada and the US. They found that symptoms-related posts from X (Twitter) showed the best prediction performance by predicting 100% of first waves about 2–6 days earlier than other data streams.

Despite the potential advantages of social media for research, our client has met several hurdles. The high data volume that needs to be annotated accurately coupled with tight deadlines presents a significant challenge for them.

In addition, the target platform’s data – X (Twitter), normally has short-length texts and common use of abbreviations, hashtags, etc., making it difficult to comprehend contextual information.

Project Scope

Classify, label, and categorize users’ tweets on X (Twitter) based on predefined criteria: keywords, phrases, and sentiments related to flu-like symptoms.

Data Volume: The client’s sizable data, including 200,000 tweets, needs to be annotated within 2 months.
Language: English and Chinese language proficiency is required.
Ethical Considerations: Adherence to privacy regulations and ethical guidelines.
Service time: 24/7

SOLUTION

Text Annotation With Natural Language Processing

DIGI-TEXX provided a hybrid text annotation service with human-in-the-loop, which combined the power of machine learning, natural language processing (NLP), and a team of highly skilled data annotators with advanced English and Chinese proficiency. This approach optimized output for the project, ensuring efficient annotation of the large dataset.

Text annotation process:

Data Pre-processing: Classify relevant categories and remove irrelevant data, duplicates, and noisy content.
Keyword & Sentiment Analysis: Employ NLP techniques to analyze and identify relevant keywords and phrases related to flu-like symptoms. Utilize machine learning models to determine the sentiment associated with the extracted keywords and phrases.
Data Labeling: Label a subset of the data with relevant categories: “high probability of infection” and “low probability or insufficient information”, to provide efficient data with precision for client-specific needs.
Data Quality Assurance: Our annotators conducted frequent quality assurance to monitor the accuracy and consistency of the project. In addition, a feedback loop was established to evaluate and enhance performance continuously.
Export and provide the data: Deliver the annotated dataset that is compatible with the client’s systems and for further analysis and research.

Text Annotation With Natural Language Processing Process

BUSINESS OUTCOME

Accurately annotated 200.000 Chinese posts from X platforms
Complete the project within 2 months.
The accuracy rate: 100%
We provided high-quality annotated data to enhance the client’s AI algorithm accuracy and efficiency.
The annotated data can be used to develop more accurate and timely early warning systems for future pandemics, allowing for proactive measures to be taken.

BUSINESS CHALLENGES

Our Client

Project Challenges

Project Scope

SOLUTION

Text Annotation With Natural Language Processing

BUSINESS OUTCOME

Image Processing for Virtual Try-on AI Model

Data Generation on Multiple Platforms to Build User Behavior Datasets for AI Agent Training

Data Annotation for BIM System to Enhance Spatial Digital Twin Accuracy

LEGAL PAGES

LOCATION

DIGI-TEXX Building: Innovation Solution Center, ISC Hau Giang, 198 19 Thang 8 street, Vi Tan Ward

Data Annotation and Labeling Social Media Data To Predict The Pandemic

DIGI-TEXX provided a hybrid text annotation process with human-in-the-loop, which combined the power of machine learning, natural language processing (NLP), and a team of highly skilled data annotators with advanced English and Chinese proficiency.

BUSINESS CHALLENGES

Our Client

Project Challenges

Project Scope

SOLUTION

Text Annotation With Natural Language Processing

BUSINESS OUTCOME

RELATED CASE STUDIES

Image Processing for Virtual Try-on AI Model

Data Generation on Multiple Platforms to Build User Behavior Datasets for AI Agent Training

Data Annotation for BIM System to Enhance Spatial Digital Twin Accuracy

SHARE YOUR CHALLENGES

LEGAL PAGES

LOCATION

DIGI-TEXX Building: Innovation Solution Center, ISC Hau Giang, 198 19 Thang 8 street, Vi Tan Ward