Web Scraping of Disease Information from social media Twitter

Muhammad Iqbal Habibie; Taufiq Widiaputra; Yulianingsani Yulianingsani

Open Conference Systems, The 1st International Conference on Advanced Information Technology and Communication (IC-AITC)

Muhammad Iqbal Habibie, Taufiq Widiaputra, Yulianingsani Yulianingsani

Last modified: 2021-12-10

Abstract

This project aims to investigate the trends in the profile of various users, who had discussed disease information on Twitter. To perform this task to get the data of disease information, related tweets and twitter user details the data collection using web scraping. Data Collection from Twitter was carried out by applying web scraping technology using python language. Web scraping allows us to download data from many websites to our local system through internet. It collects data from many internet sites utilizing HTTP or a web browser and analyzes it to meet our needs. Many researchers, companies utilized it to collect data and create search engine. Beautiful soup and selenium are two python packages/modules that can aid in the process of web scraping. There are several libraries, such as Autoscraper, that can automate the web scraping process. All these libraries make use of different APIs for scraping data and storing it in a data frame on our local system. The scraping experiment from twitter in this study has succeeded in retrieving disease information from 2015-2020 using an advanced tool for Twitter scrapping called Twint. Without needing to utilize the Twitter API, we can use this program to scrape any user’s followers, following, tweets, and so on.

Keywords: web scraping, data collection, python, disease information, twint

Conference registration is required in order to view papers.