Promises in the application of Data Science to Environmental Health

Roy Aishwarjyo
6 min readJan 11, 2021


From the university gym to the canteen corner, the people of our generation often talk about complicated environmental issues unknowingly. The more people are getting civilized, the more they are getting concerned about their health. Health, which is directly connected to the environment. This is the reason why environmental health is so important nowadays. But how many of us do know what environmental health exactly is? How can we protect our environmental health? Let’s explore!

What is Environmental Health?

Environmental health is a dynamic and developed sector. It aims to prevent or control the difficulties related to the interactions between people and their environment. While it is not possible to predict complicated environmental issues , environmental health science professionals focus on pointing out the risk of the physical environment on people’s health and try to mitigate them. But how do they do that? What are the factors that play a vast role here? These questions are the domain here.

You might have read a lot of articles regarding environmental issues on social media as well as print media. But did you know that along with several organizations and NGOs there is a hidden wonder that has been working from the very beginning to protect us from various environmental issues?

If the answer is no and you are a science enthusiast, the next section is only for you!

With other methodologies and ideologies, data science itself is being used widely as an important factor for environmental health science. The National Academy of Sciences, Engineering, and Medicine (NASEM) organized a workshop on June 6–7, 2019 on the use of Artificial Intelligence on Environmental health.

As the algorithms of machine learning are now heavily being used to explore the vast amount of complicated data to find some patterns, it is very useful to make predictions. Driven by immense growth in the data sector as well as accessibility ,data science applications are rapidly growing in environmental health. The potential power of data science has been widely recognised nowadays.

How does data science advance Environmental Health?

Both human biology and environmental factors are complicated. Since we daily encounter with them, discussing environmental impacts on human health face many data challenges. A report from 2013 says that 90% of the entirety of the world’s data has been created within the previous two years. Added to that, 9x the amount of information than the previous 92,000 years of humankind has been collected in that two years which leads to 2.7 zettabytes of data having already been created . From the scenario here, we can surely say that the future will be driven by data and a data driven future will play an important role in environmental health issues.

In fact data science is already working on some of the major issues. For example,

Genetic sequencing and wearable health : We have got a mass of data including genetic sequencing and wearable health and activity monitors which would help to find a pattern and predict. For this, several data science tools are used to get to know the best method with particular analysis. Now choosing the best method will be difficult without knowing the patterns that data will reveal. While Data Science makes use of Artificial Intelligence in its operations, Brahmar Mukherjee from the University of Michigan defined the benefits of AI as it helps to reduce assumptions and make better predictions. Since exponential growth in the availability of data on individual environmental exposures makes it easier to predict and comply which is an implementation of Data Science.

Sensors and personal chemical samplers : Sensors and personal chemical samplers make it easier to get more details about exposure models. Assessing location and activities in the long run are important factors for all the spheres in the exposome. Here location indicates the amount of a hazard present and behavior indicates the duration of exposure to a hazard for example air pollution. Assessing locations and matching them with environmental data collected from instruments or sources is fundamental to exposure science. These can help for developing models which will help to examine the factors in a person’s physical and social environment that influence their exposome.

Human biomonitoring : Exposure biomonitoring in several matrices such as blood and urine provide more granular detail about actual chemical body burdens. Human biomonitoring (HBM) is a tool for occupational exposure assessment has been used to have a wide variety of exposure models. These models are being used to examine the health risk and biomarkers are also used to evaluate exposure estimates which have been predicted by a model.

An important review of the collected information which aims to clarify the contribution of HBM to the evaluation of potential health risks from several occupational settings and formulate recommendations on the implementation of HBM as part of the occupational health surveillance is also a blessing from data science.

Toxicology: Experimental models from toxicology sharpens the understanding of chemicals and environmental exposures which might be a risk to human health. For example, Thomas Luechtefeld from Insilica developed a large neural network which divides the chemical of interest into several functional groups. He uses them to invent the globally recognized chemical hazard labels. The analysis uses structural data contained in the PubChem database that creates a measurement of similarities between the chemical of interest and some 200,000 chemicals that have been classified in order to 74 different hazards.

For example, a chemical might be equivalent in formation and operating groups to others that are known mutagens or acute dermal hazards. In that case, data analysis helped a lot to group the functions and mitigate the chance of the risk.

Toxicity testing : Large-scale high-throughput chemical safety screening efforts provide generated data on tens of thousands of chemicals in thousands of biological targets. The United States National Research Council (NRC) displayed a manifestation for toxicity testing in the 21st century in 2017. It was focused on the use of vitro high‐throughput screening (HTS) methods and predictive models as an alternative.

Availability of HTS data, aggregation of chemical property and toxicity information into online databases and the development of several models and frameworks to support extrapolation of HTS data has been a significant factor to progress in this domain.

From the above discussion it can be easily said that Data Science has the potential to infuse environmental health by exploiting innovations. The innovations might be related to the non-traditional data sources or providers. Promising applications comprise real time analysis and forecasting. However, in order to reinforce these innovations, privacy and security should also be maintained. Even though unorganized and unrepresentative data and spurious findings might be a challenge, data science has been an intense factor for environmental health.

References :



Roy Aishwarjyo

A Computer Science and Engineering student. Interested in Computer Science, business analytics, project management, research and editing.