Covid-19 Survey And What It Tells Us About Societal Behavior
Though Covid-19 is a worldwide phenomenon that has affected most everyone in differing ways, we attempt to see how the individual behaviors changed within different groups of society. We combine self-reported Covid-related information with census data to understand the valuable data collected through infogears.org, a Covid-19 tracking survey-database.
Written by: Hovhannes Alekyan, Taline Mardirossian
Having access to a Covid-19 survey database means having the freedom to explore how individual reporting of behaviors are shaped based on varying demographical information. The dataset analyzed throughout this article includes features such as age, zip code, whether or not they have been Covid-tested, their face mask wearing behaviors, and symptoms. The time range of data covers from April 16, 2020 to August 20, 2020 and the granularity of the data is at individual level. One user may submit a response multiple times, and as this is de-identified data, there is no way of knowing whether a single user has submitted once, or more than once. We assume that based on human behavior, one will be most likely to submit more than once only if something changes in their behavior or symptoms.
Our goal during this process was to try understanding how people behave around Covid, and whether the demographics related to their respective zip codes influenced peoples’ behaviors. In order to make more astute observations, we utilized census data of zip codes from 2019, which had metrics including mean household income, demographical data (race, age, education level), etc.
Now that we have given you some background information, we can jump into the analyses that we conducted and draw some conclusions about the survey takers!



The axes for Figure 1a represent the average income for each person’s respective zip codes vs the percentage of people in that zip code that have attended college at some point. In other words, we are looking at the demographics of the survey taker’s zip codes to understand the behaviors of people from different backgrounds.
The blue KDE plot in Figure 1c represents the people who tested positive for COVID-19. As we can see, the highest concentration of the spread is located around 20% on the Higher Education axis and $45,000 on the median household income axis. In addition to this, we can see that as we go towards the higher levels of education and income, the spread increases. While we do recognize that this could be affected by the small sample population, we still believe that we can draw conclusions from this data about the overall pattern of behavior. Interestingly enough, the center of the KDE plot for those who tested negative has a similar average for education and income with a much larger spread overall, as well as a larger area of high concentration near the middle.
When looking at Figure 1a, we can also observe slightly higher slope for the group testing positive as compared to those who tested negative.



Figure 2a represents the spread of respondents who tested positive (blue) or negative (red) for COVID relative to the median income and the percentage of people that have no diploma in the area where the respondents live.
We can see that the the spread of both plots differs widely from what we observed in the graph plotting against percentage of population with higher education. However, it is important to note that the negative pool (Figure 2b) has a more spread out average as well as a much more gradual decline, with almost a second center of concentration around 30% no diploma and $60k average income.

The first thing we can note about Figure 3 is that on average, more women fill out the survey than men, with a ratio of 3:1. However, it is interesting to note that this ratio changes as the symptom score increases. For a symptom score of 15 and over, the ratio changes to 2:1. This suggests that men are likely to fill in the survey when they actually experiencing some symptoms. A symptom score is calculated based on weights assigned to each symptom reported, curated based on the CDC guidelines of most severe or likely symptoms having the most weight, to the least common or severe ones having lower scores. Each individual bubble represents one survey submission, and as we can see the survey takers are spread across the income distribution with a peak around $75,000 which is around the same as the mean income over all the reported zip codes. It is also notable to point out that the sizes of the bubbles show that the majority of survey-takers are between 36 and 55.

Let us now take a look at differences in behavior based on splits among the zip code demographics from figure 4a. In the first plot, we see the separation from high to low defined as zip codes with high population density (greater than 10k per square mile), medium (between 1.5k and 10k people per square mile), and low (less than 1.5k people per square mile). As we can see, there is a distinct change when looking at these three divisions. Those living in areas with higher population densities were more likely to wear a face mask at all times, and this likelihood of “always” wearing a face mask decreases as the density of zip codes decreases. Relatively, there is an inverse trend for “sometimes” and “never” face mask wearers. We must keep in mind the confounding factors of this project however, given that this survey was voluntary, and the submissions are relative to the reach of the survey advertisements.

Furthermore, we can take a look now at a different demographic split, this time defining the ‘high’ grouping as submissions from zip codes with a >50% black population, ‘medium’ being between 25% and 50%, and ‘low’ being below 25%. We can observe that for this split on the population, the number of times someone left the home was one time for the majority, regardless of the split, with a relatively consistent decrease per category. Notably, we see that for the category of “two times or more”, the high grouping has a greater concentration of leaving the home two times or more, with a steady decline as we go towards low.

Here for figure 4c, we have created a split on concentration of white population, with the ‘high’ grouping being zip codes with <75% white population, ‘medium’ being between 50% and 75%, and ‘low’ being >50%. For this figure we see a significant difference in behaviors relative to the categories and groupings. For the “always” category, meaning that the respondent noted that they always wear a mask when outside of the home, we can see that the high grouping had the lowest votes, increasing as the white population concentration decreased. However, the high concentration of white zip codes had a higher proportion of the ‘sometimes’ and ‘never’ categories of wearing masks when outside.

Next, let’s take a look at Figure 5a, which represents the face-covering behavior of people with no diploma per zip-code. This means that the higher you go up the y-axis, the higher percentage of the zip code population have no diploma. The plot represents 4 variables: income level, no-diploma percentage, population density, and face-covering behavior. As we can see, at higher levels of no-diploma per zip code, there is a higher population density on average, as well as a higher “always” face mask-wearing behavior. As we go down along the “no diploma” axis and move forward on the “income” axis, we see that the population density goes down and the frequency of “sometimes” and “never” reported mask-wearing behavior increases.
From our interpretation, it seems that the education level does not have as much of an impact on the face-mask wearing behavior of the population. This implies that areas where the majority of the population is highly educated are not guaranteed to perform any better in preventing the spread of the virus. On the other hand, it is not guaranteed that the people filling out the surveys meet the exact demographic of their zip codes. This is an important confounding variable that should be considered when drawing out conclusions from the plot.

Figure 5b represents the percentage of college bachelors and graduate degrees against median income per zip code. The colors of the circles represent the face-covering behavior of people, and the circle size represents the population density of the given area, just like in Figure 4a. It is interesting to note that nearing the top right corner of the graph, the sparsity allows us to see that the majority of people living in high-education high-income populated zip codes have low population density, and have a more or less even split between answering “always” and “sometimes”, but very few “never”s reported. Contrasting this with Figure 4a, where, moving up along the “No Diploma percentage” axis, we could see that “Always” was the most frequently reported mask-wearing behavior, we can once again confirm that the overall education level of the population has little or no effect on their mask-wearing behavior.
Not surprisingly, we found throughout our analyses that population density has a large impact on people’s behavior during the pandemic, in a much more significant manner than any other factor. Neither education nor income significantly impacted the way people behave as much as the population density of their zip code.
Referring back to the notes in the introduction, we conclude that efforts towards contact tracing are only effective when regular reporting is mandatory, as when it is optional, the biases can strongly impact the results. Although the concept of a vaccine is becoming more and more likely, we encourage people to still take the necessary precautions of preventing the spread of the virus, and make use of platforms such as infogears.org which may help slow the spread of the virus by informing others, until a more permanent and safe solution is found.