Menu

#Discussion

While GIS has come a long way over the past decade, in order to understand human behavior and society as a whole in terms of spatial dynamics, Sui and Goodchild (2011) also highlight the necessity for a better tool to study these dynamics. Since the amount of data produced on social media is so immense, a synthesizing technique in order to attain significant temporal and spatial data from these sources is suggested. As being mentioned previously, it is extremely hard for a computer to capture human behavior because of the lack of context a computer is able to apply to jargon, slang, sarcasm and other acts of speech that may result in incorrect representations.

During the process of these analyses, the most problematic part was the data extraction. It is a hard question how to correctly use the keyword database to find out every tweet needed in later analysis. The biggest problem or shortcoming is the appropriateness of this chosen Excel command. Based on the conditions we have given to return a TRUE value, tweets such as “kaleidoscope” (contains the word kale in the string) or usernames like “@veggie,...” with return this TRUE value even though the tweet may have nothing to do with eating habits. An example regarding lack of context is the keyword “overweight” returning the tweet “my luggage is overweight” as a TRUE result. Computers only look at if the word matches with the search key not the actual context the word is being used in. “Oh, kale Caesar, You are my spirit salad”. This tweets has both kale and salad in it. In the eye of a computer if we ignored the rest of this sentence, it is reasonable to categorize this tweets as a healthy one, but human language is just that complex. Antonyms are also neglected by the computer with this method. For example, tweets like “Get your kale caesar salad away from me”, “F*cking cishet vegetables” are also classified as healthy whereas they represent the exactly opposite meaning of eating healthily. In addition to the lack of consideration on actual content, tweets with special cases like “Don’t ever put \bacon\ ‘bacon \ and ‘kale\’ together in the same tweet” or the same user tweeting similar tweets hundreds of times are all unpredictable errors contained in this extraction process.

In addition to the potential problems existing in the methods of extracting data, social media platforms are also interesting variables that are unpredictable in an analysis. The accuracy of the data from social media is a big problem. It is no way to ensure that everyone is telling the truth. The level of representativeness from social media is also a concern. To note, in the article “Leveraging geotagged Twitter data to examine neighborhood happiness, diet, and physical activity”, the author stated that social media as a data resource is not without limitations which include an over representation of young individuals (Nguyen et al., 2016). Is it influenced by the digital divide? What kind of people are the most active player on social media? What is the rate of social media users over the entire population. Those uncertainties all could contribute a large amount of questions to the analysis being conveyed based on social media. On the other hand, it is undeniable the the power of social media is un-neglectable in this Web 2.0 era. The widespread of Internet usages do raise a certain amount of awareness on new ways of finding solutions to many disciplines such as health care, natural disaster prevention and so on.

With respect to the raw form of data, the geotagged tweets are generated from the Twitter GeoAPI. This geotagging features reveal locational information of each individual user. The concept of volunteered geographic information (VGI) comes into play . Beginning with Goodchild’s (2007) definition, VGI is understood as the widespread engagement of large numbers of participants involved in the digital creation of geographic information (Sieber & Haklay, 2015). With the consideration of VGI, the issue of geo-privacy arises. The journal article named “Privacy, reconsidered: New representations, data practices, and the geoweb” talks about the constitutive outcomes of societal struggles over privacy (Elwood & Leszcynski, 2011). It specially examines how privacy is negotiated around two geoweb services: Google street view and Twitter GeoAPI. Putting aside the issues with Google street view, the discussion around Twitter’s GeoAPI perfectly falls into the field of this project. Basically, it is a service that allows users to automatically attach their location to post, or “tweets” (Elwood & Leszcynski, 2011). Twitter’s service is not only ‘real time’, but enables collection of location-stamped points allowing individuals to be identified at their precise locations at any point in time (Elwood & Leszcynski, 2011). A core element of the geoweb is the phenomenon of user-generated content which implies that individuals are no longer only the subjects of privacy invasion, but rather are also information producers (Elwood & Leszcynski, 2011). User-generated content then leads to the problem of information oversharing. According to Elwood & Leszcynski (2011), Twitter’s GeoAPI is seen as a sort of geographic oversharing. Individuals’ information-divulging actions compromise the privacy rights of all because they shift the socially-mediated boundary between what is public and what is private (Elwood & Leszcynski, 2011).

During a complete process of one research project, it could include a wide range of academic disciplines and there are specific problems associated with each discipline. In the field of computing sciences, challenges associated with Big Data technology includes data storage, data transmission, data management, data processing, data analysis, data visualization, data security, data privacy challenges and data quality (Yang et al., 2017). In the field of Geography, neogeographers, GIScientists and critical geographers might each have their own arguments to debate around the problems of this research.

Since the research topic is on people’s eating habits, social sciences also have some control over the roles being played in this research. Chou et, al’s article “Obesity in social media: a mixed methods analysis” (2014) on both qualitative and quantitative methods used to analyze fat-shaming and other sentiments towards obesity across multiple social media platforms identifies a similar data mining process to our project. This brings us to another limitation in our research as our data was only collected on a single social media platform. Chou et. al (2014) explain that “not all social media are equal” meaning that different types of conversations occur on different social media platforms.

Although there are significant problems, inconsistencies, generalizations, and limitations associated with our research, there are also important successes we achieved. The amount of data collected (690,00 tweets), in-depth lexicon construction, careful analysis and overlay with different aspects of the City of Vancouver aided in us being able to develop a representational model of unhealthy and healthy eating habits in the respective city. Our drawbacks also have the potential to contribute to similar studies of the future. With the constant development of technology, it is possible that solutions to problems described will arise in the near future.