Whale watching in Sri Lanka: Understanding the metadata of crowd-sourced photographs on FlickrTM social media platform

Data mining on social media platforms (InstagramTM, FlickrTM, and TwitterTM) is rapidly increasing and application of data mining techniques has contributed to significant findings in various fields such as tourism, ecology, and politics etc. In the face of globalization and nature-based tourism is thriving in many countries, social media activity on tourism is increasing despite the socio-economical barriers. In this context, this paper attempts to understand the metadata of photographs related to whale watching in Sri Lanka in Flickr social media platform. Photographs related to whale watching was extracted and analyzed for i) photographic content ii) Geo-tags iii) Social-tags and iv) Photographers’ nationalities by using Flickr API (Application Programming Interface) and self-written python program script. Content analysis of the photographs has identified five major categories (human activity, accommodation, natural phenomena, animals and other) of photographs based on the major element present in each photograph. Mapping of geo-tagged photographs indicated that Mirissa was the hotspot for whale watching in Sri Lanka. Moreover, the present study suggests that mapping of geo-tagged photographs can be used as proxy data for whale distribution in Sri Lanka. Analysis of social tags indicated that tags indicating whale (156), Sri Lanka (144) and Mirissa (133) were popular among the photographers. The demographic profile of the photographers indicated that the highest number of photographers (25%) from the United Kingdom followed by Sri Lanka (18.69%) and China (12.94%) interested in whale watching. Despite some of the weaknesses, this study has demonstrated that metadata of Flickr photographs can effectively be used for understanding the basic information related to whale-watching tourism in Sri Lanka.


INTRODUCTION
As an emerging sub-sector of nature-based tourism, whale watching provides substantial economic benefits to many countries around the world. In 2009, global whale watching activity over 119 countries has generated more than US$ 2.1 billion (O'Connor et al. 2009). As per the definition of International Whaling Commission (IWC), whale watching can be defined as a commercial activity that allows to see cetaceans (whales, dolphins and other porpoises) in their natural habitats (IWC, 2004). Though the majority of watching activity is done by boats, other forms of watching include aerial or swimming with cetaceans (Parsons et al. 2006;Parsons, 2012).
The span of whale watching activity ranges in every continent of the world, where North America is the largest whale watching destination (O'Connor et al. 2009). However, during the last few decades, several countries in Asia are also gaining popularity as whale watching destinations. Especially, regions in South Asia (Sri Lanka, Maldives) and south-east Asia (Japan, Thailand, Indonesia and Taiwan) attract significant tourist influx for whale watching (O'Connor et al. 2009). Among those, Sri Lanka remains as one of the top-notched emerging destinations largely due to a number of factors. These include year around whale sightings over the different parts of the island, comparatively lower fares for watching activity (O'Connor et al. 2009) and other biophysical activities that allows higher distribution of whales (e.g. curvature of coastline, narrow and steep continental slope and upwelling pattern associated with the seasonal blooming of phytoplankton and krill) (De Vos et al. 2010).
The history of whale watching in Sri Lanka is dated back to 1983, where small-scale commercial operations were practised off the coast of Trincomalee (Buultjens et al. 2016a). However, the development of the whale watching industry was hindered by civil unrest in North-eastern provinces. Cessation of this unrest in 2008 and sightings of the blue whales and sperm whales off the coast of southern Sri Lanka (Buultjens et al. 2016a) reinvigorated the Sri Lankan whale watching industry. Currently, commercial whale watching operations in Sri Lanka occurs in three destinations including Mirissa, Trincomalee, and Kalpitiya. Out of the 30 documented cetacean species in Sri Lankan waters, whale watching industry primarily targets on larger charismatic species such as blue whale (Balaenoptera musculus) and sperm whale (Physeter macrocephalus) (Ilangakoon, 2013).
The rapid development of technology has intrinsically linked nature-based tourism and photography. Increasing number of mobile phones equipped with cameras and technologically advanced digital cameras have become an intimate part of the tourist. In an era where social network sites are rapidly expanding, photographs from these devices, are regularly uploaded to various social media platforms. In this context, social media platforms including Facebook™, Flickr™, and Instagram™ remained at the top level of the hierarchy. Continues uploading and sharing information on these platforms by the general public, enables social media as a rich source of public data. Data on these networks act as a promising source for various scientific studies largely due to i). Data is openly available and free of charge ii). Relatively easy to collect iii). Has continuous generation iv). Has higher spatial and temporal range . Out of these networks, Twitter a text-based social media network is extensively used in various data mining studies in the fields of tourism (Claster et al. 2017;Sotiriadis and van Zyl, 2013), epidemiology (Vincenza et al. 2015). On the other hand, Flickr, Instagram, and Facebook networks allow the user to upload more multimedia (photographs, videos) content. Information of these graphical contents uses extensively in various fields of environmental sciences.
Started in 2004, Flickr is one of the largest photo sharing websites in the world. Since 2004, more than 71 million users have uploaded 6 billion photographs to the Flickr platform (Wood et al. 2013). Moreover, a large portion of these photographs is geo-tagged and publicly available. Flickr also provides specific tools and methods for mining the information of those photographs. Willemen et al. (2015) have used the Flickr network to assess the contribution of red list species in African protected areas to wildlife tourism. On the other hand, various authors have pointed out the potential of Flickr data as a proxy for biodiversity studies (Stafford et al. 2010;Willemen et al. 2015). More extensive studies in tourism have shown that, data on Flickr platform can effectively be used for understanding the tourist travel routes (Spyrou et al. 2015;Yang et al. 2017), visual reconstruction of historical sites (Agarwal et al. 2011), assessing the spatial distribution of coral reefs (Howarth, 2014).
By evaluating the previous studies on using Flickr for tourist data mining approaches, this study aimed to understand whale watching tourism of Sri Lanka by using crowdsourced photographs on Flickr social media platform. Systematic investigation of the i). The content of photographs ii). Geographical distribution of the photographs iii). Temporal distribution of photographs and iv). Demographics of the whalephotographers and v). Social tag analysis of photographs was studied to achieve this objective.

Data extraction
Flickr provides an Application Programming Interface (API) for developing applications based on its data. Flickr API consists of a set of callable methods/set of codes allowing search by keywords and location for certain request (ElQadi et al. 2017). Returned data based on API requests are either in machine-readable XML (Extensible Markup Language) or JSON (JavaScript Object Notation) format (Barve, 2014). These machine-readable data formats have significant importance for data mining.
The present study focused on the extracting metadata of photographs bearing the keyword "Whale". The term whale was used as it is more convenient for the public than the binomial nomenclatures of the cetaceans. flickr.photos.search API method was used to extract the photographs tagged by keyword Whale in Flickr web platform. Search results were narrow down to selected geographical locality (Sri Lanka) by applying the place_id parameter in flickr.photos.search API method. Parameters on geographical information (latitude and longitude), photo id, photo title, date and time, URL for photograph and owner of the photo (owner_id) were also applied in the search query to obtain the relevant information. Metadata extraction was carried out by using the Flickrapi python wrapper (Stüvel, 2012) and Python programming language (Version 3.6) (Python Software Foundation, 2017). Extracted data were stored in a separate data frame.

Extraction of photo tags
Observation of the photographs indicated that each photograph was associated with several tags. Extraction of these tags was performed by using flickr.tags.getListPhoto API method and photo_id as a search parameter.

Photographers' nationality and other EXIF data
Owner_id from each photograph was used to extract the information about photographers' nationality. Photo_id was used to retrieve the EXIF (Exchangeable Image File Format) information on each photograph by using flickr.photos.getExif method. This information was used to determine the camera model which each image was taken. The complete flow of algorithm has presented in Fig.1.

Content analysis of the photographs
Content analysis can be defined as an observational research method used to systemically evaluate the symbolic content of all forms of recorded communications (Albers and James, 1988). It used extensively in the various field of sciences to classify the photographs. The content of the photographs was manually checked and classified according to the "eyecatchers" of the images by an independent coder. Eyecatchers are defined as an illustration where 50% or more occupied by the eye-catching device to grab attention (Pritchard and Morgan, 1995). Photographic categories were adopted from Hausmann et al. (2017) and modified by adding frequent categories appeared in the current photographs. (Table 1).

Visualizing the geo-tagged whale photographs
Geographical distribution of the photographs depicting whale was visualized by basemap (Hunter, 2007) package of python. Based on the content analysis, only photographs depicting whale were chosen for plotting on the map.

Analyses of social tags, demographics and monthly distribution of photographs
Extracted photographic tags from each photograph were counted and visualized in a wordcloud based upon the frequency of each tag. Percentage distribution of the photographers was plotted against the country of origin. Most popular camera gear among the photographers was also identified. All the analyses were performed by using self-written python-script by using matplotlib (Hunter, 2007) Whale and dolphins were most photographed (47.69% and 18.24% respectively) elements within the "animals" category. Moreover, the number of photographs depicting the intended keyword (whale) was compatible with similar studies with other animal species (Table 3). Whale watching is one of the emerging industries in Sri Lanka. With possible avenues for development, there might be more travel photographers using Flickr platform.
This may further increase the number of whale photographs. In addition to that, data mining in other social media platforms (Instagram, Facebook) may also increase the number of geo-coded photographs related to whale watching.   (2017) Among the number of human activities, whale watching was the prominent human activity (8.13%). However, activities such as posing were not much popular during the whale watching. Similar behaviour of tourists has been observed by Hausmann et al. (2017) in a study to understand tourist preferences for nature-based activities at protected areas. The authors argued that Flickr is not a popular social media platform for sharing photographs of human activities such as posing. This may result in a low percentage of (3.3%) posing-photographs in the latter study. In the present study, there was a low percentage (0.44%) of human activities such as diving.
Though "swim/dive with cetaceans" is included in the definition of whale watching, swimming/diving with cetaceans is strictly prohibited in Sri Lanka unless prior approval is granted by Ministry of Wildlife Resources Conservation (MWRC) (MWRC, 2012). This situation might be attributable to the low percentage of diving/swimming with cetacean photographs.
Whale watching industry in Sri Lanka provides a number of livelihoods for nearby residents. Among those, provision of tourist accommodation is one of the important services. These accommodations are range from small homestay to luxury hotels. Most of these service providers highlight the whale watching opportunities associated with them. Some of them use Flickr as a media for posting their facilities for wider tourist attention (Fig. 2b). This is usually achieved by posting the photographs of the relevant accommodation and tagging it as whale/whale watching. Besides that, a considerable percentage (5.49%) of other tourism infrastructure facilities (boats) have also illustrated in the photographs. This includes large commercial boats and modified fishing boats. Modification of fishing boats by fishers was primarily urged by the potential higher income related to the whale watching in Sri Lanka (Buultjens et al. 2016a). However, the standards of these modified boats are arguable for whale watching (Williams, 2013).

Tharindu Bandara and Tharanga Prageeth Bandara
Results of the content analysis also revealed that 4.18% of photographs related to natural phenomena. Within this category, 93.2% of photograph contained the photographs of sunrise. On a normal day, whale watching boats leave Mirissa around 0600-0630 h. (Randage et al. 2014). During this early excursion, a tourist has ample time for photographing the sunrise at the horizon. This might be a reason for the increased number of photographs related to the sunrise.
The present study also reported that there was irrelevant content also appeared in the search results. This may be due to both incorrect user tagging and behaviour of social media bots which could leave incorrect results.

Mapping of geo-tagged whale photographs
Among the 455 photographs that were queried, exclusion of the photographs that were not relevant to the main subject (whale) and burst photographs reduced the total number of photographs up to 238. Mapping of these photograph coordinates indicated that most of the photographs were located around Mirissa. In addition, some of the photographs were also recorded in Kalpitya and Trincomalee area (Fig.  3). Among all, Mirissa is the most popular whale watching destination. Better tourism infrastructure facilities and close proximity to the sea are major drivers for increased tourist attraction for whale watching at Mirissa (Buultjens et al. 2016a). Trincomalee is an emerging tourist attraction for dolphin and whale watching after the civil war (Buultjens et al. 2016b). Although Kalpitiya has greater potential for whale watching, tourism infrastructure is still at developing stage. Those reasons might be attributed to reduced tourist visitation at Kalpitiya and Trincomalee. Therefore, a number of photographs in each area may depend upon the destination popularity and available infrastructure facilities. The number of geo-tagged whale photographs recorded in the present study is higher than that of in other popular databases for biodiversity research (e.g. Global Biodiversity Information Facility/ GBIF.org, iNaturalist.org). Therefore, integration of the above geo-tagged sightings with GBIF database will expand the present GBIF scope on whale sightings from Sri Lanka. Since GBIF is a popular tool for larger citizen science informatics studies (Barve, 2014;Wood et al. 2013), this integration will positively contribute to further marine biodiversity studies. Fig. 3 Distribution of geo-tagged whale photographs around Sri Lanka However, this integration should be followed by further quality controlling and accurate georeferencing of the relevant photographs.

Time of the day the photographs were taken
The highest photograph count was recorded at 1100 h. Based upon the number of photographs; it can be assumed that photographers were more active from 0900 to 1100 h. Most of the boat operators are aware of frequent whale sighting sites. At Mirissa, the average time to reach those sighting places would take 3-4 hours from the departure (at. 0600 h) (Buultjens et al. 2016a). This may explain the highest photographic count around 0900-1100 h. Fig. 4 Hourly distribution of the photographs

Monthly distribution of the photographs
Monthly distribution of the photographs indicated that the highest photo count in April. From October to April, there was an increasing trend in photo count (Fig. 5b). In Sri Lanka, sightings of the whales are highly depended upon the seasonal weather patterns. November to April is the prime whale watching period at Mirissa where watching is unaffected by south-western monsoon rains. In contrast, May to October is the best period for whale watching at Trincomalee. However, whale watching traffic at Trincomalee and Kalpitiya is far below than the Mirissa (Martenstyn, 2013). This may result in reduced photo count during May-October. Since observed results highly aggregated to Mirissa, it can be assumed that increased photo count from November to April is either due to increased tourist influx/increased whale activity or both. However, the previous record of monthly blue whale distribution of Sri Lanka (Martenstyn, 2013) indicated that photographic count was overlapped with monthly blue whale sightings (Fig. 5a). Therefore, monthly distribution of photographs may also be used as a primary biodiversity data for understanding whale distributions.

Nationality of the photographers
Analysis of the Flickr profiles indicated that 91 photographers over 20 countries have engaged in photographing the whales. The highest number of photographers (25%) were recorded from the United Kingdom followed by Sri Lanka (18.69%) and China (12.94%) (Fig. 6). However, the actual number of whale watching photographers may exceed the present statistics, as not every photographer uploads their photographs to Flickr platform.

Camera models used for photographing whales
EXIF data of the photographs indicated that photographers have used 52 camera models. These include both Digital Single-Lens Reflex (DSLR) and mobile phone cameras. In DSLR cameras, 19.12% of photos were taken by Canon EOS 7D followed by Nikon D800 (5.15%), Canon EOS 5D Mark III (4.41%) ( Table 4). Users of these camera models were likely to be professional users. iPhone 6s and iPhone 4s were top mobile phone cameras bearing photo count of 9.56% and 3.68% respectively. Activity factors (a number derived from photos, members and model's rank to indicate most frequently used cameras) of above DSLR camera models indicated that those cameras are in listed in top 10 camera models within the Flickr community (Flickr, 2017).

Analysis of social tags
Social tagging/ collaborative tagging is a process of assigning tags (keywords) for individual and shared content (e.g. Photographs/ tweets) for future retrieval and organizing (Golder and Huberman, 2006). Since a number of search engines are accepting only text queries, tagging of photographs maximize the photograph's exposure for future retrieval. Currently, Flickr allows adding 75 tags for a selected photograph (Wikihow, 2017). Analysis of photo tags indicated that Flickr users have used 316 different tags. Tags indicating whale (156), Sri Lanka (144) and Mirissa (133) were popular tags among the photographers (Fig. 7). Previous studies on tagging behaviour of the Flickr users have indicated that users are not only tagging the visual content but also other attributes such as date, location, and actions where the photo was taken. Based on these, an image-attribute based tag classification system has been introduced by Jörgensen (1998). As per Jörgensen's classification, most frequent tags in the present study can be classified as, i. Location tags (Mirissa and Sri Lanka), ii. People related attributes (e.g. watching) and iii. Visual elements (ocean, whale). Studies on tagging behaviour of Flickr users have indicated that these attributes are more prominent in tagging Flickr photographs (Rorissa, 2010;Sigurbjörnsson and van Zwol, 2008). Therefore, further use of above frequent tags in whale watching photography may increase photograph visibility for various data mining approaches.

Fig.7
Word cloud representation of most frequent tags associated with photographs (Minimum tag Frequency >5). Note that tag color and size are proportional to the frequency of tags.

CONCLUSION
Rapidly developing social media networks provide valuable information for data mining studies. The present study suggests that metadata from geo-tagged whale watching photographs can effectively be used for understanding the tourist behavior of whale watching. Despite some limitations, geo-tagged photographs may also be used as a proxy for whale distribution in Sri Lanka. Future studies on mining other databases (Instagram, Twitter) concurrent with Flickr database may improve the present findings. Furthermore, application of automated image classification systems may significantly reduce the human intervention in image classification process. In a scenario where statistics of whale watching tourism is scattered/scare, the present study presents a novel and cost-effective method for data aggregation and interpretation. Results of the present study may use for future policy formulation, promotion of whale-watching tourism and related activities.