Sunday, September 10, 2017

Emojis, elections, LKF and whatsapplied statistics

Whatsapp is the messaging medium of choice among the ultimate community of Hong Kong. Since December 2011, an ever-expanding group of players have coordinated practice, social activities, shared news and engaged in raucous discussions over a Whatsapp group. At times, this group has exploded like a phone ringing off its hook and been the source of much hilarity for the community. The group has gone through many names, but most commonly has been known as "Party in My Pants" (PIMP).

This analysis was started by my friend, teammate and former colleague Jak Lau. In June 2017 he emailed me this:

Party in my TUANsuit
STATISTICS
Started: 10 December 2011
Age: 5 years 7 months
Messages 39,700
Group name changes: 19
Messages written no.%
Doona32618.0
Sam24536.0
Jak22235.4
Neil21205.2
Mikey20975.1
Kim20415.0
Will13393.3
Tommy8052.0
Gio6551.6

This is the email exchange that followed:
Cal: What?!?!?! How did you get these??
Jak: I made them.
Cal: You downloaded the transcript and searched?
Jak: Yeah, just exported the chat into excel and used a few simple analysis tools. Mostly sort and filter.
I was gonna do more, but wouldn't be worth it.
Cal: Haha I might play around with it. I want to see which emojis we use (eggplant)
Jak: Have fun. I don't know if you can export the emojis.

And so I set forth to do some further analysis. Whatsapp's Export Chat feature is nifty (and a feature that separates it from many other messaging apps), allowing me to email my account's stored data of the phone to myself as one raw text file. You can include media as well (images and videos) but I figured that might be an overwhelming amount and didn't include the images. Since leaving Hong Kong in early 2016, I've both remained in this group chat and become a professional data scientist, learning many techniques that would help me work with this text file.

Python is a good tool for text analysis, especially when used through a web application interface like a Juypiter Notebook. The benefits of using a web interface is the text gets outputted in your browser, which means different language scripts and emojis, both of which are relevant here, will likely be supported. The Pandas package allows in one step for the ingestion of the .txt file and conversion into a useful dataframe. The raw data contains lots of activity from whatsapp, including when users entered or left the group and when they sent images. I was mainly just interested in messages. I knew I would eventually do analysis on the emojis though, and I wasn't sure how that would work in a Python Juypiter Notebook - there was a tutorial on emoji data science in R though, which happens to be my strongest programming language. I exported the raw data from Python and used R to analyze it.

So I imported the data into R and was looking at a 37,351 * 4 dimension dataframe. I had the timestamp of the message, the sender, the message text itself, and whether it was an image.

My first step was to look at a timeline - what was typical activity? What sort of spikes occurred and when? It's hard to plot a timeline without first grouping data into buckets, hence I grouped the continuous timestamps into the days of the message. Since I was in the US EST time zone when I emailed these messages to myself, the timestamps are also in that time zone - however the majority of the group is based in Hong Kong, and it made sense to add 12 hours to all those times first. Then I was able to calculate how many messages were sent on any given day in Hong Kong, and eventually create the following plot.

The labels that you see there were created semi-manually. After observing the timeline without the labels, I looked up the peak days and dug into the original data to see what people were talking about on those peak days. For most of these I could find some clear events that piqued interest in the group. Some of these were external events like the US Presidential Election and the Hong Kong Umbrella Revolution (perhaps the most sustained spike) and some were random internal events like the time we decided to play a game where people typed entirely in emojis and others guessed what movie they referred to. A few of the spikes didn't really correspond to anything more than a Saturday night. The external events were mostly major news events, live sporting events including ultimate events. I also felt like the chart was lacking colors and struggled to think of what other variable I could use to color the dots, and decided to just use day of the week. It isn't really adding any more insights to the chart, but it makes the presentation better.

This plot also shows several periods of no activity which I can trace to periods where I lost my phone and had to restore an earlier backup. Whatsapp stores data on the cloud of course but access to this data on any given device is local. Whenever I lost my phone and had to reset, weeks or months of messages were lost. This explains why my total message numbers are lower than Jak's, despite being done several months later. I also chose to break down the plot into each year, called facetting in ggplot, and keep each year to its own scale - the 469 behemoth during the first emoji movie game ruins the scale and makes it hard to see other outliers. This is where ggplot really shines - facetting is awful to do in many other plot packages, and without ggplot I would probably just create 6 plots and piece them together in a photo editor.

Next was to repeat Jak's work and look at the most common texters. The group has expanded greatly over the years. Once limited by the app itself to 25 users, it has now grown to 91 users, and I found unique messages by over 100 users including users who have left the group. After aggregating the sender count, I made the following bar chart including all texters who had sent over 100 messages. In R it was also easy to add in a bit of extra information by coloring the message quantities by year (which I do with sequential shades of green, making it clear that darker means more recent).

So we do see that Donna is far and away the most active user historically, followed by Sam, Mike Ying, Kim, Jak, Neil, me and Tuan. The group's expansion is also visible here, with users Wanda and Jason noticeable for being high volume users with messages only in 2016 and 2017. On the flip side there are users whose volume dropped off over the years, including people like Nickie Wong and Chris Harrison who left Hong Kong.

What was also fun was searching for specific words, and then redoing the barplot for just messages with those words. As one of the main functions of this group was to organize social activity among people in Hong Kong, several places in Hong Kong appear in hundreds of messages over the years. Chief among these is "LKF", short for Lan Kwai Fong, one of the best party areas in Hong Kong and in all seriousness, the world. LKF appeared in messages 102 times, led far and away by Tuan Phan.
Ok I'm #2, but Tuan has me beat by a mile. Along each bar, I included a randomly sampled message by the respective person using LKF, and it so happens that Ruth Chen's message is "Tuan's always in LKF."
Just as an added side bonus, I wanted to see how deep into the socializing these texts typically occurred. It took a bunch of manipulation (I had to extract  the time portions of these texts, then set them all to the same arbitrary day) before I was able to graph the frequency of these texts over the course of the day

Hmm, it would appear that texts referring to the party place in Hong Kong within the group "Party in My Pants" really take off between 6pm and 1am. Whodathunk it?

You can repeat the first LKF graph with any other word, or regular expression. I'll do one more, and be careful if you're reading this at work, because our group is not a PG13 group.
Interesting, Tuan also has a commanding lead in this category, and his randomly sampled "sex" sentence even includes "lkf." Even if you the reader are not familiar with any of the people mentioned here, you may have an inkling of why this groupchat is now named "Party in my Tuansuit."

Ok at this point, the most data science heavy thing I've done is sample a random sentence and plot it in that graph. Surely this is not what I'm paid to do (you'd be surprised). But let's actually apply some text mining to this wonderful data set.  I first do some quick preprocessing steps, reducing everything to lower case and getting rid of pesky punctuation. Using the R package "tm", I also eliminate a healthy group of English language stopwords (generic words like "an", "me", "who" etc which don't really provide any insight), and created a corpus and dictionary. Here dictionary means that the program creates a vector to store words. It iterates along each word of each message and every time it comes across a word, if it hasn't seen it before, it adds a new element to the vector and assigns it the value 1. If it has seen the word before, it finds the index corresponding to that word and increases its value by 1. The program will separately keep a vector of the words themselves so that we can match them to the word count later. This step tells me that we have 17,608 unique words. Considering we have over 37k texts and most texts have multiple words, I was surprised that the unique words was so low. As it turns out, we repeat words a lot. From this step I can see that we've said "happy" 1509 times and "birthday" 1215 times, the 1st and 3rd most used words respectively.

So I want to calculate the average frequency usage of each word, and the average frequency for each user. Key to this step is the document term matrix. The document term matrix is essentially a collection of all those word vectors but corresponds to each document, which in this case is an individual text. Each vector must have as many indices as there are unique words, so each vector is 17,608 elements long. Since there are 37,351 texts, we are looking at a 37,351 * 17,608 matrix! That matrix takes up at least 5 gb of ram on my computer. I say at least because my workstation would crash before it finished creating that matrix.

Luckily computer scientists have figured ways around this - a sparse matrix. Nearly all the elements in the matrix are 0 - no text has anywhere close to 17k unique words.  A sparse matrix only stores the non-zero elements. It is a little bit harder to do operations with this, but you still can, and it saves a lot of storage. The sparse matrix for this whatsapp group was only 5.3 mb. I combined this matrix with a vector containing the senders of each chat, and iterated through for each unique sender to find each person's total word vocabulary, or individual dictionary. These individual frequencies could be compared to the overall frequency in a couple ways. We could look at the % difference in values, finding cases where someone used a word 1/100th of the time and overall it was used 1/10000th of the time. However for words that were only used a couple times in total, this % value would be wildly distorted. So I removed from the consideration all words which were only used once overall, and created a weighted equation where the raw difference in values was also taken into consideration. The weightings I used here were arbitrary, but I tried a couple variations until I got words that seemed "interesting."

After successfully iterating through each sender (and not crashing my computer), I saved the 10 most "distinctive" words for each sender, and graphed these words for a bunch of people

Awesome. There are a lot of interesting words in here, which I'll get to in a bit. but first, what are the u0001---- things? Most of these are unicode for emojis - a couple of them are Chinese characters. And there are lots of emojis, to the extent that this graph is really more distracting than useful until those unicode sequences are converted into weird smiles. And thus I broke out the emoji data science tutorial, written by the affable Hamdan Azhar, who has actually founded a company around emoji analysis.

Turns out emojis are really complicated. The steps involve in making that graph pretty were extensive - I spent a couple weeks of free time on it. Hamdan's strategy is to create a dictionary mapping each unicode id to a name of the emoji, and downloading a bunch of emoji .png images with the same name. His tutorial links to a dictionary and set of images, unfortunately the dictionary I used did not contain unicodes and I had to find another one online. This one for some reason named some emojis differently. Aggravatingly differently. There isn't exactly one emoji regulatory body (or 👮👉 for short). For example, my dictionary had "grinning face with sweat" and my images used "smiling face with open mouth and cold sweat". Also, whatever regulatory body there is keeps adding new emoji and the dictionary and set of images were not up to date. So I expanded the repertoire as I came across new items. A new frustration came with the new png files, some of which create an error when I tried to render them. Turns out I needed to download windows specific emoji, some of which look quite different from browser, android or Apple versions. Eventually with enough "manual" work, I was able to redo the plot by removing all the text that matched with emojis, then one by one rendering an image of the matching emoji in their place. The cleaned up result is below:

How sweet is that? Some users almost exclusively communicate in emoji (Kingi, Cat MK, Rie). Mike Ying's first emoji definitely rang a bell, and of course the eggplant appeared on Neil's most distinctive words list. Also notable here are Sam Axelrod's 7th most distinct word, Lincoln's 4th and 7th, Clay's 9th, Conor talking about football, my love for Tom Brady, and the fact that Donna, the group's most prolific user, apparently just texts various different types of laughs all the time. Note, this emoji chart was done a bit later and with slightly improved methodology from the previous non-emoji chart, hence not all words match up.

More users:

Of course Wilkie mentions Madonna, Jeremy uses aviation vocab, Quention has a baby named "Marni" and Jason talks about master's. Ed Lee says "sold" whenever you propose any social event. I'm not sure what it says that Kirk's most distinctive word is "harem", but he's used it twice and no one else has. And yes there still are some frequency issues here. While I removed words that were used once overall, some of the words that show up here were only used once by the user and twice or thrice overall. Is a word really distinctive of a person if he/she has only used it once? Perhaps my weighting equation needs some reworking, but there was always going to be some issues, especially with users who haven't sent that many texts.

But Cal! There are still emoji unicodes in here! Yes there are. Like I said, emojis are really aggravatingly complicated. The basic emojis are all one unicode to one emoji - however they just keep expanding it. You know how the face emojis now have adjustable skin tones? That is a combination of two unicodes - the original face unicode and an additional one signifying the skin tone. All the flag emojis? They are a combination of two unicodes. And actually, England, Wales and Scotland are all considered subdivisions of a national flag and are somehow represented by a combination of 7 unicodes. My current methodology breaks up every unicode combination into individual words of one unicode. I could redo the process grouping everything into bigrams, but it's not even guaranteed to solve this problem. It's a tricky one that might be best solved with more manual work. The ungraphed emojis in the graphs above are mainly Hong Kong/USA/UK/England flags as well the skin-tone signifier emoji. There are still some Chinese text that are left encoded - while I've worked with Chinese text before, for some reason I had trouble getting them to display on this file. 

I get the impression that many people find big data and data science very abstract and impersonal. The algorithms crunching massive data behind your targeted ads don't exactly inspire congeniality. But these techniques can be applied to anything, including more personal data. I've already written posts looking at my Facebook data and my travel locations, where data visualization really helped me understand my own past better. Going through this particularly dataset was especially fun - I was constantly reminded of hilarious exchanges from years ago with friends on the opposite side of the globe. Does this analysis add any business value? Nope, but I spend plenty of time doing analysis for data that will add value, and sometimes it's fun to just see exactly how many times Tuan drunkenly messaged the group.

P.S. If I can do this, Whatsapp (Facebook) is also probably doing this with your data.

Monday, February 20, 2017

Urban Clustering

When you see a picture of a city, does instinct immediately bring you to guess where it was taken? It’s an urge I can’t quite suppress. I find that even if I can’t recognize the city, I can almost always still guess the continent. Even without signpost textual giveaways or inhabitants’ facial features, many subtle urban features can clue you in. The roof architecture, street food, make of the buildings, road paving - these all help distinguish the continental origins of a city. Though there is incredible urban diversity between countries within continents, it still seemed to me that cities of one continent had more in common with each other than they did with cities of other continents. 


I was staring at an aerial photo of Dhaka when I decided I wanted to numerically prove this hypothesis. Surely there could be some city metrics that featured more variation within continents than between them - the old ANOVA test from classical statistics. And so I set about gathering as much data as I could.

The headaches started immediately. This project almost certainly couldn’t have been possible at any scale 5 years ago. It will certainly get easier in 5 years. There aren’t really any comprehensive worldwide cities database. There are no standards of definitions and though efforts are being done to correct this (including a proposal for ISO for cities), nothing has been widely adopted. Even basic definitions of area and population are frustratingly inconsistent, with significant discrepancies of where to draw borders. Here though discrepancies are also continental in nature - American cities go by strict legal districting, whereas Asian cities often redefine their borders to match their urban sprawl. I had begun with dreams of finding creative metrics such as the average sidewalk width or the % of restaurants open after midnight, but soon realized I’d have to settle for what I could find.

Acquiescing that this would not be an exact science, I began with using a base dataset from the World Cities Cultural Forum (WCCF), who collected such interesting metrics as Art Exhibits daily visits and Rare & Secondhand Bookshops for 25 core cities. Their data was not without proven  flaws (there was clearly a lack of consistent methodology) resulting in some questionable figures (Berlin has 4 rare bookshops and Johannesburg has 943?). I would fact check strange results and often manually make changes after vetting through data collection methods. It might seem like modern society is swimming in big data, but estimates for international tourists in Hong Kong ranged from 60 million to 27 million because the government doesn’t have a method for identifying what a tourist is.

Ultimately I created a dataset of 42 cities (10 from Asia, 19 from Europe, 6 from North America, 7 from elsewhere) with 24 metrics. These included Number of Concert Halls and Median Weekly Earnings from the WCCF supplemented with data I could find on metro systems (length of rail, annual ridership and % usage), number of Starbucks, CO2 emissions, number of airport runways and the number of international firms (calculated by McKinsey). There was plenty of missing data, which I imputed with the metric mean. Then, on every possible permutation combining 6 metrics, I ran a K-means algorithm. I then analyzed the resulting clusters and found the combination that best matched reality, putting over 70% of the cities together with other cities from the same continent. There were a few “best combinations” and the one I’ve chosen to display in this application is Foreign Born %, Number of Cinemas (per capita), Metro Usage, Number of Restaurants (per capita), Working Age Population (as a % of total population) and Number of International Service Firms (per capita). Some takeaways from this combination:
  • American cities have by far the highest foreign born %, followed by European cities. Most cities in Asia, Africa and South America have close to 0 foreign born %. Singapore, being an exception, was actually clustered with the North American cities by the algorithm.
  • European cities have a lot more cinema screens per capita than other cities.
  • Asian cities have way more restaurants per capita (although this statistic is hard to measure)
  • Asian cities also have a large % of working age population, with American cities at the other extreme. To be honest, this one doesn’t quite make sense. I do think you see more elderly working in Asia - and when it’s a little octogenarian pushing trash uphill, in heartbreaking public ways - but I don’t think that’s captured in the accounting methods here. More likely we have vastly different population denominators between methodologies.
  • Predictably, most international service firms are European or American and thus cities from those continents have much higher firms per capita. This statistic is pretty biased but I think it might have some effect on how a city looks and feels, as a proxy for how many familiar logos one sees.
  • It wasn’t clear to me what to do with the cities outside these 3 continents in my database, including 3 South American cities, Istanbul, Mumbai, Johannesburg and 2 Australian cities. I labeled them all as Other, but the algorithm clustered the Australian cities with the Europeans, which meets the eye test. 
The app has an interactive map (built in Leaflet) with all the 42 cities plotted and colored by their cluster. The color legend labels each color by the continent most associated with the cluster - this results in cities with a label not matching their true continent. Rome is colored as an Asian city - that just means it is closer to the Asian cluster than any other, even though I am fully aware that Rome is in Europe. No geographic information is included in the clustering algorithm.

The app lets you click on a city to have its data popup. You can also see 6 tabs of on the left which show density plots for each metric, split by the clusters. The idea is that the density plots will look rather distinct for each cluster. An orange line then shows where the metric of that exact city fits into the density plot. For cities missing data for a given metric, no orange line is shown.

There are plenty of flaws in my methodology and data, but ones that can be improved over time with more and better data. I believe there are many metrics that can reveal interesting urban planning or sociological distinctions between the continents - essentially the data that I saw helped confirm my thesis to me. Understanding the underlying reasons behind these distinctions can drive interdisciplinary conversations.

Personally this was also an important project for me. It was the major impetus driving my data science training, giving me a goal to work towards that necessitated me learning about data merging, language encoding, data standardization, clustering/classification algorithms and web application development. I even talked about the project in my final interview with GE.


The project is hosted on the free shinyapps.io server at https://cal65.shinyapps.io/Cities/ . This minimally viable approach is slow and won’t work when my home laptop is turned off (!). For users of R, a better user experience is available by downloading the Shiny package and running runGitHub("Cities", “cal65") . All my (sloppy) code is up on Github, and I’m happy to collaborate with people to improve this project. Shoutout to Ivan Peng for helping me on the project and teaching me how to setup on a database in Python!

Tuesday, December 27, 2016

Asian Calmination - Thailand

Thailand
I was pretty zonked out boarding the bus at Siem Reap at 6am, and almost got lost in the border crossing no man's land, but 10 hours later recognizable sights and sounds of downtown Bangkok refreshed me. This was the first "not new" stop on my trip. I was far from familiar with Bangkok, having spent most of my time there in various Sukhomvit Sois (side streets), usually inebriated. Sukhomvit is a huge road and a life artery for the wilder expat scene, but the enormous city has far more to offer. I alighted near the railway station, where backpackers and motorcyclists flooded my path. I elbowed my way into a 7-Eleven to purchase a phone card. A pay-as-you-go phone plan was cheap in Vietnam and Cambodia, but for whatever reason I hadn't felt like getting one - a foreign trip just feels different when connected to the interwebs. However I was planning on being in Thailand for quite some time, and staying with a friend, so a phone seemed necessary. Service in the Land of Smiles is usually pleasant, but the 7-Eleven outside Bangkok Railway Station broke such stereotypes. The clerk went to activate my phone card, left me hanging for 15 minutes, then came back to simply say, "No." The language barrier surely inhibited a more detailed explanation, but this guy's lack of effort was infuriating. In one of the few shoplifting efforts of my life, I pocketed the card (and later succeeded in activating it), and snuck out the store.

"Cal!" Imagine my pounding heart - someone had caught me.  Did I give the store clerk my name? I looked up to see one of the many backpackers on the sidewalk staring directly at me, a white guy with impressive facial hair. "Do you remember me? It's Mark Waterman!" I quickly pieced together my memories from another life of a college ultimate teammate, and had one of those "it's a small world" conversations. In fact I was in town for an ultimate hat tournament, but in a twist of small world irony, Waterman was not in town for that tournament.

I stayed at fellow ultimate player Asha's apartment in Silom. Dinner that night consisted of delicious street food with other visiting players from Islamabad, Pakistan of all places. The Bangkok Hat is one of my favorite tournaments because of the players it attracts. Its central location regularly allows as internationally diverse a participation group, from the UAE to Japan, as any hat tournament in the world. I captained a team with a large Singaporean core, supplemented by local Thai players and expats living in China. I think we finished 2nd to last, but no one remembers that. 

Getting to explore the rest of Bangkok was a wonderful boon. You can get around the city pretty painlessly - outside of rush hour. The BTS is actually quite convenient when not jam-packed, and if you have the luxury to live near it. At Arup, I had worked on a luxury mall in the city called Icon Siam that was under construction, and savored the chance to check out an overseas project for the first time.  On the western bank of Bangkok's main river Chao Praya, I approached from the east bank where the Shangri-La and the Mandarin Oriental landmarked an upscale neighborhood. Water transport used to be Bangkok's main mode of transport, and though times have changed, a raft ferry across the river cost just a few cents. The neighborhood around the western pier was drastically different, populated with dense apartments and tiny convenient stores navigated via narrow dirt roads. The construction site itself was closed off with larger banners blocking a sneak peak, and so I made my way down an adjacent narrow road. To my surprise, the path took me to what could be described as a shanty town - a bunch of tin shacks and some abandoned wooden houses on stilts. People were living underneath the stilts, proved by the operating clotheslines and hammocks. 
Unbeknownst to me, the multibillion-dollar luxury mall where I had done advanced daylighting simulations was next to this squatter settlement the entire time. Did my project replace the lives of many poor Thai residents? What sort of massive gentrification had I partaken in? Would the Icon Siam eventually help the lives of the people on the wrong side of the Chao Praya?

From the train ride over, I glimpsed a crazy looking building - Google eventually told me the story of the eerie abandoned Sathorn Unique Tower, 68 haunting floors of bankruptcy. Apparently bribing the security guards and taking the stairs to the top is a thing - alas minor shoplifting was enough lawbreaking for me. I also managed to visit the equally unique Jim Thompson House, a combine of traditional Thai houses lifted from remote villages by Jim Thompson, the American architect turned WWII spy who revitalized the Thai silk industry before mysteriously disappearing in Malaysia.

I hadn't made solid plans post-Bangkok, and realized too late that I may have been overstaying my welcome. To the south of Thailand were some of the loveliest beaches in the world, but I'm not much of a beach guy and I'd missed out on a bachelor party in Chiang Mai, so I looked northwards. Unbeknownst to me, that very weekend was a long weekend on account of the Buddha's birthday, and buses and trains to Chiang Mai were packed. I cover this ordeal in a separate post, where I explain how I eventually ended up in Phitsanulok, the city of 84,000 halfway between Bangkok and Chiang Mai and almost died on the back of a motorcycle on a highway. I left out another harrowing experience on that trip. Once I returned from the Sukhothai ruins back to Phitsanulok, I still wasn't home. First, I explored a sprawling night market that swallowed up two Wats, which was awesome - so many delicious options. 


Then I found the Thai address of my hotel, which was really a motel 20 minutes outside the city, and showed it to a motorcyclist who was very eager to give me a ride. He looked at my address, then started driving - all the way to the first stoplight, when he asked me where to go. I was like I don't know! Here's the address again! He huffed and took a right, then in the middle of the street flagged down a driver and asked her to look at the address. I couldn't believe it. The woman read my address several times, then had a far lengthier conversation with my driver than I felt comfortable with - a simple "it's that weird Days Inn ripoff right off the highway" should have sufficed - and finally the driver seemed to know where to go. I relaxed and leaned back,  or as far as I could on a motorbike. After a few intersections, I took my phone out again and checked our progress, and was stunned to find that we were going the opposite direction! I shouted stop to my driver and jumped off. As I was showing him the address again, furious and confused, I finally realized that perhaps he couldn't read. This had been such a rarity everywhere I'd been - China has a 90% literacy rate now - but I'm guessing the illiteracy rate for motorbike drivers in Phitsanulok is not insignificant. I spent another 20 minutes in the area unable to find a taxi or motorbike before finally flagging down a tiny clown car with my battery at 10% life. The car zoomed along at 25 mph and by the time I got home I was exhausted. But hey, at least I made it to those ruins.

The next morning, my bus to Chiang Mai broke down. Really not a great transportation week for me. We were waylaid for a bit over an hour and then crammed onto another bus for the rest of the journey - luckily only another hour. Behind Bangkok and on par with Koh Samui, Chiang Mai is among the Thai places best known to westerners. For tourists, it's an old city with a major airport and access to many activities. There are elephant sanctuaries, zipline course, hikes, night markets, ancient temples and massages galore. Though a city with less than a million inhabitants, Chiang Mai is also a place of abode for a unusually many foreigners, many working unusual jobs. The city teemed with cafes and bars run and frequented by white people. I struggle to find a similar city in Asia - Bali is the best I can come up with.

As a city, I found Chiang Mai...ok. The centre is constrained by a moat and well-preserved city wall, and there aren't a lot of those in the world. Many of the cafes and bars are objectively charming. I guess I found Chiang Mai to be too much of a tweener place. It was too busy and industrial to be quaint, but not nearly built up enough to be productive or have a skyline. It was too touristy to be culturally interesting, but not centralized in its activities, resulting in my walking around constantly feeling like the cool kids were partying elsewhere. Perhaps I was doing it wrong - the most appealing aspects of a Chiang Mai vacation involve getting outside the city. As such, the city itself can feel lazy and boring. 

Among the outdoor  activities I partook in was hanging with elephants.  I had only recently learned how cruel training elephants to be ridden was, and now activities like what I did, bathing and walking with rescued elephants, were in vogue. The Elephant Jungle Sanctuary tour I signed up was located 90 minutes outside Chiang Mai deep in mountainous jungle, and had four elephants. I think all four had been rescued from other, presumably less moral, elephant tours. Seeing the elephants walking through the jungle was pretty surreal, thinking about how heavy they were but how quietly they were stepping.

My other Chiang Mai activities included going to a bizarre outdoor music festival in a hot air balloon field, with a handful of imported American freestyle rappers, and playing pickup with the (relatively) large ultimate community. There I met Jazi, the mysterious Israeli-born handler with a huge beard. When Jazi heard I was a math major, he actively tried to recruit me. He never got too detailed with the type of work I'd do, but it was basically data analysis for his online gambling site. I didn't find that particularly interesting, but I found him fascinating. Jazi was part of the sizable digital nomad crowd that had settled in Chiang Mai, and had made enough money from the site that he didn't need to work a ton. He spoke with a thick accent but with a learned vocabulary, for he had learned English in his adulthood for the express purposes of understanding an academic computer science paper (which I still can't understand). His English was aided by having been raised bilingually in Hebrew and Yiddish, a Germanic language. He didn't seem to have had a formal university education, but instead got intense programming training in the military.

After 3 days I boarded a bus to the legendary backpacker town of Pai. Unlike Chiang Mai which I think is trying to be many things, Pai knows what it is. It's only got 6,000 or so residents, but another 600 or so tourists, and about 60 ways to make a pun on its name. With streets full of souvenir shops,  signs in English and four 7-Elevens, no one will mistake it for an authentic Thai village. However it's geographic remoteness helps preserve a chill atmosphere and keep it from getting overrun with tourists. Some of the bar owners I spoke to were Bangkok transplants seeking a quieter place to ply their trade.

I stayed at an inn complex with an upside down house in front of it. It proved to be a popular photo spot for tourists who were consistently mainland Chinese. Most of the Chinese tourists traveled in groups to a variety of sites, including a strawberry field outside of town that wasn't even on my English map. The tourists explained to me that there was this love movie popular in China set in Pai, which started a buzz for this spot. Funny how one movie can tap into a market of a billion people.



2016 in Recap

Some years are just more eventful than others. 2016 was a year where a hell of a lot happened, both in the world and in my life. My bulletpoint summary of this year is about as full as I could imagine. In order:
  • I left my job at Arup in Hong Kong after 4 years
  • I took on consultancy work with a HK energy efficiency startup and helped them win a major project
  • I flew off to Vietnam and lived the spontaneous backpacking life for 3 months
  • Continued through Cambodia, Thailand, Laos, Malaysia, Singapore, Xinjiang, Beijing and finally Shenzhen to Hong Kong
  • With a tournament and going away party, said farewell to my friends and family in Hong Kong
  • Came back to America and resettled in Boston
  • Learned how to use Python by finishing a webscraper project I'd started years before
  • Mastered ggplot and R data visualization techniques
  • Went to my 10 year high school reunion
  • Watched all of Game of Thrones
  • Went to my cousin Parissa's wedding
  • Underwent LASIK
  • Went to Toronto and saw my cousin, having missed his wedding years before, and his wife and baby girl
  • Saw my mom retire after an incredible career
  • Met local politician and role model Michelle Wu at a fundraiser
  • Got hired by GE as a data analyst. Started work 5 years to the day of my start date at Arup
  • After growing out facial hair for many months and focusing more on upper body exercises than ever before, dressed up as Cal Drogo for Halloween in NYC
  • Became an uncle to a girl named Audrey
However, life is not a series of bulletpoints. Many of my darkest hours occurred interspersed between those highlights. Leaving my job in Hong Kong without anything else lined up was not an easy decision, and it created serious consequences. There were long stretches where I wondered whether I had made terrible mistakes in pursuit of some of these highlights. There were months where my transition to a different country and different industry was rife with disappointment, and I had little salvation on the horizon. For so long, I floundered about in the unknown, with no confidence that I was on the right track. 

I'll start with my decision to leave Arup in Hong Kong. Or maybe I'll start with a Steve Jobs quote, from his much ballyhooed acceptance speech at Stanford. Among his many great quotes was this one:
"I have looked in the mirror every morning and asked myself: 'If today were the last day of my life, would I want to do what I am about to do today?' And whenever the answer has been 'No' for too many days in a row, I know I need to change something."

That hit me one Monday in early December 2015. The weekend had been a life highlight - a successful hosting of the Asia Oceanic Ultimate Championships, a massive 300 person tournament in which I had a small, but still stressful, role as an organizer and competitor. The tournament was more thrilling than I could have possibly imagined, validating this little hobby that I've pursued for over a decade. Compliments from multinational strangers made it feel like my hard work and passion had paid off.

That next morning I returned to work and nothing had changed. I had a ton of awful boring tasks to complete and hand off to unappreciative coworkers. For years I had balanced my work and ultimate life and dealt with the incongruity between the two, but this morning, I felt like I might be wasting the last day of my life. After two weeks consulting friends and family and receiving a mixture of encouragement and warnings, I submitted my resignation letter. I had only partially convinced myself - I almost withdrew the letter that afternoon and made moves towards an internal transfer to London. But ultimately I quenched my self-doubts and heeded the Cantonese idiom "delay no more," and one month later I thanked my colleagues and handed in my badge.

At that point I didn't have a clear plan - I hadn't 100% committed to returning stateside. All I knew was that I had long held aspirations to travel in bulk during one of those rare life intervals where this is possible. Packing up 4 years of life was hard, but picking up and backpacking was the easiest part of this whole year. Work life had bottled up so many goals and now the open facing road provided every opportunity to let me pursue them. Seeing new places was hugely important to me - it's added so much to my life - but I was equally as excited to pursue a bunch of nerdy projects. Among the possessions I stuffed into my backpack were research papers I printed out with my Arup ScienceDirect account and a book on web app development. 
From 4000m up on the Karakorum Highway


I've got other posts to recap that trip and describe how great it was for me. I didn't set off intending to find myself - I went in with life goals already well-defined - but I ended up learning a lot about myself. Spending that many hours on end alone with your thoughts, with long bus rides to read and write and think, was a pretty fantastic reset button. This travel experience was fairly different from my previous weekends gallivanting around Asia - I spent a lot less time in cities, a lot more time socializing with fellow travelers and locals, and revelling in the freedom from itineraries. The trip significantly influenced many of my beliefs, including those on globalization and ethnic identity, and refined my thoughts on how to travel

Most importantly though, the trip recentered my approach to life. I think it was during a glorious week in Penang, after the wilder chunk of my backpacking, a week I intentionally setup to pursue some of my deep-seated goals. I love working on my laptop in coffeeshops and I had always wanted to uproot and plop down in a coffeeshop in some faraway city. Penang was the perfect spot, as I'll explain in another post. That week I was focused - I awoke at 7am to run in the equatorial heat, finished this post, and finally figured out how to plot points in shapefiles. As I sipped Tiger beer in celebration, the loneliness really struck me. This trip had been all about me, myself and I. And when I thought about it, so had my years in Hong Kong. 

When I left Hong Kong, neither my self-indulgences nor my self-improvement would remain behind. The only real imprint I'd leave was those I'd left on people. I hadn't been entirely self-absorbed in my time there, but for one reason or another I hadn't made that many truly great friends. I hadn't realized it then, but there were many subtle changes I could have made to my priorities and interactions with people that would have made a great difference. It took me three months of solo travel to realize that relationships are everything.

The relationships that I had made were instrumental in getting me through the 2nd and 3rd quarters of my year. I came home and thought I'd get a job pretty quickly. I had two interviews already lined up with building energy/data science roles. Neither of those interviews panned out - in one case the recruiter completely ghosted, not returning any emails after we'd already exchanged 10 and agreed to setup a meeting in Boston. The other involved me flying on my own dime to DC, and a bizarre rambling rejection several weeks later that a friend described as "word vomit." I was surprised and disappointed - I had already talked myself into that unique role, but had some relief that I could now focus entirely on tech roles which I preferred. After my time at large corporation, I wanted to try the tech startup world.

Turns out that changing industries was hard. I had the right educational background, but with no experience in the roles I was applying for, I couldn't hit the ground running on any job. I pored through dozens of job descriptions every day and felt overwhelmed by the amount of words I didn't know. Scala, MongolDB, postgreSQL, Docker, the list went on and on. I had just barely learned Python - now I saw that there were so many different interpretations of data scientist, and what I was able to do was not enough. In Hong Kong I rarely had any deep data science discussions with anyone - my ability to use R was an extreme novelty within my firm. In Boston, I couldn't throw a stick without hitting a couple developers, and all those great universities churn hundreds more out every year. These kids were being taught those skills that I was now trying to learn. Additionally, I came in with a further few disadvantages. Many American companies, and startups especially, do not give great respect to international experience.  I found my language skills very rarely valued, or even noted - at best, it was an interesting tidbit that I happened to have spent hundreds of hours on. Even worse, my firm Arup, though well known in many parts of the world, was met with clueless stares by those in the American tech sector. While talking with an older female HR recruiter for GE, in response to my work experience in sustainability, condescendingly poo-pooed my "green living" phase and said that a lot could change after my "first job." When I asked her what she meant by first job, she mispronounced my previous employer's name (Ovid Arp?) and asked whether it was an international or big firm. Not that it should really matter, but Arup has 10,000+ employees headquartered in London. Luckily her opinion wasn't the important one, and other interviewers at this global firm saw the importance of my international experience .

What I had going for me were real projects that I enjoyed. I had good ideas that pushed me to learn all sorts of new technical skills and I loved it. They gave me purpose, and if there's anything you need when you're unemployed, it's purpose. Primarily pursuing around with my theory that city metrics could be found that would cluster together among the continents, I dove down a project that taught me formatting regular expressions, map visualizations, encoding formats, clustering algorithms, dataset merging, database design and creating interactive maps. These were techniques I might have been able to learn in a course or something, but with a project in mind, I got to learn them on my terms (and for free).
Ogawa Coffee, one of the coffeeshops I spent time at

A negative turning point came in mid July. I had been back for almost 3 months already and gotten the lay of the land, knew some startups I wanted to work for, and networked my way into an interview with a startup that I really liked. It seemed like a great fit - they were hiring for an entry level data analyst position, I ticked off just about all the job requirements, the company did great socially impactful global work (including in China), were financially stable and even shared the same name as one of my good friends. I passed their phone screening, then put in a solid 10 hours completing their take-home assignment (recommended 3 hours). I was brought in for back-to-back-to-back interviews and left feeling like this would be an incredible place to start my new career.

I was devastated when I didn't get that job. They had liked me, but not enough to compensate for my lack of experience. I had nothing else on the horizon, no other interviews in the pipeline. That week, it was so hard for me to move on from that rejection back to studying clustering. It was hopeless looking through job ads posted by marketing companies, e-mailing my resume to anonymous black hole addresses, googling all these biotech terms that I didn't know. I'd been ready to work that very Monday and now I didn't know when I'd work again - it certainly wouldn't be soon. That indefiniteness is rough practically and emotionally  - it disrupts one's abilities to make plans, and one's ability to sleep at night.

During the intervening months, I didn't get many interviews. Half the time I didn't even get a substantive reply. (As an aside, I think the way job seekers and employers use online recruiting has not converged, and hopefully this is a field that will keep getting more efficient.) In those days, I felt very aggrieved and sanctimonious. How could all these companies not recognize my talent?  Did they not know how good a writer I was? Did they not care about my nuanced thoughts on socialist reactive movements to colonialism? Were they aware that I had goals far beyond the scope of their little enterprise software? Each individual rejection or non-response I could accept in stride, but in their collective, I felt like my entire career and life path was being rebuked. 

After identifying a startup in travel that I liked, and finding a friend who knew someone at the company, and again passing through a phone screening and a 10 hour take-home, and getting rejected again, I was in pretty deep despair. In these times, great friends were there to remind me that 6 months was not that long. That I had so much going for me, that in fact these companies did suck, and that it would be much better on the other side. With so much time on my hands and so few people around, I texted all the time. I had time zones down pat and knew how many hours I had in the morning to get responses from people in Asia, then I had my friends in Europe until 5pm or so, and then I'd talk to the west coast until Asia woke up again. You know who you are, but I will still shout you out: Ben Goldsmith, Joan Xu, Jackie Fan, Diana Pang, Kat Tse, Michele Mak, Ria Sunga, Charlotte Poon,Seems Tsang, Hyun Park, Asha Sharma, Lesley Sim, Jen Thomas, Hannah Lincoln, Andrea Phua, Che Bello, Maggie Lonergan, Janice Shon, and my Boston-based friends Henry Fingerhut, Glen Cornell, Alison Shin, Sam Malin and Joe Nasser.  To give people an idea of what these folks dealt with, these are true back to back text mesages I sent to Jackie Fan: 1. "How do I deal with the existential dread today?" 2. "Omg I got the job with GE"

Now busy and employed and settled into a Cambridge apartment twice the size of my Hong Kong apartments, I feel genuinely grateful. I'm lucky that GE moved into Boston this year and in their tumultuous move, they weren't able to interview too many candidates before needing to respond to me.  I'm grateful that I pulled the trigger on this career switch now - in 3 years, the average college graduate would likely be too good. I'm lucky that the major I studied happens to be employable now - this certainly was not a foreseen consequence of my 18 year old self. I'm extremely grateful that my parents let me live with them rent-free and in a city with a thriving tech scene (what if I had been from Buffalo?). Even within the context of my own life, this was a particularly privileged period.

To be fair to myself, I put in a lot of work. I proved to myself that without structure, without guidance, I could still work productively. At GE I recently had to solve a similar geospatial visualization problem that I had tried to solve in March. What had previously taken me a week I now typed out in a matter of minutes. I had gotten in so many data science reps that I had forgotten how complicated some of these problems were that now felt old hat. Like a tennis player whose racket is an extension of his arm, I became attached to my R-running laptop and approached new datasets like returning groundstrokes from different angles.

So it ended up being a good year for me. However I'm not sure what lessons I'd glean from it, or what I'd recommend to my friends. I'd happily extoll the virtues of taking a long solo trip, giving your mind space to clear out and visiting the less accessible parts of the world. I'll go on and on about the importance of living abroad. These experiences are fundamental to forming my worldview and basic personality, and I wish that everyone could share them. However, I do not wish everyone to share the months of rejection and despair. Without the guarantee of a happy ending, that is not a fate I'd wish on just anyone. So take my journey for what it is and draw from it what you will - I won't preach about quitting your job and traveling or whatnot. What I will preach about are the importance of relationships. I sincerely resolve to put friends and family on the top of my priority list. To all the people who assisted me this year - thank you. To all the people whom I wasn't a good enough friend to - I am so sorry.  I regret the happiness that we could have shared, the lessons we could have learned from each other.

To friendships in 2017 and beyond -

Sunday, November 6, 2016

All Sides of the Border

There’s no doubt that immigration is one of the main issues in the 2016 US Presidential Election, if not the main issue. The Trump campaign kicked off with the unusual idea of building a wall on the Mexican border,  and the resulting dumpster fire has routinely dehumanized of immigrants and refugees. Immigration is also a divisive political topic throughout Europe, especially the Brexit-ing UK, and as far as Singapore. I won't rehash all the notable anti-immigration rhetoric, but I'll just leave here this gem from Fox News displaying their nuanced understanding of Chinese immigrants.

There’s certainly no lack of pro-immigration champions. We have all sorts of arguments for taking in immigrants: they add value to our economy and actually create jobs, they bring in new ideas and cultures, that America has always been a country of immigrants, that it's the compassionate option. If you want a collection of pro-immigration arguments and stories, go no further than Define American created by Jose Antonio Vargas.  However, I find some of these arguments fundamentally flawed and not unified, inadequately voicing a cohesive basic reason for supporting immigration.

I am not writing a policy piece – in fact I really don’t intend this to be political. I just want to lend an international perspective to reframe a dialogue that frankly nauseates me.

As a child of immigrants to America, I have seen much of the American immigrant experience. As an American who has spent 5 years of adulthood living outside the country, I have also seen much of the American emigration experience, which I call (controversially) the expat experience. This piece explores the staggering differences between those two experiences.

In western countries, assimilation is the go-to word in expected immigrant behavior. Assimilation is about adapting your language, choices and activities to fit those of the people around you – you know, changing everything about you. Immigrant children are expected to go to local schools. Eyebrows can raise up when immigrants gather for group cultural activities, whether it's prayer at a mosque, a Chinese lion dance or a cricket game. There are no shortage of stories of Americans feeling uncomfortable in the presence of people speaking other languages, even starting confrontations. The phrase “go back to your country” has likely been uttered angrily to an immigrant, or maybe just a visible minority, a dozen times since you began reading this.

I wish there was good data on ex-patriates and their language skills, but I am not aware of such data. In lieu, I have my personal anecdotes from travels. I’ve met lots of impressive multilinguals, particularly in Beijing and Tokyo, but I’d argue that in no major city do a majority of expats successfully learn the local language. The percentages get a lot higher outside the major cities, but even in the smallest villages, I’ve met expats piss poor in the local language. 
Hong Kong is a particularly extreme example - as a former British colony, English is an official language and still dominates in higher education and the professional world. Among places in Asia, only in the United Arab Emirates and Singapore do westerners put less effort into learning local languages. In my 4+ years in Hong Kong, I met two people who learned Cantonese from scratch to a proficient level. And I met a lot of people. The overwhelming majority of foreigners possess a core vocabulary of "hello, left, right, thank you, shrimp dumpling." 

In Hong Kong there is no shortage of 15+ year long expat veterans who cannot converse in Cantonese. There is no shortage of people born and raised in Hong Kong to western parents who could not converse in Cantonese – I’ve met easily 100 people in this demographic and not one was fluent. The Kadoories, one of Hong Kong’s oldest and richest families, no longer speak Cantonese. Western children are expected not to go to local schools, even though Hong Kong’s education system is great. Very often they’re multilingual in French or German or Mandarin, able to communicate to anyone but the people around them. A Swedish coworker who had been in Hong Kong for 7 years without speaking Cantonese explained unironically of his resentment for Iraqi refugees in Sweden, who lived in enclaves for years without learning Swedish.

The game is pretty rigged for English speakers all over, even in places without direct colonial legacies. There are English announcements in all the subways of Asia, from Tokyo to Bangkok to Changsha, as if there is some UN decree. Nowhere in America are there even Spanish announcements. I literally spoke English every day I spent in Asia and never once did I worry about making people uncomfortable. Often I’ve been that American engaged in loud uproarious English conversation with friends on the public subway, and not once has anyone dared complain or told me to go back to my country. Expats abroad party hard, even when local cultures that don’t, and easily engage in drugs, even when local laws heavily criminalize them. Many expats work for years on tourist visas - not once have I heard an expat referred derogatorily as an illegal immigrant.

The truth is that becoming an expat is a bestowment of privilege. You are assumed to be an educated professional and granted an amount of freedom to make yourself comfortable. The assumption of a white collar  job isn’t necessarily true – there's this white minibus driver in Hong Kong. And experiences may differ by place and ethnicity - many parts of Asia are deeply racist and sexist - but I think most expats will agree that their social status elevated after moving abroad. The reverse experience is precisely the opposite. A non-westerner moving to a western country knowingly engages in a stripping of privilege, often profiled as a job stealer or an uneducated migrant, regardless of background.

You might think now that I wrote this to excoriate expat behavior. Not at all. I was an expat, and I took full advantage of my privilege in Hong Kong.  In fact I mean to paint the picture of immigrants to the west in a sympathetic light. It’s easy to judge an immigrant for their lack of assimilation, their inexplicable clinging to their old country ways. But until you try, you might have no idea how hard it is to assimilate. How hard it is to learn the local language. How hard it is leave your culture behind, how greatly you desire to keep doing the activities that have always made you happy. I lived as an American in Hong Kong, where I have direct ancestry, for four years and I wasn’t close to assimilating. I wasn’t even on the path to assimilation – I could have lived there for 40 years and I would not have enjoyed drinking hot water like a local, I would not have watched TVB programs like a local, and I would not have stopped calling in sick on Super Bowl Monday. I think there’s nothing wrong with that. Sure, I wish that more expats in Hong Kong could be more engaged in local affairs, but I don’t see anything fundamentally wrong with a society that has diverse groups of people happily doing their own thing.

So if you’re a citizen of a western country and discussing immigration, please consider the following tenets. Understand the degree of difficulty. Embrace the diversity. Check your fucking privilege. Try to accept immigrants not because they add to the economy, or because you live in a country of immigrants – because this doesn’t excuse discrimination against immigrants who don’t add to the economy or excuse countries without a legacy of immigration. Try to accept immigrants because they are humans, and any one coming with good intentions should be welcomed. On a global issue like migration, we cannot narrow our focus to how it affects us in our little part of the world.  We need to be cognizant of the underlying causes that motivate people to make dangerous and difficult journeys to dangerous and difficult lives in a strange country. We need to address an imbalance where an American college graduate can jump into an upper middle class lifestyle teaching his/her native language in Korea while an Ivorian man with a Master’s degree scrapes by driving taxis in New York. At the end of the day, it really shouldn’t matter where you are born. And yet it matters so, so much. Can we try to push this world in a better direction?
-->