On days when I could stomach a future in biology, I would think of becoming a botanist.
This wasn’t out of any particular love for the subject. Our botany lectures were fine albeit mind-numbing, and I hated introductory taxonomy as much as anybody. But there’s a material immediacy to biology that has always appealed to me, and botany seemed the best way to sink my hands into the pulsing heart of living things without having to wade through too much bone and blood and viscera.
I wanted to feel, you see, but I’d had enough of the messes that entailed.
Plants have always been easier than people.
When I was small and the first terrifying, incomprehensible fits of anxiety started, I would hide in the terrace of my grandparents’ house. I’d squeeze into a corner — hoping, maybe, that the pebbled granite would dig into my skin deep enough to give the panic somewhere to drain through.
Sooner or later, my grandmother would come to water the plants. Sometimes she took my hands, ran some water over them to clean up the scrapes. Sometimes she didn’t. Either way, at the end of her rounds, she’d pick me up and take me back into the house. Calmness, for me, still starts as shades of purple, like the bougainvilleas I’d watch her tend on the worst afternoons.
In the long summers during high school, when I had to learn and relearn how to be home, I’d bring in some calamansi from the tree outside and watch my grandfather squeeze it over the pancit he’d always have for merienda. He’d give me an extra fork so we could share, and I’d always decline because tiny decisions like that used to feel like the only way to bring my life to heel, but the offer would settle my nerves, anyway. Sometimes we talked; sometimes we didn’t.
When I think of the home we used to share, there’s still that skip and stutter, straight from the extra calamansi I used to roll across the tabletop, the leathery skin of it sticking a little on the glass. The settling, I think, I’ll have to relearn elsewhere.
I don’t know if I can do that here. This place is as far from blood and guts as you can get, but that doesn’t automatically mean solace, does it?
Sometimes I wonder about wandering, how much and how long before it crosses into an unsalvageable rootlessness. If not here, if not in any of the places that came before, then where?
Maybe it’s too early for an answer; maybe it’s too late. In any case, we visited the Gardens for the first time in months today, and it felt good to breathe in flowers again.
The relentless hustle of startup culture, or the crunch of game development, or whatever synonym other tech-adjacent industries like to use for the endless grind of work, all rest on the metaphor of the worker as a single-minded machine.
When I stumbled on this article, then, I had to laugh:
An interesting detail from the study is that this finding only applies to “biologically realistic neuromorphic processors.” That is, most other artificial systems don’t run into this problem. Rest — or some analogue of it — becomes necessary only when systems try to mimic human brain function.
What is it about the way we think? Setting aside considerations of biology and chemistry, why does human thought require sleep?
Rest is strange in the grip of a pandemic.
Most of us are presumably confined to our homes, and on paper that sounds like an occasion to relax. It isn’t, of course: fear and anxiety are constant; some people have to grapple with childcare duties, uncomfortable home or family situations, loss of income, the need to risk their health to survive, the dissolution of boundaries that tends to occur with extended remote work arrangements.
No wonder so many people are having weird dreams or keeping odd hours.
“Has your sleep been messed up too?” a friend asked me recently, when we met up to check on each other’s sanity.
We’re both uncomfortable with rest, it turns out. The circuit breaker (CB) was a time for paranoia and unease, not just over the virus but over what actions might be penalised in the country’s attempt to contain the spread.
Early on, she’d helped friends move out of their flat and belatedly wondered if that counted as a social visit to another household, leading to a whole month of fretting. She had been sleeping at 4 AM. These days — a week or so into Phase 2 of the post-Circuit Breaker period; months (for her) and weeks (for me) into excruciating job situations — she’s been trying to sleep before midnight.
“I slept at 8 AM,” I offered. That was one time, and anxiety is ultimately easier to endure while unconscious, but in any case, it made her feel better.
I saved this screenshot from Tumblr some time ago:
I’ve been reading a lot about the nature of information lately, and one statement in particular1Hastily jotted down in a notebook, without citation, sigh has stayed with me:
“Data is time-sensitive.”
The validity of whatever information we possess erodes over time. Hard drives can fail; phone numbers can fall out of use. Likewise, access can decay: websites shut down; programs get deprecated; file types or formats cease to be supported. Someone given a floppy disk of people’s contact details, for example, would have trouble using that information, even if nobody in that directory ever changed phone numbers.
Every now and then I wonder about how this affects — well, a lot of things, really. Our relationship with tech, for one thing; our ability to remember or learn from prior knowledge, for another.2Anil Dash rightly points out that the dearth of proper documentation and the speed of information erosion lead quite naturally to an ahistorical view of the tech world that can be quite crippling. On a smaller, pettier level, I think about the bits and pieces of personal history scattered across old OneNote and Evernote files, abandoned email accounts, profiles on social media networks that no longer exist.
Even setting aside the proliferation of streaming services and DRM3digital rights management measures, I wonder: Do we ever actually own anything digital?
Of course, digital objects don’t have a monopoly on impermanence. In an article for the BBC, Lawrence Norfolk wrote, “It is transitional. Work passes through it on the way to becoming something else.” He wasn’t talking about streaming services, but notebooks: paper, ink, writing accumulated over time. There’s a similar erosion of relevance and validity, online or offline. As for access, well: the notebook Norfolk was describing was lost on a train.
But digital objects are more vulnerable to access decay than physical items, I think. Notebooks can be destroyed or misplaced, i.e., unfortunate events can render these inaccessible to us. But a Flash video, a link out to a different website, an Adobe Photoshop project file can be lost to us even if we never do anything, even if the file simply sits on our desktop or the link stays forever on our blog — simply because the digital world would have moved on, often sooner than we’d expected.
There’s a popular tendency to view technology as an “objective” field, “purer” and somehow more essential for it.
For example, we often hear about the apparent infallibility and efficiency of the digital, especially compared to analog tools and processes. Computers and mobile devices have become common fixtures for some of us, and with that comes the shift from physical labour to knowledge work — “pure thought, pure mind, pure intellect,” as Audrey Watters describes it. Developments like artificial intelligence or data analytics allow more of us to crow about “smarter” devices and “data-driven” decisions.
The implication, usually, is that disembodying work minimises uncontrollable “human error,” and boiling phenomena down to “indisputable” numbers constitutes freedom from fault. Most people who talk about these shifts like to frame them, without question, as progress.
In her talk linked above, Watters quotes Asimov:
“In fact, it is possible to argue,” he adds, “that not only is technological change progressive, but that any change that is progressive involves technology even when it doesn’t seem to.”
But as Watters points out, technology always involves human factors — human labour, human judgments — no matter how much our visions of digital utopia like to pretend otherwise. Technology doesn’t spring forth from nothing. Insisting that it does often erases the inequities at play in technology’s production and usage, the structural wrongs technology doesn’t save us from (and often, in fact, perpetuates).
Anil Dash makes a similar point when he asserts that tech isn’t neutral. I’d like to stretch that further and push back against Asimov a little by noting that tech isn’t inherently good. Novelty and innovation don’t automatically translate to welcome change. Tech carries the values, biases, and failings of its creators — and it can easily 10X these at scale, to borrow from the language of Silicon Valley startup bros. Just look at how Facebook is handling misinformation and data mining on its platform.
Tech (and more specifically, its creators) sidesteps a lot of criticism and responsibility when we let it disavow human elements and pretend to be detached, “objective,” incorruptible. I think a lot about Christopher Schaberg’s discussion of the term “30,000-foot view”, a favourite of startup productivity gurus like Tim Ferriss:
The expression enfolds a double maneuver: It shares a seemingly data-rich, totalizing perspective in an apparent spirit of transparency only to justify the restriction of power, the protection of a reified point of authority. It works this way: “Here’s how things look from 30,000 feet. Can you see? Good, now I am going to make a unilateral decision based on it. There is no room for negotiation, because I have shown you how things look, so you must understand.”
This particular use of data — or of the idea of data — has always bothered me. To a certain extent, yes, data doesn’t lie, and a “data-driven” approach does help weed out some of the personal biases and preconceived notions that would otherwise colour, say, research work. Evidence matters.
But quantitative data often isn’t “pure” in the sense that many people like to believe, nor is it automatically more “reliable” or “trustworthy” than other forms of evidence. Judgments also have to be made about what data to collect and how; what analyses to perform; how to interpret and present any results. Skull measurements were data, for example, and for a long while, many anthropologists used that to prop up racist, imperialist narratives of social evolutionism.
In any case, I’ve been thinking a lot about tech lately — the functions it fulfills, the spaces it occupies in our lives.
Our office intern asked me a strange question this morning:
“Hey Kate, when you don’t like a person, is it obvious?”
My answer must have taken longer than expected, because she hurried to clarify: “Like, if you don’t like someone, does it show on your face? Do you behave differently toward them?”
The easiest way to put it would be, I do, but not in the way the question supposes.
By “behaving differently,” the question implies cold shoulders or eyerolls. What I mean by it, though, is a conscious1(though often reluctant) effort to be civil.
It is different, still. When I dislike someone, I’d usually prefer to knee them in the groin. But that’s hardly ever permissible in everyday interactions; when it comes to people we dislike, we’re more likely to be asked to work together than to be allowed to inflict bodily harm.2Most of us have at least one group project we would’ve wanted an exit door for, or a co-worker whose desk should’ve come with an eject button.
This used to frustrate me to no end. Why can’t I just dislike somebody and be done with it3or, more accurately, with them? Then I stopped being a teenager, and I realised I didn’t have the energy for endless frustration. And endless it usually would have been, because most people aren’t aware that we dislike them; those who know probably don’t care anyway. The upshot is that active dislike takes time and effort, all for hardly any payoff.
Between outright like and dislike, though, there’s a lot of room for civility. That’s where I try to spend my time these days. This isn’t some form of wisdom or kindness. Instead, consider it an attempt at self-preservation: If I have to work with people I don’t want to spend time with, then I might as well make the experience as painless as possible. This is what’s necessary for us to get things done, so this is as much of my time, effort, and goodwill as I will give you. Or, put differently: This is as little of my life as you will occupy.
Some people might find that cold. Maybe it is; but it’s also efficient, and it spares me the trouble of thinking about difficult people any more than I have to.
Recently, I’ve found myself trying to apply the same mindset to work in general. For example, I’m trying to be more vigilant about my working hours.
I’m one of those people who care too much about what I do: given a task or goal, I can’t stand the idea of doing anything less than great. If this sounds like a humblebrag, it’s not. In practice, this just means that work takes over my life, and I torch my reserves to accomplish even unreasonable tasks. This is neither healthy nor sustainable, but lately I’ve found myself in a setting rife with situations that could feed this tendency.
So: vigilance, which means drawing clear lines that I do my best not to cross. Mentally checking out of work at 6pm. Keeping Slack off my phone. Logging out of the work email on weekends. Muting notifications for the office group chat. Most important of all, though, is making peace with the fact that enforcing these limits will sometimes mean adjusting deadlines, asking for help, saying no.
Five years ago, that would have been horrifying — a circumscription of potential, an admission of inadequacy. Today, I try to remind myself that these limits save me from depletion. There’s still the itch to do well, all the time, but I’ve only got so much of myself to throw around, and not everything is worth it.
Some background: On 12th June, Twitter released new datasets that compiled anonymised data from accounts that seem to be linked to information operations run by the Chinese (PRC), Russian, and Turkish states. These accounts have since been shut down, but Twitter has retained data about the profiles and their tweets. This is part of Twitter’s ongoing compilation of data about “potentially state-backed information operations” on their platform.
Sinha’s analyses looked at behavioural trends in the Chinese account dataset, including the timing of tweets:
1. Tweet timings: The tweets from the CCP accounts almost exclusively tweet during "work hours" by Beijing Time.
89% of the tweets were between 7 AM and 5 PM.
For the control group of 58k tweets from 32 accounts in HK & TW, that number was 37%! pic.twitter.com/hbNEAu8j1m
This piqued my interest, of course. As you can probably tell from this blogchain, I’ve been thinking about social media and its influence on information dissemination and consumption. Sinha ends his thread by pointing out how these behavioural patterns and attributes could be used to create some accessible way to identify / flag fake accounts like these. That’s catnip for nerds in a world of digital disinformation, really.1Even if we factor out the very relevant fact that social media is destroying public discourse in my home country.
So I went and downloaded a copy of Twitter’s datasets to try poking through the data myself. The better to practice some R programming, too.
Simple Tweet Data Analysis with R
First things first, Twitter’s datasets are about as tidy as you can hope for. The Chinese set contained two main datasets:
account information, which compiled metadata about each profile (so attributes like user’s reported location, number of followers, etc.)
tweet information, which compiled individual tweet contents as well as metadata (time the tweet was published; reply, retweet, and quote counts; etc.)
There were 23,750 accounts in all, and a total of 348,608 individual tweets.
If you download the datasets, Twitter also provides a handy Read Me file that enumerates all the variables available for each dataset. For these quick probes of the data, I mostly did some simple transformations to isolate the variables I wanted to look at.
Examining Tweet Timings
First, I tried to recreate Sinha’s graph of tweet timings. I think the trickiest step here might be remembering to convert time zones, since Twitter provides timings in UTC by default.
## create column for tweet time by hour and store copy in new object
by_hour <- tweets_all %>%
mutate(chn_hour = with_tz(tweet_time, tzone = "Asia/Shanghai"),
hour_level = hour(chn_hour))
## check new object
glimpse(by_hour)
## check count of instances by hour
by_hour_sum <- by_hour %>%
group_by(hour_level) %>%
summarise(count = n())
From there, it’s a simple matter of visualising the data using ggplot2, with “chn_hour” (basically, the hour in China’s standardised local time) as the focal variable:
## line graph version
by_hour_graph_line <- by_hour_sum %>%
ggplot() +
geom_line(aes(x = hour_level, y = count)) +
scale_x_continuous(name = "Hour of Day",
limits = c(0,24),
breaks = 0:24) +
scale_y_continuous(name = "Tweets",
breaks = seq(0,60000,5000)) +
labs(title="PRC Fake Twitter Accounts - Tweets By Hour",
subtitle="Tweeting trends correspond with working hours in China",
caption="Source: Dataset from Twitter.com") +
ggthemes::theme_economist()
This gives us the following graph:
I tried to create a bar graph version too, in the sense that it might be a better representation of discrete hours (as opposed to the line graph, which links each hour together into a continuous phenomenon).
## bar graph version
by_hour_graph_bar <- by_hour %>%
ggplot() +
geom_bar(aes(x = hour_level)) +
scale_x_continuous(name = "Hour of Day",
limits = c(0,24),
breaks = 0:24) +
scale_y_continuous(name = "Tweets",
breaks = seq(0,60000,5000)) +
labs(title="PRC Fake Twitter Accounts - Tweets By Hour",
subtitle="Tweeting trends correspond with working hours in China",
caption="Source: Dataset from Twitter.com") +
ggthemes::theme_economist()
Which gives us this graph:
The findings track with Sinha’s own graph, which he shared in his Twitter thread. Obviously, this would be the outcome since we were working with the dataset — but it’s always good to have that quick assurance that your own code was structured correctly and yielded the same results.
Examining Twitter Profile Age
Sinha didn’t tweet about this, but I figured I might as well check. In the Philippines, just from what I’ve seen from regular social media browsing, troll accounts tend to be fairly new. I wondered if that might be the case for these PRC accounts as well — if, perhaps, that indicated that most accounts used for specific information ops goals are only created shortly before the campaign starts.
First, then, I had to figure out how long each account was active — that is, each account’s “age.”
Twitter’s dataset doesn’t include activity ranges, but it does provide the account creation date for each profile. The Twitter profiles included in the dataset were taken down in May 2020, so I used that as my end date. Then, it was time to calculate ages for each account.
# Grouping accounts by age ####
mark_date <- as.Date("2020-05-01")
by_age <- accounts_all %>%
mutate(current = mark_date)
## set interval between twitter reporting date and account creation date
by_age <- by_age %>%
mutate(int = interval(by_age$account_creation_date, by_age$current))
## find length of interval and assign ranges
by_age <- by_age %>%
mutate(duration = round(time_length(by_age$int, unit = "month"))) %>%
mutate(range = cut(duration,
c(0,3,6,9,12,Inf),
c("0-3 months", "4-6 months", "7-9 months", "10-12 months", "13+ months")))
I figured there would be considerable variation when it came to the number of months each profile was active. To avoid getting a fairly messy graph2Just imagine 30+ ticks all over your X-axis, I decided to simplify things further and group accounts according to specified age ranges:
0-3 months
4-6 months
7-9 months
10-12 months
13+ months
Then, it was a matter of graphing the results using ggplot2:
## check count per month age
by_age_sum <- by_age %>%
group_by(duration) %>%
summarise(accounts = n())
glimpse(by_age_sum)
## graph count per range level
by_age_graph <- by_age %>%
ggplot(aes(x = range)) +
geom_bar(aes(fill = range), show.legend = F) +
scale_y_continuous(name = "Number of Accounts",
breaks = seq(1000,13000,1000)) +
scale_x_discrete(name = "Age") +
labs(title="PRC Fake Twitter Accounts by Age",
subtitle="Most fake accounts tend to be less than 7 months old",
caption="Source: Dataset from Twitter.com") +
ggthemes::theme_economist()
This gives us the following graph:
The vast majority of these troll accounts appear to have been less than a year old. There are a lot of factors that could affect account age, though: maybe Twitter tends to identify and take down troll accounts before most of them can breach the 6-month mark; maybe accounts get abandoned or deleted after a certain campaign; and so on.
This graph is mostly descriptive; sussing out some kind of explanation for this behaviour will take much more research and analysis. Still, it’s an interesting point to bring to light about these kinds of accounts.
More Information
I tried visualising these accounts as a network, but apparently that was too much work for my lone laptop. R couldn’t even produce a visualisation. 😅
Other, better analysts have, of course, studied this data and come up with much more sophisticated analyses. Twitter has been working with the Stanford Cyber Policy Center’s Internet Observatory, which has published its findings online. They’ve got a fantastic model of the network as divided among the topics of their tweets, as well as some interesting takeaways about the specific narratives that these accounts tried to amplify.
There’s a lot more data to be studied, but if nothing else, this quick look at a couple of Twitter’s datasets highlights the scale and sophistication of the information operations being carried out online. Social media can be a scary place, more so when you consider how its massive reach and influence is essentially unchecked. Like Sinha pointed out in his thread, though, studying these information operations could give us a fighting chance against disinformation online.