A Comparison of the Attitude on the series “Emily in Paris” Between French Speakers and English Speakers — — Using WordCloud in R
Context
The Netflix series “Emily in Paris” has gone viral over the past few days. It has soon become the first-rated series on Netflix during this week. The series talked about how a young American girl, Emily, survives in Paris as a marketing manager in the fashion field without speaking French. In the film, it talked about how the life in Paris would be like. However, while introducing the culture in Paris, there are lots of cliché within the series. It can be assumed that this series is controversial and might have polemic reviews from people from different regions.
News in Taiwan all talked about how this series is detested by Europeans. A news in Taiwan, UDN(聯合新聞網, URL: https://udn.com/news/story/6812/4943121) , talked about how high the amount of views of “Emily in Paris” is, while saying that this series received a great amount of negative reviews. As a student who studied French for more than 3 years, I think comparing people’s tweets is a great chance to compare the cultural differences between that of United States and of France.
Research question
With the context given above, I’d like to find out the following questions:
1. Is it true that audience from France really didn’t like the series, “Emily in Paris”?
2. If it’s true that people in France didn’t like the series, to what extent do they hate this series?
Data and method
I downloaded the data during the week that “Emily in Paris” is the first-rated series on Netflix. A table below can further explain the details about the data.
Content or Details
Time|2020/10/20 22:30
Amount of Tweets|4000
Search Terms|#Emilyinparis, #EmilyInParis, #emilyinparis
I downloaded 4000 tweets, and created three list of tweets. One with all the tweets with the hashtag #Emilyinparis, one with the tweets that are written in English, and one with the tweets that are written in France. I used the WordCloud to compare the differences of these three tweet lists.
Results
I printed out three WordClouds, one showing the hashtags that goes along with the hashtag #Emilyinparis. The second shows the hashtags that goes along with the tweets written in English with the hashtag #Emilyinparis. The third one shows the hashtags that goes along with the tweets written in French with the hashtag #Emilyinparis. Please note that the three pictures below are the three WordClouds, managed in the same order as mentioned.
I found that the tweets written in French doesn’t really, at least with the perspective of finding what hashtags that goes along with it, reveal much negative reviews. Most of the hashtags, including “lilycollins”, “thesims4”, are the names of the actors and the title of other Netflix series. The result of this assignment doesn’t really match the Chinese news mentioned above.
Discussion
I think there are a few things to improve. First, the database of this research is not enough. Second, there are many cursory information in the tweets that should be sorted out in order to make the WordCloud look clearer. Third, it is hardly to say that the people who write on twitter are the ones who are representative of the whole group of people who watch the film. It is likely that the ones who hate the series did not feel like posting because they don’t want to promote a series that they don’t like, leading to the result that the ones who post are the ones who like the series better. In such case, there will be a selective bias in the research. Last, the text only lies on the data of hashtags, which doesn’t really show the full stance of a post on Twitter. It is therefore hard to see if the post is for or against the series.
I think the nice thing about this research is the topic of this research. Just like what is said in our course, there aren’t many people who do such kind of research about entertainment. Most people use R and WordCloud on Twitter to find out questions about politics. I think this assignment can be a great proof of the fact that comparing data on Twitter about entertainment can be just as fascinating.
R Code:#### diffusion ####
# see for example: https://upload.wikimedia.org/wikipedia/commons/1/11/Diffusion_of_ideas.svg
# additional visualization options
library(rtweet)
library(scales)
library(ggplot2)
all_tweets <- search_tweets(“#emilyinparis”, n = 4000, token = twitter_token)
# visualize when how many unique users started to become active
# We need time and number of unique users
head(all_tweets)
names(all_tweets)
head(all_tweets$lang)# let’s see what language users tweeting from Taipei use:
# first we use our table to df function:
table_df <- function(data_vector) {
# create the table, create the df and then order
df <- as.data.frame(table(data_vector), stringsAsFactors=F)
df <- df[order(df$Freq, decreasing = T), ]
# return as value the df
# just call the df or use return
return(df)
}lang_table <- table_df(all_tweets$lang)
# we only want the top 20 users
lang_table <- lang_table[1:20, ]
# the names of our columns:
colnames(lang_table)
# this time barplots with ggplot2
# we use geom_bar with stat=”identity”
# stat = “identity” because we have already the frequency for the y-axis
ggplot() + geom_bar(data=lang_table, aes(x = data_vector, y= Freq), stat=”identity”)# check reorder, without reorder — bars in alphabetical order
# we also change the position of the labels
ggplot() + geom_bar(data=lang_table, aes(x = reorder(data_vector, Freq), y= Freq), stat=”identity”) +
theme(axis.text.x=element_text(angle=90,size=14)) # rotate labels 90°# more beautiful with the bw_theme
# important: always call theme_minimal before you call theme
# theme_minimal overwrites all settings with regard to the themejpeg(“freq_visual.jpeg”, width = 7, height = 7, units = ‘in’, res = 500)
freq_visual <- ggplot() + geom_bar(data=lang_table, aes(x = reorder(data_vector, Freq), y= Freq), stat=”identity”) +
theme_minimal()+ xlab(NULL) +
theme(axis.text.x=element_text(angle=90,size=14))
dev.off()### Extract Hashtags with simple Regex pattern ####
# we want to know about what the Taipei users are tweeting:
library(stringr)posts <- all_tweets
posts_en <- all_tweets[all_tweets$lang == “en”, ]
posts_fr <- all_tweets[all_tweets$lang == “fr”, ]hashtags <- str_extract_all(posts$text, “#\\w+”)
hashtags_en <- str_extract_all(posts_en$text, “#\\w+”)
hashtags_fr <- str_extract_all(posts_fr$text, “#\\w+”)str(hashtags)
str(hashtags_en)
str(hashtags_fr)# we have now a list — need to unlist an create a vector: unlist()
hashtags <- unlist(hashtags)
hashtags_en <- unlist(hashtags_en)
hashtags_fr <- unlist(hashtags_fr)# now we only want lower case as #Trump and #trump are the same
tolower(c(“Emilyinparis” , “EmilyInParis”))
hashtags <- tolower(hashtags)
hashtags_en <- tolower(hashtags_en)
hashtags_fr <- tolower(hashtags_fr)# as usual create a table with our function:
table_df <- function(data_vector) {
# create the table, create the df and then order
df <- as.data.frame(table(data_vector), stringsAsFactors=F)
df <- df[order(df$Freq, decreasing = T), ]
# return as value the df
# just call the df or use return
return(df)
}table_all <- table_df(hashtags)
table_en <- table_df(hashtags_en)
table_fr <- table_df(hashtags_fr)# create a word cloud
if(!require(wordcloud)){
install.packages(“wordcloud”)
library(wordcloud)
}library(wordcloud)
head(table_all,20)
head(table_en,20)
head(table_fr,20)wordcloud(words=table_all[2:50,1], freq=table_all[2:50,2], random.order=F)
wordcloud(words=table_en[2:50,1], freq=table_en[2:50,2], random.order=F)
wordcloud(words=table_fr[2:50,1], freq=table_fr[2:50,2], random.order=F)jpeg(“wordcloud_all.jpeg”, width = 7, height = 7, units = ‘in’, res = 500)
wordcloud(words=table_all[2:50,1], freq=table_all[2:50,2], random.order=F)
dev.off()jpeg(“wordcloud_en.jpeg”, width = 7, height = 7, units = ‘in’, res = 500)
wordcloud(words=table_en[2:50,1], freq=table_en[2:50,2], random.order=F)
dev.off()jpeg(“wordcloud_fr.jpeg”, width = 7, height = 7, units = ‘in’, res = 500)
wordcloud(words=table_fr[2:50,1], freq=table_fr[2:50,2], random.order=F)
dev.off()