OIDD245 | Data Project 2 | Danielle Goh

For those who know me, I have always been interested in fashion. Personally, this interest manifests itself in spending hours in thrift stores, pining over outfits that exude a creative idea or mood, or scouring the internet for visual inspiration (e.g. Youtube, Instagram). It’s an entire process and practice of creativity! Yet fashion and its industry is often connotated with this idea of being frivolous and shallow (not 100% without reason). As a rational and ambitious person, these conflicting ideas about fashion can cause a great deal of cognitive dissonance…

Over the past couple of years I have found a community in Man Repeller, online but also offline, through some of my closest friends (…the number of times a conversation begins with “Did you see that Man Repeller article about ______?”).

Man Repeller explores the expansive constellation of things women care about from a place of openness and humor, with the conviction that an interest in fashion doesn’t minimize one’s intellect”

I’m definitely biased but there’s definitely something about MR’s content that is so engaging, genuine and refreshing. MR has garnered a strong following and continues to thrive in the saturated fashion media industry.

To figure out whether my opinions are even a little founded on fact, I decided to put my Data Analytics skills to the test and dig a little more into what makes Man Repeller so engaging.

Man Repeller Topic Modelling
> LDAvis

The first thing I wanted to do was take a look at what kind of content MR generates, and whether we could bucket articles based on topic. In order to do this, I used the LDA modelling technique, combined with LDAvis to create an interactive LDA visualization. 

Please feel free to explore the gallery below!
As you can see, there are 4 main topics that content is focused on: Fashion, Life & Community, Hair & Beauty, and Brands & Social Media.

LDAvis lets us explore these topics and see the most salient terms for each topic. 

Who What Wear Topic Modelling > LDAvis

Another well-known fashion blog is Who What Wear. Like MR, WWW is focused primarily on fashion. I thought WWW would make a good comparison, as it is about the same scale as MR (rather than say, a comparison with Vogue), and shares a similar target audience. 

Again, I ran a LDA model and then a LDAvis. WWW content revolves around four slightly different topics: Industry, Purchase-Focused, Opinion and Fashion. These results were unsurprising to me, as in my opinion, WWW tends to approach writing with a more commercial lens. Part of my hypothesis is that MR is able to engage its readers through more life/think pieces (Life & Community topic), something that WWW is more lacking in. 


A Direct Topic Comparison
> MR vs WWW

The two above LDAvis visualizations are helpful in gaining a holistic understanding on the type of content MR and WWW focus on. However, to add more meaning to the comparison, I ran another LDA analysis, but this time I fixed the dictionary to the words appearing in MR articles, and thus, was able to compare WWW’s content on the MR defined topics of: Fashion, Life & Community, Hair & Beauty, and Brands & Social Media.

The bar chart below further supports my hypothesis of MR focusing more on content that goes deeper than just the surface level of dialogue of fashion (…ironically, for example, Who’s Wearing What?) and more human aspects. As I claimed earlier, WWW’s content tends to feel more commercialized, with a higher proportion of their articles being identified as part of the ‘Brands & Social Media’ category. Surprisingly, WWW focuses significantly less on ‘Hair & Beauty’ related content. This could again support the claim that MR advocates for a more robust understanding of fashion, recognizing the fact that it spans beyond just clothes. However, it is commendable in its own way that WWW is able to pinpoint its content on fashion alone. As a good sanity check, it’s reassuring to know that both websites do post primarily about fashion. 


Analyzing Readers' Comments > Disqus Sentiment Analysis

One thing that differentiates MR from WWW, and really, a lot of other fashion media platforms out there, is the level of engagement from the readers. As I have experienced on a first-hand basis, you feel a part of a community rather than just an individual sitting alone at home, scrolling through the website before bed. Simply on a conceptual level, I think this is something very special — especially given the fact that so much of the digital content out can have effects that are more isolating. MR readers use the website’s built-in disqus platform to engage in dialogue with the writers: we embrace Harling’s trend observation of dressing like a stick of butter, Haley’s musings on identifying love in age of Instagram, and whatever, founder, Leandra has to say… 

Explore the visualization below to get an understanding of what kinds of articles generate the most positive sentiments from users. Tip: articles are ordered from highest to lowest sentiment scores.

Congrats MR! There’s only one article that scored a negative sentiment. As someone guilty of frequently sporting the hair bow, I was a little shocked and decided to check out the article in question. After reading through the comments, and realizing that no one was actually criticizing the hair bow or the article, I bring you the first of many data anlaysis takeaways: sentiment analysis is a great tool for exploration, but requires more fine-tuning in order to produce accurate results.

From this step, I learned the following:

  • Sentiment Analysis gives the attitudes of words (to be fancy, tokens) out of context. Watch out for:
    • Sarcasm – e.g. “I just love spending my entire weekend web scraping Vogue and its impressively impossible infinite scroll, to no avail!”
    • Context – e.g. “Hair bows are ugly vs “I’m worried that hair bows will look ugly on me!”

Average Sentiment by Topic
> Improved Analysis

After summing the sentiment scores per topic, I quickly realized that this analysis was confounded by the fact that more articles were written about certain topics (Fashion, I’m looking at you). 

Therefore, I calculated the average sentiment for the article, rather than the sum of commenters’ sentiments in order to mitigate the effects inflated sentiment scores. Therefore, the bar chart below shows a more realistic visualization of sentiments by topic. In fact, the sentiments are pretty similar across topics which could suggest that MR does a good job at consistent writing. 

To improve even further on this analysis, I would have liked to have more samples of articles from the Life & Community, and Hair & Beauty topics. It would also be interesting to look at the sentiment scores of articles grouped by writer. 

Article Length vs Sentiment vs Topic > Scatterplot & Fitted Line

Besides from comparing sentiment scores across topics, I thought it would also be interesting to look at article length (word count) differed by sentiment and topic.

In the current digital media climate, quality is often sacrificed for quantity. Given that the market is so saturated, publishers and creators need to constantly churn out content in order to stay relevant. The length of an article can be used as a proxy for quality — giving the writer the benefit of the doubt, it could signal better fleshed out ideas and researched arguments. 

In terms of my analysis on MR, it seems that the longest articles tend to be about fashion or life. Both of these topics have bimodal distributions, suggesting that readers either prefer relatively shorter (perhaps more digestable and suitable for a quick read) or relatively longer (perhaps more detailed or thought-proving for leisure reading) pieces. 

In contrast, readers more clearly tend to prefer articles about brands which are shorter. 

Article Length: MR vs WWW
> By Topic

This Gantt Chart illustrates the distribution of article lengths by source, across topic. On average, WWW articles tend to be shorter than those of MR. 

The distribution of article lengths over different topics does not vary significantly from the previous visualization, however, it’s interesting to note that for the fashion and life topics, which tend to be the longest on average, WWW posts shorter articles than MR.

To summarize...

There appears to be data backing up my hypothesis as to why MR is a particularly engaging source of digital fashion media. Although it’s important to be cautious as to not draw biased conclusions from the analyses, I believe that this data project has helped me better understand what makes MR unique, and on a macro level, what kind of voice or content digital media creators should be aiming for in order to better engage their audience.

Data & Methodology

Data Gathering:
– Article information was scraped off of Man Repeller and Who What Wear, respectively, using rvest in RStudio. Keeping in mind that fashion and obsolescence are inherently connected, I tried to scrape 50 pages of articles and had to stop when my scraper was blocked. 
– Man Repeller article comments were scraped using the disqus API & disqusR in RStudio and again, scraping concluded when requests started being blocked.

Data Cleaning:
– tm in RStudio was used to clean text data, e.g. removing sparse terms, stemming

– LDA modelling and sentiment analysis were performed in RStudio.
– Visualizations were performed in RStudio (ggplot2 & plotly) and Tableau.



Unstructured & Qualitative Data:
It was definitely challenging not having a clean and robust dataset to work with, as it often felt as though I was limited to text analysis. Although, I suppose this is a representative experience of collecting digital exhaust.

– I could not export the interactive LDAvis plots to html format. This is a shame, because they are really quite interesting to play with. 

– I would have liked to compare with Vogue too. However, I was unable to launch Rselenium in order to bypass the infinite scroll layout of the website.
– I would have also liked to have a larger sample of articles, but was bounded by how much the websites would let me scrape, and then further bounded by the limit for disqus comment scraping.

Next Steps

Feature Engineering:
If I had access to more data, such as the number of views for each MR article, I would like to create a model that predicts how well an article will do.

Additional Platforms:
– Successful companies these days have omnichannel presence, and therefore I would have liked to analyze Man Repeller, Who What Wear and Vogue across Youtube and Instagram as well. For example, Vogue’s Youtube channel is particularly successful and it could be interesting to see why and whether MR and/or WWW could tap into that success.