top of page
Lekha Mirjankar

Pride and Prejudice - Sentiment Analysis using Python

Good day, dearest readers!


In continuation with previous posts analyzing Pride and Prejudice, we are delving into sentiment analysis. My aim is to see if we can uncover the emotions embedded within each chapter. This seems particularly interesting owing to Austen's sattirical writing style.


1. Prepare data

Since we are using the novel available on Project Gutenberg and they have added certain information at the start and end of each volume; we need to clean it for further analysis.

text = text.split('PRIDE & PREJUDICE.\n\n')vol1 = text[1].split('END OF VOL.')[0].strip()vol2 = text[2].split('END OF THE SECOND VOLUME.')[0].strip()vol3 = text[3].split('\n\xa0\n\xa0\n\n\xa0\nTranscriber\'s Note:\n\r\nSpelling and hyphen changes have been made so that\r\nthere is consistency within the book')[0].strip()vol = vol1 + vol2 + vol3
  • text.split('PRIDE & PREJUDICE.\n\n'): splits the text into a list using the specified delimiter.

  • text[1].split('END OF VOL.')[0].strip(): Extracts the content between the start of the text and the phrase 'END OF VOL.' The [0] index selects the first part of the split. The strip() method removes leading and trailing whitespaces.Similarly for vol2 and vol3.

  • vol1 + vol2 + vol3: Concatenates the content from the three volumes into a single string.


2. Split into chapters

With the volumes united, we now divide the text into respective chapters.

chapters = vol.split('CHAPTER')chapters = chapters[1:]
  • vol.split('CHAPTER'): Splits the vol string into a list of substrings at occurrences of the word 'CHAPTER'.

  • chapters[1:]: Creates a sublist of chapters starting from the second element to exclude any content before the first 'CHAPTER'. This sublist, represents the chapters.

3. Sentiment Analysis

To perform sentiment analysis, we use VADER analyzer- a sentiment analysis tool part of the NLTK library.

# Initialize the VADER analyzersia = SentimentIntensityAnalyzer()# Perform sentiment analysis for each chapterchapter_sentiments = [sia.polarity_scores(chapter)['compound'] for chapter in sent]plt.figure(figsize=(14, 8))plt.plot(chapter_sentiments, marker='o', linestyle='-', color='#5b1f77', markersize=3, label='Sentiment Score')plt.title('Sentiment through the Chapters', fontsize=12, color='black')plt.grid(axis='y', linestyle='--', alpha=0.7)plt.text(35, 0.849, 'Wickham\'s true nature is revealed')plt.text(47, 0.975, 'Lydia elopes with Wickham')plt.show()
  • [sia.polarity_scores(chapter)['compound'] for chapter in sent]: Applies sentiment analysis to each chapter using VADER. The polarity_scores method returns a dictionary containing the sentiment scores (positive, neutral, negative, and compound) for each chapter. The list comprehension extracts the compound scores for each chapter.

  • The sentiment score produced by the VADER sentiment analyzer is designed to be within the range of -1 to 1.

  • positive score (>0): positive sentiment

  • negative score (<0): negative sentiment

  • score of 0: neutral sentiment


Output:


As we can see, it shows a mostly positive sentiment with a sharp decline around chapter 35 and another one around chapter 47. If you have read the book, you know why that is :P

SPOILER ALERT


Chapter 35 is when Darcy give Elizabeth his expose` letter and reveals how wicked Wickham is. In chapter 47, it is revealed that Lydia has eloped with Wickham.

Even though this provided insights into chapter-wise sentiments, deeper analysis is needed to unveil them in detail. VADER analyzer is specifically designed for social media style text (short and informal). Analyzing the sentiments in a full-length novel especially a classic might require a better approach.


This is day3 of my #100daysofdata challenge.

Let me know if you have any suggestions or ideas for me.


Complete code available on Github.


Happy analyzing!

Comments


bottom of page