감성 분석 (1)

8.1 감성분석의 이해

감성분석(Sentiment Analysis): 텍스트에 나타난 의견, 평가, 태도와 같은 주관적인 정보를 분석하는 것
감성: 어떤 대상에 대한 주관적인 감정으로 긍정, 중립, 부정으로 나뉘어짐

8.1.1 어휘 기반의 감성 분석

감성이 표현될 수 있는 명사, 형용사, 동사를 대상으로 모든 단어에 대해 긍정 혹은 부정의 감성을 붙여서 감성 사전을 구축한 후에, 이를 기반으로 텍스트에 대한 감성 분석을 수행하는 방식
특별한 경우가 아니면 외부의 감성 사전을 이용

8.1.2 머신러닝 기반의 감성 분석

학습을 위한 데이터셋이 필요함
머신러닝 기반의 감성 분석은 학습에 사용한 말뭉치의 영향을 많이 받음

8.2 감성 사전을 이용한 영화 리뷰 감성 분석

8.2.1 NLTK 영화 리뷰 데이터 준비

import nltk
nltk.download('movie_reviews')

from nltk.corpus import movie_reviews

print('#review count:', len(movie_reviews.fileids())) #영화 리뷰 문서의 id를 반환
print('#samples of file ids:', movie_reviews.fileids()[:10]) #id를 10개까지만 출력
print('#categories of reviews:', movie_reviews.categories()) # label, 즉 긍정인지 부정인지에 대한 분류
print('#num of "neg" reviews:', len(movie_reviews.fileids(categories='neg'))) #label이 부정인 문서들의 id를 반환
print('#num of "pos" reviews:', len(movie_reviews.fileids(categories='pos'))) #label이 긍정인 문서들의 id를 반환

fileid = movie_reviews.fileids()[0] #첫번째 문서의 id를 반환
print('#id of the first review:', fileid)
print('#part of the first review:', movie_reviews.raw(fileid)[:500]) #첫번째 문서의 내용을 500자까지만 출력
print('#sentiment of the first review:', movie_reviews.categories(fileid)) #첫번째 문서의 감성

fileids = movie_reviews.fileids() #movie review data에서 file id를 가져옴
reviews = [movie_reviews.raw(fileid) for fileid in fileids] #file id를 이용해 raw text file을 가져옴
categories = [movie_reviews.categories(fileid)[0] for fileid in fileids] 

"""
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
#review count: 2000
#samples of file ids: ['neg/cv000_29416.txt', 'neg/cv001_19502.txt', 'neg/cv002_17424.txt', 'neg/cv003_12683.txt', 'neg/cv004_12641.txt', 'neg/cv005_29357.txt', 'neg/cv006_17022.txt', 'neg/cv007_4992.txt', 'neg/cv008_29326.txt', 'neg/cv009_29417.txt']
#categories of reviews: ['neg', 'pos']
#num of "neg" reviews: 1000
#num of "pos" reviews: 1000
#id of the first review: neg/cv000_29416.txt
#part of the first review: plot : two teen couples go to a church party , drink and then drive . 
they get into an accident . 
one of the guys dies , but his girlfriend continues to see him in her life , and has nightmares . 
what's the deal ? 
watch the movie and " sorta " find out . . . 
critique : a mind-fuck movie for the teen generation that touches on a very cool idea , but presents it in a very bad package . 
which is what makes this review an even harder one to write , since i generally applaud films which attempt
#sentiment of the first review: ['neg']
"""

8.2.2 TextBlob을 이용한 감성 분석

!pip install -U textblob
!python -m textblob.download_corpora

from textblob import TextBlob

result = TextBlob(reviews[0])
print(result.sentiment)

"""
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
Sentiment(polarity=0.06479782948532947, subjectivity=0.5188408350908352)
"""

def sentiment_TextBlob(docs):
    results = []

    for doc in docs:
        testimonial = TextBlob(doc)
        if testimonial.sentiment.polarity > 0:
            results.append('pos')
        else:
            results.append('neg')
    return results

from sklearn.metrics import accuracy_score

print('#TextBlob을 이용한 리뷰 감성분석의 정확도:', accuracy_score(categories, sentiment_TextBlob(reviews)))

"""
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
#TextBlob을 이용한 리뷰 감성분석의 정확도: 0.6
"""

※ 해당 내용은 <파이썬 텍스트 마이닝 완벽 가이드>의 내용을 토대로 학습하며 정리한 내용입니다.

저작자표시 동일조건 (새창열림)

'텍스트 마이닝' 카테고리의 다른 글

감성 분석 (3) (0)	2023.07.18
감성 분석 (2) (0)	2023.07.17
토픽 모델링으로 주제 찾기 (5) (0)	2023.07.15
토픽 모델링으로 주제 찾기 (4) (0)	2023.07.14
토픽 모델링으로 주제 찾기 (3) (0)	2023.07.13

IT & technology

감성 분석 (1)

8.1 감성분석의 이해

8.1.1 어휘 기반의 감성 분석

8.1.2 머신러닝 기반의 감성 분석

8.2 감성 사전을 이용한 영화 리뷰 감성 분석

8.2.1 NLTK 영화 리뷰 데이터 준비

8.2.2 TextBlob을 이용한 감성 분석

'텍스트 마이닝' 카테고리의 다른 글

티스토리툴바

감성 분석 (1)

8.1 감성분석의 이해

8.1.1 어휘 기반의 감성 분석

8.1.2 머신러닝 기반의 감성 분석

8.2 감성 사전을 이용한 영화 리뷰 감성 분석

8.2.1 NLTK 영화 리뷰 데이터 준비

8.2.2 TextBlob을 이용한 감성 분석

'텍스트 마이닝' 카테고리의 다른 글

'텍스트 마이닝' Related Articles

티스토리툴바