Stock sentiment analysis with transformer architecture

Pongsatorn Harnmetta

Stock sentiment analysis with transformer architecture

3

1

Files

TH_Pongsatorn_Ha_2022.pdf (2.31 MB)

Issued Date

2022

Copyright Date

2022

Resource Type

Master Thesis

Language

eng

File Type

application/pdf

No. of Pages/File Size

xi, 47 leaves

Access Rights

open access

Rights

ผลงานนี้เป็นลิขสิทธิ์ของมหาวิทยาลัยมหิดล ขอสงวนไว้สำหรับเพื่อการศึกษาเท่านั้น ต้องอ้างอิงแหล่งที่มา ห้ามดัดแปลงเนื้อหา และห้ามนำไปใช้เพื่อการค้า

Rights Holder(s)

Mahidol University

Bibliographic Citation

Thesis (M.Sc. (Information Technology Management))--Mahidol University, 2022)

Suggested Citation

Pongsatorn Harnmetta Stock sentiment analysis with transformer architecture. Thesis (M.Sc. (Information Technology Management))--Mahidol University, 2022). Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/113852

Title

Stock sentiment analysis with transformer architecture

Alternative Title(s)

การวิเคราะห์แนวโน้มหุ้นโดยสถาปัตยกรรมทรานส์ฟอร์มเมอร์ส

Author(s)

Pongsatorn Harnmetta

Advisor(s)

Taweesak Samanchuen
Sotarat Thammaboosadee
Rojjalak Chuckpaiwong

Abstract

The stock market is affected by several factors, such as politics, economics, and finance. These factors are reflected in several digital formats, including news, report, and social media. A large amount of data created through many online platforms over the internet in this era is causing the growth of unstructured data in an exponential direction. To receive much useful information in time, we propose a stock sentiment analysis system corroborating with the state-of-the-art pretrained transformer architecture base model. The transformer base models, i.e., BERT and WangchanBERTa models, are utilized to extract the contextualized embeddings for application in a downstream task like sentiment analysis. The dataset, a fundamental analysis of Thai financial content, is gathering from Krungsri securities, a financial institution in Thailand. However, to compare the result between embedding techniques with baseline, we use multiple machine learning models such as logistic regression, random forest, and support vector machines and apply the baseline which is the term frequency-inverse document frequency (TF-IDF). Our experiment shows that WangchanBERTa and BERT with logistic regression can achieve the highest accuracy at 88 % compared to machine learning models. In conclusion, our proposed system can precisely predict stock sentiment in Thai with high accuracy. Implication of Thesis: In this research, we show that the transfer learning concept can further develop the sentiment analysis system on the financial domain by using WangchanBERTa and BERT base model with the high prediction score and transformer architecture base model perform well on the areas of finance and illustrate the vector generating technique for the long text sequence without implementing language model from scratch that perform a great prediction. In addition, this system is used for monitoring and handling on stock market fluctuation in Thai language to prevent the late decision.
ตลาดหุ้นมักได้รับผลกระทบจากปัจจัยต่างๆมาช้านาน เช่น การเมือง เศรษฐกิจ และการเงิน สิ่งเหล่านี้ แสดงออกผ่านสื่อออนไลน์ที่ผู้คนสามารถเข้าถึงได้ง่ายในปัจจุบัน นอกจากนี้ ในยุคปัจจุบันการเติบโตของข้อมูลเป็นแนวโน้มแบบทวีคูณและมีการสร้างข้อมูลนับล้านรายการผ่านแพลตฟอร์มออนไลน์จำนวนมากบนอินเทอร์เน็ต ซึ่งในการรับข้อมูลที่เป็นประโยชน์ต่างๆได้อย่างทันเวลาเพื่อหลีกเลี่ยงการตัดสินใจที่ล่าช้า เราจึงขอเสนอระบบการวิเคราะห์แนวโน้มของหุ้นที่ผสานเข้ากับโมเดลที่มีชื่อว่า transformer ในงานวิจัยนี้ เราได้ใช้โมเดล transformer ที่ก้าวผ่านข้อจำกัดการประมวลผลภาษาธรรมชาติ (NLP) ในอดีต และการประยุกต์ใช้โมเดล transformer ที่รวมทั้ง Bidirectional Encoder Representations from Transformers (BERT) เป็นโมเดลที่รองรับหลากหลายภาษา และ WangchanBERTa ที่เป็นโมเดลแบบจำลองภาษาเดียว(ภาษาไทย) เพื่อใช้ในการสร้าง contextualized embeddings ที่สามารถนำไปใช้ประโยชน์ในภายหลังได้ในงานการวิเคราะห์แนวโน้ม นอกจากนี้ เรายังเก็บข้อมูลการวิเคราะห์พื้นฐานทางด้านการเงินจากสถาบันการเงินในไทย อย่างไรก็ตามเพื่อเปรียบเทียบผลการทดลองระหว่าง word embeddings เราใช้หลากหลายโมเดล machine learning ได้แก่ logistic regression, random forest, and support vector machines ในรูปแบบของแบบจำลองการคาดการณ์ และยังมีการสร้างผลลัพธ์อ้างอิง หรือ baseline โดยใช้โมเดล term frequency-inverse document frequency (TF-IDF) ในการเปรียบเทียบ การทดลองของเราแสดงให้เห็นว่า WangchanBERTa และ BERT สามารถทำความแม่นยำได้สูงสุดถึง 88 % เมื่อเทียบระหว่าง machine learning model โดยสรุป ระบบที่เรานำเสนอสามารถทำนายความเชื่อมั่นหุ้นภาษาไทยได้อย่างแม่นยำและแม่นยำสูง การนำผลของวิทยานิพนธ์ไปใช้: ในงานวิจัยนี้ เราได้นำเสนอหลักการคิดแบบ transfer learning ที่สามารถนำไปต่อยอดเพื่อสร้างระบบการวิเคราะห์แนวโน้มทางด้านการเงินโดยใช้โมเดล WangchanBERTa และ BERT ที่ให้ผลการทำนายที่สูง อีกทั้งยังแสดงการทำงานของสถาปัตยกรรม transformer ที่สามารถทำงานได้ดีทางด้านการเงินและได้แสดงถึงเทคนิคการสร้าง vector สำหรับข้อความที่มีลักษณะยาวโดยปราศจากการทำงานตั้งแต่เริ่มต้น ยิ่งไปกว่านั้น ระบบนี้ ยังสามารถนำไปใช้ในการสังเกตการณ์และรับมือกับตลาดหุ้นที่มีความผันผวนในภาษาไทยเพื่อหลีกเลี่ยงการตัดสินใจที่ล่าช้า

Degree Name

Master of Science

Degree Level

Master's degree

Degree Department

Faculty of Engineering

Degree Discipline

Information Technology Management

Degree Grantor(s)

Mahidol University

Keyword(s)

Sentiment analysis -- Thailand
Stock exchanges -- Thailand
Natural language processing (Computer science)
Finance -- Data processing -- Thailand.

URI

https://repository.li.mahidol.ac.th/handle/123456789/113852

Collections

Thesis and Thematic paper

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th