F. A. S. T - Financial Stock Analytics with Text & World's Sentimental Analysis

published on 14 June 2021

Web app is live ! click here 

Summary:

The stock market has fluctuated heavily in recent times due to various events and the world is in need to make the best of data engineering for the financial domain. We aim to find how much impact the financial news has on the stock market trends. From Market Research, Forecasting is need of the Hour to make investment strategies for Client, Client also needs a way to speed up decision making by 80% using innovative Data engineering & Cost-Effective Techniques to help Core management team summarize Financial meetings from audio files via SaaS product, along with features like Protecting User Data etc. thus helping Other teams access the data for further use.

End Goal : To develop a Saas Product to let User handle following client requirements, as use cases:

Client's use cases 
Client's use cases 

Our solution as a Streamlit App for Functions/Operations on Company's Stock health via News, Twitter & Google Trends to make Strategic Investments on Stocks Current health & forecasting technologies 

Project Report (google colabs) 

Web app is live ! click here 

AWS Architecture - Serverless Design - Fully automated triggered, Octacore Pipelines 

Octa-core pipeline architecture on AWS 
Octa-core pipeline architecture on AWS 

Stage 1: WEB SCRAPING, DATA PREPROCESSING, LABELLING AUTOMATED PIPELINE

Getting various Company Data from World resources for use case Financial Analytics on Trends and for use case 2 ( text summarization & sentimental analysis on audio files) we will feed on our self generated dataset from Amazon Polly Historical call audio data will be into S3 Bucket which will be fed into Database To feed realistic trends we will feed on real time tweets from twitter, google trends and various news sources to get headlines Cloudwatch is used to monitor PUT operations on the S3 bucket and invoke a Step function To run Lambda to get all raw data into S3 Bucket For example : Step Function runs following : Scraping from Twitter feed streams into S3 Bucket Scraping from Financial News streams into S3 Bucket Scrapping from Yahoo Stocks into S3 Bucket

Stage 2 : AWS EXTRACT TRANSFORM LOAD PIPELINE

AWS Glue scheduled crawlers in every hour to crawl and fetch data from various sources AWS Catalog will serve as Metadata database for all catalog metadata AWS Glue Jobs are then used to function on Crawled data which is fed into Redshift All Extract Transform load Scripts are stored in S3 and final data is into Redshift We have leverage Redshift ML query auto scheduler to Truncate our old historical data

Stage 3 : FINANCIAL MEETINGS AUDIO CALL TRANSCRIPTION, TEXT SUMMARIZATION

Defines all Process for transcribed audio to text and getting text summarization & Sentimental Analysis on Financial Audio Calls Uses Amazon Transcribe to convert Audio to Text and uses Text Summarization Gensim algorithm to give summarization of audio files and also generate for the same

Stage 4 : FORECASTING PIPELINE

Streamlit will be running on the EC2 instance, will provide various functionality of forecasting Stock prices & Trends thus allowing business team to make decisions based on sentiments of forecasted prices of stocks for targeted company

Stage 5 : STREAMING TWEETS PIPELINE , LIVE NEWS & GOOGLE TRENDS PIPELINE

Streamlit will be running on the EC2 instance, will provide various functionality of fetching Tweets, and perform various algorithms via functionality like word cloud formation of negative sentiments, positive sentiments, etc.

Stage 6: USER AUTHENTICATION PIPELINE

Streamlit will be running on the EC2 instance, will provide various functionality, In this stage, the Lambda function is invoked via APIs to get Token from cognito and email is sent on user’s email id as an MFA and data is fed into Dynamo DB

Stage 7: MASKING ENTITY PIPELINE

Streamlit will be running on the EC2 instance, will provide various functionality, In this stage, the Lambda function is invoked via APIs to get done masking from S3 files to comprehend jobs and get data into s3 and then to streamlit

Stage 8: INTERACTIVE DASHBOARD WEB APPLICATION PIPELINE

FLASK web app integrated Dashboard will be running on the EC2 instance, will provide various functionality In this stage

Stage 9: TECHNICAL DOCUMENTATION

Collated results are sent for report generation through the Reporting tool Amazon Simple Notification Service will be used for any pipeline failures, successful runs, and triggering email notifications.

Built on Unicorn Platform