Summary:
The stock market has fluctuated heavily in recent times due to various events and the world is in need to make the best of data engineering for the financial domain. We aim to find how much impact the financial news has on the stock market trends. From Market Research, Forecasting is need of the Hour to make investment strategies for Client, Client also needs a way to speed up decision making by 80% using innovative Data engineering & Cost-Effective Techniques to help Core management team summarize Financial meetings from audio files via SaaS product, along with features like Protecting User Data etc. thus helping Other teams access the data for further use.
End Goal : To develop a Saas Product to let User handle following client requirements, as use cases:
Our solution as a Streamlit App for Functions/Operations on Company's Stock health via News, Twitter & Google Trends to make Strategic Investments on Stocks Current health & forecasting technologies
Project Report (google colabs)
AWS Architecture - Serverless Design - Fully automated triggered, Octacore Pipelines
Stage 1: WEB SCRAPING, DATA PREPROCESSING, LABELLING AUTOMATED PIPELINE
Getting various Company Data from World resources for use case Financial Analytics on Trends and for use case 2 ( text summarization & sentimental analysis on audio files) we will feed on our self generated dataset from Amazon Polly Historical call audio data will be into S3 Bucket which will be fed into Database To feed realistic trends we will feed on real time tweets from twitter, google trends and various news sources to get headlines Cloudwatch is used to monitor PUT operations on the S3 bucket and invoke a Step function To run Lambda to get all raw data into S3 Bucket For example : Step Function runs following : Scraping from Twitter feed streams into S3 Bucket Scraping from Financial News streams into S3 Bucket Scrapping from Yahoo Stocks into S3 Bucket
Stage 2 : AWS EXTRACT TRANSFORM LOAD PIPELINE
AWS Glue scheduled crawlers in every hour to crawl and fetch data from various sources AWS Catalog will serve as Metadata database for all catalog metadata AWS Glue Jobs are then used to function on Crawled data which is fed into Redshift All Extract Transform load Scripts are stored in S3 and final data is into Redshift We have leverage Redshift ML query auto scheduler to Truncate our old historical data
Stage 3 : FINANCIAL MEETINGS AUDIO CALL TRANSCRIPTION, TEXT SUMMARIZATION
Defines all Process for transcribed audio to text and getting text summarization & Sentimental Analysis on Financial Audio Calls Uses Amazon Transcribe to convert Audio to Text and uses Text Summarization Gensim algorithm to give summarization of audio files and also generate for the same
Stage 4 : FORECASTING PIPELINE
Streamlit will be running on the EC2 instance, will provide various functionality of forecasting Stock prices & Trends thus allowing business team to make decisions based on sentiments of forecasted prices of stocks for targeted company
Stage 5 : STREAMING TWEETS PIPELINE , LIVE NEWS & GOOGLE TRENDS PIPELINE
Streamlit will be running on the EC2 instance, will provide various functionality of fetching Tweets, and perform various algorithms via functionality like word cloud formation of negative sentiments, positive sentiments, etc.
Stage 6: USER AUTHENTICATION PIPELINE
Streamlit will be running on the EC2 instance, will provide various functionality, In this stage, the Lambda function is invoked via APIs to get Token from cognito and email is sent on user’s email id as an MFA and data is fed into Dynamo DB
Stage 7: MASKING ENTITY PIPELINE
Streamlit will be running on the EC2 instance, will provide various functionality, In this stage, the Lambda function is invoked via APIs to get done masking from S3 files to comprehend jobs and get data into s3 and then to streamlit
Stage 8: INTERACTIVE DASHBOARD WEB APPLICATION PIPELINE
FLASK web app integrated Dashboard will be running on the EC2 instance, will provide various functionality In this stage
Stage 9: TECHNICAL DOCUMENTATION
Collated results are sent for report generation through the Reporting tool Amazon Simple Notification Service will be used for any pipeline failures, successful runs, and triggering email notifications.