Project 5 : Predict markets stocks of Google, Facebook & Amazon.

MAP670G - Data Stream (2021-2022)

Alexandre PERBET
Cyril NERIN
Hugo RIALAN

PART 3: PLOT THE RESULTS OF ON-LINE MACHINE LEARNING



Project 5 : Collect trading data using Yahoo finance API and use online regression to predict markets stocks of Google, Facebook & Amazon.

Option 2 : For each of these 5 countries, use 1 major industry stock data For ex, in US Google, in France BNP Paribas, in China Alibaba, in Russia or England, use a major international industry. This option was initially given in the project.

For each option, each group should use at least 3 different data streams, with online and adaptive regression on RIVER (such as https://riverml.xyz/latest/api/tree/HoeffdingAdaptiveTreeRegressor/) and compare the performances with batch regression model (scikit-learn).

ToDo: Compare online Regression vs Batch Regression and discuss the performance.

Bonus : Use recent stock market data (from January to March 2022).

Online resources: You can use the Python library to collect Yahoo Finance data in streaming https://pypi.org/project/yfinance/ You can compute time-series statistics and moving averages (MACD) for features engineering https://www.statsmodels.org/stable/tsa.html

Libraries

Utility functions

Launching the servers

LAUNCHING ZOOKEEPER AND KAFKA SERVER ON WINDOWS

In a first terminal, run the following commands:

cd %KAFKA_DIR%

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

In a second terminal, run the following commands:

cd %KAFKA_DIR%

.\bin\windows\kafka-server-start.bat .\config\server.properties

--

we assume that Zookeeper is running default on localhost:2181 and Kafka on localhost:9092.

Plot the results from Kafka predict topics

Archi_KAFKA_3.PNG

We create a Kafka topic for each company and stream the stock data retrieved with yfinance into these topics.

In another notebook, we will retrieve this stock market data to apply a Machine Learning model with RIVER.

In the last notebook, we plot the prediction results from the data stored in the Kafka predicts topics

stock market data identifiers

Notebook settings

Creation of topics if needed

Plot the results and the accuracy of the prediction

Visualize the results