Cryptocurrency Price Prediction
Abstract
In the last decade, cryptocurrency has emerged in the financial area as a key factor in businesses and financial market opportunities. Accurate predictions can assist cryptocurrency investors in the right investing decisions and lead to potential increased profits. Additionally, they can also support policymakers and financial researchers in studying cryptocurrency markets behaviour. Nevertheless, cryptocurrency price prediction is considered a very challenging task, due to its chaotic and very complex nature.
The objective of this project is to predict the closing price of a particular cryptocurrency at different frequencies using machine learning algorithms forecasting cryptocurrency prices. The results obtained, provide significant evidence that machine learning models are not able to solve this problem efficiently and effectively.
Conducting detailed experimentation and results analysis, the project concludes that it is essential to invent and incorporate new techniques, strategies and alternative approaches such as: more sophisticated prediction algorithms, advanced ensemble methods, feature engineering techniques and other validation metrics.
Introduction
Motivation
Cryptocurrency is a digital asset designed to work as a medium of exchange wherein individual coin ownership records are stored in a ledger existing in a form of computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. It typically does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies are put away in an advanced wallet which is essentially similar to a virtual financial balance. Cryptocurrencies typically use decentralized control as opposed to centralized digital currency and central banking systems. When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database. The record of the considerable number of exchanges, the timestamp information is put away in a spot called Block chain. Each record in a block chain is known as a square. Each square contains a pointer to a past square of information.
Existing Systems and Solutions
The cryptocurrency market being relatively new when compared to traditional markets such as stocks, foreign exchange and gold, there is a significant lack of studies in regard to predicting its price behaviour. The already existing systems have a bit of a bad dataset where it contains null values in the dataset, so that the model may not be correctly predicted. There have been a few attempts at predicting the directional of the price of cryptocurrencies but with varying accuracy and methodologies. The accuracy of past models may still be considered to be low compared to traditional trading markets.
Product Needs and Proposed System
The proposed system is to analyse and forecast the price of the given cryptocurrency and price using historical data. This project is done using Supervised machine Learning algorithm such as Linear regression, Support vector regressor and ensemble methods. In previous works, machine learning-based classification has been studied for an only one-day time frame, while this work goes beyond that by using machine learning-based models for one, seven, thirty and ninety days. The developed models are feasible and have high performance.
Methodology
- Data Collection
- Import Responsible Library
- Preprocessing Data
- Feature Selection
- Apply Machine Learning algorithm
- Training and Testing
- Validation
- Future Prediction Report
Data Collection
The cryptocurrency historical price data is gathered using the publicly available API. From the Yahoo finance website. This dataset is gathered an interval length of every five-minute and it continued the data collection process. From Coin base API that collected the data of Bitcoin, Ethereum, Tether, Litecoin, BitcoinCash in order to predict the fluctuation of the cryptocurrency price with other collected datasets.
Historical data of Cryptocurrency has a total of 8 attributes, they are
1. Date-Date of data recorded
2. Name-Name of the cryptocurrency
3. Open-Opening price of cryptocurrency for particular date
4. High-Highest price of cryptocurrency sold in particular date
5. Low-Lowest price of cryptocurrency sold in particular date
6. Close-Closing price of cryptocurrency in particular date
7. Adjusted close-The adjusted closing price amends a stock’s closing price to reflect that stock’s value after accounting for any corporate actions
8. Volume-Volume indicates how many cryptocurrency are being bought and sold on specific exchanges
Data Source
Bitcoin — https://in.finance.yahoo.com/quote/BTC-INR/history?p=BTC-INR
Ethereum — https://in.finance.yahoo.com/quote/ETH-INR/history?p=ETH-INR
Tether — https://in.finance.yahoo.com/quote/USDT-INR/history?p=USDT-INR
Litecoin — https://in.finance.yahoo.com/quote/LTC-INR/history?p=LTC-INR
BitcoinCash — https://in.finance.yahoo.com/quote/BCH-INR/history?p=BCH-INR
The dataset is collected using pandas_datareader packing in pandas. Once the data of five cryptocurrency data are gathered, a new column is created in each data named as ‘Name’ which represents the name of the cryptocurrency. To make five data’s into a single dataset I have used concat method in pandas. Thus we now created a new dataset to proceed to the next step.
Preprocessing and Feature Selection
The dataset that I have is containing five kinds of cryptocurrency names, so that the names are now made into numerical values by using the Label Encoder in sklearn package. The dataset was pre-processed to remove features that contain empty elements. Out of 7 features that were extracted, 1 were removed. The features that were removed are listed. Then, the inputs were standardized by mapping the mean and standard deviation to 0 and 1 respectively for each row due to the large variability in the input data. This helps in training the ensemble more efficiently and accurately. In this project, as we get the historical data from internet and check all the elements of dataset, coming to the date column. Extract date, month and year into three different columns. so that it will be easy for the model when it comes to prediction.
Once the dataset is shaped for prediction, the values of each columns is checked. As this is a price prediction problem, the dataset has prices in each column. Scaled the High, Low, Open, Close and Volume columns using min-max scaling method, thus it makes the model to predict easily. Once the data is scaled I came to know that volume column is not much needed for this prediction, so I removed the Volume feature from the data.
Splitting Date
The Date in the dataset is of the type “yyyy-mm-dd “, while the data is sent for fitting, the model does not get a proper idea of prediction. In order to avoid such situations, I have extracted the Day, Month, and Year from the Date into each of a separate column.
Code:
df[‘Day’] = df[‘Date’].map(lambda x: x.day)
df[‘Month’] = df[‘Date’].map(lambda x: x.month)
df[‘Year’] = df[‘Date’].map(lambda x: x.year)
Scaling
As the data is related to cryptocurrency the features in the data has its values as the prices i.e, numerical data. The data is plotted in histogram and found that the data is skewed, so in this situation we have to scale the data, I have used MinMax scalar.
The MinMax scaler will scale our data between (0 to 1) or( -1 to 1), in this project I have given the range as (0,1), where it ranges the minimum value as 0 and the maximum value as 1.
Code:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
cols = [‘High’,’Low’,’Open’,’Close’,’Volume’,’Adj Close’]
df[cols] = scaler.fit_transform(df[cols])
Model Developement
The goal of this project is to study time-series data and explore as many options as possible to accurately predict the cryptocurrency prices in the future. I would use TimeSeriesSplit function in scikit-learn to split the data into training set and testing set, it could split the whole dataset into several packs and in each packs, the indices of testing set would be higher than training set. By doing this can prevent look ahead bias, which means the model would not use future data to train itself. I separate the last 90 days data as the validation dataset, to test the models by the data they have never seen.
Algorithms Applied
In this Proposed system have used different machine learning algorithms to see if the are able to accurately predict the closing price of bitcoin. I have used 5 different regression machine learning algorithms that’s are shown in below.
· Support Vector Regressor (SVR)
· Random Forest Regressor
· Lasso CV, Ridge CV
· Gradient Boosting Regressor
· Stochastic Gradient Descent (SGD)
- Linaer Regressor: The model produced by Linear Regression depends only on a subset of historical data that means dependent data, because the cost function for building the model ignores any training data close to the model prediction.
In simple says, Linear Regression predict the known parameter.
Find best line using y=mx+c equation and find the error using (actual-predicted values) then get square root of that value
2. Random Forest Regressor: Random Forest is a supervised learning algorithm. Its mean group of trees or combine of multiple decision tree. It creates a forest and makes it somehow random. The forest it builds, is an ensemble of Decision Trees, most of the time trained with the “bagging” method.
The general idea of the bagging method is that a combination of learning models increases the overall result. Random decision forests correct for decision trees habit of overfitting to their training set.
In Proposed system have used two parameters min_sample n_estimators=50 (default value =10), the number of trees in the forest. and random_state=0, random_state is the seed used by the random number generator.
3. Lasso CV: The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, lasso is help to select the features through m1*x1+m1*x2+m2*x3+m3*x4+…..effectively reducing the number of variables upon which the given solution is dependent.
When slope is slow that feature will be removed with the help of lasso that removable feature not helpful to prediction when use magnitude function its moving towards to zero or atleast one time reach to zero.
Ridge and Lasso these two are under from linear regression these two are used of generalize the model that means low bias and low variance.
In ridge function reduce the cost function in says simply its convert from high variance to low variance. Because of ridge are add one more parameter in cost function.
Performs L1 regularization, i.e. adds penalty equivalent to absolute value of the magnitude of coefficients
Minimization objective = LS Obj + α * (sum of absolute value of coefficients)
4. Gradient Boosting Regressor: Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of boosting to arbitrary differentiable loss functions. GradientBoostingRegressor supports a number of different loss functions for regression which can be specified via the argument loss; the default loss function for regression is least squares (‘ls’).
I have used the following parameters: n_estimators=70, the number of boosting stages to perform. learning_rate=0.1, learning rate shrinks the contribution of each tree by learning_rate, max_depth=4, the maximum depth limits the number of nodes in the tree, random_state=0, loss=’ls’, ‘ls’ refers to least squares regression.
5. Stochastic Gradient Descent (SGD): SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net)
Training Overview
Training methods:
In proposed system split the data into training and testing through train_test_spli function Here proposed system used 0.8 data is training process and remaining 0.2 data is testing process.
from sklearn.model_selection import train_test_split
Training Process:
The process of training an ML model involves providing an ML algorithm that is, the learning algorithm with training data to learn from. The term ML model refers to the model artifact that is created by the training process.
In proposed system is under from supervised machine learning algorithms so dependent data can learn from independent data. So that machine can learn from that historical data
X_train, X_test, Y_train, Y_test= train_test_split(X,y,test_size=0.2)
In this proposed system use three types of train test split in first experiment use 80% of training and 20% of testing through sklearn train_test_split.
t = 0.8
t = int(t*len(cm))
In this proposed system use three types of train test split in second experiment use 70% of training and 30% of testing through sklearn train_test_split.
t = .7
t = int(t*len(cm))
Experimental design & Evaluation
Experimental Design
In existing system done prediction only Bitcoin price prediction but in this experiment user can predict the five different cryptocurrency such as Bitcoin, Ethereum, Tether, Litecoin and BitcoinCash. And finally result can comparise by various algorithm select which one is highest accuracy that model will be applicable in user usage.
Design of Experiment:
This experiment design by web-app and deploy through Heroku so that user can give the details such as cryptocurrency which one wants to buy then select the date year and month through a web page then click predict button user can predict the future rate that is experiment design of this product.
Experiment-1: Train test Feature selection and Select X and Y :
After data scrubbing obviously, data is ready to involve the machine learning model but we must select the best feature before model building.
This proposed system can select the best features and select x and y then go to build the model.
In this proposed system use three types of train test split in first experiment use 80% of training and 20% of testing through sklearn train_test_split.
t=0.8
t = int(t*len(df))
df[‘Name’]=Le.fit_transform(df[‘Name’])
df[‘Open’] = df[‘Open’]
df[‘High’] = df[‘High’]
df[‘Low’] = df[‘Low’]
df[‘Volume’] = Le.fit_transform(df[‘Volume’])
df[‘Date’] = pd.to_datetime(df[‘Date’])
df[‘Year’] = df[‘Date’].dt.year
df[‘Month’] = df[‘Date’].dt.month
df[‘Day’] =df[‘Date’].dt.day
X = df[[‘Name’,’Open’, ‘High’,’Low’,’Year’,’Month’,’Day’]]
y = df[‘Close’]
Experiment-2: Model Build and training
After select x and y that data is ready to build the model here choose the algorithm and fit the model through fit() method after the fitting system will predict the data through .predict() method. In this proposed system train the model through various types of machine learning algorithm when use different type of algorithm that give different types of prediction.
Different types of algorithms:
Linear Regression
Random forest Regression
Decision Tree Regression
Experiment-3: Given input and get output
The training and Testing process almost over but when new data comes to the product machine can predict the data correctly now only product testing almost good. Here test the data given various type of input.
print(model.predict([1,1234,2342,1221,2341,2021,3,2]))
Experimental Results:
Algorithm result variance in experiment:
In this proposed system used various types algorithm that algorithm give different types of algorithms. In below shown various output through plotly visualization.
Experiment-1: Linear Regression
In this proposed system train the model used linear regression. In this linear regression train the model through fit()method when developer used linear regression without any tuning, algorithm result is not efficient but algorithm train with default tuning like linear regression work with n_jobs = -1
Experiment-2: Decision Tree algorithm
After fit and predict use linear regression then go to move decision tree algorithm. In decision tree algorithm split nodes in this proposed system train the data used decision tree with the help of some tuning.
regressor = DecisionTreeRegressor(random_state=0,criterion=’mse’, splitter=’best’)
Experiment-3: Random Forest:
In above proposed system used decision tree algorithm that algorithm prediction is efficient but developer goes to another advance level of algorithm, that algorithm is ensemble techniques. Random forest algorithm is one of the ensemble techniques this algorithm overcome of underfit in decison tree.
Product Delivery & Deployment
Need of the User manual:
After develop the proposed system, developer obviously delivery the product. When delivery the product user manual is mandatory. Because of user have no prior experience of product because they are end user. If suppose any issue when user use this product they can easily solve that issue through this user manual.
What are the details are available in User manual:
In user manual file have process of developing and deploying but importantly user manual have how to use this product like how to get URL and paste the URL, how to handle the main file like how to enter cryptocurrency because this product full and fully develop by machine learning so machine cannot accept the categorical data but end user when give the categorical like Bitcoin, Ethereum, Tether, Litecoin and BitcoinCash application will be raise an error. Such a issues are same so developer can give solution through user manual.
In below what are the problem solving are available in that manual:
- Value error when click predict
2. Application error
3. Parse error
4. Internal server error
Value error When Click Predict
If suppose you enter the cryptocurrency bitcoin instead of 0 system cannot accept data because product full and fully developed by machine learning. And developer solve this type of issue through manual and very easily through in our design page.
That means when user enter the cryptocurrency page will be shown please enter cryptocurrency in number (Bitcoin : 0, BitcoinCash : 1, Ethereum : 2, Tether : 3, Litecoin : 4) through place holder.
Application error:
If suppose user have some prior knowledge about development they are update some features they are solving some issue so developer give solution this type of issue.
This type of issue is nothing just version mismatching of some of the library missing when deploy the product so user go to change or update the requirements.txt file these all solution are provide by user manuals.
Parse and Internal Server:
This type of error when user change the data or file so user handle the file very securely and carefully user directly use given URL without file but developer give the file also.
Deployment Process
When the product is ready to run local server then obviously go to deploy the project because when deploy the product user can easily used.
In this proposed system done by local server then upload the code and files in github when github is done then create heroku account. Because of this product deploy through heroku. Heroku provide one URL that URL will be used by user.
Local Server running:
In local server running we can run localhost Running on (http://127.0.0.1:5000/)
Deploy Processing in Heroku
Conclusion
Through in this proposed system who want to buy some kind of Cryptocurrencies like Bitcoin, Ethereum, Tether, Litecoin and BitcoinCash are not only buy suppose anyone investment that cryptocurrency like bitcoin investment, they cannot predict the next day rate so that user can easily predict the rate with the help of web application.
In this proposed system various types regression machine learning algorithm namely Linear regression, KNN regressor, Decision Tree regressor but these all not efficient so proposed obviously go to ensemble techniques in ensemble techniques are available in adaboost regressor, xgboost regressor but developer easily use random forest techniques that techniques are given high accuracy so developer finalize the algorithm.
Limitations and Future Work
Limitations:
In this proposed system have some limitations such as given below:
Ø In this proposed system predict the cryptocurrency price through supervised method
Ø In this system cannot predict the another kind of cryptocurrency like Cardano
.
Future Work:
In above discussing some limitation of this system in future outcomes they are overcome that all issues.
Future Outcome-1
In future when user enter click the Date, Month, and year system can automatically predict the future rate and recommend that product that means, given the one report also that like Today this cryptocurrency is fine to buy , investment or not fine in market.
Future Outcome-2
In next system can develop mobile application regarding to Cryptocurrency price predictions. Currently in this proposed system user can access only in given below.
But Future developer develop the app that apps are available in play store so any user can easily download and use it that app are shown in pricing graph also .
Thank You!!!