What moves Bitcoin? – Towards Data Science

On the other hand, the Bitcoin movement over time has been very volatile. Less than three years ago the price of Bitcoin was around 1,000 USD, while the price as of December 1, 2019 is around 7,400 USD, an appreciation of 640%, but not only this, the price reached levels of almost 19,000 USD before falling 25% in less than 2 days. Therefore, Bitcoin has been subject to large changes in price, a volatility that makes investments in this cryptocurrency very risky and unattractive for investors seeking more moderate returns with controlled risk.
This is how we come to our working hypothesis:
Is it possible to predict the movement of Bitcoin through news analysis?
The objective of this article is to implement different predictions through statistical models to try and see if there is a relationship between the words that appear in the news and the Bitcoin movement. Additionally, we seek to see if it can help us predict the future movement of the cryptocurrency and monetize these predictions through portfolios. Taking advantage of the liquidity of the market we will observe the evolution of a Capital of USD $ 100,000 over a dynamic position in Bitcoin.
It is important to mention that Bitcoin is not exchanged in a regulated exchange, but is exchanged directly between users, “peer-to-peer”, the vast majority of them are Wallet users. For this reason, Bitcoin can always be exchanged, at any time of the day, in order to measure a daily change an opening (open) and closing price (price) approach of traditional markets was taken.
Remember that our objective variable is the change in open vs. Price, a categorical variable. Accompanied by this variable there are others that define the bitcoin market for each day. These are high, low among others. However, these variables don’t really correlate with our objective variable, it is clear that time series models could better interpret these variables, however, this is not the focus of this project.
It is here that the concepts of natural language processing (NLP) arise. The information that our model will use to predict our open vs. price objective variable come from newspaper headlines as mentioned in the introduction. It is prudent to present 2 new concepts here. Text-Vectorizers and N-grams.
A Text-Vectorizer is nothing more than a way to vectorize text, worth the redundancy, that is to say, to assign to each word or text a vector in some space. There are many ways to do this, some techniques assign a one hut vector for each word in your vocabulary, a fancy name to all the words in your Corpus. Some, other ways of vectorizing text assign a vector to each word that encodes context in particular geometries of a multidimensional space, such as the Word2Vec space. In our particular case, a vectorizer was used that counts the occurrences of each word and every 2-words in the news for 12 hours a day, prior to the opening of the Bitcoin market.
To understand better 1-word and 2-words we can check the concept of N-grams. Let’s look at an example with the next sentence:
In the previous sentence all 1-grams would make up the list:
- Bitcoin
- is
- a
- very
- effective
- etc.
One entry per word. The list of 2-grams or as we mentioned, 2-words would be:
- Bitcoin is
- is a
- a very
- very volatile
- etc.
These 2-words serve to extract information from pairs of words that often go together, such as New York. Now that we understand, suppose that the sentence previously written was the only headline for 12 hours. To this observation along with the price of open, price and our objective variable. A vector would be added where the occurrences of each 1-word and 2-word were counted. That is to say, it would be worth 1 in the corresponding cell for columnmercadobecause it was mentioned only once and 2 in the bitcoin and verycolumns as they were mentioned 2 times and so for all the n-gram counts in the vocabulary.
To better understand our data, let’s see the following graph with 2 axes:
Published at Sun, 08 Dec 2019 22:32:38 +0000
{flickr|100|campaign}
