Today I want to show you how to integrate Excel and Python. In a way that we can call Python functions within Excel spreadsheets. This kind of integration is powerful because it gives the best of both worlds – Excel’s simplicity and the power of Python! We will demonstrate this by building a stock tracker to extract/scrape financial data from websites using Excel & Python.
Oct 11, 2017 Thanks for your video above. I’m contemplating moving from PC to Mac, and trying (without much success) to understand how complete the VBA implementation is in Mac Office 2016. I recently tested an xlsm file I wrote in PC Excel 2007 on a friend’s Mac Office 2016, and it choked on trying to create a Word file from within Excel. This uses the. Excel for Mac now supports Power Query refresh for many data sources, as well as query creation through VBA. Authoring in the Power Query Editor is not supported yet. Refresh Power Query queries You can refresh queries that use local.TXT.CSV.XLSX.XML or.JSON files as data sources. It's beyond web scraping, but i would like for example to enter a user name and a password, display a menu, select an item, and finally get access to the page with the information i want to scrape. (Even though i don't know how to code it, i heard about Selenium and Beautiful soup for Python). So, could this be done from vba excel?
By the end of this tutorial, you will learn how to:
There are many ways to get financial data from the Internet, the easiest way is through an API. Still, we’ll leave that to another tutorial. Today we’ll scrape stock data from Yahoo Finance website using BeautifulSoup
and requests
. Once you learn this, you’ll be able to scrape data from any website.
A word of caution for scraping websites: be aware of the target website’s bandwidth limitations, don’t flood it like sending thousands of requests in a second. That will be considered a DoS attack, which is regarded as a malicious act.
Now back to scraping, I’m using Chrome for this tutorial, but you can use any internet browser.
Let’s find Apple’s stock information on Yahoo Finance. Here is the URL: https://finance.yahoo.com/quote/AAPL, which looks like this:
First, we want to get its price: $262.47. Select this number on Yahoo Finance’s website, right-click, then choose “Inspect”. This will bring up the Chrome developer tools, which reveal the underlying HMTL code of the site we are viewing. A little bit of HTML knowledge helps a lot here because all the data we are trying to find is in HTML, and we just need to know where to look.
What we are interested in is a div
tag name, as unique as possible since a unique value will help narrow down the choices. I’ve settled on <div…>
line, but feel free to try other tags. The key to remember is that we need a tag (HTML code block) that includes the data we are trying to extract. We can see that the price is within our selected div
tag. If you want to try other tags, the one I selected in yellow should also work <div>
.
We need to use three Python libraries. If you don’t have them already, use the following command line input to install them.
We can use the requests
library to get the entire HTML document of the page with 2 lines of code:
The requests
library allows us to send HTTP requests easily to any server. The get()
method returns a Response
object. A value of 200 means OK, which indicates that we have made a request to the server and received some data back successfully.
The Response
object contains a .content
attribute, which literally means the text/content of the response. In this case, it’s the HTML code for the underlying website – Yahoo Finance. This text data is huge and we really don’t want to print it on the screen – it will hang your Python IDE. There’s no way we can extract data from such a large text data, so we need some help…
Since we care about only the information we are trying to scrape, namely stock price, volume, and etc., we can use BeautifulSoup
, which is a Python library for pulling data out of HTML files.
The soup.find_all()
Omniplan pro 3.5.1 for macos. method returns all the HTML code block that match the argument inside the parentheses. In our case, there’s only one of them, which is the code block <div..>
. Thanks to the unique tag value we picked earlier! Note this is not the only solution, so feel free to try other div
tags. The key to remember is that you want a code block that includes the price.
The above screenshot is the entire div block with id=”quote-header-info”. Price is within this block (green box). The object price
appears to be a list type object that contains 1 item, so we can access the actual div block text using price[0]
, since Python index starts from 0. We also want to further extract only the price from this nonsensical block of text. Note in the <span>
tag that contains the price, there’s an attribute data-reactid='14'
, we’ll take advantage of it.
With a little bit assistance from the helper method .get_text(), we just extracted the current Apple stock price. Pay attention that this value is a string
type.
Let’s try to scrape a few other pieces of information from the same website. For a stock tracker, I’m also interested in Apple stocks’ volume and the next earning announcement date. Same technique here:
BeautifulSoup
‘s engine to find the element that contains the data of interest. Then extract the text value.Good job! We have just completed the first part of the job! Next, let’s look at how to bring the data into Excel spreadsheet seamlessly with an Excel formula!
I’m posting a full version of the code, so feel free to grab it here, or from Github. The code is more complicated than the example we walked through, but the core concept is the same. Note that I place all the code inside a function named get_stock()
, which will return a list of data points we’d like to scrape. Note that the return value is called a list comprehension, which is essentially a Pythonic way to write a for loop in one line. Check out this tutorial if you want to learn about it.
Let me introduce another excellent tool – xlwings
, which is a Python library that allows us to leverage the power of Python from and with Excel. With it, you can automate Excel spreadsheets, write macros in Python, or write user-defined functions (UDF).
Here, we only focus on how to create user-defined functions in Python and use them in Excel. Check out this tutorial if you need help with xlwings
setup, or if you are interested in learning about how to automate Excel or write macros in Python.
Trust access to the VBA project object model
.The setup in Python is a lot easier compared to what we just did in Excel. Since we are creating user defined functions (UDF), we need to write a function in Python, and the function has to return some data to us. We did this in part 1 of the tutorial. Then, follow the below steps:
@xw.func
. This decorator will allow you to use call Python functions from Excel.Now the setup is complete. One last step we need to do is to load the Python function into Excel. We do this by clicking on the Import Functions in the xlwings tab in Excel. Remember, every time we make a change in the Python code, we need to re-import it here.
Our user defined function get_stock()
can return multiple data points in Excel by using an array formula. If you don’t know how to enter an array formula, here’s what you need to do to create one, using the below screenshot as an example.
B4:L4
)=get_stock(A4,$B$1:$L$1)
)To re-cap, now you know how to:
requests
and BeautifulSoup
xlwings
to create user defined functions (UDF) in Python and call them within ExceEnjoy your new stock tracker spreadsheet!