VBA Code to Extract Data from Website to Excel
To perform HTML screen scrapping or to automatically pull data from website into Excel vba, use this simple code in this page.
Programmers & Software testing Professionals use this technique when ‘Excel query’ to fetch webpage option could not get required data from web or due to data formatting issues. For example, if we need a stock quotes in Excel, then we can use a Excel webquery option. But, we cannot hand pick the required data alone.
Lets see how to pull data from a website with Excel VBA code automatically.
Also Read: How to automatically pull website data from Chrome using Excel VBA & Selenium Type library?
How to automatically pull data from Website to Excel using VBA?
HTML screen scraping can be done by creating instances of the object “MSXML2.XMLHTTP” or “MSXML2.ServerXMLHTTP”, as explained in the code snippet below.
Use this code in VBA to pull data from website i.,e HTML source of a page & place the content in a excel sheet.
- Create new excel workbook
- Press Alt + F11 to open VB editor.
- Copy paste the below code in space provided for code.
- Change the URL mentioned in the code.
- Execute the code by Pressing F5.
Private Sub HTML_VBA_Pull_Data_From_Website_To_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code. URL to Extract data from.
sURL = "http://WWW.WebSiteName.com"
'Pull/Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
'Get webpage data into Excel
ThisWorkbook.Sheets(1).Cells(1, 1) = sPageHTML
MsgBox "XMLHTML Fetch Completed"
End Sub
Once you run the above code, it will automatically pull website data in Excel vba as a HTML code.
You can parse the HTML elements from the web page as explained in out another article: How To Extract HTML Table From Website?
It explains with a same VBA HTML Parser code on how to process each HTML element, after the web page content is extracted.
Clear Cache for every VBA Extract Data from Website
Use this below code every time, before executing the above code. Because with every execution, it is possible that extracted website data can reside in cache.
Also Read: Extract URL From Sitemap of a Website
If you run the code again without clearing the cache, then old data will be displayed again. To avoid this, use the below code.
Shell "RunDll32.exe InetCpl.Cpl, ClearMyTracksByProcess 11"
Note: Once this code is executed, it will clear the cache in your web browser. So, please execute this with caution. All your previous sessions and unfinished work will be deleted from browser cache.
Detect Broken URL or Dead Links In Webpage
Sometimes we might need to check only if the URL is active or not. During that time, just insert the below checking IF condition just after the .Send command.
If the URL is not active, then this code snippet will throw an error message.
oXMLHTTP.send
If oXMLHTTP.Status <> 200 Then
MsgBox sURL & ": URL Link is not Active"
End If
Search Engines Extract Website Data
Web browsers like IE, Chrome, Firefox etc., fetch data from webpages and presents us in beautiful formats.
Why do we need to download Website data with a program? Even few crazy programmers develop such applications & name it as Spider or Crawler? Because it is going to Crawl through the web and extract HTML content from different websites.
Internet Search Engines: Google, Bing etc., have crawler programs that download webpage content and index. Guessing, nobody is trying to build a search engine with Excel, but may be for processing website content for other purposes like counting for words, checking specific URLs, Descriptions or Correctness of Email IDs etc.,
Onpage SEO Analysis: Web admins want to fetch their own website data for SEO analysis, marketing people do it for extracting email ids for promoting their products.
Excel Query Fails: Some websites get updated so frequently, like Stock Index, Facebook timeline, Sports updates, Commodity prices, News updates etc.,In this case one would not like to click on ‘Refresh Page’ multiple times.
Use this VBA to extract data from website and accomplish Automated content processing.
Additional References
Though this article explained about fetching webpage content, the options available with MSXML2.ServerXMLHTTP are not discussed. To know more about this options, visit to MSDN Library.
Hi,
Just a suggestion.
When I use this code, I am getting error message “a connection with the server could not be established”.
So, I used “MSXML2.XMLHTTP” instead of “MSXML2.ServerXMLHTTP” and it worked. But not very sure about how this worked.
I am posting here because I have another problem now. The code is not fetching full html source. Part of HTML Source is getting truncated. Please suggest if you have solution to this problem.
Thanks,
Puvi.
Thanks for sharing your thoughts.
This is not a problem with the function. I believe you are trying to write the html content of the website to an Excel Cell. You cannot write more than 32767 characters into a cell. Refer this link to know more about the limitation in a excel sheet: http://office.microsoft.com/en-in/excel-help/excel-specifications-and-limits-HP010073849.aspx
To resolve this, try to store the website content in a string variable or write it to a output text file to reuse it later.
Hope this helps.
I copied and pasted (changing the website url) the above code and it didnt import any data at all and I did receive the message “fetch completed”. the first row’s height was increased but that was the only thing the code did or at least it seemed as though. On the other hand, I used the following code on a linked in forum for excel and vba and it did import.
https://www.linkedin.com/groups/82527/82527-6150113411775676416?midToken=AQHl66-rrs7Gxw&trk=eml-b2_anet_digest_of_digests-hero-7-discussion~subject&trkEmail=eml-b2_anet_digest_of_digests-hero-7-discussion~subject-null-a5g92~ipps82gg~sc&_mSplash=1
If possible please email me the website URL for which the code didnot fetch the content. That would be helpful for me to analyze what went wrong.
I fоund thіѕ wеbѕіtе vеrу uѕеfull.
Cаn I аutоmаtе thе рrосеѕѕ оf fеtсhіng thе dаtа
frоm a wеbѕіtе without hаvіng tо еntеr thе vаluеѕ ѕеԛuеntіаllу іntо thе
wеbраgе?? Fоr еg, I nееd tо gеt thе dаtа fоr vаluеѕ 56679878 tо 56698765. whіlе
dоіng thіѕ wе nееd tо еntеr thе vаluеѕ іndіvіduаllу аnd ѕubmіt a buttоn аnd gеt
thе dаtа аnd thеn thе nеxt numbеr. саn wе аutоmаtе thіѕ process??
I’m looking to utilize this to scrape some sales leads about pages from their websites. Once I have it in here, any idea how to filter out all but the main content of the page?