Export HTML Table to Excel – When Web query does not work
To Import a HTML table in a web-page we can use,
- Web Query option in Excel to import it to Excel or
- From IE browser, right click on the Web Page Table & choose option “Export to Microsoft Excel“.
With some website this web query option will not work properly.
In that case, You can use this code to export HTML Table to Excel using VBA.
Also Read: Download File from Website Using Excel
Excel VBA HTML Table Import – Step By Step
Extract the HTML Table content of the web page. The article in this page explains about how to import the HTML content.
- In my Previous Article, the web page content is imported to Excel sheet as text content. But in this article, it is assigned to a HTMLFile Object which has more options to retrieve the HTML Tags.
- Once you have the web page content, it will have the Table related Tags like <Table>, <TR> and <TD>.
- Now the last step is to process each row & cell in that table and transfer it to the Excel Sheet.
VBA To Export HTML Table – Pull Website Table Data to Excel
Copy paste this code to Excel VB editor. Type web page URL that you want to scrape in cell A1 of worksheet1. Then
Sub Export_HTML_Table_To_Excel() Dim htm As Object Dim Tr As Object Dim Td As Object Dim Tab1 As Object 'Replace the URL of the webpage that you want to download Web_URL = VBA.Trim(Sheets(1).Cells(1, 1)) 'Create HTMLFile Object Set HTML_Content = CreateObject("htmlfile") 'Get the WebPage Content to HTMLFile Object With CreateObject("msxml2.xmlhttp") .Open "GET", Web_URL, False .send HTML_Content.Body.Innerhtml = .responseText End With Column_Num_To_Start = 1 iRow = 2 iCol = Column_Num_To_Start iTable = 0 'Loop Through Each Table and Download it to Excel in Proper Format For Each Tab1 In HTML_Content.getElementsByTagName("table") With HTML_Content.getElementsByTagName("table")(iTable) For Each Tr In .Rows For Each Td In Tr.Cells Sheets(1).Cells(iRow, iCol).Select Sheets(1).Cells(iRow, iCol) = Td.innerText iCol = iCol + 1 Next Td iCol = Column_Num_To_Start iRow = iRow + 1 Next Tr End With iTable = iTable + 1 iCol = Column_Num_To_Start iRow = iRow + 1 Next Tab1 MsgBox "Process Completed" End Sub
Press F5 to execute this code.
Excel will pull HTML table & align the content in the Excel sheet in proper format.
Limitations of Parsing HTML <Table>
In many of the Website, even including Facebook, Twitter, the webpage will look like tables.
But they are embedded inside its HTML DIV & SPAN tags and not the <TABLE> tag.
The above code will work only if HTML has <table> tag. Other wise the parsing of the data has to be done using other DOM object commands.
It is better to use Facebook, Twitter API rather than just relying on HTML tags.
More Tips: Download Your Facebook, Twitter Profile Data to PC
I can’t tell you how helpful this was, thank you!
Quick question though. It appears as though this cache’s old information because when the query is run for the second time, it happens a lot faster.
This causes issues when what you’re querying is large. Is there any way to clear the cache after the information has been copied?
Thanks Jacob for the comment. It is a valid point and I am analyzing on how to Clear the Cache. I guess it is somewhere stored in the default browser cache.
I will post here on this very soon. Thank You.
Dear kumarapush
Did you find any way to clear the Cache..
Please I want to know as I am using this code for my excel…
but I need clear catch..as I run this macro many times
thanks
To Clear Cache of Web browser through VBA, add the below line before getting the HTML content from the Webpage.
Shell “RunDll32.exe InetCpl.Cpl, ClearMyTracksByProcess 11”
I haven’t tested this single line of code. But it should work out.Let me know if this works out for you, so that it will be helpful for others also. Thank You.
It works perfectly, thank you.
I just have a problem with one site where there are multiply tables defined in each other,
Could you help me how to modify this code in order to only download one table defined with its ID?
Thank you!
iTable is the variable that extracts each table in the webpage. If you don’t want to extract all tables, exit the loop when iTable > 0. before this line.
if iTable> 0 then Exit For ‘–> Add this line. If you still face issue, let me know the webpage to resolve issue.
With HTML_Content.getElementsByTagName(“table”)(iTable)
Add this line before statement “With”.
If Tab1.ID = “ResultsGrid” Then
and also add an “End IF” after the “End With” Line. I checked this. It will work now.
But you will get some empty lines, guess you will be able to resolve it.
getting following error
not enough storage is available -2147024882
please help
Need more info to debug the code. At which line or code you are getting this error?
(or Email me with a TeamViewer passcode to connect to your machine and resolve the issue)
This is a fantastic piece of code. Exactly what i need.
I use it on but it extracts all the tables in the page. I only need the one with the matchups. How can the code be modified for this?
Please refer the comments from others on this same page. There was a similar query and solution provided. Post again, if that solution is not feasible for your issue.
I tried your solution. But for the tablename i provided i get no results. Not sure what i am doing wrong
That is cool, and i’m already using this “.innertext” method, but my table has some links in it that i’d like to paste too. Does anyone here knows how can i get those? thx
What is the URL that your are trying to fetch? or Post a sample web page url to debug and provide a solution.
Helo Kumarapush!
I think that i’ve got something using innerhtml and string manipulation.
Bu tnow i’m trapped into this. Why doesn’t it work?
links(1) = “http://olp.conseur.org” & “/” & Mid(link(1), InStr(1, link(1), “href”) + 7, _
InStr(1, link(1), “>”) – InStr(1, link(1), “href”) – 8)
Knowing that links(1) is like this below:
7
Many thanks
1. Are you missing a ‘s’ in link(1). should it be link(1) or links(1)
2. Please do this and it would help in debugging the issue.
Place the below line just before the code you have mentioned.
Thisworkbook.sheets(2).cells(1,1) = link(1)
Thisworkbook.sheets(2).cells(2,1) = links(1)
and reply with the texts that you get in these cells with double quotes.
Hi,
I want to extract only one Table in the web page that is provided in the somewhere middle of the page. There are many tables in the page and sometime the desired table is not present in the webpage.
I have used “GetAllTables” but as provide me all the tables, it is not that useful.
HTML Code is something like:
Related U.S. Patent Documents
Application NumberFiling DatePatent NumberIssue Date<TD PCT/JP2011/005332Sep 22, 2011<TD
Foreign Application Priority Data
Sep 28, 2010
[JP]
2010-216445
Table just after the HTML tag “Related U.S. Patent Documents”.