Extracting data from Web HTML

Hi there,

I looked at this with both Excel and Power BI (which has a newer/better From Web experience.) Neither of them can extract data from the body of the page. Excel's connector requires <table> tags in the HTML (which this page doesn't have.) Power BI can also extract things based on the page's CSS. Whatever the programmer did, they aren't using CSS to render these either.

So unfortunately, it looks like Power Query isn't going to be able to extract data from these pages.
 
Hi there,

I looked at this with both Excel and Power BI (which has a newer/better From Web experience.) Neither of them can extract data from the body of the page. Excel's connector requires <table> tags in the HTML (which this page doesn't have.) Power BI can also extract things based on the page's CSS. Whatever the programmer did, they aren't using CSS to render these either.

So unfortunately, it looks like Power Query isn't going to be able to extract data from these pages.
Thank you.I understand that the site is based on Java and hence PQ is not working
 
option 1: CTRL S / Save webpage HTML to PC - use Power Query to extract data from text (without navigating CSS)
option 2: Power Automate Desktop

Remember what they said about Power Query
Power Query is powerful they said
Power Query can do lots of stuff they said
Power Query will build the code for you they said
Power Query is easy they said

I was able to extract data from the link in under 3 minutes
(after tinkering for 15 minutes and inserting a wait statement)

Power Automate Desktop will also step through the web pages of the link

Think of the following as Power Automate Desktop's "M-code"
Code:
WebAutomation.LaunchFirefox.LaunchFirefox 
  Url: $'''https://fcra2010.in/ngos-raising-over-a-crore-of-fc-annually/''' 
  WindowState: WebAutomation.BrowserWindowState.Maximized 
  ClearCache: False 
  ClearCookies: False Timeout: 
  60 BrowserInstance=> Browser

WAIT 20

WebAutomation.ExtractData.ExtractTableInExcel 
   BrowserInstance: Browser Control: 
   $'''html > body > div > div > div > div > main > article > div > div > div > 
   section:eq(2) > div > div > div > div > div > div > table > tbody > tr''' 
   ExtractionParameters: {
        [$'''td:eq(0)''', $'''Own Text''', $'''''', $'''Value #1'''],
        [$'''td:eq(1)''', $'''Own Text''', $'''''', $'''Value #2'''],
        [$'''td:eq(3)''', $'''Own Text''', $'''''', $'''Value #3'''] } 
   ExcelInstance=> ExcelInstance

Excel.CloseExcel.CloseAndSaveAs 
    Instance: ExcelInstance 
    DocumentFormat: Excel.ExcelFormat.FromExtension 
     DocumentPath: $'''c:\\temp\\trashme.xlsm'''
 

Attachments

  • trashme.xlsm
    9.3 KB · Views: 1
Last edited by a moderator:
In Windows 11 the online version of Power Automate is available. This is probably not sufficient to reproduce your demo?
 
Which function can I use to access the generated code and change it if necessary?
 
PowerAutomate Desktop

Which function can I use to access the generated code and change it if necessary?

have you installed Power Automate Desktop
have you looked at videos on YouTube about web scrapping with Power Automate Desktop

i was not aware that you had posted a question

it might be helpful to communicate through email
garylhaas2005@yahoo.com
 
Back
Top