Results 1 to 8 of 8

Thread: Extracting data from Web HTML

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Seeker nscjpn's Avatar
    Join Date
    Apr 2018
    Posts
    6
    Articles
    0
    Excel Version
    Excel 2016

    Question Extracting data from Web HTML



    Unable to download the data from below site using Power Query:
    https://fcra2010.in/ngos-raising-ove...f-fc-annually/

    The PQ result shows only Document and not Table
    Is it possible to download the tables from this page?


  2. #2
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,493
    Articles
    43
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    Hi there,

    I looked at this with both Excel and Power BI (which has a newer/better From Web experience.) Neither of them can extract data from the body of the page. Excel's connector requires <table> tags in the HTML (which this page doesn't have.) Power BI can also extract things based on the page's CSS. Whatever the programmer did, they aren't using CSS to render these either.

    So unfortunately, it looks like Power Query isn't going to be able to extract data from these pages.
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book Master Your Data for Excel and Power BI, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  3. #3
    Seeker nscjpn's Avatar
    Join Date
    Apr 2018
    Posts
    6
    Articles
    0
    Excel Version
    Excel 2016
    Quote Originally Posted by Ken Puls View Post
    Hi there,

    I looked at this with both Excel and Power BI (which has a newer/better From Web experience.) Neither of them can extract data from the body of the page. Excel's connector requires <table> tags in the HTML (which this page doesn't have.) Power BI can also extract things based on the page's CSS. Whatever the programmer did, they aren't using CSS to render these either.

    So unfortunately, it looks like Power Query isn't going to be able to extract data from these pages.
    Thank you.I understand that the site is based on Java and hence PQ is not working

  4. #4
    Seeker garylhaas's Avatar
    Join Date
    Apr 2016
    Location
    milwaukee, wi
    Posts
    5
    Articles
    0
    Excel Version
    office 365
    option 1: CTRL S / Save webpage HTML to PC - use Power Query to extract data from text (without navigating CSS)
    option 2: Power Automate Desktop

    Remember what they said about Power Query
    Power Query is powerful they said
    Power Query can do lots of stuff they said
    Power Query will build the code for you they said
    Power Query is easy they said

    I was able to extract data from the link in under 3 minutes
    (after tinkering for 15 minutes and inserting a wait statement)

    Power Automate Desktop will also step through the web pages of the link

    Think of the following as Power Automate Desktop's "M-code"
    Code:
    WebAutomation.LaunchFirefox.LaunchFirefox 
      Url: $'''https://fcra2010.in/ngos-raising-over-a-crore-of-fc-annually/''' 
      WindowState: WebAutomation.BrowserWindowState.Maximized 
      ClearCache: False 
      ClearCookies: False Timeout: 
      60 BrowserInstance=> Browser
    
    WAIT 20
    
    WebAutomation.ExtractData.ExtractTableInExcel 
       BrowserInstance: Browser Control: 
       $'''html > body > div > div > div > div > main > article > div > div > div > 
       section:eq(2) > div > div > div > div > div > div > table > tbody > tr''' 
       ExtractionParameters: {
            [$'''td:eq(0)''', $'''Own Text''', $'''''', $'''Value #1'''],
            [$'''td:eq(1)''', $'''Own Text''', $'''''', $'''Value #2'''],
            [$'''td:eq(3)''', $'''Own Text''', $'''''', $'''Value #3'''] } 
       ExcelInstance=> ExcelInstance
    
    Excel.CloseExcel.CloseAndSaveAs 
        Instance: ExcelInstance 
        DocumentFormat: Excel.ExcelFormat.FromExtension 
         DocumentPath: $'''c:\\temp\\trashme.xlsm'''
    Attached Files Attached Files
    Last edited by Rebekah; 2022-02-01 at 08:59 PM. Reason: Added code tags

  5. #5
    Conjurer pinarello's Avatar
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    177
    Articles
    0
    Excel Version
    Office 365
    In Windows 11 the online version of Power Automate is available. This is probably not sufficient to reproduce your demo?

  6. #6
    Seeker garylhaas's Avatar
    Join Date
    Apr 2016
    Location
    milwaukee, wi
    Posts
    5
    Articles
    0
    Excel Version
    office 365
    Quote Originally Posted by pinarello View Post
    In Windows 11 the online version of Power Automate is available. This is probably not sufficient to reproduce your demo?
    In Windows 10, you load the free Power Automate DESKTOP

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •