Results 1 to 4 of 4

Thread: Selecting only certain tables from PDF import

  1. #1
    Acolyte GreenBoy's Avatar
    Join Date
    Apr 2015
    Posts
    26
    Articles
    0
    Excel Version
    Office 365

    Question Selecting only certain tables from PDF import



    Register for a FREE account, and/
    or Log in to avoid these ads!

    Hi - hope everyone is keeping well.

    I am trying to import data from PDF's which contain a mix of text and numeric data information - its the numeric data I need.

    However there are several issues presenting themselves, which need to be handled as part of the import, and I am looking for any help at this stage.

    The import of PD process works nicely, but I am getting 10 - 12 tables being presented as part of the import - out of these only 2 or 3 of the tables are of use, and are not contiguous (e.g. not handily tables, 2,3 and 4 for example) but rather tables 2, 4 and 6 or 7 for example - although the number of tables of useful data changes per report imported.

    The only thing the useful data has that the other tables don't, is that the header for column 1 is always "Date" - so the question is, is there a way to inspect the tables header for column 1 on each table at time of import and only import those that are "Date", or is there a way to only append tables with the "Date" header after import? **** CORRECTION *** the Header is only the same text e.g. "Date", after the header is promoted.

    As I say this is only the start of the issues with data formatting etc, but by limiting the data at import it will massively assist with the cleansing and correcting of the rest of the data

    Many thanks in advance
    Greenboy
    Last edited by GreenBoy; 2020-05-17 at 04:49 PM.

  2. #2
    Conjurer NormS's Avatar
    Join Date
    Jul 2017
    Posts
    119
    Articles
    0
    Excel Version
    Excel 2016 ProPlus
    If you can import so that you have a column, Tables, where each row contains one of your tables, you could pull the column 1 "header" out into a new column

    Code:
    = Table.AddColumn(#"Kept First Rows", "Custom", each [Tables]{0}[Column1])
    Filter on that column, delete it and then expand the Tables column.

  3. #3
    Acolyte GreenBoy's Avatar
    Join Date
    Apr 2015
    Posts
    26
    Articles
    0
    Excel Version
    Office 365
    NormS

    Thank you very much - I have seen about [Tables]{0}[Column1] before somewhere but have never really known much about it - obviously I need to do some home schooling on this.

    Thanks again
    GreenBoy

  4. #4
    Conjurer NormS's Avatar
    Join Date
    Jul 2017
    Posts
    119
    Articles
    0
    Excel Version
    Excel 2016 ProPlus
    This post from Ken, describing how to index subsets of rows within a table, opened my eyes to performing operations on a column containing tables

    https://www.excelguru.ca/blog/2018/0...g-power-query/

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •