Selecting only certain tables from PDF import

GreenBoy

New member
Joined
Apr 13, 2015
Messages
26
Reaction score
0
Points
0
Excel Version(s)
Office 365
Hi - hope everyone is keeping well.

I am trying to import data from PDF's which contain a mix of text and numeric data information - its the numeric data I need.

However there are several issues presenting themselves, which need to be handled as part of the import, and I am looking for any help at this stage.

The import of PD process works nicely, but I am getting 10 - 12 tables being presented as part of the import - out of these only 2 or 3 of the tables are of use, and are not contiguous (e.g. not handily tables, 2,3 and 4 for example) but rather tables 2, 4 and 6 or 7 for example - although the number of tables of useful data changes per report imported.

The only thing the useful data has that the other tables don't, is that the header for column 1 is always "Date" - so the question is, is there a way to inspect the tables header for column 1 on each table at time of import and only import those that are "Date", or is there a way to only append tables with the "Date" header after import? **** CORRECTION *** the Header is only the same text e.g. "Date", after the header is promoted.

As I say this is only the start of the issues with data formatting etc, but by limiting the data at import it will massively assist with the cleansing and correcting of the rest of the data

Many thanks in advance
Greenboy
 
Last edited:
If you can import so that you have a column, Tables, where each row contains one of your tables, you could pull the column 1 "header" out into a new column

Code:
= Table.AddColumn(#"Kept First Rows", "Custom", each [Tables]{0}[Column1])
Filter on that column, delete it and then expand the Tables column.
 
NormS

Thank you very much - I have seen about [Tables]{0}[Column1] before somewhere but have never really known much about it - obviously I need to do some home schooling on this.

Thanks again
GreenBoy
 
Back
Top