Import Converter deletes line feeds automatically

pinarello

Member
Joined
Jun 21, 2019
Messages
214
Reaction score
4
Points
18
Location
Germany
Excel Version(s)
Office 365
After loading a website with Power Query I would have to notice that the import converter deleted line feeds (lf - Chr 10) during loading. This leads to the fact that I have no more possibility to split the text in the corresponding columns correctly.

However, when I copy individual web pages to the clipboard and paste them into Excel's formula row, I then see (line 33) the line break and can replace it (line 35).

I have documented an example in the enclosed Excel folder.

Is there a way to solve this problem ???
 

Attachments

  • xlguru - Import Converter deletes line feeds automatically (PQ).xlsx
    130.3 KB · Views: 16
I'll be honest, the only way I know to get around this is very ugly... it's to import the page as a text file instead of html, and do all of the parsing manually. Nasty, nasty business. If you want to do that, you'd just need to go to your source step, click the gear icon, and then change the "Open file as" to Text File.

After that though... then the fun begins. You'd probably want to:
  • Add a custom column to say if [Column1] = "<!-- show threads -->" then "remove"
  • Fill the new column up
  • Remove all rows that have "remove" in that column
  • Then you're going to be doing a lot of splitting data and logic tests to be working out what data you need

It won't be fun, but it should preserve all of the characters you need...
 
Hello, Ken,

first of all, thank you for looking at my request and mentioning the possibility of text import.

I had not even noticed this possibility yet and therefore tested it immediately.

Since I have no idea at all about html, it took me a while to recognize patterns and repetitions.

But after my eyes got used to the "chaos" a bit, it was actually quite easy to get the data of the website into the desired form.

By the way, I also noticed that the line feed, which I thought Power Query would swallow during import, is not present at all in the html code, because in the html code I only see a <div> at these places.

So I guess that it is an automatic Windows function that converts the <div> into a line feed when using copy & paste.

In case you or others are interested, here is the Workbook with the additional Power Query query.
 

Attachments

  • xlguru - Import Converter deletes line feeds automatically (PQ).xlsx
    138.6 KB · Views: 7
Back
Top