Need to merge all lines before a blank line
I have some data that I would like to edit. My data is as looks like this;
A. M. R. E. F.
THE AVIATION DIRECTOR
P.O. BOX 30125.WILSON AIRPORT.
ABERCROMBIE & KENT LTD…
NAIROBI ~ 00200
ACHARYA TRAVEL AGENCIES LTD.
The data is separated by a blank line:
I would like to have the data as follows;
A. M. R. E. F. THE AVIATION DIRECTOR P.O. BOX 30125.WILSON AIRPORT. NAIROBI. KENYA
ABERCROMBIE & KENT LTD… P.O.BOX 59749. NAIROBI ~ 00200 KENYA. USD
ACHARYA TRAVEL AGENCIES LTD. P.O.BOX 42590. NAIROBI. KENYA. KES
Would someone point in the right direction as per which regex I could use to achieve this. I have a lot of data to process so the automation would help.
I suppose the simplest thing is to do this:
Find what zone:
Replace with zone: make sure this is EMPTY
Search mode: Regular expression
This searches for a line-ending which is not followed directly by another line-ending, and effectively removes the (first) line-ending.
Have “Replace with” field include a single space. That way successive line’s text won’t butt up against the previous line. It could even have a
,instead. That way, down the track it could be reconstituted as it originally looked if needed.
As Leonardo Da Vinci said:
Simplicity is the ultimate sophistication
Have “Replace with” field include a single space
I don’t see where you are going with this…can you explain further? It seems like it would just put the space or the comma on the front of the second and greater lines…hmmmm…
in the example text provided, the all the lines ended directly behind the text, Removing the CR/LF meant the 1st character of the next line was against the last character of the previous line, rather then the OP’s request showing a space between.
@Scott-Sumner With the sample data I had shared your suggestion actually works like a charm. When I try it on the thousands of lines of data where the number of lines before the line break differ, the regex brings all the data in one line. Would you mind having a look at the data or maybe let me know where I am making a mistake?
So, I downloaded your WILSON_1.csv file, without no trouble. Fine !
Then, before thinking anything about regex, I just deduced some facts, about this file :
Text is always written, in an uppercase way and, mainly, located in column
3exceptions, only :
- At row
21, we have “ATT”, in column
Aand “WILLIAM OREMBO”, in column
- At row
1139, we have “ATTN”, in column
Aand “AART MULDER”, in column
- At row
1160, we have “ATTN”, in column
Aand “SALLY”, in column
- At row
=> So, I moved, for these
3rows, the text, in column
B, after the present text of column
A. Then, I selected all that
.CSVtext, located in column
Aand pasted it in a new N++ tab
Now, clicking on the Show all characters icon, I noticed that :
Non-blank lines begin, generally, with a letter but few of them begin with a space character
Non-blank lines end, generally, with a space character, but some don’t
The paragraph line-breaks, generally, begin with a space character
=> I thought it would be better to trim all these characters, first, with the menu command Edit > Blank operations > Trim Leading and Trailing Spaces. ( Of course, this can be done, either, with the SEARCH regex
^\h+|\h+$and the REPLACEMENT zone left
Well, now, our text is quite clean:-)) The next step consists to replace any line-break, which is, both, preceded and followed with a standard character with a single space character, in order to easily visualize the former lines !
Thus, the regex S/R, below :
Select, preferably, the
Click, once, on the
1506replacements occur and your file contains, now,
1715lines only !
- Finally, select all this new text, start Excel, do a
Ctrl + Voperation, in cell
1Aand re-save it, in an Excel format
Et voilà !
This works very well and I have my data the way I wanted it. Thanks a lot guys for your time, I have learnt so much.
Hello, @jesse-mwangi and All,
I just forgot to explain my previous regex S/R ! So :
(?-s)is an on-line modifier which forces the regex engine to consider that the dot meta-character (
.) will match any single standard character and not the EOL chars
Then it searches for the
\Rpart which, globally, represents any kind of EOL characters (
\r\nin Windows files,
\nin Unix files or
\rin MAC files ), but ONLY IF :
The line-break is preceded with a single standard character, due to the positive look-behind
(?<=.)( so preceded with a non-blank line, of the present paragraph )
The line-break is followed with a single standard character, due to the positive look-ahead
(?=.)( so followed with a non-blank line, of the present paragraph )
In replacement, this/these EOL character(s) are, simply, replaced with a single space char,
\x20. Note that you may, as well, simply hit the space key !
Refer to the post, below, for further information on Regular expressions ;-))
@Terry-R is right. I botched the original try at it because I didn’t notice that the replace action didn’t put a space between the lines that were joined. I totally missed that–TWICE! Sorry about that. (Thanks to Terry for pointing that out!)
the regex brings all the data in one line
So the reason that it all ends up on one line is that you have a single blank space on the lines that I interpreted as being totally empty from your sample data. I figured this out by downloading your file (otherwise that would have been quite difficult to discover!).