Using latest version 7.5 64bit. How to remove duplicte Lines
Greetings, New forum user here.
I am using the latest version 7.5 64 BIt. I have exceedingly large text files and need to remove duplicate entries.
How is this accomplished?
Welcome to the forum! We’re a helpful community when possible, but we need a little more information.
- Define exceedingly large
- Give us a ballpark figure - even 100 KB could be considered large for a text file, or 10 MB, 5 GB, ???
- Some users have reported having difficulty even opening files over 2 GiB
- Define duplicate entries
- Are they always a single line duplicated, multiple duplicate lines, etc.?
- Provide some example duplicate entries
- We have some awesome find-and-replace experts on this forum, but without examples, having the duplicate sections clearly identified, it’s more difficult to come up with a solution.
- If you can’t provide the examples (confidential, for example), provide us with similar text.
More effort put into defining the problem and communicating it, will often reap rewards multiple times the effort!
- Define exceedingly large
I am having the same challenge. I am on version 7.5 64-bit.
I cannot figure out how to simply remove duplicate records.
All of google searching shows to use TextFX. However, TextFX does not work with 7.5 64-bit.
I had 250 rows of data, that were just order numbers. For example,
I wanted to remove the duplicates / get the distinct values. Searched google for quite some time and could not find an answer. Ended up copying into Excel and removing duplicates from there.
Hello, @matt-czugała and All,
I was wondering, on which kind of data, you needed the suppression of duplicate records, Luckily, with the @glennfromiowa advices, you’re just saying that your text is already sorted ! In that case, it’s very easy to remove duplicate records, indeed !!
Open your file in N++
If necessary, add a line-break, after the last line of your file
Open the Replace dialog
Ctrl + H
(?-s)^(.+\R)\1+, in the Find what: zone
\1, in the Replace with: zone
Check the two options
Click on the
Et voilà !
So, from your example text :
0040067 0040134 0040134 0040134 0040134 0040271 0040271
you get, after replacement :
0040067 0040134 0040271
First, the syntax
(?-s)forces the regex engine to interpret any dot
.as a single standard character
Then, after, any beginning of line
^, the part
(.+\R)matches all standard characters of current line, followed by its End of Line character(s)
As enclosed in parentheses, the entire line is stored as group
Then the regex engine tries to match the greatest non-null number of the previous line
In replacement, all that block of lines is just replaced by the single line
Thanks Guy, that appears to be working for me.
I feel like I’m going to have to bookmark this page and come back often to remove duplicates. Hopefully TextFX or something similar can be updated for this latest version. It was so much easier to do without having to type in codes and being able to click my way to what I needed.