How to find and highlight a specific occurance of a symbol?
How to find and highlight a specific occurance of a symbol in a large text file.
Need to find the 6th occurance of =========== and highlight it.
What have you tried and how isn’t it working? It might be instructive to see your thought processes in finding a solution…
I tried to find using following code
But its manual and it highlights all =========
I want only specific occurance. It could be the 6th or 60th or even 100th.
I have something that might work. It does require that you have the cursor at the very start of the file however (so file must be open in notepad++). Once the search has found the text, if you hit the search again it will carry on to the next multiple of the number you want, so 60th, then 120th etc.
The regex needs the ‘newline’ ticked as well.
Maybe someone could expand on this, alter it to fit better.
So if you want the 6th occurrence, use 5 in the regex (where you see 5. If 60th then use 59. It might not work it there are any sequences of 2 lines together with “=” in them. If there are not 11 “=”, then change the number in the regex to suit.
Terry R is heading in the direction I was going, but he posted first. I would slightly tweak his regex (which works) to get this:
(?s)at the start to avoid having the “newline” box be a dependency
- Remove some unnecessary group-capturing parentheses
.*at the end to prevent matching every multiple-of-six groupings (it will highlight/mark from the sixth grouping of
=thru the end-of-file…maybe this is undesirable but I think in a large file it would make it easier to find where the highlighting/marking begins!)
Note that the OP said he wanted to highlight the match, so I take this to mean using the Mark tab of the Find window for this.
Terry R said:
It might not work it there are any sequences of 2 lines together with “=” in them
It WILL work in such a case!
Here’s an explanation of the regex:
- Use these options for the whole regular expression
- Match the regular expression below
- Match any single character
- Keep the text matched so far out of the overall regex match
- Match the character string “===========” literally
- Match any single character
Created with RegexBuddy
Thanks Scott for those tweaks.
I have a question about why the
\Adidn’t work in this instance. Had it done so, then it would ALWAYS start at the start of the file, and even if run multiple times it would still select only the
nthoccurrence. As well, your
.*at the end of the regex wouldn’t then be required.
I do continue to forget the
(?i)at the start, often using the tick boxes over this approach. Of course using these allows for a standard approach to the regex (not having to change tick boxes for every expression) and allowing each regex to stand on it’s own.
I hadn’t tested the occurrence of 2 consecutive lines of '='s hence my disclaimer, that was only added into the reply as I was typing it (my testing wasn’t exhaustive).
(The day you stop learning is the day you die!)
\Afeature is broken in the N++ implementation of the Boost Regex library. For an alternate regex engine, see the last part the the updated FAQ topic, below :
However, I found a very simple way to prevent from matching, every 6 times the
Just add a specific expression at the very beginning of the file, that does not exist elsewhere, in current file. Then, Scott, we just add, first, that expression in your regex :-)) For instance :
I added the
###mark on top of the Rumi's text
And I used the modified regex
I added the ### mark on top of the Rumi’s text
Well…sure, but when one answers a regex question here, one sort of assumes that changing the OP’s source text to solve a problem is not allowed. :-)
Of course, doing that when solving a problem that requires “table-building”…well, then…maybe in that case we bend the rules, eh? :-D
I think I may have found another regex (building on what we already have) that will; (no matter where in the file the cursor is); ALWAYS select the nth occurence. I used some other string check that I’d provided some weeks ago to someone who originally stated the
\Adidn’t work for them. My latest rendition is:
(?<!\x0A)^should (hopefully) look for the occurence of a line starting position where there aren’t any line feed/carriage returns immediately before (actually I only test for the line feed portion). So far my tests have shown it ALWAYS selects only the 6th occurrence (in the example) and even with a second click does not change position. With putting the cursor in various positions within the file it still correctly locates the 6th position.
This negates the need to be careful where the cursor is in the file before looking for the nth occurrence and also the need to include the additional
.*at the end to grab the rest of the content, thus preventing a double click going to the nth * 2 occurrence.
\Aseems very problematic I’ve now added
(?<!\x0A)^to my arsenal!
Nice one…hopefully a downside to this is NOT found. I’m not “with” Notepad++ or RegexBuddy right now, but maybe this also works?:
(?<!\R)^Maybe not, though as lookbehinds must be of constant length and
\Rcould be of length 1 (in the case of
\n) or length 2 (for
\r\n)…would be nice if it did though because I think it is nicer on the eyes than
Sorry, not a sheriff, maybe a deputy.
I’m quite enjoying helping out (where I can) although i do need to curb my enthusiasm somewhat. And thanks to you
@guy038for your support.
And yes, Scott i agree the \R or \r\n would probably look better, I just haven’t tested that yet. As my dad always said, measure twice, cut once. So i need to test, refine, test again before presenting!
Very clever deduction, Terry. If we generalize to any kind of EOL characters, it gives
Note that I added the
\fsyntax ( Control character Form Feed,
012decimal ) because, given, for instance, the text
abcdefghij, with the Form Feed char, between the strings abcde et fghij, the regex
(?-s)^.would also match the
fletter avec the FF char. !
So, to be short, the regex
(?<!\n|\r|\f)^seems a very nice word-around to emulate the bugged
\Afeature of the N++ regex engine :-))
I used the verb seems and not the verb is because, unfortunately, there are still some problems with that syntax :-((
Let’s work on that sample text, below, that you will copy on a new tab :
Notepad++ v7.5.8 bug-fixes: This is a simple text 12345><67890 to test the \A feature
Note : For all the tests, below, the options
Wrap aroundare ticked !
- First problem :
Let’s suppose that your cursor is located between the
<characters, on the
5thline. Using the regex
(?<!\n|\r|\f)^(?-s).( which should stand for
\A(?-s).), it does find the letter
Nof Notepad++, on top of the text and any other click on the
Find Nextbutton does not find anything else. Nice !.
Now place the cursor at beginning of the
5thline, right before the
1digit, without any selection and re-run the
(?<!\n|\r|\f)^(?-s).regex. This time, the first click on the Find Next button wrongly match the
1digit. The second click finds the letter
Nas expected, and any subsequent clicks do nothing.
- Second problem :
Let apply the new regex
(?<!\n|\r|\f)^(?-s).*\R.*( which should be a work-around of
\A(?-s).*\R.*) against our sample text. The result is just identical to what I described in the point, just above. That is to say :
If cursor was between the
<characters, it matches all contents of the
1stline, with its EOL chars and all contents of the
2ndline, without its EOL chars
If cursor was at beginning of the
5thline , then :
After a first click, it matches all contents of the
5thline, with its EOL chars + all contents of the
6thline, without its EOL chars
After a second click, it matches all contents of the
1stline, with its EOL chars + all contents of the
2ndline, without its EOL chars
Now, let’s slightly change the regex, adding an
\Rsyntax, at the end of the regex, which becomes :
Now, even if we place the cursor between the
<characters, any click on the Find Next button will match, successively, two consecutive lines ( The
2nd, then, the
4thone,… and so on :-((
Just because the end of this regex matches a
Anyway, Terry, don’t be sad ! Logically, your
(?<!\n|\r|\f)^regex should work as a work-around of the
\Asyntax. It’s simply because our present regex engine does not handle backward assertions, properly, too ! I didn’t test it, yet, but I suppose the your regex should work in some regex testers, on Web :-))
And I agree, with Scott : You, certainly, are a "regex sheriff" !