Regex: Can I Delete the content of files that doesn't have some words?
good day, everyone. Just a question. I have this words in many files, but not in all files. For example:
my baby goes away
I want to delete all contents of those files that doesn’t have this unique words.
I try something, but doesn’t work too well.
check dot matches newlines and Search
^(?!.*\s(my baby goes away)\s).*
First of all, it would be better to back up all the files, concerned by the Search/Replacement ;-))
Now, if all these files are located in a specific folder :
Open the Find in Files dialog ( Ctrl +Shift +F )
In the Find what: zone, type
(?s).*\s(my baby goes away)\s?.*|.+
In the Replace with: zone, type
In the Filters zone, enter
In the Directory zone, specify the folder, containing all the concerned files
If necessary, select the Match case option, if the string to search for, must have this exact case
Select, of course, the Regular expression search mode
Click on the Replace in Files button
Please, verify, one more time, that the FOUR zones, Find what:, Replace with:, Filters: and Directory:, are correctly filled !
Click on the Yes button, of the dialog Are you sure?
Et voilà !
=> All the contents of the files, that do NOT contain the string
my baby goes away( not embedded in a larger word ), are deleted
(?s)syntax, at the very beginning of the search regex, ensures you that the regex engine consider the dot regex symbol as matching any single character ( standard or EOL character )
Then, the remainder is an alternative between :
.*\s(my baby goes away)\s?.*: All the contents of the current file scanned, containing, at least, one string
my baby goes away, not glued in a larger expression. So, the last string
my baby goes awayis stored as group 1
.+: All the contents of the current file scanned, which do NOT contain the string
my baby goes away
In replacement, the syntax
(?1$0), is a conditional replacement that means :
If group 1 exists ( your specific string found ), all the contents of the current file are replaced with the entire searched string (
$0), that is to say all the contents matched !
If group 1 does not exist ( NO specific string found ), no replacement action occurs => All the contents of the current file are, simply, deleted
A question mark
?, after the final syntax
\s, is necessary, for the unique case, where the string
my baby goes awayends the current file, without any final line break !
As described above, sometimes, it’s easier to use the general template of a list of alternatives :
(NOT This|NOT That|.....)|(This)|(That)......
All the alternatives to
EXCLUDE, are re-written, with the syntax
\1, in the replacement part
All the alternatives to
INCLUDE, are replaced, thanks to each syntax
(?#....), in the remplacement part (
# > 1) OR deleted if this syntax is absent
Consider, for instance, the original text, below :
Jane said to Tarzan : "Tarzan" is a very strong person, much more than "Jane" is ! "Tarzan and Jane" or "Jane and Tarzan"
And suppose that we would like to convert , in uppercase, the first names Tarzan and Jane, ONLY IF they are NOT surrounded by double quotes !
Then, we could use the simple S/R :
As the replacement action is identical, for each first name, we could also use :
Note that when group 2 is defined, group 1 is NOT defined. Then, in replacement, the form
\1stands for an empty string !
Of course, the two following S/R, more complicated, may be used and produce the same replacements :
After replacement, we get, in all cases, the new text, below :
JANE said to TARZAN : "Tarzan" is a very strong person, much more than "Jane" is ! "TARZAN and JANE" or "JANE and TARZAN"
For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :
In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by
Notepad++, since its
6.0version, at the TWO addresses below :
The FIRST link explains the syntax, of regular expressions, in the SEARCH part
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part
You may, also, look for valuable informations, on the sites, below :
Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))