Faster "Find in Files"?

Krzysztof Chodak · Apr 14, 2017, 5:16 PM

Any chance on multi-threaded “Find in Files” functionality? Is here any multi-threaded code in n++? Are there any plans for multi-threading? I am doing many such searches in couple thousands files and I am thinking about cutting wait time - it looks like I/O is a main bottleneck now so using couple of threads would speed it up.

gstavi · Apr 17, 2017, 9:54 PM

In general multi threaded is not the ideal solution for “find in files” since it is mostly IO bound. Any thread added into a GUI application is an invitation for trouble. Asynchronous IO with single thread should usually provide results as good or better than multi threaded implementation.

BUT this is not the problem of Notepad++.
During “find in files” Notepad++ loads each file needlessly as if it would open it for viewing. The benefit is that during this load it detects file encoding, so you can “find in files” with multiple encodings. The price is that it is really really slow.

An alternative “find in files” that assumes UTF-8 or is given a specific encoding in the dialog and scans the files with primitive buffer operations without actually load them into Scintilla will be MUCH faster.

Personally I ‘grep’ things from command line, copy paste results and use tags lookup plugin to jump to file:line.

guy038 · Apr 18, 2017, 7:54 PM

Hello, @gstavi,

Thanks for your excellent explanation, on the N++ moderate speed of searching, on multiple files. But, now, I’m simply wondering :

Why don’t we add an other field, in the Find in Files dialog, which indicates the encoding ( ANSI, UTF-8, UCS-2 LE or UCS-2 BE ) of the different files scanned ?. Of course, if this zone would NOT be filled, the classical search, with encoding detection, would occurs ?

However, it would be of the user’s responsibility to verify that no file, of the list to scan, has an other encoding that the one specified, as I suppose that the results, in the Search result panel, would, certainly, not be coherent, in that case !!

Just an idea…

Best Regards,

guy038

P.S. :

If would be sensible to test this option in order to verify that speed increase is really significant !!

gstavi · Apr 19, 2017, 6:44 AM

@guy038
GUI-wise anything goes.
But the other benefit of current implementation that it actually reuse base functionality within Notepad++.
As far as I remember, it is:

Scan directories and build file list according to wildcards // This is another slow (and memory consuming) thing I forgot to mention
For each file in list
    Load file into Scintilla buffer // detect encoding, load entire file at once
    Find in Scintilla buffer and add to find results // Including all regular expression tricks
    Close Scintilla buffer

So for faster find in files we will have to write a new algorithm entirely.

Krzysztof Chodak · Apr 28, 2017, 7:26 AM

I would just distribute “For each file in list” loop you mentioned across all CPU cores available with synchronization on find results; in theory it should cut the wait time by the number of cores available

Krzysztof Chodak · Apr 28, 2017, 10:56 AM

I am downloading VS Community 2017 and I will see what I could do

pnedev · Apr 28, 2017, 11:54 AM

@Krzysztof-Chodak ,

As @gstavi already described distributing will not work the way things are currently implemented in N++.
The reason is because N++ uses hidden Scintilla view instance to perform the search. So each file in the search list will have to pass through this hidden Scintilla view which is serialization actually. Unless you change things entirely and have separate Scintilla view per thread multi-threading will be pointless but even with many Scintilla views those will again have to pass through N++'s main GUI thread window procedure. As @gstavi said to be able to have multi-threaded search you’ll have to bypass Scintilla, load each file in memory and search that buffer but here comes the encoding detection problem and the proper reg-ex handling.

BR,
Pavel

guy038 · Apr 28, 2017, 8:18 PM

Hi, @pnedev and @gstavi,

Just an other newbe question !

Would the Search in Files be quicker if the list of the scanned files contains, exclusively, files with a BOM ( cases UTF-8-BOM, UCS-2 BE BOM or UCS-2 LE BOM ) ?

Indeed, with that BOM, the right encoding is quickly known, without any ambiguity, and should increase the search process !?

Cheers,

guy038

pnedev · May 2, 2017, 2:21 PM

Hi @guy038 ,

If the file encoding is known and the multi-threaded search is implemented then yes, this will speed-up the process. But again, each thread will have to load a file to search into memory buffer.

BR

Krzysztof Chodak · May 9, 2017, 7:06 AM

@pnedev: this is what I started to do after analyzing the code - I have created hidden editview per each thread