Regex: how to remove spaces that are not followed by letters or numbers
hello. I looking for a regex that remove spaces that are not followed by letters or numbers. For example:
So, basically, I have more files with many spaces (usually one space) in words. I want to join these spaces, but only those space that are not followed by letters or numbers, so not to join all the other words between them.
I’m confused, your example shows what should be done but is exactly what you don’t want to do as described.
e follows the space, e is a letter. What do I miss?
hello Claudia, yes, I am sorry, I tried to delete or modify the title, but was too late.
Anyway, the example was good.
See this little sentence to see my problem. I want to eliminate the space from the interior of words, without joining the other words.
focu s on the opposition be tween the apparent simplicity of prod ucing a distinct creation and its ulterior comple xity
…and j ust ho w in the heck is the reg ex eng ine suppose d to know w hich space is a space betw een words and which spac e is a space ins ide a wor d?
hello Scott. I really don’t know…
Scott is perfectly right about it !! How the regex engine could guees, for instance, than the two words be and tween are finally the single word between ? And anyway, the first word be is quite a valid English word, isn’t it ?
So I would advice you to use, rather, a Spell-checker plugin, which will, automatically, highlight all the non-correct words of your text
where did the original text come from?
how were the spaces introduced into the text?
maybe there is a pattern that can be used to identify the “bad” spaces
hello, Js. This is what I am trying to figure out.
Right now I am trying some combination like this one:
Is not quite very good, but I have time to check more things. I got to have luck :D
But I need to make some kind of a connection with a dictionary, with a spell check.
you have not answered the question.
where did the text come from? … was it downloaded from website?
how were the spaces introduced?
were there other characters in the text, like html tags that someone incorrectly deleted by replacing tags with spaces?
oh, sure. The text was made after using a pdf to txt converter.
try opening the original pdf file using Notepad++
you may get lucky, and the contents may be cleartext (not compressed)
otherwise, try a different pdf to txt converter or try ocr software