How to replace text followed by a comma (,) or dot (.) in tab delimited file (txt)?


How to replace text followed by a comma (,) or dot (.) in tab delimited file (txt)?



I am new to autohotkey. I have a script which helps me to shorten those words that I don't need, and I am having a problem when trying to replace text that followed by a comma or a dot, Here is my script:


#NoEnv
#SingleInstance force
SetWorkingDir, %A_ScriptDir%
SendMode, Input
; -- Ctrl + SPACE -> Select all text + replace whole words only + title case
^SPACE::
NonCapitalized := "a|an|in|is|of|the|this|with" ; List of words that shouldn't be capitalized, separated by pipes
ReplacementsFile := "replacements.txt" ; Path to replacements file (tab delimited file with 2 columns, UTF-8-BOM, CR+LF)

Send, ^a ; Selects all text
Gosub, SelectToClip ; Copies the selected text to the clipboard
FileRead, Replacements, % ReplacementsFile ; Reads the replacements file
If ErrorLevel ; Error message if file is not found
{
MsgBox, % "File not found: " ReplacementsFile
Return
}

StringUpper, Clipboard, Clipboard, T ; Whole clipboard to title case
Clipboard := RegExReplace(Clipboard, "i)(?<![!?.]) b(" NonCapitalized ")b", " $L1") ; Changes to lowercase all words from the list "NonCapitalized", except those preceded by new line/period/exclamation mark/question mark
pos := 0
While pos := RegExMatch(Replacements, "m`a)^([^t]+)t(.*)$", FoundReplace, pos + 1) ; Gets all replacements from the tab delimited file
Clipboard := RegExReplace(Clipboard, "i)b" FoundReplace1 "b", FoundReplace2) ; Replaces all occurrences in the clipboard

; add exceptions
Clipboard := StrReplace(Clipboard, "Vice President,", "")
Clipboard := StrReplace(Clipboard, "Director,", "")
Clipboard := StrReplace(Clipboard, "Senior Vice President,", "")

; = End of exceptions

Clipboard := RegExReplace(Clipboard, "^s+|s+(?=([s,;:.]))|s$") ; Removes extra spaces
Send, ^v ; Pastes the clipboard
Return

SelectToClip:
Clipboard := ""
Send, ^c
ClipWait, 0
If ErrorLevel
Exit
Sleep, 50
Return



and here is a part of my replacements file:


Chief Operating, Financial Officer CFO & COO
Head,
President,



My question is how can I add text that followed by a comma(,) or a dot(.) right in Tab Delimited file instead of making more lines in AHK file? Because as you know it doesn't understand comma and dot as a text.



Many thanks for your time and your help!!





I'm not sure exactly what you're trying to do with your script and the file.
– johnlee
Jun 27 at 2:17





Thanks for asking. What I'm trying to do with this script is to transform long text like: "SENIOR VICE PRESIDENT, MARKETING AND SALES" into "Marketing & Sales" (Change text case and shorten long title). Is it clear enough for you Johnlee?
– Thinh Tran
Jun 27 at 7:17





1 Answer
1



Please indent, or your code will be much harder to read.



In regex, the b assertion requires a sequence of a word character and a non-word character, which kept your code from working on strings starting with commas or dots, non-word characters.


b



...b, and B because they are defined in terms of w and W.
...
A word boundary is a position in the subject string where the current character and the previous character do not both match w or W (i.e. one matches w and the other matches W), or the start or end of the string if the first or last character matches w, respectively.



The following's tested to work:


#NoEnv
#SingleInstance force
SetWorkingDir %A_ScriptDir%
SendMode Input
; -- Ctrl + SPACE -> Select all text + replace whole words only + title case
^SPACE::
FunctionNameOfYourChoice() {
; Using static vars allows you to avoid reading the file over and over on each key press.
Static NonCapitalized := "a|an|in|is|of|the|this|with" ; List of words that shouldn't be capitalized, separated by pipes
, ReplacementsFile := "replacements.txt" ; Path to replacements file (tab delimited file with 2 columns, UTF-8-BOM, CR+LF)
, Replacements := ReadReplacements(ReplacementsFile)

Send ^a ; Selects all text
SelectToClip() ; Copies the selected text to the clipboard
If ErrorLevel { ; Error message if file is not found
MsgBox % "File not found: " ReplacementsFile
Return
}

; 3. StringUpper is deprecated in v2.
; 4. Better to work on a plain variable than on the clipboard in terms of performance and reliability.
cbCnt := Format("{:T}", Clipboard) ; Whole clipboard to title case
; Changes to lowercase all words from the list "NonCapitalized", except those preceded by new line/period/exclamation mark/question mark
cbCnt := RegExReplace(cbCnt, "i)(?<![!?.]) b(" NonCapitalized ")b", " $L1")
; Goes through each pair of search and replacement strings
Loop Parse, Replacements, `n, `r
FoundReplace := StrSplit(A_LoopField, "`t")
; Replaces all occurrences in the clipboard
, cbCnt := RegExReplace(cbCnt, "i)(?<!w)Q" FoundReplace.1 "E(?!w)", FoundReplace.2) ; 5.
cbCnt := RegExReplace(cbCnt, "(?<=w-)([a-z])", "$U1") ; 6.
/*
; Now the following can be included in the replacements.txt file.
cbCnt := StrReplace(cbCnt, "Vice President,")
cbCnt := StrReplace(cbCnt, "Director,")
cbCnt := StrReplace(cbCnt, "Senior Vice President,")
*/
; Removes extra spaces
; This also removes all newlines. Are you sure you want to do this?
Clipboard := RegExReplace(cbCnt, "^s+|s+(?=([s,;:.]))|s$")
Send ^v ; Pastes the clipboard
}

SelectToClip() {
Clipboard := ""
Send ^c
ClipWait 0.5 ; Specifying 0 wouldn't be a very good idea.
If ErrorLevel
Exit
Sleep 50
}

ReadReplacements(path) {
FileRead, Replacements, % path
Return Replacements
}





Yeah, there was a typo in the second regex (the first assertion in it), which has been corrected. The issue with "and" won't be repeated.



I added another RegExReplace as a less than graceful, makeshift measure for addressing the problem with hyphenated words you described, but note that it is inherently a non-trivial problem since the capitalization of those depends on semantics.


RegExReplace





Thank you very much for this detailed script and your advise. As I am an amateur for this stuff so please give me time to learn and to test this carefully then get back to you as soon as I can. Again, thank you for your time!!
– Thinh Tran
Jun 30 at 1:35





Hello Johnlee, your script works perfectly for the function I want, really thank you for that!!!. However there is a problem with the word "and". When I try to execute this line: Input: Head of Brand Marketing and Sales Output: Br& Marketing & Sales Anyway to avoid this? I tried to put cbCnt := StrReplace(cbCnt, "and", "&") in the script but it will output the same result as I put it in replacement file.
– Thinh Tran
Jun 30 at 12:50






There is one more thing I've been finding over the internet but haven't found a solution. How can I keep the letter case for hyphenated words like: Input: Head of Brand-Marketing -> Output: Brand-Marketing instead of Brand-marketing? Thanks!!
– Thinh Tran
Jun 30 at 12:55





Me and six members of my team send a thousand thanks to you!!!!! It works like a charm now. Would you mind if I ask one more question please? Is it possible for RegEx to detect a missing word between two words (first words are a few specific words (like Director, Manager, Head), second word can be any word) and put the missing word between them? For examples: Input: Director Sales -> Output: Director of Sales /or/ Input: Director, Investment -> Output: Director of Investment // Input: Head, Sales -> Output: Head of Sales... Thank you!!
– Thinh Tran
2 days ago






Glad it helped. As for your additional question, well, I'd say only if you define "any word" in a much less ambiguous way, a way that could prevent any and all possible erroneous results.
– johnlee
yesterday







By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

List of Kim Possible characters

Python Tkinter Error, “Too Early to Create Image”