[ARCHIVED] Need to clean formatting of extra   that are pasted in from hidden space characters in MS Word

Jump to solution
JenniferKolar
Community Explorer

We have a course that the content was all pasted in from MS Word. When we view the html in canvas, we see '&nbbsp;' representing an extra space between most words. This results in canvas treating the line of words w/ nbsp as all one word and thus wrapping incorrectly.

We see now that if we go in word, enable paragraph symbol view and search and replace space with space, that it will clear the problem characters and we can then succesfully paste into canvas.

We also see we can copy from canvas, paste into word and choose clear formatting and then return to canvas and paste and that will work..

However, is there a way to do this w/in canvas itself w/o having to go back and forth to another editor? Clear Formatting in the rich text editor does not remove these problem characters.

I don't find a search and replace option in the rich text editor, search only.

Labels (2)
0 Likes
2 Solutions
Chris_Hofer
Community Coach
Community Coach

Hello there, @JenniferKolar ...

In addition to the info that @James has been providing, have you tried using an HTML cleaner website?  There are some that allow you to paste in content from a Word document, and it will generate pretty clean HTML code for you.  There are also other websites that just help with general HTML clean-up of your code...including removing any extra non-breaking spaces like you are describing.  I wrote a blog post about this a while back called HTML Cleanup.  There are other sites that might be of interest, too:

Again, not sure if these would be of any interest, but I thought I'd throw these out there for your consideration.  Hope this helps a bit.

View solution in original post

@JenniferKolar 

Here is the code as-is except that I did comment out the line that does the custom formatting for my videos and their duration and length.

^!c::
#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.
clipboard = 
Send ^a
Send ^c
ClipWait
attributes := ["id", "class", "target"]
for index, attrib in attributes
{
  StringReplace clipboard, clipboard, %A_SPACE%%attrib%="",, 1
  StringReplace clipboard, clipboard, %A_SPACE%%attrib%="%A_SPACE% ,%A_SPACE%%attrib%=", 1
}  
StringReplace clipboard, clipboard, ’, ', 1
StringReplace clipboard, clipboard,  , %A_SPACE%, 1
StringReplace clipboard, clipboard, <p>%A_SPACE%</p>,, 1
StringReplace clipboard, clipboard, <span>%A_SPACE%</span>, %A_SPACE%, 1
StringReplace clipboard, clipboard, %A_SPACE%--%A_SPACE%, %A_SPACE%–%A_SPACE% , 1
newclipboard := RegExReplace(clipboard, "</?span[^>]*>")
newclipboard := RegExReplace(newclipboard, "s)\s+data-mathml="".*?""","")
; newclipboard := RegExReplace(newclipboard, "s)<h3>([^\(]+)\(([0-9]+/[0-9]+,\s~[0-9.]+\sminutes?)\)\s*</h3>","<h3>$1<span style='font-size:1rem; color:#066;'>($2)</span></h3>")
clipboard=%newclipboard%
ClipWait
Send ^v

 

Looking at the documentation, it seems that StringReplace shouldn't be used for new scripts, but rather than give you untried code, I am just giving what I have. It works, but the coding isn't pretty. It was pretty much a hack job in creating it.

By default, it uses Ctrl+Alt+c (first line of code). Here is a list of codes from the documentation if you would like something else.

  1. Download and install AutoHotKey on a Windows machine.
  2. Take the above code and save it into a file with a .ahk extension. Mine is called nbsp.ahk, but you could use canvas_cleanup.ahk or whatever you want.
  3. In Windows Explorer, right click the file and choose Run Script. It should put a green icon with a white H on the taskbar (might be hidden).

You can quit the script from the Windows taskbar green H icon.

In the future (say after a reboot), just locate the file in Windows explorer and do step 3.

To use the program.

  1. Edit a Canvas page
  2. Switch to HTML view
  3. Press Ctrl+Alt+c
  4. Save the page

I will warn you that I hate span elements so I remove all of them. Occasionally I need them, but too often it is Canvas sticking in something that doesn't need to be there.

If you have span elements that you want to keep, then edit the line with the first RegExReplace in it. You can put a semicolon ; in front of the line to comment it out.

Changes made to the source file are not automatically picked up. From the Green H icon, you can click the right mouse button and edit the script and/or reload the script.

If you have it installed on one computer and want it on other computers without installing AutoHotKey, from Windows Explorer, right click and choose Compile Script. This will make a small (1 MB) executable that can be transferred and ran on other computers.

Now that I look at the code, I forgot to mention the last line before the RegExReplace. It replaces a double dash -- surrounded by spaces with an endash –.

Feel free to comment out anything you like. The most important one is the one that converts all non-breaking spaces &nbsp; to regular spaces %A_SPACE%.

Remember that if it does hose something up, you can use the page history to restore the previous version.

View solution in original post