beware of non-ascii characters

When I copy source code from an ebook in pdf format and paste into vim, and I try to compile it fails. The reason it fails is because it contains non-ascii character, for my case it uses UTF-8 encorded characters. You can check out the non-ascii or hidden characters by doing this:

cat -v mycode.c

One line of the sample output here:

fprintf (stream, M-bM-^@M-^\This is a test.\nM-bM-^@M-^]);

The M-bM-^@M-^\ is open double quote ( “ ) and M-bM-^@M-^] is close double quote ( ” ), but for c/c++ programming it just have to use ( ” )

For this case, I need to convert it to ( ” ), I uses sed for this

cat -v mycode.c | sed -e 's/M-bM-^@M-^\\/"/g' -e 's/M-bM-^@M-^]/"/g' >mycode2.c

First, I cat -v to display the non-ascii character in M- and ^ format. Then i uses sed to search and replace all the non-ascii character to ( ” ) and return the output to a new file call mycode2.c

5 Responses to “beware of non-ascii characters”

  1. Thanks for the cat -v file.txt tip. i’d been wondering how to view oddball characters in linux for the last few days until i stumbled onto your post.

    I just discovered that

    sed ‘s/[^a-zA-Z0-9]//’ tester.txt > tester2.txt

    works also to remove non-ascii characters.



  2. oops.

    sed ’s/[^a-zA-Z0-9]//’ tester.txt > tester2.txt

    will remove non-ascii characters, but will also remove
    !@#$% etc., too!

    the rule here is never take advice from a dummy.

  3. The code above was very useful in removing all of the bad characters from within my file but what if the non-ascii character is at the end of the file name?

    somehow my Expect script is adding a special character to teh end of the file name.

  4. In addition to sed ’s/[^a-zA-Z0-9]//’ tester.txt > tester2.txt,
    this also works to remove non-ascii: rm
    (i.e. delete the file)

