The challenge

The shortest code by character count, that will input a string using only alphabetical characters (upper and lower case), numbers, commas, periods and question mark, and returns a representation of the string in Morse code. The Morse code output should consist of a dash (-, ASCII 0x2D) for a long beep (AKA 'dah') and a dot (., ASCII 0x2E) for short beep (AKA 'dit').

Each letter should be separated by a space (' ', ASCII 0x20), and each word should be separated by a forward slash (/, ASCII 0x2F).

Morse code table:

alt text liranuna/junk/morse.gif

Test cases:

Input: Hello world Output: .... . .-.. .-.. --- / .-- --- .-. .-.. -..

Input: Hello, Stackoverflow. Output: .... . .-.. .-.. --- --..-- / ... - .- -.-. -.- --- ...- . .-. ..-. .-.. --- .-- .-.-.-

Code count includes input/output (that is, the full program).


C (131 characters)

Yes, 131!

main(c){for(;c=c?c:(c=toupper(getch())-32)? "•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&@/4)'18=,*%.:0;?5" [c-12]-34:-3;c/=2)putch(c/2?46-c%2:0);}

I eeked out a few more characters by combining the logic from the while and for loops into a single for loop, and by moving the declaration of the c variable into the main definition as an input parameter. This latter technique I borrowed from strager's answer to another challenge.

For those trying to verify the program with GCC or with ASCII-only editors, you may need the following, slightly longer version:

main(c){for(;c=c?c:(c=toupper(getchar())-32)?c<0?1: "\x95#\x8CKa`^ZRBCEIQiw#S#nx(37+$6-2&@/4)'18=,*%.:0;?5" [c-12]-34:-3;c/=2)putchar(c/2?46-c%2:32);}

This version is 17 characters longer (weighing in at a comparatively huge 148), due to the following changes:

  • +4: getchar() and putchar() instead of the non-portable getch() and putch()
  • +6: escape codes for two of the characters instead of non-ASCII characters
  • +1: 32 instead of 0 for space character
  • +6: added "c<0?1:" to suppress garbage from characters less than ASCII 32 (namely, from '\n'). You'll still get garbage from any of !"#$%&'()*+[\]^_`{|}~, or anything above ASCII 126.

This should make the code completely portable. Compile with:

gcc -std=c89 -funsigned-char morse.c

The -std=c89 is optional. The -funsigned-char is necessary, though, or you will get garbage for comma and full stop.

135 characters

c;main(){while(c=toupper(getch()))for(c=c-32? "•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&@/4)'18=,*%.:0;?5" [c-44]-34:-3;c;c/=2)putch(c/2?46-c%2:0);}

In my opinion, this latest version is much more visually appealing, too. And no, it's not portable, and it's no longer protected against out-of-bounds input. It also has a pretty bad UI, taking character-by-character input and converting it to Morse Code and having no exit condition (you have to hit Ctrl+Break). But portable, robust code with a nice UI wasn't a requirement.

A brief-as-possible explanation of the code follows:

main(c){ while(c = toupper(getch())) /* well, *sort of* an exit condition */ for(c = c - 32 ? // effectively: "if not space character" "•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&@/4)'18=,*%.:0;?5"[c - 44] - 34 /* This array contains a binary representation of the Morse Code * for all characters between comma (ASCII 44) and capital Z. * The values are offset by 34 to make them all representable * without escape codes (as long as chars > 127 are allowed). * See explanation after code for encoding format. */ : -3; /* if input char is space, c = -3 * this is chosen because -3 % 2 = -1 (and 46 - -1 = 47) * and -3 / 2 / 2 = 0 (with integer truncation) */ c; /* continue loop while c != 0 */ c /= 2) /* shift down to the next bit */ putch(c / 2 ? /* this will be 0 if we're down to our guard bit */ 46 - c % 2 /* We'll end up with 45 (-), 46 (.), or 47 (/). * It's very convenient that the three characters * we need for this exercise are all consecutive. */ : 0 /* we're at the guard bit, output blank space */ ); }

Each character in the long string in the code contains the encoded Morse Code for one text character. Each bit of the encoded character represents either a dash or a dot. A one represents a dash, and a zero represents a dot. The least significant bit represents the first dash or dot in the Morse Code. A final "guard" bit determines the length of the code. That is, the highest one bit in each encoded character represents end-of-code and is not printed. Without this guard bit, characters with trailing dots couldn't be printed correctly.

For instance, the letter 'L' is ".-.." in Morse Code. To represent this in binary, we need a 0, a 1, and two more 0s, starting with the least significant bit: 0010. Tack one more 1 on for a guard bit, and we have our encoded Morse Code: 10010, or decimal 18. Add the +34 offset to get 52, which is the ASCII value of the character '4'. So the encoded character array has a '4' as the 33rd character (index 32).

This technique is similar to that used to encode characters in ACoolie's, strager's(2), Miles's, pingw33n's, Alec's, and Andrea's solutions, but is slightly simpler, requiring only one operation per bit (shifting/dividing), rather than two (shifting/dividing and decrementing).

EDIT: Reading through the rest of the implementations, I see that Alec and Anon came up with this encoding scheme—using the guard bit—before I did. Anon's solution is particularly interesting, using Python's bin function and stripping off the "0b" prefix and the guard bit with [3:], rather than looping, anding, and shifting, as Alec and I did.

As a bonus, this version also handles hyphen (-....-), slash (-..-.), colon (---...), semicolon (-.-.-.), equals (-...-), and at sign (.--.-.). As long as 8-bit characters are allowed, these characters require no extra code bytes to support. No more characters can be supported with this version without adding length to the code (unless there's Morse Codes for greater/less than signs).

Because I find the old implementations still interesting, and the text has some caveats applicable to this version, I've left the previous content of this post below.

Okay, presumably, the user interface can suck, right? So, borrowing from strager, I've replaced gets(), which provides buffered, echoed line input, with getch(), which provides unbuffered, unechoed character input. This means that every character you type gets translated immediately into Morse Code on the screen. Maybe that's cool. It no longer works with either stdin or a command-line argument, but it's pretty damn small.

I've kept the old code below, though, for reference. Here's the new.

New code, with bounds checking, 171 characters:

W(i){i?W(--i/2),putch(46-i%2):0;}c;main(){while(c=toupper(getch())-13) c=c-19?c>77|c<31?0:W("œ*~*hXPLJIYaeg*****u*.AC5+;79-@6=0/8?F31,2:4BDE" [c-31]-42):putch(47),putch(0);}

Enter breaks the loop and exits the program.

New code, without bounds checking, 159 characters:

W(i){i?W(--i/2),putch(46-i%2):0;}c;main(){while(c=toupper(getch())-13) c=c-19?W("œ*~*hXPLJIYaeg*****u*.AC5+;79-@6=0/8?F31,2:4BDE"[c-31]-42): putch(47),putch(0);}

Below follows the old 196/177 code, with some explanation:

W(i){i?W(--i/2),putch(46-i%2):0;}main(){char*p,c,s[99];gets(s); for(p=s;*p;)c=*p++,c=toupper(c),c=c-32?c>90|c<44?0:W( "œ*~*hXPLJIYaeg*****u*.AC5+;79-@6=0/8?F31,2:4BDE"[c-44]-42): putch(47),putch(0);}

This is based on Andrea's Python answer, using the same technique for generating the morse code as in that answer. But instead of storing the encodable characters one after another and finding their indexes, I stored the indexes one after another and look them up by character (similarly to my earlier answer). This prevents the long gaps near the end that caused problems for earlier implementors.

As before, I've used a character that's greater than 127. Converting it to ASCII-only adds 3 characters. The first character of the long string must be replaced with \x9C. The offset is necessary this time, otherwise a large number of characters are under 32, and must be represented with escape codes.

Also as before, processing a command-line argument instead of stdin adds 2 characters, and using a real space character between codes adds 1 character.

On the other hand, some of the other routines here don't deal with input outside the accepted range of [ ,.0-9\?A-Za-z]. If such handling were removed from this routine, then 19 characters could be removed, bringing the total down as low as 177 characters. But if this is done, and invalid input is fed to this program, it may crash and burn.

The code in this case could be:

W(i){i?W(--i/2),putch(46-i%2):0;}main(){char*p,s[99];gets(s); for(p=s;*p;p++)*p=*p-32?W( "œ*~*hXPLJIYaeg*****u*.AC5+;79-@6=0/8?F31,2:4BDE" [toupper(*p)-44]-42):putch(47),putch(0);}



