This is an old revision of the document!
Table of Contents
Twitter.awk
@neutronbot on twitter is a bot using the streaming API. I like to right IRC bots when I learn a new language. I did that, then realized this streaming API. Now I can act like twitter is real-time chat and make a bot. :)
Programming it
The latest challenge was messing with unicode/utf-8. I now know what they mean! :) ASCII is define as first 127 (bit 7 is 0). If bit 7 is 1, it is UTF-8. UTF-8 is variable byte-width character, and uses a bitmap to determine how many following bytes are used. Once U+20AC is encoded it is “\xE2\x82\xAC”…
JSON will express this euro as \u20ac, I have to convert to UTF-8 to display to terminal. Then I have to URL encode it to re-send to Twitter. In awk, sprintf(“%d”, c) doesn't work. People build ord[c] table for c=1 to 255. 'c' will be the entire UTF-8 character, but I had no way to find out it's value without piping a shell. A simple printf piped to “od” did that job.
I think it's good now. I'd still like to re-do the JSON parser. It goes char-by-char while somewhat remembering the state. It does it this way because it began as a script that simply indented json. Then there is the obstacle of awk not truly doing multi-dimensional arrays or pointers. :(
source
The code is viewable here.