Twitter.awk

@neutronbot on twitter is a bot using the streaming API. I like to write IRC bots when I learn a new language. I did that, then realized this streaming API. Now I can act like twitter is real-time chat and make a bot. :)

Challenges

Starting from the beginning, and ending at the latest..

JSON

awk doesn't do multi-dimensional arrays, nor pointers. I can't set json[“user”] to point to an array holding all the user elements… You have to flatten the tree. For example, grabbing twitters mention.json will create a json[“1,user,name”]. Not much a problem until you want to do a loop for each mention. You've got to think about how to grab the relevant elements, er something…

OAuth

Twitter requires HMAC-SHA1. This isn't hard with a pipe to openssl and base64. The hardest part to grasp is what parameters to send and when.. Once you have that figured out, you'll get denied again because you are not calculating your base string the same way as the server. Oh, parameters end up being encoded twice there… :)

Unicode

This challenge was revisited a couple times. I now know what Unicode and UTF-8 actually are! :) ASCII is define as first 127 (bit 7 is 0) to keep backward compat with the way things were. If bit 7 is 1, it is UTF-8. UTF-8 is a variable byte-width character, and uses a bitmap to determine how many following bytes are used. For example the Euro currency symbol is U+20AC and once it is encoded it is “%E2%82%AC” (in html urlencode syntax, you get the idea).

JSON will express this Euro as \u20ac in a string. I first have to convert to UTF-8 to display to terminal. Then I have to URL encode it to re-send to Twitter. In awk, sprintf(“%d”, c) doesn't work. People build ord[c] table for c=1 to 255. On my Linux distro, it's all modern and has UTF-8 set as my locale with WCHAR (wide character) support in gawk or something. So, if I try to grab a single character with substr(str, 1, 1) for example, it's value could be >255 !!! The simple work-around is to start awk with LANG=C. This will revert to the old behavior. I decided I'd support either way just in case. I could just use “printf c | od -t x1” to get the hex representation. I'm trying to do as much as possible in awk itself without calling other programs, so I test for the WCHAR behavior and just build a much larger ord[] table.. It takes some time because in order to know the hex value of a character, I have to create it myself… Using binary strings to also apply the UTF-8 encodings. Was fun… ugh :\

HMAC-SHA1

I wanted to drop dependancy on OpenSSL CLI, so I wrote HMAC-SHA1 routine in awk using binary strings. :)

IT WORKS! It's not for hashing a ton of files, just for OAuth header so speed is like… pfft whatever.

It was mostly creating functions that did binary operations using strings. Then a direct implement of the wikipedia pseudo-code, but I had to refer to the actual FIPS specification to realize some things. It almost worked on the first go. Then a bug with padding (oops, pad -17 bytes? how'd I end up with that math hah).

Then a quick base64(), which wasn't hard to debug. I just was missing '=' at the end.

Source

Decided to start using my github account and have been pushing the code to: http://github.com/neutronscott/Twitter.awk

wiki.scottn.us

Table of Contents