User Tools

Site Tools


c:twitter.awk

This is an old revision of the document!


Table of Contents

Twitter.awk

@neutronbot on twitter is a bot using the streaming API. I like to right IRC bots when I learn a new language. I did that, then realized this streaming API. Now I can act like twitter is real-time chat and make a bot. :)

Programming it

The latest challenge was messing with unicode/utf-8. I now know what they mean! :) ASCII is define as first 127 (bit 7 is 0). If bit 7 is 1, it is UTF-8. UTF-8 is variable byte-width character, and uses a bitmap to determine how many following bytes are used. Once U+20AC is encoded it is “\xE2\x82\xAC”…

JSON will express this euro as \u20ac, I have to convert to UTF-8 to display to terminal. Then I have to URL encode it to re-send to Twitter. In awk, sprintf(“%d”, c) doesn't work. People build ord[c] table for c=1 to 255. 'c' will be the entire UTF-8 character, but I had no way to find out it's value without piping a shell. A simple printf piped to “od” did that job.

I think it's good now. I'd still like to re-do the JSON parser. It goes char-by-char while somewhat remembering the state. It does it this way because it began as a script that simply indented json. Then there is the obstacle of awk not truly doing multi-dimensional arrays or pointers. :(

Source

The code is viewable here.

c/twitter.awk.1311542607.txt.gz · Last modified: 2023/11/04 22:29 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: Public Domain
Public Domain Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki