All Posts Tagged “GAE”

opportunity for a twitter oauth web client

I guess most of you’d know that Twitter API would be shutting down basic authentication on August 31st. What this means is that you will no longer be able to login into 3rd party twitter application by giving out your username and password. From now on, to use third party applications you will have to authorize the application using oAuth.

This is bad news for people in countries like China where twitter.com is blocked. They cannot login into 3rd party twitter applications with username/password; they cannot login via twitter.com.

Also, people who access twitter from office(where twitter.com is blocked) will not like closing down of basic auth. But this brings up an opportunity for someone to come up with a twitter web client. The idea is that a user opens an account with an app and authorizes it(to use twitter) when he is at home; later he can log into this app from office and use twitter. Although there are clients like hootsuite which do it, they are targeted at enterprise-y, business-ey people. A simple application like iTweet or dabr would be a hit among the code monkeys.

If built on google’s appengine, the app can use google auth for the user registration part. There are app engine compatible twitter libraries available on github. Build a decent & basic interface and you have your next big twitter app.  So, who’s building it? ;)

UPDATE: dabr has done it.

building a word unscrambler

Pre – Prologue

Months ago, there was a twitter account conducting a contest where a scrambled word was tweeted; challenge was to be the first to reply with the correct answer. A new word was posted every hour and to score more points, one had to be watching twitter all the time. It seemed a perfect task for automation(sigh!). It didn’t take much time in hacking code to poll, read and reply to the tweet but I needed a word unscrambler. Being a ‘Hacker’, I didn’t want to write code for that. There were websites that did it but lacked APIs. I plugged in SimpleHTMLdom which provides jQuery like syntax for ‘scraping’ data from websites and my hack was ready.

Prologue

Few days back during lunch time chat, I asked a friend if he could build an app that could return the proper sentence when given a scrambled sentence as input. This reminded me of the ‘twitter reply’ hack I had written months ago and thought I could write code for ‘unscrambing’ words. Just then, I was re-discovering Google App Engine with python so chose it for solving(showing off?!) this.

Action

Straight forward solution to this problem was finding all permutations of the characters in the scrambled word and then doing a dictionary match. There are lots of links on the web that provide word lists. I used one such word list for the dictionary. itertools.permutations and a dictionary-search loop later, I had the required results. This worked fine for short words but as the length increased, it took more and more time.

This solution wasn’t usable; waiting for seconds(and minutes) just to find all the possible permutations is simply stupid(more so if it’s a web application).

Next challenge was to bring down the time taken; somehow convert this exponential(?) problem to linear. Solution to this was simple. For a given word and all other words formed by scrambling the characters in it, the count(number of repetitions) of each character in the word remains the same.

For example: In both god and dog, the characters g,o,d are repeated once.

god => a=0; b=0; c=0; d=1; …. g=1;…. o:1;….

dog => a=0; b=0; c=0; d=1; …. g=1;…. o:1;….

All I had to do was create a data structure that held the character counts (of characters) for each word in the word list. This data structure went into a database and finding unscrambled words became as simple as firing a single query with a where clause to match the character counts. Thus time consuming permutation-generation part was ousted but at the price of storing this new data structure. Time complexity reduced at the cost of Space complexity.

word count integer representation

Fròm then on, it was straight forward coding, uploading and (long, tiresome, boring) waiting for the cron jobs to run for dataset generation+ insertion in GAE datastore.  The app is called wordunscrambler.

Epilogue

As I was storing the character counts as integers in the datastore, it occupied lot of space. Though this wasn’t really a serious issue, I wanted to reduce this space complexity. Solution to this was based on an (very apt)assumption that the character count in a word would not be greater than 9 and it could be represented by a single decimal number(Could have also chosen hexadecimal but I love decimals). The character counts were now represented as a string; the first character of this string represented character count of character ‘a’, second character corresponded to ‘b’ and so on.

word count string representation(second character represents ‘a’ here)

26 integers(which are stored as long in GAE datastore) representing a word’s character-count were now replaced by a 26 character string(unicode). Thus, Space complexity reduced. The site as of now runs this code.

By the way, integer representation of character count has an advantage over string representation in that if we need to find all possible words(taking ‘any’ number of characters at a time) that can be formed fròm given characters, the query could use a ‘lesser than’ in the where clause(instead of character code matching).

PS1: As this was more of a let-me-find-a-solution exercise, google’s help was not taken.

PS2: You could also call this ‘Anagram finder’.