I saw the comedy theater I intern at could really benefit from a computerized way of tracking when interns show up. So, I decided to practice my agile programming and whip something up as quickly as possible. I considered using a web framework, but decided that Django would be overkill for such a simple project.
So, where to start? I need to keep track of users, who have passwords. I decided a dictionary was the way to go.
#!/usr/bin/python '''Intern time keeping script for The New Movement theater.''' users = {'elliot':'pA$$word'} def login(username,password): global users try: login = users[username] == password except KeyError: print 'no such user: {}'.format(username) return None else: if login: print '\n{} successfully logged in.'.format(username) else: print 'Wrong password' if __name__ == '__main__': from sys import argv username, password = argv[1:] login(username, password)
This is super simple, demonstrates the basic functionality I need, and is set up to incrementally expand functionality and test along the way. So where to go from here? I need a few things still. I need the passwords to not be stored in plain text. I need the users dictionary to be update-able and to persist after the program shuts down. I need the program to do something more useful when the user logs in successfully. I need an interface. And I need these things in pretty much this order. Why work out the interface before I have all the basics of what the interface will call? Why worry about logging times before I really have a way of creating, storing, and securely authenticating users?
Passwords
The secure way to store a passphrase is as a hash. A hash cannot be reversed*, so one needs to try a passphrase, hash it, and compare it to the stored hash to know if they have used the correct passphrase. I decided to use a simple MD5 hash, since it comes with python. So I wrote a hashing function
def saltyhash(password): '''salt the password and hash it a bunch. Security through obscurity. No password cracking programs are pre-configured to do this exact hash.''' result = password+'_._'+password for x in range(25): result = md5(result).hexdigest() return result
and now I always call saltyhash on a password before doing anything with it (storing or checking it)
users = {'elliot':saltyhash('pA$$word')} def login(username,password): global users try: login = users[username] == saltyhash(password) except KeyError: ...
Storing User Passwords
Now that I have passwords worth storing, time to figure out how to store them. My default method is by pickling the data, in this case a dictionary. I don’t know of a reason for using the pickle module over json, so I just went with json since it comes up in more situations. Their usage is practically identical.
Since there is nothing interesting about pickling data, writing it to a file, and retrieving it, I won’t bother discussing it further here. See more complete code later in this post for details.
So, instead of a user database, I have a pickled dictionary stored in a file. This makes perfect sense for how simple this project is and how little data I have to keep track of.
Logging times
Now, to do something useful with the successful login. This turn out to also be trivially simple. I want to log the user, current date/time, and some user input (any important notes). The agile thing to do is log this data into a tab delimited csv. This logs the data accurately in a human readable format and allows for the admins to manually edit and add information through tools they are familiar with. Maybe I’ll want to use a google doc in the future, but I’d rather get feedback from whom I am making this program for before getting into that. I decided the easiest format for the (potentially) technically naive admins was a separate csv for each user. Maybe use a naming scheme like “username_timesheet.csv”? Sounds good.
The time comes from datetime.datetime.now(), which should be formated with strftime. If these numbers ever need to be parsed out again, the same format that strftime used can be given to strptime to get a datetime object again. So, I’d define this format right up at the top.
>>>import datetime >>>timeformat = '%H:%M on %Y/%m/%d' >>>time = datetime.datetime.now() >>>humanreadable = time.strftime(timeformat) >>>print humanreadable '17:15 on 2013/11/23' >>>time2 = datetime.datetime.strptime(humanreadable,timeformat) >>>type(time2) datetime.datetime >>>time==time2 #not True, because time1 has seconds and milliseconds. False
Once these things are decided, the code is again trivial.
With the pickling and csv writing, the code now looks like:
#!/usr/bin/python from datetime import datetime import csv from md5 import md5 import json time_format = '%H:%M on %Y/%m/%d' pickle_file='pckl.dat' timesheet = '_timesheet.csv' def create_pickle(): with open(pickle_file,'wb') as f: print '{} did not exist, created it'.format(pickle_file) users = {} f.writelines(json.dumps(users)) def get_users(): '''return a dictionary keyed by username''' users = {} try: with open(pickle_file,'rb') as f: users = json.loads(f.readline()) #If the file hasn't been created or is some how empty, #create it and return an empty dict except IOError: create_pickle() except ValueError: create_pickle() return users def create_user(user,password): users = get_users() try: if users[user]: print '{} already exists'.format(user) except KeyError: users[user] = saltyhash(password) with open(pickle_file,'wb') as f: f.writelines(json.dumps(users)) def saltyhash(password): '''salt the password and hash it a bunch. Security through obscurity''' result = password+'_._'+password for x in range(25): result = md5(result).hexdigest() return result def list_users(): users = get_users() print users.keys() def login(username,password): '''takes username and password, and logs time of successful login and notes to file''' users = get_users() try: login = users[username] == saltyhash(password) except KeyError: print 'no such user: {}'.format(username) return None else: if login: print '\nEnter any notes you want to log with this sign in.' notes = raw_input('notes: ') time = datetime.now().strftime(time_format) with open(username+timesheet,'ab') as f: writer = csv.writer(f) writer.writerow([time,notes]) print '\n{} successfully logged in at {}.\n\nThank you!\n'.format(username,time) else: print 'Wrong password'
User interface
Maybe I do want a web interface. Maybe I want something pretty on the desktop made with Tkinter or something. But maybe not. For now, I could leave it as command line program called from the bash shell, but I do need to sell this a little bit, and ‘./tnmtimesheet.py username’ is a bad interface for anyone but developers. I decided to take a middle road and learn something in the process. I used the cmd module, as it provides tab completion, help text and I can define default behavior.
With the bash command line as the interface, I know the theater folks will want a different interface without even asking. With the cmd interface, they may just want it tidied up a bit or they may want something graphical. I also used getpass, which is like raw_input but doesn’t echo what you type.
I did decide to get a little fancy with this since I am just seeing cmd for the first time. It is easy to make custom tab completion suggestions for arguments to a command. If I wrote a do_login(username) function, then I could also write a complete_login() function that gets called once after login is typed. So typing “login u[tab]” gives me “login username”. But I want users to just type their username without typing “login”, because that will be easier for people. So, instead of writing a do_login(username) I’ll use default(username). But completedefault(), despite my expectation, doesn’t get called for commands. It is only called for completing arguments for commands that do not have a complete_ function defined. Makes sense.
So I dug into source code for cmd.py and it looked like I needed to override cmd.completenames(). This gets called by complete after deciding what complete function to call.
This may be a bad idea, because I may have more commands in the future, and admins may want to be able to create_user, delete_user, get_user_stats, change_user_password, and they would want tab completion for those. But, maybe I’ll deal with this by making a second “admin” shell, and keep it separate from the “login” shell all together. Or maybe they’ll want a GUI. So, no use spending more time refining this until I get feedback.
For now, the entirety of my UI is this:
import cmd class LoginShell(cmd.Cmd): intro = ''' The New Movement intern sign in app. To sign in as username do: (login) username Tab to autocomplete your username. ''' prompt = '(login) ' users = get_users() def emptyline(self): '''Refresh user list (why not) and give help again''' print '\n\ntype one of the following usernames to log in:\n' self.users = get_users().keys().sort() for user in self.users: print ' {}'.format(user) print '\nTab autocompletes\n' print 'Or type "help" for admin commands\n' def default(self, username): '''Just type your username to login''' login(username) def completenames(self, text, *ignored): '''override command name completion to include only usernames and not commands Delete this function to revert to command completion behavior. Copy this code into complete_login(self, text, line, start_index, end_index) and replace default(self, username) with do_login(self, username) to revert to the more common usage behavior''' if text: return [ user for user in self.users if user.startswith(text) ] else: return users def do_adduser(self,username): '''Usage: (login) adduser username''' if username == '': print '''Usage: adduser newusername''' else: create_user(username) def do_exit(self,line): '''Exit the program''' return -1 if __name__ == '__main__': LoginShell().cmdloop()
One more step I need to make is to set up file permissions so that someone can’t just get the json file with the password hashes in it, or cheat and modify their timesheets manually. Again, I’ll make sure this program is close to what the theater admins want before testing that.
After some tweaking some details, the full code of this prototype is as follows.
#!/usr/bin/python import cmd import getpass from datetime import datetime import csv from md5 import md5 import json time_format = '%H:%M on %Y/%m/%d' pickle_file='pckl.dat' timesheet = '_timesheet.csv' def create_pickle(): users = {} print '{} did not exist, created it'.format(pickle_file) with open(pickle_file,'wb') as f: f.writelines(json.dumps(users)) def get_users(): '''return a dictionary keyed by username''' users = {} try: with open(pickle_file,'rb') as f: users = json.loads(f.readline()) #If the file hasn't been created or is some how empty, #create it and return an empty dict except IOError: create_pickle() except ValueError: create_pickle() return users def create_user(username): users = get_users() try: if users[username]: print '{} already exists'.format(username) except KeyError: password1 = getpass.getpass("Enter your password: ") password2 = getpass.getpass("Again (for verification): ") if password1==password2: users[username] = saltyhash(password1) with open(pickle_file,'wb') as f: f.writelines(json.dumps(users)) else: print "passwords did not match!" def saltyhash(password): '''salt the password and hash it a bunch. Security through obscurity''' result = password+'_._'+password for x in range(25): result = md5(result).hexdigest() return result def list_users(): users = get_users() print users.keys() def login(username): '''takes username, authenticates user, and logs time of successful login and notes to file''' users = get_users() try: password_hash = users[username] except KeyError: print 'no such user: {}'.format(username) return None else: password = getpass.getpass("Enter your password:") login = password_hash == saltyhash(password) if login: print '\nEnter any notes you want to log with this sign in.' notes = raw_input('notes: ') time = datetime.now().strftime(time_format) with open(username+timesheet,'ab') as f: writer = csv.writer(f) writer.writerow([time,notes]) print '\n{} successfully logged in at {}.\n\nThank you!\n'.format(username,time) else: print 'Wrong password' class LoginShell(cmd.Cmd): intro = ''' The New Movement intern sign in app. To sign in as username do: (login) username Tab to autocomplete your username. ''' prompt = '(login) ' users = get_users().keys() users.sort() def emptyline(self): '''Refresh user list (why not) and give help again''' print '\n\ntype one of the following usernames to log in:\n' self.users = get_users().keys() self.users.sort() for user in self.users: print ' {}'.format(user) print '\nTab autocompletes\n' print 'Or type "help" for admin commands\n' def default(self, username): '''Just type your username to login''' login(username) def completenames(self, text, *ignored): '''override command name completion to include only usernames and not commands Delete this function to revert to command completion behavior. Copy this code into complete_login(self, text, line, start_index, end_index) and replace default(self, username) with do_login(self, username) to revert to the more common usage behavior''' if text: return [ user for user in self.users if user.startswith(text) ] else: return users def do_adduser(self,username): '''Usage: (login) adduser username''' if username == '': print '''Usage: adduser newusername''' else: create_user(username) def do_exit(self,line): '''Exit the program''' return -1 if __name__ == '__main__': LoginShell().cmdloop()
*Password hashing is really interesting to me. My understanding is like this: For many mathematical operations, the forward operation is much easier than its inverse. Any 10th grader can square a decimal number exactly with pen and paper, but taking the square root is much more complicated. The best method I know of is Newton’s method, and it involves guessing and then adjusting the guess over and over. Plus, on taking the square root, you can’t be sure if the original number was positive or negative anyway. A good hashing algorithm is this to the extreme. It takes a bit for the computer to do it forwards, and is impossible to inverse, and wouldn’t give a unique answer even if you could. It is even impossible to use a method like Newton’s, where each guess informs the next as you zero in on the exact answer.