Monday, March 23, 2015

Easy View-Based Routing in Django

If you are like many (myself included), you are not a particularly big fan of the idea that the route definition for your app is located inside a single file called urls.py. Sure, it allows you to include the urls from any location, allowing for a more modular approach to how you store these apps, but often it is annoying to break up your routes across multiple different files if you are building a simple website that may need lots of different inputs and don't want to use query params in the url.

One solution is to produce complex regexes that return a different set of variables for a url pattern (i.e. you want to be able to query a url by both the entity's id and its english name: /foo/{foo_id}/$ Or /foo/{foo_name}/$). Though this use case may not be universal, I find it annoying to have to create a complex, error-prone regex or define a simplistic one for each needed url.

Another downfall of the Django urls.py is that it requires you to spell out a new url for every single view you have of your code. Several plugins are available to attempt to solve this problem for you, but I have not liked any of them. The reason I don't really like them is 1) requires an external dependency for something that should be really simple, and 2) usually takes control of the internal routing design that Django ships with. Neither of these are really that bad since Django is very customizable and in a sense is made for this sort of behavior; however, I do not like it. Therefore, I have created a means whereby to get the url functionality I wanted without having to download a third party app (and it is contained all within about 200 lines of code, including comments and documentation).

This tutorial supposes that you are familiar with Django routing. If you aren't, it might help to read the Django tutorial on routing first.

You can see the source code for this also at my github page.

Requirements For Custom Router


Before I walk through how to create your own custom router, I want to list a few requirements I had for this module:
  1. Foremost it had to be self-contained, meaning that it couldn't rely upon another python package to work. 
  2. Any routing had to be done through the default Django route system as it has already proven to be sufficiently performant.
  3. Error checking of urls was a must (why Django doesn't come with some sort of error checking beyond the regex compiling module is beyond me). Error checking in this case means that the base path of the url (a path without any regex patterns included) was different for each new urls. Thus you wouldn't ever have the infuriating problem of wondering why you aren't seeing what you expect on very similar but slightly different urls.
  4. Had to be able to keep the same functionality of original url() function that ships with Django.
  5. Must be able to create custom routes as well as automatic routes for the application. That means that it will produce <app>/<view>/* urls were the star will be treated as variables split by the /, and any other custom url you wish.
  6. View routing must be attached to the view. I hate having to go back to the urls.py to figure out what the url is I am looking at to see the grouping I had used for the variable I need.
  7. Must be easy to use and require very little configuration.
Most of the requirements above were met with my first attempt within an hour's worth of work. It is really quite simple, and I was able to make 149 different unique urls with the following urls.py definition:
urlpatterns = patterns('',
    url(r'^admin/', include(admin.site.urls)),
    url(r'^accounts/', include(accounts)),
    url(r'', include(routes.urls)),
)
Looks much cleaner than the normal 150 lines of excessive regex that would be required. The next section will detail how to write the router used, with the full code pasted at the bottom.

Using Django for Inspiration


The basic requirements of making this router self-contained and not circumvent the normal Django routing requires that all the routing creation occurs before Django ever serves up a page. Several things happen when Django is first loaded, and the event we are interested in most is how the settings.py is set up to be global. If you are not familiar with the Singleton pattern, just know it is a way to make one - and only one - instance of an object. Python makes this very easy because modules are, in Python, objects themselves. This allows for very simple singletons, and it is a pattern that Django uses to make the settings Global and singular (see the Django github for the settings moudule for more details).

Because Django loads things once and only once at the initialization of the system, it does not have to worry about linking to sources and hoping that they have not been moved, while still having the advantage of storing things in an encapsulated object and not inside the global namespace. This provides the benefits of OO programming with the accessibility of functional programming. Elegant and simple. We will do the same for our router.

So the question arises: How do we get Django to scan the views for the routes that we will be adding? At this point you may give up and think that this isn't really worth it (I certainly did), but the reality is that this problem is not a hard problem to solve. To get Django to scan, all you need to do is import the modules that hold the views into the urls.py module. That is it! Python will scan those modules for you. Using this fact will make almost all of our requirements possible. So, first thing we need to do is import the modules that hold our views into urls.py.

Importing the modules we need solves the most basic problem of getting the routes into a position to be added to urls.py. However, this does not establish an easy way of creating app/module/* routes since simply importing the files would require that we do lots of writing of code to add them and removes the automatic nature of adding the routes to urls.py. We could simply write the app/module/* in the view that will be calling it, but that just seems against the auto-magicness of python. We can do better, and we will.

Setting Up a View Based Routing Paradigm 


Since one of our goals is to make routing definitions found on the view that it is routed to, we will take advantage of the fact that class-based views are the new way of making views in Django. Don't worry, the old functional views paradigm will also be supported, but it will be clunkier or less flexible. To make this work, we will simply define a new variable on the view:

routes = ...

What should routes be set to? I went through a number of ideas, including an overly complicated tuple of tuples, which in the end was abandoned for a pythonic (and much easier) way of doing things: dictionaries! Because we decide to use dictionaries right off the bat, we are free to change the structure as we go along and no longer are weighed down by needing to keep track of what order our route definition is defined. If you ever think of tring to define configurations using a static list, just step back, slap yourself awake, and realize that dictionaries are the best thing you could possibly use in this situation.

That being said, I decided upon a routing definition structure as follows:

routes = {"pattern" : '',
                "map" : [(.)],
                "kwargs" : {}
               }

To explain each in turn, remember that we are making it so that each route can define variables in the url to be of one type or another (ie. id based or name based for lookups). Therefore, we need to allow for a way to define the pattern once and then add new regex groups to that pattern. This allows us to also error-check the pattern to make sure that we are not registering the same url pattern more than once unwittingly.

Pattern

The pattern is simply a string that uses the syntax for the format() function of a string to change the curly braces (i,e, {}) inside that string to the positional value of the args passed in to format(). For example, if we want to make a route of foo/{foo_id}/ and foo/{foo_bar}/ we make a pattern:

foo/{}/

Simple!

Map

Because we are defining string templates with pattern, we are going to create a list of lists that can be used to substitute those curly braces with the desired regex pattern. These lists must have the same number of regex patterns as the number of {} in the string in order for this to work, or it will throw an error. The map variable for our pattern above should therefore look like the following:

[('foo_id',), ('foo_name')]

This produces to distinct url patterns:

foo/foo_id/

and 

foo/foo_name/

Note that these are not regex patterns, but simple strings. This is intended to keep it simple, but you can add a regex pattern in there to capture a variable just fine.

Kwargs

The typical name for passing around key-value arguments is to name the dictionary that Python creates as kwargs. To keep with this tradition, we will name our configuration dictionary the same. Any arguments that you need to pass on to the underlying view should be given in the kwargs. This directly correlates to the kwargs argument of the url dispatcher in Django.

Miscellaneous Notes

Because we are using a dictionary to setup our configuration, we now can make use of the 'key in dict' pattern where we can search if a configuration has been defined. By doing so we can omit and add configurations for a route as needed. 

One such example is the name variable defined in a url dispatch object. The name given to a url dispatch object is used in reverse url lookups within templates and can come in handy in many ways. Because name is such a common variable to use in so many applications, it has a high chance of clashing with predefined variables in the kwargs section our our route definition. To circumvent naming collisions (and to make the intent of adding a name for reverse lookup to a url more clear), I later added the 'django_url_name' route configuration option to my router. I was able to add this in to my code without it affecting any other aspect of the routes setup. The same can be said about any new routing configuration that you may want to add in the future.

Creating the Routes Table

Now to the good stuff. The routes table will make use of the singleton pattern described above. This was heavily inspired by Django, including the workaround of LazyRoutes (which will be discussed later). At the bottom of our routes.py, add the line:

routes = Routes()

Routes() here is referencing a class that we have not yet defined. Go ahead and define it anywhere above that line:

class Routes(object):

Now that we have defined the singleton (routes =  Routes()), and the class, we are ready to start adding the structure of our router.

I will not go into full detail about all the different ways you can set up your Routes() object, but I will cover three aspects of it: the initialization (which will be where we automatically add the app/module/* routes as well as any class-based view routes), error checking of routes, and the add() function.

The first thing to do is make sure that the routes are unique. To do this, we will add a set object to the Routes module definition (not the routes instances). This is significant since we will be using this to keep track of all routes added, either by Routes or LazyRoutes (again, discussed later). To do so, you will write a class that looks like this:

class Routes(object):
    tracked = set()
    routes = []

You will also want to add the routes list at this point since want to make a single source for both Routes and LazyRoutes to place their urls. Now, when you add a pattern to the routes table, it can check the tracked set to see if it has already been defined by calling

pattern in tracked

If pattern is in tracked already, then this gives us a chance to throw a meaningful error, one that can be used to denote the duplicate pattern. I have wrapped all this in a function contained within the Routes class as follows:

def _check_if_format_exists(self, route):
        '''
        Checks if the unformatted route already exists.
     
        @route the unformatted route being added.
        '''
        if route in self.tracked:
            raise ValueError("Cannot have duplicates of unformatted routes: {} already exists.".format(route))
        else:
            self.tracked.add(route)
Now that the routes base pattern is unique, we can with confidence add route patterns to each view and know that we won't inadvertently step on a route we already defined.

With the above function we now have an adequate check to use in our add function. We create our add function in such a way that we can pass the pattern, the map, the function to call, and the kwargs to add to the url. Note that I said the calling function. Here we define a way to add function based views and their routes. My add function looks like the following:

def add_url(pattern, pmap, ending=False, opts={}):
            url_route = '^{}{}'.format(pattern.format(*pmap), '/$' if ending else '')
            if "django_url_name" in opts:
                url_obj = url(url_route, func, kwargs, name=kwargs['django_url_name'])
            else:
                url_obj = url(url_route, func, kwargs)
            self.routes.append(url_obj)
Since Django's url dispatcher literally stores the function signature to use when routing a url to a view, we can safely add any function that fits the url dispatch function parameters. If you look at the source code for Django View object, you will see that the as_view() function that is required to be passed in to a url dispatcher object literally returns another function called view. This function fits the old function-based view pattern of:

def view(request, *args, **kwargs)

Since the class-based views are just passing this function as the view, it is therefore clear to see that any function with this pattern can be passed safely. Note that in order for it to work it must return a django.http.HttpResponse object, but you could register a function with this pattern and almost get away without it. So, because we know that all we really need is a function, we now have the ability to add any view function to our routes. Isn't that great!?! An example of what I mean is as follows:

Here is your view function:

def showMeAll(request, *args, **kwargs):
    ....

All you need to do to add it is to first import routes, and then add it as follows:

routes.add_url('foo/{}', [('foo_id',),('foo_name',)], True, {})

The ending variable of the add_url() definition is to add the '/$' at the end of the url, thus eliminating mistakes that arise from not including the pattern end clause and preventing the need to make sure all urls end with a /. The opts is simply the kwargs as described in the class-based view route table.

The LazyRoutes Object

The LazyRoutes object pertains to the automatic loading of view-based classes into the routes table along with the automatic addition of app/module/* routes. The detail behind how to make an automatic loader is also heavily influenced by how Django registers apps and Models. The way Django loads apps is confusing, and it took a little bit of trial and error to finally figure it out, but if you want to see more about it, here is the link. 

From what I can make of it, since Django imports everything it needs at once before it ever serves up a page, there are times when recursive import statements can become a problem. What I mean by that is say that, as in our situation, we need to load the views modules when we have custom routes to add to the urls.py, but we also want to automatically add the app/module/* routes along with these custom routes. To automatically create these routes, we need to create them through introspection upon the creation of the Routes object. Simple enough, so what is the problem?

The problem arises by the fact that, when scanning the modules for views, we may come across a line of code like the following:

routes.add_url(...)

This is for a functional route, one that cannot be added through introspection upon creation. What are we to do? We do what Django does and create a LazyRoutes object. Technically this is not a real Lazy Object because it creates the url objects when they are found, but the idea is that they are not added to the Routes object that is still being created. The LazyRoutes object is used to store url patterns in the Routes.routes list (remember that the class definition Routes is its own object and is not an instance of Routes). This LazyRoutes object seems to act as though it adds the routes definitions defined by the routes.add_url() function after the Routes object has been created and has finished making the app/module/* routes. In reality, it was adding them to the master list all along. But this is a detail that is needed to be known in very few, if any, circumstances.

The added bonus of having a LazyRoutes object is that we can also completely circumvent the automatic Routes behavior if ever we wished and just stuck with LazyRoutes and only making defined routes accessible. Whatever your style of coding is will determine whether you will use it in this way. 

How to Introspectively Create App/Module/* Routes

Finally, we will discuss the trickiest part of this whole routing setup. So far it has been very easy, no? Hopefully you will have figured out that to add a class-based view would mean iterating over the routes dictionary you defined in the view class, but if not, here is a hint that that is what you should do. This aspect makes it so that you don't really have to even do that, for as you will see, you can add a function called add_view to your Routes definition that will take advantage of the introspective magic we are about to cover to add these views automatically (meaning you don't have to register them with routes.add_url()).

Python comes with a bevy of really cool tools for introspection (something made very easy since it is an interpreted language). Since Django already requires us to list the importable app names of our application, we will just use this list: settings.INSTALLED_APPS. Using this list we will attempt to load the modules as defined in the settings.INSTALLED_APPS list with the importlib module. This module comes with a handy feature called import_module(), which takes a string (aka the string listed in the settings.INSTALLED_APPS) and attempts to import it. Once imported it returns the module object that it found.

Now is a good time for me to state that I truly love the idea that everything (and they mean everything) is an object in Python. The module object is literally an object that describes a module that has been loaded, or in this case, the app's module. From that we can get the path to the module, and using some other python tools, we can grab the name of all the other modules inside this app. This means that we can load every module in an app and never need know what the app structure looks like beforehand! Isn't that cool? Because of this, we can make use of another nifty tool: inspect.

inspect is a module in python that allows you to inspect a module, directory, or whatever it is that you may need to inspect. In this case we are going to inspect all the modules of an application that we have loaded from the settings.INSTALLED_APPS list above. It is probably a good idea to filter out the django.* apps since they wont have defined the routes as we have hear, but that is up to you. 

The function from the inspect module that we are interested in is the get_members() module. This is a really neat function because you can pass in a module and a predicate (which means a declaration of need) to find what you are interested. We are interested in finding all the view classes of our modules. Doing it this way allows us to define a view wherever we like inside our application - we needn't be limited to a single views.py module. To find just the classes, do the following:

inspect.get_members(module, inspect.isclass)

This will return a list of classes. There is no way to see if a class definition descends from another class type without first creating an instance. Again, the everything-is-an-object paradigm means that the class definition is an object too, but it is simply of type type (the reason for this is far beyond the scope of this tutorial). To check if our class is of type View, we need to make an instance. Luckily the call inspect.get_members() actually returns a list of tuples, with the second position of the tuple being the class definition object. Since these objects are used to create an instance of the class, all we need to do is get the second position member and call it. An example below:

klasses = inspect.get_members(module, inspect.isclass)
inst = klasses[0][1]()

The parentheses at the end creates the object inst, which is an instance of the class that was defined by the second position of the first item in the klasses list (that is a mouthful, read it a few times to make sure you understand). We can now check that the inst class is of type view by doing the following:

isinstance(inst, View) #view must but imported before use from django.views.generic.base.View

Checking to make sure that a class is a view is necessary because now we can take the other information we have about the module, the app, and the view name and create the app/module/view route. Also, since we already have the view with its route table, we have all the information to add the custom routing that is defined on the view. Pretty sweet!

Note that it is a pretty trivial matter now to do something very similar to look for a views.py module where you can define all your functional based views and add them to the routes table as well. This makes it so that you won't have to define any routes.add() outside of your routes folder. Doing this makes it much more automagical, but it is really up to you.

Conclusion


Usually my tutorials are a lot more straightforward, but I felt like for this example it would be too long and too much to go over every aspect of the code. Also, the intent of this tutorial was to try and give some idea of how to do something versus giving just one idea of how to solve the problem. It was a lot more difficult, and so it is probably pretty unclear at times what I was attempting to do. I said I would give the source code to you to view, and so I have attached it at the bottom here. But I would encourage you to view my solutions on Github. Github has automatic syntax highlighting which makes things easier to read.

I hope that I was able to explain in some detail some of the cooler aspects of this routes creation. It took me awhile to figure it out, but now that it is done I am very proud of what it can do. I am sure that there are several people who have done this, but it seems to me that I always get more joy from figuring it out on my own. Hopefully someone can use this to their advantage.

Source


'''
Created on Mar 12, 2015

@author: derigible
'''

from django.conf.urls import url, patterns
from django.conf import settings
import importlib as il
import glob, os, sys, inspect
from django.views.generic.base import View

def check_if_list(lst):
    if isinstance(lst, str):
        '''
        Since strings are also iterable, this is used to make sure that the iterable is a non-string. Useful to ensure
        that only lists, tuples, etc. are used and that we don't have problems with strings creeping in.
        '''
        raise TypeError("Must be a non-string iterable: {}".format(lst))
    if not (hasattr(lst, "__getitem__") or hasattr(lst, "__iter__")):
        raise TypeError("Must be an iterable: {}".format(lst))

class Routes(object):
    '''
    A way of keeping track of routes at the view level instead of trying to define them all inside the urls.py. The hope
    is to make it very straightforward and easy without having to resort to a lot of custom routing code. This will be
    accomplished by writing routes to a list and ensuring each pattern is unique. It will then add any pattern mapppings
    to the route for creation of named variables. An optional ROUTE_AUTO_CREATE setting can be added in project settings
    that will create a route for every app/controller/view and add it to the urls.py.
    '''
    
    routes = [] #Class instance so that lazy_routes will add to the routes table without having to add from the LazyRoutes list.
    acceptable_routes = ('app_module_view', 'module_view')
    tracked = set() #single definitive source of all routes
    
    def __init__(self):
        '''
        Initialiaze the routes object by creating a set that keeps track of all unformatted strings to ensure uniqueness.
        '''
        #Check if the urls.py has been loaded, and if not, then load it (for times when you want to create the urls without loading Django completely)
        proj_name_urls = __name__.split('.')[0] + '.urls'
        if proj_name_urls not in sys.modules:
            il.import_module(proj_name_urls)
        if hasattr(settings, "ROUTE_AUTO_CREATE"):
            if settings.ROUTE_AUTO_CREATE == "app_module_view":
                self._register_installed_apps_views(settings.INSTALLED_APPS, with_app = True)
            elif settings.ROUTE_AUTO_CREATE == "module_view":
                self._register_installed_apps_views(settings.INSTALLED_APPS)
            else:
                raise ValueError("The route_auto_create option was set in settings but option {} is not a valid option. Valid options are: {}".format(settings.route_auto_create, self.acceptable_routes))
    
    def _register_installed_apps_views(self, apps, with_app = False):
        '''
        Set the routes for all of the installed apps (except the django.* installed apps). Will search through each module
        in the installed app and will look for a view class. If a views.py module is found, any functions found in the 
        module will also be given a routing table by default. Each route will, by default, be of the value <module_name>.<view_name>. 
        If you are worried about view names overlapping between apps, then use the with_app flag set to true and routes 
        will be of the variety of <app_name>.<module_name>.<view_name>. The path after the base route will provide positional 
        arguments to the url class for anything between the forward slashes (ie. /). For example, say you have view inside 
        a module called foo, your route table would include a route as follows:
        
            ^foo/view_name/(?([^/]*)/)*
        
        Note that view functions that are not class-based must be included in the top-level directory of an app in a file
        called views.py if they are to be included. This does not make use of the Django app loader, so it is safe to put
        models in files outside of the models.py, as long as those views are class-based.
        
        Note that class-based views must also not require any parameters in the initialization of the view.
        
        To prevent select views from not being registered in this manner, set the register_route variable on the view to False.
        
        All functions within a views.py module are also added with this view. That means that any decorators will also have
        their own views. If this is not desired behavior, then set the settings.REGISTER_VIEWS_PY_FUNCS to False.
            
        @param apps: the INSTALLED_APPS setting in the settings for your Django app.
        @param with_app: set to true if you want the app name to be included in the route
        '''
        def add_func(app, mod, func):
            r = "{}/{}/(?:([^/])*/+)*".format(mod,func[0])
            if with_app:
                r = "{}/{}".format(app, r)
            self.add(r, func[1], add_ending=False)
            
        for app in settings.INSTALLED_APPS:
            if 'django' != app.split('.')[0]: #only do it for non-django apps
                loaded_app = il.import_module(app)
                for p in glob.iglob(os.path.join(loaded_app.__path__[0], '*.py')):
                    mod = p.split(os.sep)[-1][:-3]#get just the module name without the .py
                    try:
                        loaded_mod = il.import_module('.' + mod, loaded_app.__package__)
                        for klass in inspect.getmembers(loaded_mod, inspect.isclass):
                            try:
                                inst = klass[1]()
                                if isinstance(inst, View):
                                    if not hasattr(inst, 'register_route') or(hasattr(inst, 'register_route') and inst.register_route):
                                        add_func(app, mod, klass)
                                    if hasattr(inst, 'routes'):
                                        self.add_view(klass[1])
                            except TypeError: #not a View class if init is required.
                                pass
                        if mod == "views" and (hasattr(settings, 'REGISTER_VIEWS_PY_FUNCS') and settings.REGISTER_VIEWS_PY_FUNCS):
                            for func in inspect.getmembers(loaded_mod, inspect.isfunction):
                                add_func(app, mod, func)
                    except ImportError:
                        raise TypeError("Routes type found in view module when settings.ROUTE_AUTO_CREATE has been set. Switch Routes to LazyRoutes.")
        
    def add(self, route, func, var_mappings= None, add_ending=True, **kwargs):
        '''
        Add the name of the route, the value of the route as a unformatted string where the route looks like the following:
        
        /app/{var1}/controller/{var2}
        
        where var1 and var2 are arbitrary place-holders for the var_mappings. The var_mappings is a list of an iterable of values
        that match the order of the format string passed in. If no var_mappings is passed in it is assumed that the route has no mappings
        and will be left as is.
        
        Unformatted strings must be unique. Any unformatted string that is added twice will raise an error.
        
        To pass in a reverse url name lookup, you can use the key word 'django_url_name' in the kwargs dictionary.
        
        @route the unformatted string for the route
        @func the view function to be called
        @var_mappings the list of dictionaries used to fill in the var mappings
        @add_ending adds the appropriate /$ is on the ending if True. Defaults to True
        @kwargs the kwargs to be passed into the urls function
        '''
        self._check_if_format_exists(route)
        
        def add_url(pattern, pmap, ending, opts):
            url_route = '^{}{}'.format(pattern.format(*pmap), '/$' if ending else '')
            if "django_url_name" in opts:
                url_obj = url(url_route, func, kwargs, name=kwargs['django_url_name'])
            else:
                url_obj = url(url_route, func, kwargs)
            self.routes.append(url_obj)
            
        if var_mappings:
            for mapr in var_mappings:
                check_if_list(mapr)
                add_url(route, mapr, add_ending, kwargs)
        else:
            add_url(route, [], add_ending, kwargs)
    
    def add_list(self, routes, func, prefix=None, **kwargs):
        '''
        Convenience method to add a list of routes for a func. You may pass in a prefix to add to each
        pattern. For example, each url needs the word workload prefixed to the url to make: workload/<pattern>.
        
        Note that the prefix should have no trailing slash.
        
        A route table is a dictionary after the following fashion:
        
        {
         "pattern" : <pattern>', 
         "map" :[('<regex_pattern>',), ...],
         "kwargs" : dict
        }
        
        @routes the list of routes
        @func the function to be called
        @prefix the prefix to attach to the route pattern
        '''
        check_if_list(routes)
        for route in routes:
            if 'kwargs' in route:
                if type(route['kwargs']) != dict:
                    raise TypeError("Must pass in a dictionary for kwargs.")
                for k, v in route["kwargs"].items():
                    kwargs[k] = v
            self.add(route["pattern"] if prefix is None else '{}/{}'.format(prefix, route["pattern"]),
                      func, var_mappings = route.get("map", []), **kwargs)
    
    @property
    def urls(self):
        '''
        Get the urls from the Routes object. This a patterns object.
        '''
        return patterns(r'',*self.routes)
        
    def _check_if_format_exists(self, route):
        '''
        Checks if the unformatted route already exists.
        
        @route the unformatted route being added.
        '''
        if route in self.tracked:
            raise ValueError("Cannot have duplicates of unformatted routes: {} already exists.".format(route))
        else:
            self.tracked.add(route)
            
    def add_view(self, view, **kwargs):
        '''
        Add a class-based view to the routes table. A view that is added to the routes table must define the routes table; ie:
        
            (
                  {"pattern" : <pattern>', 
                   "map" :[('<regex_pattern>',), ...],
                   "kwargs" : dict
                   },
                 ...
            )
        
        Kwargs can be ommitted if not necessary.
        
        Optionally, if the view should have a prefix, then define the variable prefix as a string; ie
        
            prefix = 'workload'
            
            or
            
            prefix = 'workload/create
            
        Note that the prefix should have no trailing slash.
        '''
        if not hasattr(view, 'routes'):
            raise AttributeError("routes variable not defined on view {}".format(view.__name__))
        if hasattr(view, 'prefix'):
            prefix = view.prefix
        else:
            prefix = None
        
        self.add_list(view.routes, view.as_view(), prefix = prefix, **kwargs)

class LazyRoutes(Routes):
    '''
    A lazy implementation of routes. This means that LazyRoutes won't add routes to the Routes table until after the
    routes table has been created. This is necessary when the ROUTE_AUTO_CREATE setting is added to the Django settings.py.
    All defined routes using the routes.* method must now become lazy_routes.* methods.
    '''
    
    def __init__(self):
        '''
        Do nothing, just overriding the base __init__ to prevent the initilization there.
        '''
        pass
        
lazy_routes = LazyRoutes()
routes = Routes()

Friday, March 6, 2015

How to Remove PostgreSQL from Server

At work we have a reporting server that was setup by an employee here who deemed himself the foremost expert at Linux. Needless to say, if you are touting your Linux skills, you had better be able to back them up. Turns out that he wasn't quite up to snuff with his skills, and after following a tutorial he put together for setting up a server, I found myself unsure of what PostgreSQL server I was actually using. Somehow I had both 9.1 and 9.2 running.

I was in the process of cleaning up the reporting server and decided that it would be good to only have one server of PostgreSQL running, so I discovered how to do it. In the process of cleaning it up, I also discovered that the server version you are using stores databases by default in the same parent directory as the server itself is located. In other words, if you follow this tutorial, be aware that you will lose your data if the default storage location had not been changed. Luckily it was data that really didn't need to be kept around for a long time so it wasn't that big of a deal. But be aware that it does destroy data doing this method.

First, run the command

sudo /etc/init.d/postgresql stop

If two versions of the server pop up, then it means you are running two instances of PostgreSQL. Decide which one you want to remove and then run the following command:

sudo apt-get purge postgresql-x.x

where the x.x stands for the major.minor version number.

If you are purging a server from multiple different servers being used, then you will need to restart the server:

sudo /etc/init.d/postgresql start

The server is now up and running and ready to be used.

Thursday, March 5, 2015

How to Setup Django on an AWS EC2 Instance Using VirtualEnv

Setting up an ec2 instance is as easy as following the launch wizard provided by Amazon, and as such will not be covered in this tutorial (to setup ec2 for yourself, you can follow Amazon's guide found here). After following the instructions for setting up a Ubuntu server, you should be ready to follow the rest of this tutorial. Though the server used is Ubuntu, the steps will be fairly similar for any Linux distribution (just use that distros package manager calls instead). You can also follow this tutorial for connecting to an EC2 instance from Windows if you are unfamiliar with the process.

Update Ubuntu


The first thing that you should always do when starting a new server is running an update call on the server. This provides all of the security updates and functional fixes that have been released since Amazon took the image of Ubuntu that you are using.

To do so, enter the command:

sudo apt-get update

A bunch of packages will be downloaded and installed on the operating system. Normally this step doesn't take more than a minute or two, and once you are done you can continue setting up the server without having to restart (a nice advantage over the typical Windows update cycle).

Update the Distribution of Python to Python 3


NOTE: If you want to just stick with Python 2 instead of going through these steps, then skip ahead to the next section and treat python 3 references as though it were python 2 by simply removing the 3 in the call.

Before you do anything with your current installation's python, you need to take note (write on a piece of paper or store on a notepad document) of what version is the default version of python. You do so by running:

python --version

The above command will return something like Python 3.2.3. Make sure to make note of this as it will be important going forward.

My personal opinion is that Python 2, while great, should be deprecated and the world should move to Python 3. I won't go into an explanation here as to why I feel this way, but it is easy enough to do and writing things in Python 3 will ensure that your code will continue to work when (if ever) they decide to finally stop supporting Python 2.

To install python 3, you should run the following (this may not be necessary for server version 12.10 and up, see here):

sudo apt-get install python3

A quick Google search brought up some stackoverflow.com answers that seem to indicate that it is a bad idea to switch the system default python version to Python 3 (primarily because Python 2 and 3 are not compatible) and will likely break some scripts that rely on Python 2. I am not a Linux expert by any means, and since this is a practical tutorial on how to set up a server that will work, we will do as we are told (see more about it here). 

To set up Python 3, we will create an alias by doing the following command:

echo 'alias python=python3' >> .bashrc

The above command will edit the default bash shell setup (or, if you are not familiar with Linux at all, the command prompt you see through a terminal to your Linux box) with an alias to the word python that will point to python3. You will not see the change until you reconnect. To make sure it works, reconnect to the server and enter:

python --version

It should now be Python 3.x.x


Symlink Your Python Executable


We will also want to setup a symlink (basically an aliased path to a directory) to point /usr/bin/python to Python 3, do the following:

Make the directory:

sudo mkdir ~/bin/python -p

Then run:

sudo ln -s /usr/bin/python3 ~/bin/python

Essentially we just made a path called /home/ubuntu/bin/python that points to the Python interpreter in /usr/bin/python3. This link will be useful in the next step.

NOTE: If you didn't setup Python 3, then you will need to replace python3 with python2.

Install Python Package Manager


Pip is the preferred way by many to manage packages specific to python. There are other ways, but pip is so easy to use that it doesn't make a lot of sense not to use it. We will need to install pip (or some other package manager if we want to make this easy), before we continue. You can try another python package manager, but this tutorial is specific to pip.

To install pip, run:

sudo apt-get install python-pip

Pip should now be installed and ready to use.

Setup a VirtualEnv


Several places online encourage the use of virtualenv to run your Django instance. Since it is not a very hard thing to do, we will set it up as well. I will not go into detail as to why it is a good idea, but you can Google the reasons yourself if you want. We will follow the install instructions found at http://docs.python-guide.org/en/latest/dev/virtualenvs/.

To install virtualenv, call:

sudo pip install virtualenv

Since we set a symlink in the previous step to Python 3, we can use it to set up the virtualenv with the Python 3 interpreter:

virtualenv - p ~/bin/python venv

Note that it is possible to sidestep the symlinking and do something like virtualenv -p /usr/bin/python3, but symlinking provides the advantage of being able to change the python version without having to update this call. Essentially, if we wanted for some reason to move back to Python 2 (or if you never went to Python 3 using the steps above), we could set the symlink to /usr/bin/python2 and the virtualenv wouldn't know the difference (unless we broke compatibility by going back to python 2). It is therefore a good idea to make your calls using symlinks as it makes for more flexibility to change things in the future. Although in the case where you are doing this manually, it is probably fine to direct-link to the python version you want. It is good practice to write things in such a was as to makes portability into scripts much easier.

Now we need to activate the virtualenv:

source venv/bin/activate

Now that it is setup, you should see the (venv) on the left side of your command prompt indicating that you are in a virtual environment. Virtual environments only last as long as the shell is alive, so you will need to run the above command each time you want to edit your venv after closing the shell (or after running deactivate). Go ahead and enter deactivate for the next step

NOTE: Python 3 comes with a virtual environment package built in called, conveniently, venv. I didn't know this until after I started writing this tutorial. It is basically the same as virtualenv, and it is likely easier to use. I would read about it here: https://www.python.org/dev/peps/pep-0405/, or the documentation here.

Setup PostgreSQL


PostgreSQL is the recommended database for Django as it is the most supported of all the databases. You will need to install it on your system by doing the following:

sudo apt-get install postgresql

We now need to log in to the server and setup the postgres user (probably a good idea to try and set up a different user other than postgres since postgres is the superuser for your database, but for now we can just use postgres). Do so by entering the psql (postgres database management prompt) by typing the following:

sudo -u postgres psql

You should see a prompt that looks like:

postgres=#

This is the command prompt for postgres and will allow us to perform operations on the database. First we set up the user password:

ALTER USER postgres PASSWORD '<password here>';

Which should be followed by the words ALTER ROLE.

Now we will create our database:

CREATE DATABASE <db_name_here>;

You should then see CREATE DATABASE to confirm that it was created.

Now you will need to install a few other packages so Django will be able to talk to the server. Run the command:

sudo apt-get install postgresql-server-dev-x.x

where x.x is the version number of your PostgreSQL database. You can find out the version of PostgreSQL by running the following:

sudo /etc/init.d/postgresql stop

will show you the version that was stopped. Restart it be running:

sudo /etc/init.d/postgresql start

Then run the command:

sudo apt-get install python3-dev

which will install some python files that are not included in the original python 3 install. Next you will reactivate your virtual environment:

source venv/bin/activate

and then run:

pip install psycopg2

which installs the actual interface between Django and the PostgreSQL server. I don't know the reasons behind why all of these files are needed except that often times, when developing, developers will use files found in a package to help them speed up development time. That is what is going on here and thus requires us to install so many additional packages.

Install Django Inside VirtualEnv


Start up the virtual environment again:

source venv/bin/activate

Now run the command:

pip install django

Note that you no longer have to use sudo in front of pip to install packages. This is one of the best benefits of using virtualenv. 

Next we will make a symlink to the python 3.x site-packages directory, to be used later in the apache setup:

sudo ln -s ~/venv/lib/python3.x/site-packages  /var/lib/python/site-packages

where the x in 3.x is the name of the directory for the Python 3 version you are using.

Setup Your Django Project


If you have a Django project that you have already built, then you have a variety of ways you can get it onto your server. We will focus on the case of when you already have a Django project built and leave the other case up to the user to figure out (see Django's excellent documentation on how to get started with Django for building a Django app, though I recommend building it on your local computer first). If you don't know how to put files onto a server, you can follow up on how to do so with my tutorial here (the part on connecting with WinSCP is towards the bottom and is a little outdated, but should be sufficient). I will not go over how to get the project onto your machine, my previous tutorials details how to do so. We will start off assuming that you have your project on your server.

To begin, make a symlink to your django files wherever they may be (for ease of finding later in this tutorial):

sudo ln -s /path/to/django/files/directory /var/lib/<projectName>

We will need this path when we have setup apache on our server.

After Django is installed and your project setup, we will need to setup our Apache server. We could use just about any other server, such as nginx or lighttpd, but will stick with Apache because of its popularity and because I know it already. An added bonus is that Django has documented how to setup Django with Apache, so why not make it simple?

Setup Apache Server - Install


First, if inside your virtual environment, deactivate your virtual environment:

deactivate

Now enter the command:

sudo apt-get install apache2

Pretty straightforward. You can navigate to your server's url and you should see the Apache default page, which looks something like this:

It works!

This is the default web page for this server.
The web server software is running but no content has been added, yet.
Once this is complete, you will need to install a plugin for apache called mod_wsgi. Mod_wsgi needs to be compiled with the same version of python that your scripts are running. This creates a major headache, and I couldn't find an easy way with package managers to simply point the install to the right python version. Therefore, if you are running a version of Ubuntu that doesn't have Python 3 as the default version, you will need to do this (hopefully you checked what the system default python is as instructed above; if not, then it is up to you to figure it out).

To setup your mod_wsgi to work with apache in python 3, do the following (taken from this stackoverflow.com answer):

Install more packages needed to modify apache2 mods:

sudo apt-get install apache2-dev

Change directories to a common place to store source code:

cd /usr/local/src

Download and install the mod_wsgi code from the code repository. The following steps are all needed to take the code from the repository and make it into something Linux can use:

sudo apt-get install make

sudo wget https://modwsgi.googlecode.com/files/mod_wsgi-3.4.tar.gz

sudo tar -zxvf mod_wsgi-3.4.tar.gz

cd mod_wsgi-3.4/

sudo ./configure --with-python=/usr/bin/python3.x

where x is the version of Python 3 on your machine.

sudo make

sudo make install

Now you have mod_wsgi in an executable binary format and can be loaded into apache. To tell apache to load this module, we will have to edit the apache configuration, which we cover in the next section.

NOTE: If you decided to stick with Python 2, things are a lot easier. To install, simply run:

sudo apt-get install libapache2-mod-wsgi

You are now ready to setup Django to run with Apache.

**UPDATE: go to https://launchpad.net/ubuntu/trusty/+package/libapache2-mod-wsgi-py3 instead if using latest edition of Ubuntu. This is for python 3.

Setup Apache Server - Configure to Run with Django


One reason I really recommend Django over other web frameworks (at least for python users) is that the documentation is excellent. Django comes with a tutorial of deploying Django with Apache. I will attempt to distill the finer points here, but you can always see the Django tutorial at https://docs.djangoproject.com/en/1.7/howto/deployment/wsgi/modwsgi/.

First thing we need to do is navigate to the /etc/apache2 directory:

cd /etc/apache2

If you look in the directory (ls command), you will see several different files, each of them dealing with different aspects of apache's configuration. The apache2.conf is the main apache configuration file and any configuration changes you make there will be used. However, it is generally a good idea to leave custom changes to apache's configuration outside of the main config file. Therefore, apache comes with a httpd.conf file, which is a user-defined configuration file that is added to the main apache2.conf file. It is good practice to edit this file as it helps to segment the changes you made with what comes standard with apache. All we need to do is make sure that the apache2.conf file includes httpd.conf.

Open the apache2.conf file (if you don't know how, there are a number of ways to do so, each of which can be tricky to use if you don't know Linux). Since this is a very basic tutorial, we will use the text editor vim to open our file. It has some nice features to help read text files from a terminal, and it is widely regarded as one of the most useful text editors on Linux. Just note that if you are new to vim, only enter the commands you see here or else you will be totally confused as to what is going on.

Enter:

sudo vim apache2.conf

Your screen should now have a bunch of blue text. What you are reading are the instructions for how to use apache. Take some time to read it as it does give some useful information, but for our purposes we are just going to use the up and down arrow keys (you can also use the page up and down keys to scroll a whole page) to find what we want.

After scrolling down for a bit you should see:

Include httpd.conf

If this line is in there then you are ready to edit the httpd.conf file. Exit out of your current view by typing the keys

:q

and then pressing enter. This will return you back to your regular command prompt. If that line is not there, then enter the following sequence of commands:

  1. Press the i key.
  2. Write Include httpd.conf on its own line.
  3. Press the Esc key.
  4. Enter the character sequence :wq
  5. Press Enter.

You have successfully added the httpd.conf file to your apache2.conf file.

Now open the httpd.conf file as follows:

sudo vim httpd.conf

Using the same basic pattern described above for editing a file in vim, write the following config information in your httpd.conf file (do not save after this, more will be written):


WSGIDaemonProcess <projectName> python-path=/var/lib/<projectName>:/var/lib/python/site-packages
WSGIProcessGroup <projectName>
WSGIScriptAlias / /var/lib/<projectName>/<projectName>/wsgi.py
Alias /static/ /var/lib/<projectName>/static/
Note that <projectName> should be the name of your Django project.

The above configuration is telling apache to run your Django project that we set up previously and is also set to retrieve your static files, like your css and js files, from the Django static folder directly. Django strongly discourages the use of Django as a means to send static files to a user, so that is why we tell apache where to look for static files. This presupposes that your static files are pointing to ./static/ in your html.

If Python 3 is your python version, then do the following step; skip it if you stuck with Python 2. We are going to add the configuration now that tells apache to load the mod_wsgi executable. Write the line:

LoadModule wsgi_module /usr/lib/apache2/modules/mod_wsgi.so

Save the file as discussed in steps 3 through 5 in the vim editing example above. You are done setting up apache.

Restart Apache


After you have completed all of this, you are ready to go. Restart apache by issuing the following command:

sudo service apache2 restart

Navigate to your server's homepage and you should now be seeing the homepage of your django app!

Troubleshooting


I hope that this helps out aspiring developers in the future. Some of the pitfalls I faced when first setting up a Django app are as follows:
  1. Couldn't Find Anything with PIP - I didn't have my https port open on AWS. Though it seems like such an obvious reason for some of the issues I was facing, but it wasn't immediately apparent when pip was failing that it just couldn't see the pip repository. PIP uses https to get the packages you need on your system, so it requires that https be open. I didn't know it at the time, and I spent hours working with the extremely unhelpful error messages before I figured out the solution.
  2. Apache is Returning 500 Errors - It took me awhile to figure out why my first install of Django wasn't working, so I had to do a lot of searching just to figure out where the log files for apache were so I could see what is happening. For our purposes (since it really depends on the Linux distribution for where the log files are located) you can find the log files under /var/log/apache2. The most useful troubleshooting log is, naturally, error.log. Use the tail command to see what happened last: tail -100 /var/log/apache2/error.log
  3. My Apache error.log tells me that permission is denied with file /path/to/file/__pycache__ - Apache runs as user www-data and as such has very limited space in which it can edit files on the system (for obvious security reasons). You will need to edit the folder that stores your file (most likely going to be /var/lib/<projectName> as we setup above). To do so, enter the following: sudo chown -R www-data /var/lib/<projectName> assuming that the __pycache__ is within this directory. Then run the command sudo chmod -R 775 /var/lib/<projectName> . Note that this last command is somewhat insecure but should be sufficiently secure for now. Most security settings will have to be adjusted when you decide to really get serious about security, so we won't bother with it now.
  4. Makemigrations is Giving Me Permission Denied Errors - Permissions need to be edited to allow the ubuntu user (the default login user) to make edits as well. Do the following: sudo chown -R www-data:ubuntu /var/lib/<projectName> . For good measure, run the command sudo chmod -R 775 /var/lib/<projectName> .
  5. I can't see my media files - This one was really obvious but somehow I missed the explanation Django provided. You need to create another configuration entry in httpd.conf that points /media/ calls to your media folder (wherever that may be). Google Django set up media files for more information.
  6. VirtualEnv is getting a permission denied error. - This problem occurred because I failed to symlink my python interpreter correctly. Linux doesn't always (perhaps not so often) gives good error messages, and I had to bang my head for awhile with this one. After I removed the symlink to my python interpreter and remade it the correct way, everything worked. But, to be thorough, here is a resource you can use.
  7. Apt-get not working because lock can't be removed. - Again, another stupid problem with me just being impatient and ending a process before it finished and Linux couldn't recover. Basically you just have to end the apt-get processes and then remove the lock file if it doesn't work. See more here.
  8. I'm having trouble migrating my models to the database. - Remember that anything you do with Django has to be run inside the virtual environment. Before running migrations or other Django management, you must run source ~/venv/bin/activate .
  9. There is a problem with my virtual environment saying that I do not have setuptools installed when attempting to install pyscopg2. Follow the answer here: https://www.reddit.com/r/learnpython/comments/3jlbep/error_msg_pip_setuptools_must_be_installed_to/

Sources


I used a plethora of sources to make this work. As I have mentioned before, I am not a system administration guru, but I am fairly decent with Linux. That being said, several things about setting up a server were not immediately straight-forward to me since error messages on Linux can be extremely unhelpful at times. To help me get to where I am now, I have listed several of the sources I used.
  1. http://www.tonido.com/blog/index.php/2013/11/25/working-with-virtualenv-on-django-projects/#.VPjDHvnF_DQ
  2. https://www.digitalocean.com/community/tutorials/how-to-run-django-with-mod_wsgi-and-apache-with-a-virtualenv-python-environment-on-a-debian-vps
  3. https://virtualenv.pypa.io/en/latest/userguide.html#usage
  4. https://docs.djangoproject.com/en/1.7/topics/install/
  5. https://docs.djangoproject.com/en/1.7/howto/deployment/wsgi/modwsgi/
  6. https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-14-04
  7. http://www.postgresql.org/download/linux/ubuntu/
  8. http://stackoverflow.com/questions/1951742/how-to-symlink-a-file-in-linux
  9. http://ubuntuforums.org/showthread.php?t=2141770
  10. http://askubuntu.com/questions/197626/where-is-a-postgresql-9-1-database-stored-in-ubuntu-12-04
  11. http://www.postgresql.org/message-id/006201c74b23$17cce130$9b0014ac@wbaus090
  12. http://askubuntu.com/questions/15433/unable-to-lock-the-administration-directory-var-lib-dpkg-is-another-process
  13. https://www.digitalocean.com/community/tutorials/how-to-read-and-set-environmental-and-shell-variables-on-a-linux-vps
  14. http://stackoverflow.com/questions/16618071/export-a-variable-to-the-environment-from-a-bash-script-without-sourcing-it
  15. http://askubuntu.com/questions/320996/make-default-python-command-to-use-python-3
  16. http://askubuntu.com/questions/401132/how-can-i-install-django-for-python-3-x
  17. https://docs.djangoproject.com/en/1.7/faq/install/
  18. http://stackoverflow.com/questions/5846167/how-to-change-default-python-version
  19. http://askubuntu.com/questions/244544/how-do-i-install-python-3-3
  20. http://docs.python-guide.org/en/latest/dev/virtualenvs/
  21. http://stackoverflow.com/questions/22938679/error-trying-to-install-postgres-for-python-psycopg2
  22. http://stackoverflow.com/questions/20913125/mod-wsgi-for-correct-version-of-python3
  23. http://askubuntu.com/questions/483744/config-status-error-cannot-find-input-file-makefile-in
  24. https://code.google.com/p/modwsgi/wiki/CheckingYourInstallation
  25. http://httpd.apache.org/docs/2.2/mod/mod_so.html
  26. https://launchpad.net/ubuntu/trusty/+package/libapache2-mod-wsgi-py3


Wednesday, July 24, 2013

Tutorials Done My Way

Part of the great love that I have for the technology industry is just how willing people are to help each other, evidenced by the plethora of tutorials out there on just about any and every subject. Entire websites are devoted to answering any question that may be asked (think stackoverflow.com). Twitter has various different feeds that answer technical questions for a variety of products, and people troll these feeds and provide almost real-time answers.

But my major problem with online help is just how shoddy, piecemeal, and incomplete almost all of it is. Want to know how to connect to a server remotely? There is a tutorial for that, but what if that server is not typical? Then you are HOSED! Though this is a relatively minor example, and one that isn't actually all that hard, it nevertheless illustrates the gaping hole found in most tutorials in that they don't explain the underlying structure of the tutorial and so moving outside of that tutorial can be near impossible for the newbie (which, if you are looking up tutorials in the first place, you almost certainly are).

So here is my deal: I want to do online tutorials and documentations in a more excellent manor. I realize that some people don't want an explanation behind what is going on in the tutorial, they just want to know how it is done. That is fine, most of those people can either look elsewhere or just skip over the meaningful explanations I hope to provide.

In addition to go deeper in my tutorials, I also want to provide explanations in layman's terms so that anyone can read them and understand what is going on. That will mean that some technical folks will not appreciate my writing style or may possibly think it is condescending. It is not my intent to condescend, but rather to write to a level that I know from much experience will be beneficial to many others. Too often technical writers assume that the user is a technical person, making the explanation inaccessible to those not advanced enough for the writing.

My aims in the different tutorials that I will write and publish is to share what knowledge I have gained along my way to technical understanding, and to hopefully do so in a maximally accessible way. Feel free to comment/share/take any and all code/resources/material posted. It is free and I want it to stay that way. If you want to cite me, that is fine, but I really don't care. Just try and help a fellow newb out along the way.

How To Connect to an Amazon EC2 Server

As part of my internship I have been required to set up and maintain a variety of cloud based services. One of these services is an Amazon AWS EC2 Linux server, Ubuntu 12.04. Set up for this server was easy and instructions to set one up can be found here (they are so easy to follow that I am not going to cover it). A Quick Note: You will want to download the key pair generated for your server instance as this will be used later. This key is not stored anywhere, so be sure to download once created.

However, connecting to the server once established is slightly more difficult. The following tutorial covers how to do so on a Windows machine.

Using Putty

Almost anyone who has to connect to a Linux machine from a Windows machine will tell you that you should use Putty to get the job done. Although Amazon does have a Java Client that will run through the web browser, it is almost not even worth the hassle of ensuring that Java is going to be working with your browser. Also, putty is free and easy to use and doesn't require you to login after set up.

To begin, download Putty from this website (I would recommend downloading the putty.exe under the Intel x86 heading as this is very lightweight and easy to use). You will also want to download PuTTYgen (note that link will download it for you). Also, locate the key-pair created when your Linux machine was created. It should be something like [keyname].pem.

Do the following:
  1. Open up Puttygen.
  2. Click on the load button and navigate to the location where your keypair is stored.

  3. You will notice that you do not see your .pem file listed. That is because Puttygen is looking for .ppk files. Change the file type to All Files.

  4. Select the correct .pem file and press Open. A message box will appear telling you that you have loaded your key successfully.
  5. Next you will click on the Save Private Key button. Save it to whatever you want to call it with the extension .ppk.

Now you are ready to make a putty connection:
  1. Open up putty.exe.
  2. Click on Run when the message box appears.
  3. The putty configuration screen should appear.


  4. Click on the Connection Node and then on SSH. You should see a screen as follows.

  5. Click the Browse button and select the .ppk file created in the beginning of this tutorial.
  6. Open up the Amazon EC2 management console. Click on Instances on the side bar to the left.
  7. Click on the instance that you wish to connect to. At the bottom of the screen you will see a tab that contains information about the instance. For security purposes, I have only captured a portion of this screen to demonstrate. You will want to copy the URL found in this information.
  8. Go back to the putty configuration and click on the Session option again in the left side bar.

  9. In the Host Name (or IP address) field, paste the URL copied from step 7.

  10. Next enter a name for this session in the Saved Sessions field.

  11. Click Save.
  12. Click on the newly created saved session and then click Open.
  13. If you have not connected to the server before a security window will pop up stating something about how the server is not in the cache and if you want to add it. Click yes and continue.
  14. If you are connecting to a Ubuntu instance, then the login screen will ask for a user, and you will enter ubuntu. Look up what user you will log in as on each instances basis.

Using WinSCP

Winscp is another free program that works very similar to putty in that it uses what is called an ssh session to connect you to the remote server. The major difference between WinSCP and putty is that putty can run programs and commands on a command line whereas WinSCP is simply a file browser. If you are familiar with FTP clients like Filezilla then you understand the concept behind WinSCP.

To begin, download directly here by clicking WinSCP, or download it from this site (be sure to choose the first option of Installer). Run the set up by clicking through all the screens. It may ask to install some toolbars, check no for those (I can't remember if it does or not).

Do the following:
  1. Open WinSCP.
  2. Once opened, make sure that you are on the Session node found in the left side bar.
  3. Enter the url found from step 7 above in the Host Name field.
  4. If you look in the Private Key File field you will notice a little button with three dots at the far right, click on that to open up a file browser window. Locate the .ppk file generated at the beginning of this tutorial and click open.
  5. At the bottom of the screen click the Save button. A new window will pop up asking you to name the session. Save is as whatever you want.

  6. This will take you to the Stored Session tab. You will see all your stored sessions here. Click on the newly created session and then click login.

  7. If you have not connected to the server before a security window will pop up stating something about how the server is not in the cache and if you want to add it. Click yes and continue.
  8. You should now be able to see a view of your computer's filesystem and that of the remote servers.

  9. If you did everything right, you should be able to close and reopen WinSCP and see the session under the Stored Session node.
That is really all there is to it when connecting to the servers. Hope this helped.

Automating Internet Explorer (Setup)


The initial problem with automating Internet Explorer (IE) is that IE is a mess; and when I say a mess, I mean it is like the drunken celebrity whose career is slowly dying in the tabloid sections of the supermarket. What was initially meant to be the premier browser for the internet has ballooned to this bloated mess of a browser we know and grudgingly use today.

The history of IE is something that many bloggers have touched on, and it is almost universally accepted in the tech community as the last-resort browser, used only when some legacy system requires its use. But for the purpose of automation, it is the best option we have. There are probably multiple different ways to automate other browsers, possibly even using bash scripting or something like that; but I, like most people, do not have any knowledge of these systems. Microsoft has done a lot of work in providing a base from which we can automate IE, so why not attempt to use the framework they have built?

I will eventually add some more technical articles to this post that will help outline the structure and core of VBA. This post, however, will largely skip over the technical aspects of what is going on and simply outline how to perform the task at hand. You will need to read some posts to catch up to speed on how to set up VBA and what objects are to make sense of some of the explanations used herein, but I will attempt to keep it as low-level as possible.

Step 1 – The IE and Browser References

Visual Basic for Applications (VBA) has a reference for some useful objects and functions that will assist us in creating our IE automation. To add these references, go to Tools under the VBA development window (see here for explanation on how to get to this) and select References:

This will bring up a window with a list of References, listed in alphabetical order. You will want to select the following: 1) Microsoft Internet Controls, 2) Microsoft HTML Object Library, and 3) Microsoft Scripting Runtime .

Note that the top four references come standard with all VBA projects.

Explanation of References

You may skip this section if you do not wish for more information on what these references do. It is not essential that you know what they do, but it is useful to have background on them. Each reference is provided with a link that leads to a more technical explanation of each.

Microsoft HTML Object Library gives you access to the objects found in an HTML document. This is crucial since any navigation across a webpage is done through the nodes of the DOM. Don’t worry if you don’t understand what this means, just think of it like this:  the DOM (which stands for Document Object Model) is a building that contains information, and each node is a doorway that leads to a specific part of that information. Navigating this building (also known as traversing the DOM) is done through methods defined in this reference.

Microsoft Internet Controls is what allows you to control IE without ever having to click on the icon and clicking on the window that would appear (also known as creating an instance of IE that can be manipulated programmatically). This object is the backbone to automating IE as it allows you to interact with the web as though you were actually viewing the page inside the browser, but with the key difference in that the computer will just be simulating your actions. There is another object called WebBrowser that can be used, but my initial research seemed to indicate that this object is outdated. Therefore, we will only use the InternetExplorer object found in this reference library.

Microsoft Scripting Runtime provides access to the filesystem (or more generally the folders on your hard drive) of your computer. Though not technically necessary when automating IE, it is useful to be able to open and move files that have been downloaded from the internet.

Step 2 – IE as Class Module

To begin the setup, create a new Class Module and name it something that will help you know that it is an object used for automation (i.e. WebAgent, IE, IEObject, BrowseMeDaily… whatever you want). For this Demo we will refer to it simply as IEAgent. To do this, right-click on your project anywhere within the Project Explorer on the left-hand side of the screen and select Insert > Class Module (highlighted below).


Above - Insert Class Module



Above – Change the Class Module name from Class1 to whatever you want (in this case, IE Agent).

There are many advantages to making IE a class module versus a regular module. The most important reason for making IE a class module is that it allows you to use the same instance of IE across multiple modules (which is important as we will see in subsequent posts). Since we are choosing to make IE a Class Module, we will also be able to use some useful code bites by calling a method found in the Class module instead of having to write that code block every time we use IE. 

Since there are so many reasons for creating a Class Module over a regular module for IE I will not enumerate all of them. But remember this pattern in VBA: if you intend to use something as if it were a standalone object, then it should be a class module; if you want to do something that any object should be able to do (like open a file) then it should be in a module. This is the argument between static and non-static classes in other languages such as Java.

Step 3 – Creating the Basic Structure for Automating IE

This step is where we will actually start writing code. Attached is the module we have written to be found in plaintext form on a shared google doc (blogspot does not support file sharing, so this is a sort of hack). If you wish to skip this step and just download this agent, feel free to do so.

To begin, we must declare some simple variables that will be used in this object, such as the IE object defined in the Microsoft HTML Object Library. Other useful variables will be to define a number variable that will store the process number the IE object will run on. What this means is this: an operating system keeps track of what programs are running by placing each program (or process) in a table and assigning it a number. Whenever we will create this IEAgent, it will create a new IE browser process that the computer will then keep track of by referencing the number it was given upon creation (also known as the process handle). Don’t worry if you don’t understand what this means, just know that it will be used to track the creation of IE in a way that we will be able to interact with it in a different way that will be useful later.

Begin by typing the following:

Dim ie as internetexplorer
Dim handle as Long

You may notice that as you type internetexplorer a popup box appears that changes as you type in the name of the object. This is called intellisense, and is used to help developers determine what objects, properties, or methods are available to them in this context. Though the VBA intellisense is weaksauce compared to other development environments, it is still useful After typing in a few letters after the As you will be able to arrow down to the correct object and press TAB  or Enter.

Enter the following line of code after the variables:

Private Declare Function SetForegroundWindow Lib "user32" (ByVal hwnd As Long) As Long

This calls what is referred to as an API that will be used to open up webpages in multiple tabs. It is not necessary to this post to understand or even know what this line does, but in essence it will enable us to grab IE as if it was the window we were currently using (called Focus – it determines what window you are writing in/clicking in, etc.). Doing so makes some user functions like hotkeys programmable. So if you want to send a hotkey such as PageDown to IE, this API will be needed to do so.

Next define the following methods:

Private Sub Class_Initialize()
                Set ie = CreateObject("internetexplorer.application")
                handle= ie.hwnd
End Sub
Private Sub Class_Terminate()
                ie.Quit
                Set ie = Nothing
End Sub

Since IEAgent is a class module, we can run code upon the creation of the IEAgent Object (this is called instantiation, or setting up of object settings). This will both create and destroy the object as necessary.
With this code, you have successfully completed the setup of the IE automation object. The next few posts will be discussing how to flesh out the object to make it do what we want.