Tuesday, August 21, 2012

JavaScript is Over: We Just Don't Know What to Replace It With Yet

Although other people (namely Google) have been advocating this for a while, I've only come around to this way of thinking recently, but I think consensus is gradually building in the community that JavaScript is over. In the early days when people were doing simple little scripting with JavaScript, it was a fine language. But as people start writing larger and larger programs with the language, it's showing its weaknesses, and it's becoming clear that JavaScript simply can't handle programming in the large. In the past, there was some talk that software engineering features could be grafted on to JavaScript (the infamous ECMAScript4), but I think researchers are now beginning to realize that the JavaScript model is fundamentally too messy to make such an effort worthwhile. Just as people don't write databases using bash scripts, you just can't have a large million line program developed by hundreds of people written in JavaScript. It's a maintenance nightmare. Even the most radical of the proposed reforms for JavaScript would only improve the situation in minor ways.

The problem is that JavaScript is simply too dynamic. There are static languages, there are dynamic languages, and then there's JavaScript. Once you get to hundreds of thousands of lines of code written by a team of hundreds of people, it's no longer possible for any one person to understand the entire code base. An individual programmer will at best understand the general shape and structure of the code base and understand the details of only the tiny section of the code base that he or she is working on. As such, they are almost entirely dependent on programming tools to help them work with the code. Since they can't understand the interdependencies of the codebase, they need tools to tell them whether changes will likely result in errors or whether a refactoring is safe or whether certain assumptions about how the code is structured is actually true or not. With statically-typed languages like Java, C#, and C++, many tools exist for finding errors in code and performing refactorings and visualizations. Even with older dynamic languages like LISP or Smalltalk, these sorts of tools existed and were quite effective. JavaScript, though, is part of the modern wave of dyanamic languages that have been designed to be flexible and stuffed with features so as to accomodate a wide variety of programming styles. Unfortunately, this flexibility makes building good JavaScript tools extremely difficult. And without these tools, it's not possible to scale up a language for use with large code bases.

The heart of the problem is that JavaScript is too flexible and too dynamic. Duck-typing can be used to change any object from one type to another at runtime. There are no guarantees about which methods can be called when. There are no constraints as to what can happen within functions or not. In past dynamic languages, there have been constraints on the programming paradigms that allowed programmers and their tools to make assumptions about the behaviour of programs. For example, LISP is considered a very dynamic language, but since it's designed around the functional programming model, functions do not have any mutable state, which makes it easier for programmers and tools to understand LISP code. Similarly, Smalltalk, an early dynamic object-oriented language, did not allow methods to change during runtime, an object's data could only be accessed by the object itself, and most method calls occurred through a well-defined mechanism, making code analysis much simpler. JavaScript, by contrast, supports both functional and object-oriented programming, but has none of the accompanying constraints of these older languages. Without these constraints, it's very hard for programmers or their tools to understand how code works. Even simple queries about where a method is used or what methods are available on an object require complicated pointer analysis to solve, if they are even solvable. I think that the key indicator as to whether a programming language can scale to million line programs is whether there are any reliable refactoring tools available for that language. If no such tools exist, then the language is likely too wild for any programmer or their tools to be able to work with a million lines of such code.

In the past, it was thought that JavaScript would be salvagable by adding new features to the language such as modules, sealed objects, types, etc. I now realize that the problem is that these new features are additive. They add new capabilities to the language; they don't impose any additional constraints on how programs are structured. As such, the problem of analysing and understanding million line JavaScript programs becomes harder with these new features, not easier. Types do not help refactoring tools unless types are mandatory. Unless all objects are sealed, it doesn't help a programmer understand whether an object they are using has the correct interface or not at a certain point in the code. These new features do not add new constraints that reduce the search space that a programming tool needs to analyse to understand a piece of code.

Better programming practices might help a bit. But it's cumbersome and error-prone. For example, an extensive set of unit tests can provide similar error-checking as static type-checking. But the effort required to maintain these unit tests is high, and if any area of the code is missing unit tests, then programmers and their tools can no longer rely on unit tests to check their errors.

Flexibility and expressiveness are wonderful features in small languages, but they cause maintenance nightmares when these languages are scaled to giant million line programs. JavaScript is a very expressive and powerful language, but these same characteristics will prevent it from scaling to the sizes that programmers want. Programmers are beginning to realize this. Unless there's some great advance in programmer tooling that can handle the complexity and flexibility of JavaScript, programmers will have no choice but to start using alternate programming languages. These languages will have to compile down to JavaScript so that they can be deployed in web browsers. JavaScript then becomes only a deployment language or an intermediate representation. The real development is done with another language. It's not clear which of these alternate languages will win out, be it C#, Java, Dart, or perhaps some new language in the future, but I think this will be the future trend. It's probably best if new language features aren't added to future versions of JavaScript, but only new engine features or speed improvements. For example, if everyone is coding in alternate languages, then no one will use new syntactic sugar, but features like weak hash maps will greatly augment the types of alternate languages that can be built on top of JavaScript.

Update 2012 Oct 29

So I've been thinking over the problems with JavaScript as a programming language, and these are some things that I think will likely be changed in any JavaScript replacement.

to be removed: Objects as Associative Arrays
to be replaced by: Dictionary Objects, Weak Hash Maps
Associative arrays are extremely useful when programming. But having every object be an associative array is just too much. If you need an associative array, you should create an AssociativeArray object. There is no need to burden every single object with associative array capabilities. Associative arrays are widely used to store supplementary information in the DOM though, so some sort of weak map functionality will also need to be provided to replace this.

to be removed: Prototype Objects
to be replaced by: Class-based Objects
unsure: Methods are Just Functions Inside Objects
If objects are no longer associative arrays, it weakens the need for prototype based inheritance. Class based object systems are much more well-studied and understood than prototype based systems, so moving to such a system should help the building of new tooling significantly. The fact that JavaScript methods are just data members that point to function objects may or may not be a barrier preventing a move to a class-based system though.  It might be necessary to have separate notions for function and method.

to be removed: Object Literal Syntax
to be replaced by: Anonymous Classes
JavaScript's object literal syntax is a great, easy way to make small, ad hoc, short-lived objects. But once you move to a class-based system, the current behavior of object literals won't work. They'll probably have to be adapted to some sort of anonymous class system.

unsure: Automatic Casts, Automatic Semi-Colon Insertion, Val Scoping, Property Accessors, Variable Function Parameters
JavaScript has a bunch of other features that have long been troublesome such as the ones above. But I'm not sure if these features actually block the deep analysis of JavaScript code, so it might be possible to keep them if people actually want them.

to be added: Modules, More Declarative Syntax
JavaScript currently doesn't have a good mechanism for breaking code into independent chunks. A good module system should fix that. Continuing with that direction, it would also be useful to provide a more declarative syntax for the language because it would allow code and data structures to exist and be analysed independently of other neighboring code.

Thursday, August 16, 2012

Population Density Map for Toronto by Neighbourhood, 2011

Although the City of Toronto publishes all sorts of weird maps of Toronto, they haven't published a population density map using the most recent 2011 census data yet. They have published all sorts of related maps like population growth rate maps, senior maps, and children maps, but no matter how deeply I dig through their demographics website, I can't find a population density map using the 2011 data. But the City did publish the population of each of the 140 neighbourhoods in Toronto, so I thought I'd just generate my own population density map. I also grabbed Toronto's Open Data map information about the shape of each neighbourhood. So, all I needed to do was to parse the neighbourhood data and output it as svg. The shape data used a universal transverse mercator coordinate system, which means I could just calculate the areas of the shape polygons directly without really worrying about the curvature of the Earth and still get reasonable numbers. And from there, it was easy enough to calculate the population density.

Number of people per square kilometre (Contains public sector Datasets made available under the City of Toronto's Open Data Licence v2.0. Users are forbidden to copy this material and/or redisseminate the data, in an original or modified form, for commercial purposes, without the express permission of Statistics Canada. Information on the availability of the wide range of data from Statistics Canada can be obtained from Statistics Canada's Regional Offices, its World Wide Web site at: www.statcan.ca and its toll-free access number 1-800-263-1136.)
I sketched in the approximate locations of the subway (plus future extensions) so as to give the map some identifiable landmarks. The colour range is linear except for St. James Town, which had such a high population density of over 40k/km^2 that it skewed the colour range too much, so I had to treat it specially.

From looking at the City of Toronto's other maps, it looks like they have population data at a much more fine-grained level available, but I was restricted to individual neighbourhoods, which can be quite big at times and which sometimes encompasses mixes of large apartment blocks as well as low-density housing. Also, the data only shows residential density. The City has employment survey data which shows all the employment centres in the Toronto, but they don't have it in a form that I could process and overlay on top of this map.

The map is sort of limited, so it's not too interesting. I'm not sure if any interesting conclusions can be drawn from it. I guess I was previously curious as to why so much attention was being dedicated to improving transit in Scarborough in the east but not to Etobicoke in the west. But from the map, I can now see that the east end is much bigger with a higher population density, so it probably makes sense to focus efforts there. From the map, the case for a downtown relief line that extends north of Bloor in the east doesn't seem to strong since the density doesn't seem that great as compared to the west side, but the Don Valley could be screwing up the density numbers, and the density numbers don't include information about workplaces.