Ming's Coding Blog: 2018

Monday, June 18, 2018

Ranking of Racism against Asian Americans at Ivy League Schools

It has long been assumed that the Ivy League schools are racist against Asians, but it's been hard to understand the extent of the racism. There are some universities that try to run purely meritocratic admissions systems involving fewer subjective evaluations. For example, Caltech has an Asian enrollment of 43%, but it's not clear whether that's comparable to other schools because of the heavy engineering focus of the school. Berkeley is a more well-rounded university and it has an Asian enrollment of 41%, but it's a public university in a state with a high Asian population, so it's not clear if it's comparable to universities in other parts of the country.

Fortunately, Harvard ran the numbers back in the 2013, and they found that if they ranked students solely by academic qualifications, then Asians would make up around 40% of the admissions. Even if Harvard continued to set aside spots for athletes and undeserving rich people, then Asians would make up 31%. If extra-curriculars and other subjective measures were included as well, then Asians should still make up around 26% of the student population (even though in 2013, only 19% of the admitted class was Asian). I believe that the Asian population has only grown since 2013.

These numbers agree with the Berkeley and Caltech numbers, so I feel it's safe to use these numbers to do back-of-the-envelope calculations for how racist against Asians each university is. The Harvard numbers should be comparable with other prestigious, well-rounded, private universities that attract students from across the country. So it should be safe to compare the numbers to other Ivy League universities.

So I visited the websites of the Ivy League universities, and grabbed their reported diversity statistics on Asian admissions. The numbers are hard to compare because different universities categorized their students differently. If universities had a separate category for unknown and/or foreign students, then I left them out of the total. If there was a category for multi-racial, I did not include that number in the count of percentage Asians. As a result, the numbers are very noisy, but I think they still give a basis for comparing universities. I think that universities with Asian enrollment in the high 20s or low 30s are demonstrating low amounts of racism against Asians.

So here are the results:

1. Brown (18.16%) - most racist
2. Yale (21%)
3. Dartmouth (21.74%)
4. Harvard (22.2%) - probably worse than Dartmouth, but the numbers are hard to compare
5. Cornell (23.26%)
6. UPenn (23.56%)
7. Princeton (25.29%)
8. Columbia (29%) - least racist

And here's Stanford's numbers even though they aren't an Ivy League university:

Stanford (26.44% Asian)

Initially, the numbers seem to suggest that Brown is the most racist against Asians of the universities. They also have the lowest enrollment of African Americans of the other Ivy League universities, so they just fail on diversity in general. They do have a large number of students classified as multi-racial, which makes things unclear, but things still look bad after removing them from the totals. It's possible that Yale or Harvard are, in fact, the worst universities because they don't have a "multi-racial" category and their percentage of enrollment being Asian is pretty low.

There's a bunch of universities in the middle that seem to be sort of racist. Princeton seems to be the best of the middle.

The least racist, by far, seems to be Columbia, which achieves a high-20s in Asian enrollment, which is the safe zone. They also have strong African enrollment as well. It demonstrates that it is possible to have diverse minority enrollment without unduly punishing Asians.

So what's going on? These universities (other than Columbia) are using the implicit bias effect to willfully keep down Asian enrollment. These universities intentionally add subjective measures into student evaluations that are known to be subject to bias. They then hire admissions officers of dubious qualifications who don't understand the Asian experience or don't like Asians in general, who implicitly prefer applicants who are more like themselves, and who are told to find applicants who match an Ivy League "culture" or "character profile" that run contrary to Asian stereotypes and biases. As a result, they end up giving lower subjective scores to Asians. Is the applicant who plays badminton more or less "brave" than the applicant who plays football? Is the atheist applicant more or less kind than the church-going applicant? There is no way of knowing those things, but people will inevitably form an opinion based on their implicit biases. For example, Harvard gave lower "personality" ratings to Asian applicants in general. I can imagine that, yes, some universities might prefer people with certain personalities over others. But that mix of personalities should be evenly distributed among all people. If one race consistently score poorly on the personality rating versus all other races, then there's some implicit racism going on there that needs to be fixed. Strangely, no personality deficiencies were found during alumni interviews. The bias only appeared during the ranking of personal qualities by the Admissions Office.

Here are some more in-depth articles if you want a deeper dive in the issues involved: 1, 2, 3.

Tuesday, June 12, 2018

Nan Native Module Asynchronous Callbacks in Electron with GWT

This problem has been causing me frustration for weeks, and I think I've finally figured out what was wrong.

I have a GWT application that I'm running as a desktop application using Electron. To access some Windows services, I wrote a native module in C++ that my JavaScript code can call into and call some Windows functions. Some of the newest Windows APIs are asynchronous and long-running, so I made use of Nan's AsyncWorker framework for running C++ code in another thread and then calling a callback function in JavaScript with the result afterwards.

But the code would always crash. If I executed the commands from the Electron/Chrome debugger console, it would run fine. But if I ran the same instructions in my compiled GWT application, the application would crash when the callback function is invoked from C++.

I spent weeks looking over the code and trying different variations, tearing my hair out, and I could never figure it out. Native modules (much like everything else in node.js and electron) are underdocumented, but my code looked the same as the examples, and I couldn't find any reports of other people having problems. Maybe I was compiling things incorrectly? Was my build set-up wrong? Maybe mixing in winrt and managed code was causing problems? But I think I've finally figured things out.

The problem is that the GWT code runs in an iframe, so the callback functions are defined in the iframe, and somehow, this leads to a crash when the C++ code tries to call these callback functions. To solve this problem, I've created a separate JavaScript shim that creates the callback functions in the context of the main web page. My GWT code can call into the main web page to create the callback functions and to pass them to the native module. Then the native module can safely call back into JavaScript from the AsyncWorker without any crashes.

Side Note: When running Electron in Windows, it seems that the Windows message queue is managed from the main process. So if you have a Windows API that needs to be called from the UI thread, it should probably be called from the main process not the renderer process.

Tuesday, May 01, 2018

ES6 Modules: Limp and Overcooked

I've been eagerly awaiting a module system for JavaScript for many years. Although plans for a standardized module system have been floating around even for ECMAScript 4, it's only become standardized and available during the last year or so. Usually, modules are a pretty intuitive language concept. You briefly look at a couple of examples, and then you dive and start using it, and everything just works. For some reason though, when I tried using ES6 modules in a project, my mind absolutely refused to accept ES6 modules. I literally spent hours staring at these lines of code, and my brain couldn't do it:

import foo from './library.js';
import {foo} from './library.js';

Both lines of code are valid ES6 Modules code. Only one line is correct though, and it depends on how you've set-up your modules. The difference is so confusing and the error messages are so cryptic that I just couldn't get my feeble brain to understand it.

Apparently, the JavaScript module system spent so long in the standardization oven that it has become overcooked and ruined. It's limp and dry and completely unappetizing. ES6 Modules are actually two completely different module systems that have been thrown together into JavaScript with no attempt to unify them at all. What's worse is that the two module systems use very similar syntax, and it's very easy to get things mixed up. Some misplaced squigglies results in you using the wrong module system that's incompatible with the library you want because the library was built with the other module system. What's doubly-worse is that one of the module systems is already deprecated, and the preferred module system has the more complicated syntax. If one system is preferred, then why does the other one exist? If the two module systems are different, why couldn't they have two completely syntaxes for them?

What's weird is that they could have unified it. There would have been a lot of weird corner cases, but they could have made a consistent syntax. When I see the two lines above, I think of destructuring assignment.

pair = getFullName();
[firstName, lastName] = getFullName();

point = getPoint();
{x, y} = getPoint();

It's a bit unusual, but it's consistent. My mind could accept that.

import foo from './library.js';

could be for importing everything in the library into an object named foo

import {foo} from './library.js';

could be for importing foo from a library.

But, no, that's not how it works. Instead, the first line is for doing imports from modules built using the default module method, and the second line is for doing imports from modules built using the namespace module method. Oh, you can also build your libraries so that they are compatible with both types of modules, but since the two module systems are completely distinct, you can design your libraries to export completely different things depending on whether they are imported using the default module method or the namespace module method.

After several hours of my mind rejecting the ES6 approach to modules, I think I've finally gotten it accepted. I explained to my mind that the ES6 module system is complete garbage, but that's all that there is to eat, and it better not barf it all out like last time. It's not happy about it though.

Saturday, March 10, 2018

Copying Hard Disks with Bootable Windows GPT Partitions Using Linux

This is just a placeholder blog post. I keep intending to do a proper blog post on this topic, but I never get around to it. Unfortunately, I always forget the steps I need to do when I'm in the middle of copying my hard disks and can't consult my notes, so I'm going to just put some placeholder notes here and flesh them out properly later.

Note: Copying hard disks is tricky. I disclaim all responsibility if you use these steps and you lose data or your BIOS becomes corrupted or whatever. This blog post mainly serves as notes to myself.

About GPT Partitions
I still don't fully understand how a modern UEFI system boots from GPT partitions. I think with GPT and UEFI, your hard disk contains multiple partitions. One of them is a FAT partition that's special because it contains some boot loader programs that contain the instructions needed to load an OS from one of the other partitions. That partition is called the EFI System Partition. The UEFI BIOS of a computer will start Windows in two ways

Usually, the BIOS stores the specific UUID label of the boot EFI System Partition and the name of the boot loader program on that partition to load. The BIOS can then quickly load the boot loader and then continue on to load the OS.
When you first install Windows, the BIOS doesn't have that information yet, so the BIOS is able to find the EFI System Partition on the hard disk itself, find the default Windows boot loader on that partition, and then run that to start Windows.

Why It's Tricky Copying Windows

Copying Windows GPT partitions is hard because

Windows makes it hard to copy Windows partitions
the bootloader program on the FAT partition has to be changed to load things from the new partition
the BIOS has to be changed to know about the new bootloader

Steps for Doing the Copy

By default, Windows is configured with some settings that lets it keep the file system in an inconsistent state on shutdown (in order to have faster shutdowns and bootups), which is a bad time to copy it. I tried various methods for disabling that, but the only reliable approach seems to be to turn off hibernation entirely. You need to start a command prompt in Administrator mode. Then run

powercfg -h off

If you're copying to a smaller hard disk, you might want to use Windows Disk Management (right-click "This PC", choose "Manage", then choose "Drive Management" under the "Storage" category on the left) to shrink your partitions in advance, but that usually never works, and Linux can shrink your partitions anyway.

Also make sure that you have a bootable version of a Windows rescue CD or the Windows installation media.

Now, you can start-up Linux to start copying your hard disk. I always use the GParted LiveCD to do this. GParted can be slow to start-up because it scans through all your hard disks very slowly to find all the partitions. I think it might also do some really slow thing with Windows partitions as well. That scanning step gets slower the bigger your hard disk is too. But I've found it to be pretty reliable. Use this to copy your partitions to the new hard disk.

msftres Partition

You may find that your hard disk has a msftres (Microsoft reserved) partition. GParted is unable to copy this partition. It is not necessary to copy the partition. It is just a partition that Microsoft reserved so that if you ever install Windows Bitlocker disk encryption, Microsoft can store the decryption code there. If you do want to keep this msftres partition, you can manually create one. First, copy all the partitions before the msftres partition. Then boot into the Windows installation CD (you did remember to make one, right?). Go into the Advanced Repair settings and get to the command prompt. Then use the "diskpart" program to create an msftres partition. I forget the exact steps. You can type stuff like "help," "help select disk," "help list partition," "help create partition msr" or something to get the exact commands you need. But you need to select the new hard disk, then use something like "create partition msr size=128" to create a 128MB msftres partition. Then, you can reboot into GParted to copy the rest of the partitions.

Update 2020-9-22:

> list disk

> select disk n

> list partition

> create partition msr size=128

GParted Fix-ups

GParted will make the copied partitions have the exact same IDs as the old partitions. This can be a problem if you keep both the new hard disk and the old hard disk in the same computer (e.g. when moving from a hard disk to an SSD, you might want to keep the old hard disk around in the same computer as a backup). It's not clear which boot partition the BIOS will use when starting up. And when Windows is loaded, the wrong one can potentially start up. And even if the right one starts up, the wrong partition might end up being mapped to the C drive. To be safe, if you intend on keeping both hard disks in your system, you should go in and change the UUIDs of all the partitions on either the new drive or the old drive. I'm not 100% sure about Windows recovery partitions. To have the recovery work properly on the new hard disk, it might be better to have it keep the same UUID, so then generating new UUIDs for the old hard disk may be better. But I could never get that recovery partition to work properly anyway, and I would rather have at least one proper working copy of my hard disk in case the copy goes bad, so maybe it's better to generate new UUIDs for the new hard disk.

GParted also often doesn't copy the partition labels and flags correctly. You can go in and set those manually so that they're the same on both disks. I've forgotten to do this before, and everything still seemed to work, so this might not be necessary.

Update UEFI BIOS with Location of Bootloader

Now that Windows is copied, you need to update the BIOS with which bootloader to use on start-up. If you didn't change the UUIDs of the newly copied partitions, then this might not be necessary since the existing BIOS entry for the location of the old bootloader should still work. You might need to swap hard drive cables to ensure that the new hard drive takes over the old drive number from the old hard drive. Or maybe not?

If you did change the UUID of the boot partition, then this is definitely necessary. Open a Linux terminal. Become root by using "sudo bash". Then use the "efibootmgr" program to create the necessary entries.

Update 2020-9-22:

> efibootmgr or efibootmgr -v

to list boot entries

> efibootmgr -o i,j,k

to change boot order e.g. efibootmgr -o 2,1,3

(check man pages for other commands)

I can never figure out how to create new BIOS EFI entries using efibootmgr, so I sometimes try to use some other approach to create these new entries. Sometimes, if you start up your system with only your new hard disk, the BIOS won't find the default bootloader, so it will default to searching for the Windows bootloader itself, and it will then add an entry for it in the BIOS itself. Or you can try starting a Windows Recovery CD or installation DVD, going to the command-line repair tools, and trying to use "bcdedit", "bcdboot", or "bootrec /rebuildbcd" to do this. I'm not sure what these programs do, but I think one of them will create a new EFI BIOS entry for the bootloader partition.

Once you have an EFI entry in the BIOS for the bootloader, you can go back to using "efibootmgr" to rearrange the order of your boot entries so that it comes first, which is a lot easier than creating a new entry from scratch.

Update Bootloader with New Location of Windows Partition

Now the BIOS can find the bootloader to start loading Windows, but the bootloader may not point to the correct partition to actually start the OS. Again, you can try starting a Windows Recovery CD or installation DVD, going to the command-line repair tools, and try to use "bcdedit", "bcdboot", or "bootrec /rebuildbcd" to do this. Again, I'm not sure what these programs do. I don't actually know what the BCD is, and the Microsoft documentation is very vague on that fact. I think it refers mostly to the configuration files for the bootloader on the EFI FAT system partition, but I don't know. Sometimes, everything has gone bad, and you need to use "bcdedit" to start a completely new BCD store. In any case, after randomly running some combination of those programs, Windows will somehow fix itself and become bootable.

Checking Windows

When you do manage to successfully boot into Windows again, going into Drive Management to make sure that the correct drive is listed as your boot drive and that it is the C drive. You might also need to manually remove drive letters from some of your recovery drives and other drives. Don't forget to reenable hibernation by going to the command prompt as an administrator and using

powercfg -h on

Fixing Up Your Linux Bootloader

I normally use Windows, but I keep a copy of Linux on my drives for occasional use. Normally, I just let Linux install a boot loader into the EFI system partition and add an entry to the BIOS. I change the boot order to normally boot to Windows, but I use the "boot from alternate drive" key on startup to show all the EFI boot entries, and then I manually choose the Linux one.

After copying a Linux partition to a new hard disk, you have to reinstall the grub bootloader on the EFI system partition to point to the new Linux partition. Reinstalling grub is mostly impossible, so I find it easier to simply reinstall Linux over the old version (I keep my Linux data on a separate /home partition, so that it's safe to do that without losing data).

Sunday, March 04, 2018

Creating a SDF Texture for a Font at Runtime

I was recently trying to implement text for the graphics engine behind Omber. Unfortunately, although I had a complete vector graphics engine, I hadn't gotten around to implementing support for shapes with holes in them, so I couldn't just take the vector representation of each font glyph and render them directly. Instead, I ended up using the standard approach used in many 3d game engines, which is to use Signed Distance Field fonts.

Most people pre-generate their SDF textures, but that didn't really seem feasible to me. I wanted my code to let people use their own fonts in their drawings, so I couldn't precalculate SDF textures for those fonts. Also, international fonts might contain thousands of characters, and it would be too memory intensive to calculate textures for all of them in advance. So I went about trying to figure out how to generate my own SDF textures, and I learned some good lessons about how to do it.

At first, I didn't really understand how SDF worked, so I tried using the dumb approach of drawing a character on a bitmap, and then manually trying to calculate the SDF values. This actually takes a bit of time to code up, it's really slow to run, and the results are sort of poor. I didn't really understand that the shader really only cares about SDF values for the one or two pixels near the edge of a glyph, and the SDF values need to have subpixel accuracy. True, you might see some demos of people adjusting SDF cut-off values to make variations of a font with different font weights. But for normal situations, you only care about SDF values within a pixel or two of the edges of a glyph because when those values are linearly interpolated, you get a good approximation of the angle of the edge in that area. And you really do need subpixel accuracy or your linear interpolation will simply give you back your chunky pixels. To get that sub-pixel accuracy using a raster approach, you would have to draw your glyphs at a really big size and then scale the bitmaps down, but that's even slower and you lose a lot of accuracy.

Instead, it turned out to be both faster and more accurate to generate the SDF directly from the vector representation. I already had a vector graphics engine, which made it easier, but you actually don't need much vector logic. Basically, you only need a few things. You need a way to extract the bezier curves of each glyph. I was working in JavaScript, so typr.js and opentype.js were available libraries for that. Then you need a Bezier subdivider to convert all the bezier curves to lines. Then you take that bag of lines and throw them into a Point-in-Polygon routine (that calculates whether you cross an even or odd number of lines to see if you're inside a polygon or not) to get the sign and a Distance-to-Line routine to get the distance, and you're done. Since you're working with floating-point values, you get very high precision with sub-pixel accuracy. And since you don't have to scan through lots of pixels to calculate distances, it turns out it's really fast. That actually makes sense because even old computers could render vector fonts at a reasonable speed, so it should be possible to calculate a low-resolution SDF quite quickly too.

So, yeah. Calculate your SDF textures at runtime straight from the vector representation because it's not much code and it's faster.