Monday, December 09, 2024

How Long Do You Have to Wait for a Bus?

If you go down to the bus stop to catch the next bus, how long do you have to wait for the next bus? If buses come every 5 minutes, then you have to wait on average 2.5 minutes for the bus to arrive. But what happens if the buses come unevenly? What happens if the buses get bunched up, so that three buses come within 1 minute between each bus and then it takes 13 minutes for the next bus to arrive? If you just take a simple average of the time between buses:

1+1+133

You will calculate that there's, on average, a 5 minute gap between buses, which again suggests that you have to wait, on average, 2.5 minutes for the bus to arrive. But that doesn't seem right. And it's not.

Let's look more deeply into this problem and see if we can calculate the wait time for the bus more correctly.

Total Wait Time

To build some intuition about the problem, let's consider a bus stop where people arrive every 2 minutes. Buses come to the bus stop at uneven times though.

Depending on when you arrive, you may have to wait longer for the bus. If there's a larger gap between buses, there are more people who have to wait too.

If we want to calculate the total amount of time that people have to wait for the bus, we just have to add up the times from when each person arrives to the time when the bus comes. To make this more clear, we'll draw a little stick figure for a person having to wait one minute for the bus.

In the graph, we can see that over time, the number of people waiting stacks up until a bus comes. It forms a bit of a staircase or triangle. If we add up the number of stick figures in the graph, we can see that there are 14 stick figures, so people waited for a total of 14 minutes for the bus. Over the time period, there were 6 people who came to the bus stop, so the average amount of time each person had to wait was

14 minutes waited6 people=2.3 minutes / person

Now that we have some intuition about calculating bus wait times, let's try generalizing this approach.

Generalization

Every bus stop has people arriving at different times, so to generalize over them, let's assume that people arrive at the bus stop at a constant rate a. As people arrive, they wait for the bus until it arrives. When you graph this, you get a lot of right angle triangles. The triangles have a slope a that matches the arrival rate of passengers.

The total time that people spend waiting for buses is the area under the triangles. We can calculate the dimensions of the triangles using a.

Knowing the dimensions of the triangles, we can then calculate the area of the triangles. 

total time waited=all triangles 12·base·height =12·5·5a+12·4·4a+12·2·2a =45a2

Or if we want to generalize this further for a bunch of buses where we know the time duration between each bus:

all buses12·duration2·a

Knowing the total wait time that all people wait for their bus, we can just divide the total wait time by the number of people to get the average wait time per person.

number of people=time range·arrival rate

average wait time=total wait timenumber of people=45a2(5+4+2)a=4522

Or if we generalize things to a bunch of buses where we know the time duration between each bus:

number of people=time range·arrival rate=all buses duration·a

average wait time=total wait timenumber of people=all buses 12duration2·aall buses duration·a=duration22·duration

Notice that the average wait time does not involve the arrival rate of new people at the bus stop a.

So now that we have a way of calculating average wait times for a bus, let's look again at the problem posed in the introduction. How long do you have to wait for a bus if the buses are bunched up and the spacing between three buses is 13 minutes, 1 minute, and 1 minute?

average wait time=all buses duration22·all buses duration =132+12+122·(13+1+1) =5.7 minutes

That's much longer than the 2.5 minutes that we calculated using an incorrect simple approach. If buses are severely bunched up, you will have to wait more than 2x longer for a bus than if the buses were evenly spaced.

Alternate Formulation

To check if our formulation is correct, we can try calculating the average wait time for buses a different way and see if we end up with the same formula. 

Suppose that during a certain period of time, buses can come at different times. You can arrive at any point during that time period.

If we want to create a graph of how long you have to wait at the bus stop, it's pretty easy. The wait time is equal to the time until the next bus arrives. So if you happen to arrive 4 minutes before the next bus, then you'll have to wait 4 minutes. If you arrive 2 minutes before the next bus, you'll have to wait 2 minutes.

You'll notice that the graph of wait times creates a similar set of triangles to the graphs we calculated in our previous formulation.

Now that we have a graph of how long we would need to wait depending on when we arrive, we can calculate the average wait time. We can do that using some calculus.

average wait time=1b-aabf(t)dt  where range [a,b] is the time period you are calculating an average over  =1durationall buses 12duration2

Integrating over these triangles is the same as taking the area of the triangles. So according to this formulation, the average wait time for a bus results in the same expression as what we determined from the other formulation.

Conclusion

As we can see, calculating how long you have to wait for a bus does take some care, but the resulting math isn't too burdensome.

Tuesday, March 26, 2024

I was DDOSed and I didn't notice

For the past 15-20 years, I've been running a little website called Programming Basics that teaches programming to kids. It's an ancient, very amateur website that clearly shows its 20 year age too. It has amateur programmer art. It has JavaScript popovers and other archaic HTML stuff. It has PDFs of handouts that teachers can print out. It's main distinguishing characteristic is that it still works after nearly 20 years, and I haven't taken it down for some reason. It's very obscure, and it receives a very small amount of traffic.

And I think someone tried to DDOS it, but I'm not sure.

I host that website on Amazon Web Services (AWS), and when I looked at the bill a few weeks ago, I noticed that the charges seemed elevated. The per month charges have been pretty much the same for a decade, so I thought I must have accidentally left a cloud computer running or something. But after digging through my billing reports, the charges seemed to be caused by unusual web traffic to my Programming Basics website.

Digging into the general usage statistics provided by AWS, it seems I received 126,755,687 requests for the page "/" on February 28. The day before, I only received 209 requests for that page. The "/" page automatically redirects to the "/en/" page. So typically, I should receive a similar number of requests to the "/en/" page as to the "/" page or even a little more since bookmarks and search engines will usually go directly to the "/en/" page. Instead, I received 785,650 requests for the "/en/" page. That's incredibly high, but it's strange that while most of the clients didn't bother following the redirect, some did. The ones that didn't follow the redirect were obviously basic traffic generating bots that simply generate requests, but  why were some bots coded up differently to follow redirects? Accesses to other webpages on the website and accesses on other days seemed fairly reasonable. I wonder why that attack was only for the root page of the website, especially considering that the page was essentially blank? Wouldn't it have been better to access a page with a larger file size? Or a spread of different pages or even non-existent pages? I suppose it doesn't matter since the whole website is a static website anyway.

Digging into hourly usage statistics, it seems that almost all the requests happened in a one hour period:

28.05 GB was used right at 12:00 UTC time. The timing is a little odd. I suppose the attack was scheduled in advance to occur right at 12. But why was it so short? Did AWS recognize an attack was occurring and block it? Or did the attackers only purchase a small DDOS attack, so it couldn't be sustained? Or did the attackers realize that they attacked the wrong target or that it was pointless trying to attack a website hosted by Amazon and call it off?

I'm too lazy to download the gigabytes of logs and do a proper analysis, but when I took a look at one or two log files, it seems like the attack happened around 12:50 UTC and lasted only 3-4 minutes, so maybe it could have been manually triggered after all. But if it was manually triggered, maybe the attacker would manually visit the website to verify if the attack was working or not. If so, I could search through the logs, and maybe I could pick out the request that comes directly from the attacker's computer. Of course, maybe the botnet automatically monitors its own effectiveness. That might explain why some of the requests followed the redirect while others did not. The requests that followed the redirect were actually trying to verify the effectiveness of the attack by sending a normal request to the website and measure its response time.

AWS provides summary statistics of the country where requests came from. The accesses seem to be spread pretty widely geographically, so it really was a distributed botnet. Here's a breakdown of the top few countries where the accesses came from:

  1. United States 27,235,432
  2. Bulgaria 15,866,047
  3. Turkey 6,927,133
  4. France 6,166,693
  5. Germany 6,141,788
  6. Indonesia 6,115,375
  7. Netherlands 5,347,080
  8. Canada 5,156,815
  9. Australia 4,458,130
  10. China 3,511,069
  11. India 3,093,340
  12. Japan 3,039,478
  13. Vietnam 2,406,960
  14. Brazil 2,321,232
  15. Russian Federation 1,859,764
  16. Iran, Islamic Republic of 1,631,397
  17. Colombia 1,510,684
  18. United Kingdom 1,485,535
  19. Korea, Republic of 1,323,381
  20. Bangladesh 1,219,066
  21. Spain 1,170,194
  22. Thailand 1,081,853
  23. Finland 1,052,344
  24. Ecuador 921,248
  25. Poland 867,562
  26. Argentina 843,783
  27. Ukraine 842,882
  28. Mexico 838,305
  29. Hungary 719,549
  30. South Africa 669,794
  31. Philippines 610,215
  32. Kazakhstan 599,328
  33. Italy 540,665
  34. Luxembourg 532,379
  35. Chile 512,108
  36. Libya 486,914
  37. Venezuela, Bolivarian Republic of 471,266
  38. Singapore 455,583
  39. Ireland 435,134
  40. Peru 396,041
  41. Latvia 357,309
  42. Dominican Republic 353,845
  43. Sri Lanka 314,126
  44. Norway 298,881
  45. Albania 268,222
  46. Myanmar 235,063
  47. ...

I just looked at a few log entries of requests, it seems like some clients only made a few requests while other clients would submit hundreds of requests, all within the span of a few seconds. Just randomly grabbing a few IPs from the logs and doing IP lookups, it looks like the requests seemed to be coming from compromised servers in various data centers. Just grabbing some random IPs, I can see Hurricane Electric, Heymman Servers, BelCloud, RK Telecom, Maxnet Telecom, Min Proxy Company--just a lot of servers from all over. I wonder if these are compromised servers or cloud servers bought using stolen credit cards. Is it possible to report these servers to the service providers as being compromised so that they can be taken down and fixed?

So overall, it does seem like I was the victim of a DDOS, but since everything is cloud-hosted, I didn't really notice at all. To be honest, the site is so off my radar, I don't think I would have noticed even if the DDOS really had taken down the website and made it inaccessible. Honestly, I can't fathom why someone would want to run a distributed denial of service against the website. It's a really insignificant site with little traffic and no commercial value, so there can't be any commercial reason to try to knock it off the Internet. I don't think any Internet scammers sent me any threats or extortion messages asking for money to avoid a DDOS. I even looked into my spam folders and didn't see anything there. Maybe I pissed someone off on the Internet and they decided to attack back by hiring a DDOS service, but I don't think I annoyed anyone on February 28th. Perhaps I annoyed someone before then, and they only scheduled the attack for later, but that seems to defeat the point of a DDOS if it just seems random and doesn't cause me to fear that hackers are out to get me. 

So a DDOS attack was made against my website. But it was over in three minutes. I don't know why it ended so quickly. I didn't even know about it until a week later. I don't know why I was attacked. It's all just very confusing and mysterious.