Category Archives: Research

Procrastination post: Erdos Number

In waiting for my combinatorial explosion of a factorial design experiment to complete, I decided to find out what my Erdos Number looks like, which is basically defined as the collaborative distance between yourself and the famous mathematician, Paul Erdos. Turns out, I have an Erdos number of 4 via the following chain of co-authorship.

/me — Anja Feldmann — Edward G. Coffman, Jr — Joel H. Spencer — Paul Erdős.

You can calculate your Erdos number here.


Questions to ask before conducting a performance evaluation

I’ve been evaluating some popular cloud systems lately and unsurprisingly enough, I’m finding it really hard to make a final call on a set of results. Have I tuned the systems I’m comparing appropriately for the workloads I’m subjecting them to? Are the workloads reasonable? Is the system itself the bottleneck or is it a set of external factors? There were so many questions piling up that it’s made me rather skeptical about system evaluations I find in papers today.

So I’ve picked up this book, and found this minimal checklist to go through to avoid common mistakes when conducting a performance evaluation.

  1. Is the system correctly defined and the goals clearly stated? 
  2. Are the goals stated in an unbiased manner? 
  3. Have all the steps of the analysis followed systematically? 
  4. Is the problem clearly understood before analyzing it? 
  5. Are the performance metrics relevant for this problem? 
  6. Is the workload correct for this problem? 
  7. Is the evaluation technique appropriate? 
  8. Is the list of parameters that affect performance complete? 
  9. Have all parameters that affect performance been chosen as factors to be varied? 
  10. Is the experimental design efficient in terms of time and results? 
  11. Is the level of detail proper? 
  12. Is the measured data presented with analysis and interpretation? 
  13. Is the analysis statistically correct? 
  14. Has the sensitivity analysis been done? 
  15. Would errors in the input cause an insignificant change in the results? 
  16. Have the outliers in the input or output been treated properly? 
  17. Have the future changes in the system and workload been modeled? 
  18. Has the variance of input been taken into account? 
  19. Has the variance of the results been analyzed? 
  20. Is the analysis easy to explain? 
  21. Is the presentation style suitable for its audience? 
  22. Have the results been presented graphically as much as possible? 
  23. Are the assumptions and limitations of the analysis clearly documented?

Pareto Optimality and Computer Science

I’ve been looking at resource allocation problems within data-centers lately. I think it’s unlikely that someone working in this area does not come across the concept of Pareto optimality, so I thought I’d finally clear the cobwebs off this blog by writing about this.

Pareto optimality is an economics concept developed by Vilfredo Pareto that has important applications in computer science.

“You’re welcome, computer nerds” – Vilfredo Pareto

Simply said, in a Pareto optimal allocation of resources between multiple parties, no single party can be made better off without making at least another party worse off.

Here’s an example: if we have a pie and we divide it equally between Alice and Bob, it’s impossible to improve the share for either one of them without negatively impacting the other. So if we divide the pie into two equal halves and give a half each to Alice and Bob, this is a Pareto optimal outcome.

So is Pareto optimality always desirable? Heck no.

This is because Pareto optimality doesn’t tell you anything about fairness, which is also a very desirable property when it comes to resource allocation. In our Alice and Bob example, consider an allocation where the entire pie is given to Alice and Bob doesn’t get any. Is it fair to Bob? Of course it isn’t. Is it Pareto optimal? Yes, because you cannot increase Bob’s share of the pie without taking an equivalent share away from Alice.

So what exactly is optimal about a Pareto efficient allocation? To put it naively, it is an allocation where utilisation of available resources is the focus. This is quite natural considering the fact that the concept was designed within economic theory because in an economic system, it is not desirable that supply is under-utilised when there are still demands to be met.

In most practical situations involving computer systems, resources are allocated incrementally between different individuals (as opposed to a one time allocation). As an example, in an IaaS setup like Amazon EC2, the system needs to provision resources on a continuous basis by keeping up with incoming demands by different tenants. In such a scenario, you’re matching a pool of resources (like compute cores, disk, etc.) to an incoming demand vector from a tenant on the fly. Now if this tenant’s demand is met such that all other tenants’ allocations are at least as good as they were beforethen such an allocation is considered a Pareto improvement.

Like the notion of Pareto optimality, Pareto improvements are also slightly weak. For instance, making a few allocations that aren’t Pareto improvements as of now may allow the system to reach a much more desirable situation later.

Going back to our pie example, assume we’re in a scenario where Alice has been allocated the entire pie. Assuming the pie is big enough, it’s unlikely that Alice would be able to eat the entire pie in one sitting without falling sick. Now if we take all that extra pie from Alice and give it to Bob, it isn’t a Pareto improvement because Bob is now better off at the immediate cost of Alice, but Alice is eventually better off by not falling sick. This is now a much more desirable outcome, and is also Pareto optimal. Similarly, your operating system’s scheduler may have to slow down your massive compilation process if you want to be able to watch a youtube video at the same time.

The set of Pareto optimal choices in a system is called the Pareto frontier. Given a finite set of data-points, one can compute the Pareto frontier through skyline query algorithms.

I’ll stop here for now because my stomach’s expressing its disgruntlement at being denied food for a longer period than usual. In a future post, I’ll talk about Pareto optimality and its relation to Nash Equilibria.

Papers papers everywhere

Last month, I officially began my PhD research. Not surprisingly, the difference from any prior research I’ve ever done is quite evident. Primarily because there is really no horizon at all when you embark on a PhD.

The difficulty here essentially boils down to finding an answer to the following question: What problem is worth spending the next five years of your life trying to solve?

I already have an idea of the broad area that I’d like to look into. Digging deeper and deciding which boundary to push at, however, is the tricky part.

The workflow here is as follows:

1) Read a paper. Creativity and optimism will immediately kick in, pointing you to potential opportunities and gaps, causing a rainbow of ideas to explode out of your brain. You shiver with excitement.

2) Read another paper only to realise someone else has already done all of that and a little more.

3) Repeat.

Let’s see how long it takes before I finally break from step 1.

I’m not sure how I should end this post, so here’s a picture of my desk.

Many trees have died for the greater good here