https://www.henrik.org/

Blog

Showing posts with label #Azure. Show all posts
Showing posts with label #Azure. Show all posts

Tuesday, March 7, 2023

Building an online service on a shoestring budget

Photo by Josh Appel on Unsplash
Although I have been working professionally as a software engineer since I was 18 years old I have always had hobby projects I have been working with on the side and I generally take a somewhat perverse pleasure in figuring out how to build and launch these things on as small of a budget as possible. This post is an attempt to go through some of the things I have found that have helped me be productive and successfully build and launch several hobby projects.

I am particularly going to assume that this is for hobby projects and that the skill and time of the participants are free. If you are paying any salary that will dwarf anything you might save by aggressively using free tiers of online services. I am also going to assume your team is small (Less than 5).

What not to skimp on

First, let us go over the things you should not skimp on. The most important thing here is to not use any equipment or software from your day job. The reason for this is that if you do then your employer can usually claim ownership of any IP produced with their equipment. Also check your employment contract to make sure your employer doesn't have a clause to claim ownership to anything you do. That said, if you live in California, even if your employment contract does claim this it is not enforceable as long as you don't use company equipment, time, IP and you are not directly competing with your employer (See labor code 2870 for details).

One more of the things I would advise you to do is to enroll in school if you are not already. Being enrolled in a Community College only costs a few hundred dollars a year and will provide you with free licenses to a huge amount of tools for software development. Telerik, IntelliJ, Autodesk, and many more give students a free non-commercial license to almost their entire catalog of tools and libraries. Granted, once you get to the launch stage you will need to buy real licenses for your tools, but it will still save you tons of money in the development phase. You might even learn something doing it.

Basic development tools

I believe that if code isn't checked into a source repository with change tracking it basically doesn't exist at all. So, the first thing to do when starting a project is to pick a source code repository. GitHub is the giant in the field and they are fantastic. Not only do they give you free private repositories they also give you 2000 minutes a month of build executions (GitHub Actions). If you are building open-source applications you even get unlimited build executions for free.

Next you probably want to choose a cloud provider. I would pick one of either AWS or Azure. If you can go Serverless then I would go with AWS since they have a perpetual free tier for everything you need to launch a Serverless service. If not, then Azure Bizspark is a great program if you qualify. AWS also has a program for $300 to spend getting your prototype ready. Another tip for getting started on AWS is to get a new account for any new project. This is because they have an additional massive free tier that only lasts for 1 year after opening the account. It is also generally best practice to only run 1 microservice per account. Once the freebies are over you can tie your accounts together using AWS Organizations and SSO to help you keep track of them all (Doing this will usually invalidate the free tiers so wait a year after account creation to do this).

You also likely need a web UI testing tool. I use Cypress which has a free tier and is overall very good. They only allow 500 test suites per month so you can't run canaries in the free tier, but it should be sufficient for any deployment-based testing. They also provide a dashboard where you can see which tests have succeeded and failed with videos of the test execution so you can easily troubleshoot failures, something that is very useful when you integrate it into your CI/CD pipeline.

How to build your software

The key thing you want to avoid if you are launching something on the cheap is fixed infrastructure. If possible, use serverless functions instead of hosts or containers to host run your code. With some thought, almost everything you build can be run in a true pay-per-use manner. For instance, with AWS you should aim to use API Gateway, Lambda, SQS, and DynamoDB. As your service scales, you might consider moving off some of these for cost reasons but these primitives are also able to scale to thousands of transactions per second without any change to infrastructure if done right and none of them have a fixed cost. You generally don't want to use services such as Kinesis, Elasticache, Opensearch, relational databases, hosts, or containers since these all come with minimum fixed costs even if your service has no usage.

Useful services with good free tier

Here are a couple of other services worth noting with useful features and good free tiers.

  • Google Analytics is ubiquitous for site analytics. It is having privacy issues in the EU though with several countries declaring it illegal recently. Another option that I use that with more of a privacy focus is Clicky.
  • Also useful from Google is Firebase which provides a lot of features such as a basic user database, usage analytics, and monitoring among others. It is a great choice if your primary use case is a mobile app. It is pretty inflexible for building complex applications or services though and you probably want to go with a normal cloud provider for that.
  • Cloudflare is Web Application Firewall and has a very useful free tier. They also provide a privacy-focused and less annoying CAPTCHA service called Turnstile.
  • Blogger is a free blogging platform. It will generally not let you build your entire website like Wordpress will, but if all you need is blogging it does that well and allows you to use custom domains for free. 
  • Crisp is a great platform for providing support for your site and they have a nice free tier for getting started.
  • Auth0 provides a platform for helping you do auth of your users and has a decent free tier to get you started.
  • Most of the payment processors such as Square, Stripe, and Braintree only charge a percentage with no setup costs. Their fees are very similar, I prefer Stripe myself only because they have fantastic developer documentation.

Launching and running a service

As you first start out I tend to not think too much about schedules and deliverables. The reason for this is that I do this for fun and the best way to kill the fun is to start making yourself a slave to delivery commitments and launch dates. That said as you get closer to launch I really do think you need a way to keep track of remaining tasks and open bugs etc. In my opinion, Jira from Atlassian is by far the best and most comprehensive tool for this and as long as you have a small team everything you need is available for free.

You will need monitoring of your service before you go live. Both AWS and Azure have built-in monitoring tools and they work well. Also worth mentioning again in this space particularly is Firebase which does have some monitoring and analytics capabilities. Another service in this area that has a good free tier is New Relic. One thing that neither AWS nor Azure has is paging for when things actually go wrong. The tool that I found here that has a very functional free tier is Pager Duty, that said you are likely to want to upgrade from the free tier pretty soon as your service takes off to be able to have more control over your escalations.

Your service will likely need a single place to aggregate everything that is going on in one place such as task completions deployments and any issues and here Slack is hard to beat and have a great free tier.

Be frugal, not cheap

As a parting word, I would like to point out that although figuring out how to build and launch your service cheaply don't let that stand in the way of building your service right. Never pick the cheap option over the correct option, you will always regret it in the end.

For me, one of the main reasons why building things in a frugal way when I am working on hobby projects is that it allows me to have fun doing them longer because I don't have the pressure of needing to be done and launched fast because I am bleeding money during the development phase.

Being frugal during the development phase might also allow you to retain a larger portion of your equity if you actually launch your service because it will reduce the amount of help you will need to get started before you get a customer base. As an example, one of my previous projects Your Shared Secret literally has $0 per month of fixed cost. My more recent project Underscore Backup is not quite that cheap but has a fixed cost of less than $50 per month. Most of that cost is for CloudWatch alarms, KMS keys, and Dashboards.

Tuesday, July 14, 2015

How to get the most out of your BizSpark azure credits

BizSpark is arguably one of the best deals on the internet for startups. For me the key benefit that it brings is the 5 x $150 per month of free Azure credits. That said they are a little bit tricky to claim.

The first thing you need to do is claim all you BizSpark accounts and then from each of those accounts claim your Azure credits. This blog post describes this process, so start by doing that.

So after doing this you have 5 separate Azure accounts each with $150 per month of usage. However what we want is one Azure account where we can see services from all of these subscriptions at once and that requires a couple of more hoops to jump through. In the end you will end up with one account where you can see and create services from all 5 subscriptions without having to log in and out the Azure management portal to switch between them.

  1. The first step is to pick the one account you want to use to administrate all the other accounts.
  2. This is a bit counter intuitive, but you need to start by adding every other account as co administrators to the account from the first step. Yes, I am saying this correctly. All the other accounts need to be added as administrators to the main admin account (Don't worry, this is temporary).
  3. The following steps need to be done for each of the accounts except for the main account from step 1.
    1. Log into the management console using one of the four auxiliary accounts and go to settings.
    2. Make sure you are on the subscription tab.
    3. Select the subscription that belongs to the account you are currently logged into. It will be the one that has the account administrator set to the account you are currently logged into. If you have done this correct you should see two different subscriptions, one for the subscription you are logged in as and one from the account in step 1.
    4. Click the Edit Directory button at the bottom.
    5. In the image below make sure you select the directory of the main account from step 1. It shouldn't be hard because it will be the only account in the list and pre-selected. If you have already set up any co administrators to the account you will be warned that they will all be removed.
    6. Add the account from step 1 as co administrator to this account as described in the linked to article at the top of the post.
    7. The last step is optional but all the subscriptions will be called Bizspark and hard to keep apart so you might want to rename them.
      1. To do this go to the Azure account portal at https://account.windowsazure.com/Subscriptions. This page tend to be very slow, so be patient following links.
      2. Click on the subscription name. Your screen might look different depending on how many subscriptions you have.
      3. Click on the Edit Subscription Details.
      4. Enter the new name in the dialog presented. You can also optionally change the administrator to the account from step 1 at the top, this will remove the owning account as an administrator from the account all together (Although they are still responsible for billing).
  4. You can now remove all the other accounts from being administrators to the main account that you added in step 2 if you want.

If you follow all these steps when you log into the account from step 1 you should be able to see all of your subscriptions at the same time in the Azure management console like in the screenshot below.

Keep in mind this does not mean that you have $750 to spend as you want. Each subscription still has a separate limit of $150 and you have to puzzle together your services as you create them to keep all of the 5 limits from running out but at least this way you have a much better overview of what services you have provisioned in one place.

Thursday, July 9, 2015

Algorithm for distributed load balancing of batch processing

Just for reference this algorithm doesn't work in practice. The problem is that nodes under heavy load tend to be too slow to answer to hold on to their leases causing partitions to jump between hosts. I have moved on to another algorithm that I might write up at some point if I get time. just a fair warning to anybody who was thinking of implementing this.

I recently played around a little bit with the Azure EventHub managed service which promises high throughput event processing at relatively low cost. At first it seems relatively easy to use in a distributed matter using the class EventProcessorHost and that is what all the online examples provided by Microsoft are using too.

My experience is that the EventProcessorHost is basically useless. Not only does it not contain any provision that I have found to provide a retry policy to make its API calls fault tolerant. It also is designed to only checkpoint its progress at relatively few intervals meaning that you have to design your application to work properly even if events are reprocessed (Which is what will happen after a catastrophic failure). Worse than that though once you fire up more than one processing node it simply falls all over itself constantly causing almost no processing to happen.

So if you want to use the EventHub managed service in any serious way you need to code directly to the EventHubClient interface which means that you have to figure out your own way of distributing its partitions over the available nodes.

This leads me to an interesting problem. How do your evenly balance the load of work evenly over a certain number of nodes (In the nomenclature below the work is split into one or more partitions) which can at any time have a catastrophic failure and stop processing without a central orchestrator.

Furthermore I want the behavior that if the load is completely evenly distributed between the nodes the pieces of the load should be sticky, meaning that the partitions of work currently allocated to a node should stay allocated to that node.

The algorithm I have come up with requires a Redis cache to handle the orchestration and it uses only 2 hash tables and two subscription for handling the orchestration. But any key value store that provides publish and subscribe functionality should do.

The algorithm have 5 time spans that are important.

  • Normal lease time. I'm using 60 seconds for this. It is the normal time a partition will be leased without generally being challenged.
  • Maximum lease time. Must be significantly longer than the normal lease time.
  • Maximum shutdown time. The maximum time a processor has to shut down after it has lost a lease on a partition.
  • Minimum lease grab time. Must be less than the normal lease time.
  • Current leases held delay. Should be relatively short. A second should be plenty (I generally operate in the 100 to 500 millisecond range). This is multiplied by the number of currently processing partitions. It can't be too low though or you will run into scheduler based jitter of partitions jumping between partitions.

Each node also should listen to two Redis subscriptions (Basically notifications to all subscribers). Each will send out a notification that is the partition being affected.

  • Grab lease subscription. Used to signal that the leas of a partition is being challenged.
  • Allocated lease subscription. Used to signal that the lease of a partition has ended when somebody is waiting to start processing it.

There are also two hash keys in use to keep track of things. Each one contains the hash field of the partition and will contain the name of the host currently owning it.

  • Lease allocation. Contains which nodes currently is actually processing which partition.
  • Lease grab. Used to race and indicate which node won a challenge to take over processing of a partition.

This is the general algorithm.

  1. Once every time per normal lease time each node will send out a grab lease subscription notification per each partition that.
    • It does not yet own and which does not currently have any value set for the partition in the lease grab hash key.
    • If it has been more than the maximum lease time since the last time a lease grab was signaled for the partition (This is required for the case when a node dies somewhere after step 3 but before step 6 has completed). If this happens also clear the lease allocation and lease grab hash for the partition before raising the notification since it is an indication that a node has gone offline without cleaning up.
  2. Upon receipt of this notification the timer for this publications is reset (So generally only one publication per partition will be sent during the normal lease time, but it can happen twice if two nodes send them out at the same time. Also when this is received each node will wait based on this formula.
    • If the node currently is already processing the partition it will wait the number of active partitions on the node currently held times the current leases held delay minus half of this delay (So basically (Locally active partitions - 1) * current leases held delay).
    • If the node currently is not busy processing the partition that is being grabbed the node should wait the local active partitions plus one times the current leases held delay (On so fewer words (Locally active partitions + 0.5) * current leases held delay).
  3. Once the delay is done try to set the lease grab hash key for the partition with the conditional transaction parameter of it not being set.
    • Generally the node that has the lowest delay from step 2 should get this which also means that the active partitions on each node should distribute evenly among any active nodes since the more active partitions each individual node has the longer it will wait in step 2 and the less likely it is that they will win the race to own the partition lease.
    • If a node is currently processing a partition but did not win the race in step 2 it should immediately signal its partition to gracefully shut down and once it is shut down it should remove the lease allocation hash field for the partition. Once this is done it should also publish the allocated lease subscription notification. After that is completed this node should skip the rest of the steps.
  4. Check by reading the lease allocation hash value to see if another node than the winner in step 3 is currently busy processing the partition. If this is the case either wait for either the allocated lease subscription notification signaling that the other node has finished from step 3b or if this does not happen wait for a maximum of maximum shutdown time and start the partition anyway.
  5. Mark the lease allocation hash with the new current node that is now processing this partition.
  6. Also after the minimum lease grab time remove the winning indication in the lease grab hash key for the partition so that it can be challenged again from step 1.

When I run this algorithm in my tests it works exactly as I want it. Once a new node comes online within the normal lease time the workload has been distributed evenly among the new and old nodes. Also an important test is that if you only have one partition the partition does not skip among the nodes, but squarely lands on one node and stays there. And finally if I kill a node without giving it any chance to do any cleanup after roughly maximum lease time the load is distributed out to the remaining nodes.

This algorithm does not in any way handle the case when the load on the different partitions is not uniform, in that case you could relatively easily tweak the formula in step 2 above and replace the locally active partitions with whatever measurement of load or performed work you wish. It will be tricky to keep the algorithm sticky though with these changes.