UUID as Primary Key with Phoenix LiveView Authentication

Phoenix Framework has a potent generator for generating authentication for a project. The mix phx.gen.auth command can generate a complete user authentication system with registration, login, email confirmation, and password reset. But by default, it uses bigserial as the primary key. Using a sequence has drawbacks since it is a predictable counter and anyone can easily know or traverse the service for all available users. For this reason, we sometimes like to use UUID for the primary key.

Today, I’ll go through how to change the default Phoenix behavior of having bigserial as the primary key of the user table. If you don’t have Phoenix installed, check out the official documentation for how to install it.

Let’s generate a new Phoenix application to test things out:

mix phx.new demoUUIDAuth

The command will generate a new Phoenix application with Postgresql as a database. Now let’s change the DB configs in config/dev.exs so that Phoenix can connect with our local running database. After that we can generate the LiveView Authentication with the following command:

mix phx.gen.auth Accounts User users --hashing-lib argon2

The --hashing-lib argon2 tells Phoenix which hashing function to use for the password hashing. Available options are, bcrypt, argon2, and pbkdf2. By default, it uses bcrypt. For more information check out the official documentation. After running the command, you’ll be prompted to run the migration and to get the dependencies. Let’s get the dependencies but hold on running the migration because we need to change a few things to have UUID as our user table primary key. Run mix deps.get to get the dependencies, you’ll see it’ll get the argon2 and comeonin packages.

Since we have the packages, now we can change the schema to tell Ecto that we want to use a different type of primary key for our users table. Ecto here acts as the data mapper and database adapter. We can start using UUID as the primary key for the users table by changing the lib/demoUUIDAuth/accounts/user.ex file like the following:

+    @primary_key {:id, :binary_id, autogenerate: true}
+    @derive {Phoenix.Param, key: :id}
     schema "users" do
        field :email, :string
        field :password, :string, virtual: true, redact: true
        field :hashed_password, :string, redact: true
        field :confirmed_at, :naive_datetime

        timestamps(type: :utc_datetime)
  end

By adding the @primary_key attribute, we’re telling our application that the primary key for the schema will be named id and it is of type binary and it’s autogenerated. Ecto v2 guarantees autogeneration of the primary key either by invoking the database function or generation in the adapter. Meaning, either the ID will be generated on the database or it will be generated automatically by the database adapter layer and sent to the database while inserting.

Now, we have to change the users table definition in the migration file. Open your migration file at priv/repo/migrations/20240224203145_create_users_auth_tables.exs and change the following:

-   create table(:users) do
+   create table(:users, primary_key: false) do
+     add :id, :uuid, primary_key: true
      add :email, :citext, null: false
      add :hashed_password, :string, null: false
      add :confirmed_at, :naive_datetime
      timestamps(type: :utc_datetime)
    end

This change should have been enough if the user_tokens table didn’t have a relationship with the users table. But here the user_tokens table has a reference of users.id as user_id. We also have to change the type of user_tokens.user_id . Since we’re already on the migration file, let’s change the type of user_id field by doing the following:

-   add :user_id, references(:users, on_delete: :delete_all), null: false
+   add :user_id, references(:users, type: :uuid, on_delete: :delete_all), null: false

Basically, we’re saying the type on the reference. But this is not enough, since our schema of user_tokens still doesn’t know the type of user_id field. We have to do the following changes in the lib/taskchecklist/accounts/user_token.ex file:

   schema "users_tokens" do
     field :token, :binary
     field :context, :string
     field :sent_to, :string
 -   belongs_to :user, DemoUUIDAuth.Accounts.User
 +   belongs_to :user, DemoUUIDAuth.Accounts.User, type: :binary_id

     timestamps(updated_at: false)
   end

Now, we’re done with our changes and ready to run the mix ecto.migrate / mix ecto.setup command to run our database migrations. After the successful run of the migration we can start the Phoenix app with the following command:

mix phx.server

This will start our Phoenix application in localhost:4000 by default. To register a new user, go to http://localhost:4000/users/register and register your user. If you look into the logs, you’ll see the id for the user is UUID. You can also verify this by IEx. Run the following commands:

$ iex -S mix
iex(1)> DemoUUIDAuth.Repo.all(DemoUUIDAuth.Accounts.User)
[debug] QUERY OK source="users" db=5.3ms decode=1.1ms queue=2.6ms idle=1240.8ms
SELECT u0."id", u0."email", u0."hashed_password", u0."confirmed_at", u0."inserted_at", u0."updated_at" FROM "users" AS u0 []
↳ :elixir.eval_external_handler/3, at: src/elixir.erl:396
[
  #DemoUUIDAuth.Accounts.User<
    __meta__: #Ecto.Schema.Metadata<:loaded, "users">,
    id: "7d715492-3360-4a74-820e-2d9f17d772bd",
    email: "[email protected]",
    confirmed_at: nil,
    inserted_at: ~U[2024-02-24 21:26:12Z],
    updated_at: ~U[2024-02-24 21:26:12Z],
    ...
  >
]
iex(2)>

Congratulations! You’ve successfully changed the primary key for the Users table from bigserial to UUID. This might feel like a long process but it’s not that bad considering I’ve included all steps of generating the authentication part. The actual change of bigserial to UUID only takes changes in 3 files. You can checkout the repository of this article on GitHub.

If you have any questions, feel free to comment on this post, or reach out to me via LinkedIn or Email ([email protected]). Thanks for reading!

Debugging Nightmare: A Go Service That Worked Fine For Months Till It Didn’t

So, one Thursday afternoon just before we were leaving the office, my colleague told me that he was getting errors from a service I was responsible for. I told him that the last time any changes were made to the service was two months ago and it was deployed as is without further modification. After checking that the service was working fine in the staging environment, we just blamed it on the instability of the dev-test environment and thought everything would be fine the next day when we came back to the office after the weekends.

On Sunday, I went to the office a bit late and was welcomed by people trying to debug why the service wasn’t working. I was surprised since no changes were made to the service that might cause it to stop working. The worst part was, it was working fine in the Staging and Production clusters.

The service was running inside a Kubernetes cluster and was dockerized. The same docker image was used on all three environments, dev-test, staging, and production. Since the image ran fine on staging, prod, and local, we all were puzzled. It was clear, something was wrong in the Dev environment but we had no clue what it could be. I looked at the service’s log and was greeted with the following panic message:

‌SIGILL: illegal instruction
PC=0xda1128 m=5 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc4 0xc2 0xe9 0xf7 0xd2 0x48 0x1 0xd8 0x48 0x8b 0x78 0x30 0x48 0x85 0xfa 0x75 

Which made 0 sense, since SIGILL: illegal instruction typically means the CPU doesn’t support the instructions the binary is trying to execute. But this service ran fine for months on the Dev cluster without any issues. So, the CPU not supporting the instruction can’t be the case, right?

At this point, let’s get some background on the service we’re discussing. It’s an image processing service, and I recently rewrote it in Golang from Python for performance gains mostly. This meant, the Go binary had to rely on some C libraries and used Go’s cgo heavily. cgo sometimes can be fragile but since the same image ran fine on all other environments, I ruled that out as a possibility. But I wanted to know what exact function caused this program to panic beyond recovery. So, added delve debugger to the docker image, and asked the DevOps man to run the service with the delve debugger and deploy it to the dev cluster. Connecting to the delve session wasn’t much help. Since, after a few calls, the binary went talking to the C lib, and delve couldn’t trace the call anymore. A whole day was spent doing this, trying to figure out which function caused this.

The next day, one of my colleagues suggested that, instead of trying to find what function caused this, let’s try to find what instruction caused the SIGILL. Which was a brilliant idea! I immediately looked at the panic message and the instruction bytes were right before my eyes! The only problem was that they were in hex, so I googled and found a site that could disassemble x64 bytes into assembly! Putting the instructions bytes c4 c2 e9 f7 d2 48 01 d8 48 8b 78 30 48 85 fa 75 found in the panic message into the sites disassembler resulted in the following:

0:  c4 c2 e9 f7 d2          shlx   edx,edx,edx
5:  48                      dec    eax
6:  01 d8                   add    eax,ebx
8:  48                      dec    eax
9:  8b 78 30                mov    edi,DWORD PTR [eax+0x30]
c:  48                      dec    eax
d:  85 fa                   test   edx,edi
f:  75                      .byte 0x75

The instruction SHLX immediately jumped to us as something special. Doing a Google search we found that SHLX needed a special CPU feature called BMI2. But according to Wikipedia and Intel’s ARK database a CPU has to be 10 years old to not have this! “This can’t be the case, right?”, I asked myself. So to be 100% sure, I execed into the container running the service in the dev cluster and ran lscpu | grep BMI2, and to my surprise, BMI2 was missing! I then did the same for the staging service and BMI2 was present there. I wanted to do the same in prod but didn’t have access, lol. I thought, maybe the BMI2 CPUID Feature flag was disabled on the VM level or the Kubernetes level, so I notified my manager and the DevOps lead about my findings regarding the service failure and went home.

The next day, the DevOps lead, let us know that, only 1 of the 3 servers running the dev cluster supported BMI2. The other 2 were just too old to have this feature. This means we were just lucky that the service when first deployed was provisioned on the server with BMI2 support. Since a storage upgrade, the servers were restarted and the service was provisioned on a server that didn’t have support for BMI2. After hearing this, I laughed and said, “Dockerize once, run everywhere. Such a lie.”, which I already knew having gone through the migration from an Intel Mac to a M1 Mac. Also knew that for some special cases, the Linux Kernel version of all host machines had to be the same for the containers to operate properly, since docker containers share kernel with the host.

So, in the end, the fix was to make sure that the service was only provisioned on the server with BMI2 support. And later decommission the really old servers. Which we did.

That’s all for today! Thanks for reading and making it to the end. Not sure how much useful information was in the blog post. Anyways, you can reach out to me via [email protected].

How Does OCR Work: In Simple Terms

OCR stands for Optical Character Recognition, which is a technology used to convert scanned images, PDFs, and other documents into editable and searchable text. To achieve the desired results, an OCR system has to perform a few steps:

  1. Pre-processing: The first step in OCR is to prepare the image or document for analysis. This may include cropping the image to remove any unnecessary background, adjusting the brightness and contrast to make the text more legible, and rotating the image to the correct orientation.
  2. Segmentation: The next step is to divide the image into small segments, usually called “blobs” or “regions,” that contain individual characters or words. This is done by analyzing the image and identifying areas that are likely to contain text based on factors such as color, texture, and size.
  3. Feature extraction: Once the image has been segmented, the next step is to extract features from each segment. These features are text characteristics that can be used to identify the characters or words. Standard features include the shape of the text, the spacing between characters, and the relative position of the text within the segment.
  4. Recognition: This step is where the OCR software compares the features of the segmented text to a database of known characters or words. The software assigns a probability to each character or word that it recognizes and uses this information to determine the most likely match.
  5. Post-processing: After the text has been recognized, the final step is to clean up the output and correct any errors. This may include fixing any spelling mistakes, removing any unwanted characters, and formatting the text to make it more readable.
  6. Output: The OCR software outputs the recognized text as an editable document, which can be saved in various formats such as txt, doc, pdf. The recognized text can be used in various applications such as search engines, machine learning, and data analytics.

These above steps might sound pretty simple but there are lots of ways of doing the same thing with varying performance and results. There are a lot of Open Source and enterprise solutions present out there. The most popular Open Source OCR project is tesseract. But PaddleOCR is gaining popularity too and is better than tesseract in some aspects like for reading texts that are not in the correct orientation or extracting a table from an image.

In the coming weeks, I’ll try to write more about the individual steps in more detail.

Thanks for reading!