Mauricio Poppe notes

Tmux to Zellij

Sat, 21 Jun 2025 14:00:00 +0000

Requirements

My requirements for a terminal multiplexer:

Organization of workspaces using sessions
- Quickly create vertical and horizontal panes
- Seamless movement between my editor and my terminals
- Seamless integration with neovim
Switch sessions effectively
- Single keybind to switch between sessions
- Use a list of known workspaces as input to start/switch sessions

Learning Zellij

Zellij’s introduces modes similar to vi where each mode has its own separate keybindings, for more info about the modes read https://zellij.dev/documentation/keybindings-modes.html

I mapped Ctrl + Space to enter switch from Normal mode to Tmux mode and back.

Organization of workspaces using sessions

I mapped Ctrl + Space+- to create a horizontal pane and Ctrl + Space+\ to create a vertical pane.

Because of the different modes that zellij has I also use the zellij-autolock plugin to provide a single keybind combination to move across panes, while this is possible to do with zellij without plugins the plugin is needed for zellij to be aware of switching modes when entering a pane running a program.

My zellij-autolock setup is very similar to the one in the repo .

Neovim needs to be aware of the plugin, fortunately, the same author created zellij.vim which I included through my preferred package manager.

Single keybind launcher to switch sessions

Demo of switching sessions with Zellij

I want a system that helps me find my preferred session to launch or to switch too, the sessions to display are my preferred list of sessions and the currently opened sessions, After I make the selection in the fuzzy finder, I want to switch to that session.

With tmux, I have this setup with this one liner:

# This is a simplified version of my setup, it doesn't run it as it is.
( cat $bookmarks && tmux ls ) | fzf --tmux | xargs tmux switch-client -t

Zellij doesn’t have a subcommand similar to tmux switch-client. There’s this reddit thread where Zellij’s author mention that the way to do this is with a plugin. Fortunately, the plugin zellij-switch already does this.

# This is a simplified version of my setup, it doesn't run it as it is.
( cat $bookmarks && zellij list-sessions -n ) | fzf | \
  xargs -I {} zellij pipe --plugin https://github.com/mostafaqanbaryan/zellij-switch/releases/download/0.2.1/zellij-switch.wasm \
  -- "session $(basename {}) --cwd {} --layout default"

I map the above the the keybinding Ctrl + Space+Ctrl J with the following zellij config.

        bind "Ctrl j" {
            SwitchToMode "normal";
            Run "zellij-switch-session" {
                direction "Down";
                close_on_exit true;
            }
        }

zellij-switch-session is a bash script that wraps the above one zellij one liner.

Creating a backing track from your favorite song for an open mic

Sun, 01 Dec 2024 14:49:00 +0000

I have a couple of ways to split a song:

With Logic Pro for iPad - In the update for 2024, there’s a way to split a song into tracks (bass, drums, other, vocals), this feature is called stem splitter .
With demucs which is an open source tool capable of separating drums, bass, and vocals from the rest of the accompaniment
- It has a feature where it can split a song using more instruments (bass, drums, guitar, other, piano, vocals), in this mode I can keep important tracks in the backing track such as the “other” and “piano” tracks which make the backing track feel complete.

In this article I show how to use demucs to split a song into tracks.

Splitting a song with Demucs

High level steps

Install essential tools
- git
- python
Download dependencies (there’s an automated step to download these below)
- Install yt-dlp to download your song (if you have your song skip this step).
- Install demucs to split the track.
- Install ffmpeg to combine selected tracks into a combined track.
Download your song with yt-dlp
Separate the track with demucs
Join the drums/bass/other tracks with ffmpeg into a track that you can use as your backing track!

Download demucs dependencies

The assumption is that you already have git and python installed.

I’ve created a project that has a file with all the dependencies to install, you just need to clone the repository and use the requirements.txt file to install the dependencies.

git clone https://github.com/mauriciopoppe/open-mic/
cd open-mic
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements.txt

Download your song

I’ll create a backing track with bass, drums, piano and other for the song December 21 by Prince Royce

First let’s download the song assuming that it’s on Youtube:

yt-dlp -x --audio-format mp3 <link-to-song>

Example:

yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=A9B1Uo-VQas"
[youtube] Extracting URL: https://www.youtube.com/watch?v=A9B1Uo-VQas
[youtube] A9B1Uo-VQas: Downloading webpage
[youtube] A9B1Uo-VQas: Downloading ios player API JSON
[youtube] A9B1Uo-VQas: Downloading android player API JSON
[youtube] A9B1Uo-VQas: Downloading player b46bb280
WARNING: [youtube] A9B1Uo-VQas: nsig extraction failed: You may experience throttling for some formats
         n = CN0_8RV0LjPMw_i9xJs ; player = https://www.youtube.com/s/player/b46bb280/player_ias.vflset/en_US/base.js
WARNING: [youtube] A9B1Uo-VQas: nsig extraction failed: You may experience throttling for some formats
         n = it19Djd_eJ-wpANfMuK ; player = https://www.youtube.com/s/player/b46bb280/player_ias.vflset/en_US/base.js
[youtube] A9B1Uo-VQas: Downloading m3u8 information
[info] A9B1Uo-VQas: Downloading 1 format(s): 140
[download] Destination: Prince Royce - Dec. 21 (Official Video) [A9B1Uo-VQas].m4a
[download] 100% of    3.30MiB in 00:00:00 at 7.11MiB/s
[FixupM4a] Correcting container of "Prince Royce - Dec. 21 (Official Video) [A9B1Uo-VQas].m4a"
[ExtractAudio] Destination: Prince Royce - Dec. 21 (Official Video) [A9B1Uo-VQas].mp3
Deleting original file Prince Royce - Dec. 21 (Official Video) [A9B1Uo-VQas].m4a (pass -k to keep)

The track is downloaded in the same location as where the command was run.

Next let’s use demucs to separate the track into different instrument tracks.

demucs -n htdemucs_6s --mp3 -j 2 <path-to-downloaded-song>

Example

demucs -n htdemucs_6s --mp3 -j 2 "Prince Royce - Dec. 21 (Official Video) [A9B1Uo-VQas].mp3"
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /Users/mauriciopoppe/go/src/github.com/mauriciopoppe/open-mic/separated/htdemucs_6s
Separating track Prince Royce - Dec. 21 (Official Video) [A9B1Uo-VQas].mp3
100%|██████████████████████████████████████████████████████████████████████| 216.45/216.45 [01:28<00:00,  2.46seconds/s]

demucs created multiple files in the directory separated, let’s list them:

tree separated/
separated/
└── htdemucs_6s
    └── Prince Royce - Dec. 21 (Official Video) [A9B1Uo-VQas]
        ├── bass.mp3
        ├── drums.mp3
        ├── guitar.mp3
        ├── other.mp3
        ├── piano.mp3
        └── vocals.mp3

Finally let’s join the bass, drums, other, piano tracks with ffmpeg into the combined file combined.mp3.

cd separated/htdemucs_6s/Prince\ Royce\ -\ Dec.\ 21\ \(Official\ Video\)\ \[A9B1Uo-VQas\]/
ffmpeg -i bass.mp3 -i drums.mp3 -i other.mp3 -i piano.mp3 -filter_complex amix=inputs=4:normalize=0 combined.mp3

Let’s compare the original song with two versions of the backing track, one with bass and drums (no voice or guitar) and the other with bass, drums, other and piano (no voice or guitar).

Song	Sample Audio
Original song (voice and all instruments)
Backing track (bass, drums), no voice or guitar. This is what you'd get using Logic Pro's stem split feature.
Backing track (bass, drums, other, piano), no voice or guitar

demucs is an amazing tool and so useful for open mics!

Bachata

Sun, 17 Nov 2024 14:00:00 +0000

Learning to dance

I had my first dancing experience in San Franciso in 2018, I went with a friend to Space 550 , a latin night club where I had an intro class to salsa. I didn’t enjoy it that much because I was new to it and because you have to be a good leader (or at least a leader with enough decent moves) to start enjoying it.

I decided to try again in late 2021 at a dance academy in South San Jose called Dance Boulevard . They had a 90$ all-you-can-dance 1-month package where I got to try different types of dance including salsa, bachata and ballroom. I went to all the classes and initially I focused on getting better at salsa, over the next weeks I realized that bachata was more fun not only to dance to but to listen to. The salsa and bachata counts are similar (1 to 8), I felt that salsa was faster and more fluid while bachata is more relaxed and you can slow down on steps 4 and 8, over the next months I spent listening to bachata music way more and I started enjoying the music too, it’s just very very fun.

While I could keep on dancing both I decided I’d stick to bachata for the long run.

Dancing bachata is perfect for me because:

First and foremost, it’s fun.
It’s a way to do cardio and improves my fitness.
It’s a great way to make friends and connections with people that like bachata as much as you do.
(bonus) It’s a perfect excuse to travel. My excuse are the dance festivals happening all around the world.

My bachata journey

How do you get better at it?

The steps to get better are simple, you go to group classes and learn a few moves, if there are dance socials right after the class you stay and practice the moves you learned. Then at home you practice on your own, I focus a lot on perfecting isolations (targeted movement of a part of my body) so that the follower can feel the next move I’d like to do. Finally, you go to the social dance nights every week to apply what you learned in classes and on your own.

I liked going to bachata festivals to learn new moves, on my first festivals I went to most of the classes over the 3 days I was there. Looking back, there are too many moves to learn and I felt drained to go to the social dance nights by the end of the day. For future festivals, I decided to pick a few classes that matched my style (I like sensual bachata) and the rest of the time I’d spend it exploring the town. Then later, I’d go back to the hotel, rest and go to the social dance night which usually start very late after midnight.

Nowadays, I spend my free time at home practicing a small set of moves and some variations, as I mentioned before I also practice isolations a lot. My teacher says that we should dominate the fundamentals and that’s exactly what I focus at home over and over again. To learn new combos I use https://bachatasteps.com/ .

In the dance floor there are good days and bad days. Days where you feel the flow of the music and all the moves click in your head and days where you don’t feel the flow of the music at all. Days where you dance with a follower and all of the moves and their transitions seem so effortless and days where you can’t make a connection with a follower and they seem disinterested and the moves don’t go with the flow, it’s just another day.

It’s a life long hobby, so let’s just enjoy the process.

2021 - Starting my dancing journey

I started dancing bachata in late 2021 at a dance academy in South San Jose called Dance Boulevard .
I went a couple of times to Alberto’s Night Club , a dance night club in Mountain View. I didn’t enjoy it much because I was still a beginner and because the dancing space was very small.
On a few Saturday nights I went to Space 550 in San Francisco, this is where I met a dance instructor called Kathy Reyes who became my first bachata teacher.

2022 - The year of the bachata festivals

On weekday nights I went to Berkeley to dance with Kathy Reyes , she liked traditional bachata so I got to learn lots of footsteps, every week we’d learn a new combo and practice in a mini social night event afterwards.
On weekday nights I went to Studio M in San Jose to dance with Ngoc Huynh who was my 2nd bachata teacher, I met her in a Sunday social dance event in San Francisco, I really like her energy and humor, she’s a great teacher as well.
On saturdays I went to the socials in Space 550 in San Francisco.
I learned about bachata festivals happening all around the US, they were a perfect excuse to start travelling more and to explore the US. The festivals are usually 4 day long events where multiple bachata professional teachers teach you new combos in the afternoon and there are huge social dance events at night. Initially I went to the classes to learn new combos but there are too many classes and too many moves to learn that I felt that I was not learning the right way because it was just too much content. I decided I’d go to a few classes and instead spend my time exploring the city where the festivals took place.
- Bachata Festival in LA in May.
- Bachata Festival in Sacramento in June.
- Bachata social nights in San Diego in July.
- Bachata Festival in San Francisco in July.
- Bachata Festival in Miami in September.
- Bachata social nights in New York in October.
In the July festival in San Francisco, I met ataca and la alemana, they’re a very famous bachata duo characterized by their romantic dance style, every time they dance they show a strong dance connection, they do simple moves but with great transitions.

I danced with la alemana in the festival! Fun souvenir, a photo with ataca y la alemana:

Yo y Ataca y la Alemana
I went to a bachata concert by Prince Royce in San Jose in October.
I created a bachata player website to mark my love for bachata.

2023 - Slow year in bachata dance

I stopped dancing in March because of an upcoming move.
I moved to NYC in mid 2023.
I started dancing again in September at KS Bachata Sensual NYC with Steven Halim and Kasia Brozynska , in their teaching style they have a 4-week cycle class where you learn a combo and perfect it during the cycle, this style is my favorite because I get to perfect a combo and just replay it in the dance floor.
- They also host social night events on Saturday nights, I didn’t go to them as much as I wanted.
I went to a bachata concert by Romeo Santos in New Jersey in November.

2024 - Making covers of bachata songs

One of the reasons I moved to the city is to be more involved in the music scene as a performer, I like playing the guitar in my free time so I decided I’d start making covers of bachata songs that I like, I started going to a few voice lesson classes to improve my singing skills for the open mic events. I also started a Youtube channel where I upload my covers.

I watched Aventura in the Madison Square Garden in May.
From June to Nov I kept on going to KS Bachata Sensual NYC classes on weekdays. I didn’t go to socials on Saturdays as often which is not good!
I did my first bachata cover and uploaded to Youtube, I made a cover of Solo Conmigo by Romeo Santos .

I performed at NY Guitar School with a cover of Todavia me Amas by Aventura

2025 - Dancing around the world

I did a 2 month vacation across Latin America, in addition to exploring new places I had the opportunity to go dancing!

San Salvador, El Salvador - I danced at PÁNUK Centro de Danza , the DJ would mostly play Salsa (like 5 salsa songs and then 2 bachata songs).
Santo Domingo, Dominican Republic - I went to Punta Cana to relax and dance at night, when I rented a car the owner told me that the real bachata scene was in Santo Domingo so I drove a few hours to get there. The owner recommended el “Corito Bachatero” and it was amazing! I danced with many Dominican and Colombian girls, this place plays a lot of traditional bachata which I don’t usually dance to, I learned that in traditional bachata you don’t make a lot of turns. Nevertheless, it was a fun experience.

My Favorites

This is a collection of my favorite steps in Bachata or the steps that I do in the dance floor. I look at these clips regularly while practicing and I try them in the dance floor.

Dancers

Steven & Kasia - https://www.instagram.com/stevenkasiabachata/
Kathy Reyes - https://www.instagram.com/krdance.official/
Ngoc Hyunh - https://www.instagram.com/knockout_ngoc/
Brian Dinh - https://www.instagram.com/itsyaboibdinh/
Vlad & Nataly - https://www.instagram.com/vladynataly/
Cornel & Rithika - https://www.instagram.com/cornelrithika_official/
Daniel D’Errico - https://www.instagram.com/danielederrico_/
Umeko Yuko - https://www.instagram.com/umekoyuko/
Pablo Cano - https://www.instagram.com/_pablocano_/
Anastasia and Jovanny - https://www.instagram.com/ayjbachata/
Michal & Nicky - https://www.instagram.com/michalynicki/
Juan y Sara - https://www.instagram.com/juanysara_bachata/
Clark Ji - https://www.instagram.com/dancingwithclark/
Azael Salazar - https://www.instagram.com/azaelbachatafever/
Ataca y la alemana - https://www.instagram.com/atacaylaalemanaofficial/
Magda y Valeria - https://www.instagram.com/magdayvaleria_official/
Daniel & Tom - https://www.instagram.com/danielytom/

Combos

I learned a lot from reels from this channel https://www.instagram.com/bachataworldmasters/ .

Video	Why I like it
https://www.instagram.com/p/CsJi6wVNfpG/	The transition to two types of waves, the arm wave at the end.
https://www.instagram.com/p/CjsSijqjXOq/	Simple but easy to do
https://www.instagram.com/p/CqNZ_uwjX59/	Leader switch front and back
https://www.instagram.com/p/CgwROATvrSY/	Simple variation on diagonal (ataca y la alemana)
https://www.instagram.com/p/CgjHS_dA_D7/	Hip out, Chest out, Chest in, Hip in
https://www.instagram.com/p/DCRLfdyCIR8/	Titanic step
https://www.instagram.com/p/Cqk7EsoOyk5/	Contra y circulo (tutorial)
https://www.instagram.com/p/CttvRi-smnq/	Leader switch front and back + a very nice hair brush at the end
https://www.instagram.com/p/CsjZBrIvZWq/	Double turn lead, follower shadow, nice sync jump with the music
https://www.instagram.com/p/CuHh2JvgFwR/	Basic of basic turn with hug + hip,chest,chest,hip
https://www.instagram.com/p/CuMbAybAvIH/	Perfect intro to the chorus
https://www.instagram.com/p/CuMsbw1LeJX/	Onda lateral + contra
https://www.instagram.com/p/C1e224nCSVi/	Lady squat
https://www.instagram.com/p/C12LBKti8tf/	Side plank
https://www.instagram.com/p/C48DmHGiyFi/	Nice double turn + hair brush

Documenting my life

Sun, 10 Nov 2024 16:28:00 +0000

Why am I documenting my life?

Ordinary moments don’t have a value right now but they will value over time. We might think that mundane things will stay the same forever but they don’t.

I took the following notes from a vlogger that I admire called Riza :

Because we don’t know when we will die, we get to think of life as an inexhaustible well. Yet everything happens only a certain number of times, and a very small number really.

How many more times will you remember a certain afternoon of your childhood, an afternoon that is so deeply a part of your being that you can’t even conceive of your life without it? Perhaps four, five times more, perhaps not even that.

How many more times will you watch the full moon rise? Perhaps 20. And yet it all seems limitless.

Riza

Life moves pretty fast, I want to do my best to remember it.

How do I document my life?

I like writing so I’ll start with a journal. I’ll also get a video camera and I’ll start with one that’s the most convenient to use. I’ll also learn to edit videos because I want to romanticize my life by recording my adventures as something that my future self would like.

I’m the subject and reporter of my own life, when I talk to myself in a video it’s like I’m asking questions and answering them out loud.

Journaling

I have a google doc where every night I write an entry about my day, sometimes I forget to do it but that’s ok, it doesn’t need to be perfect and it can have as much detail as I want to. It’s a google doc for now but I feel I’d value a physical journal a lot because I could hand write it! Maybe I’ll do that later.

There’s a template at the top with the following questions:

[today's date]

* I am grateful for:
* Positive self affirmation:
* What did I experience today?
* Good for someone today?
* What will I do better tomorrow?

Somedays I feel like reading a few entries from the past, it feels great to go back in time to see what I did in my life and how I grew as a person from that moment in time.

I also have a special doc for dreams, I’m trying to get better at being able to have vivid dreams where I remember more details of my dreams, regardless of having a happy or sad vivid dream I record it so my future self can remember it.

Photos

I got myself an instax camera and I used in a trip abroad, while I’m able to capture things in the moment with my phone and I can share them online it doesn’t have the same value as taking a physical picture in the moment and sharing it with someone.

Video

I decided to get a nice vlogging camera called DJI Osmo Pocket 3, it’s great because it very compact and has a stabilizer. I learned the basics about photography and video editing with Davinci Resolve and I practice to summarize a series of random clips that I took during a day.

Reflecting on my life

Every 4 months I write a small essay about my life for that 4 month period. Every year on my birthday I wrote a longer essay about my life during the last year. I capture highlights about the things that I did and progress on the objectives I set for myself in a previous essay. Writing these thoughts is a great way to reflect on my life.

To help me on this reflection I use NotebookLM which helps me remember details of my life, I chat with it to see big picture topics in the goals I’ve set. This is an interesting way to do therapy :).

Kubernetes

Sun, 30 Apr 2023 18:27:00 +0000

Presentations

CSI Windows

PV/PVC controller

Debugging K8s e2e tests with delve

Playground

As I started to contribute to kubernetes I created a few environments for easier development.

Please check it out for examples about:

Productivity skills

Wed, 11 May 2022 21:16:00 +0000

Getting used to multitasking

Multitasking is a skill that I try to get better at to better use my time, while it may seem detrimental to my performance because I’m not completely focused on a task, I believe that there are situations where being able to do a mental context switch¹ can save you a lot of time.

As you start growing in your career your scope will increase as well as the amount of knowledge that you have, you’ll participate in more meetings for many different topics that you’ll have to juggle around in your head, with lots of opportunities to practice every day you eventually get used to it. So the short answer like any other skill is practice.

Time for deep focus, time for a break

Having periods of time for deep focus is a must but it’s also really important to let the conscious mind rest and let the diffuse mode of thinking act. I learned this from the book “A mind for numbers” by Barbara Oakley which goes deeper into balancing the focussed and diffuse modes of thinking. For this reason, taking breaks is really important and helps me reset my mind, some of the solutions for problems at work that I can’t solve when I’m focused usually come after I take a break and go back to my desk.

Getting interrupted during the periods for deep focus ruins my train of thought and is something that I avoid by disabling notifications and blocking time in my calendar to focus better on the task.

Having a routine helps tremendously and keeps me happy, regular exercise in the morning keeps my mind clean and gives me energy for the rest of the day to focus better.

Emails

A tip that I got from a coworker is to tag and filter all the incoming emails, I use an internal tool at work that helps me tag emails with a declarative language, I think that https://github.com/mbrt/gmailctl or a similar tool can help.

Once you tag email the first thing is to free your inbox from emails that aren’t that important to read, lots emails coming from our bug tracker are most of the time not directed to me but to my team inbox instead, the first filter group bugs by the team they’re targeted to and moves them to a tag e.g. bugs-team-a, bugs-team-b and archives them skipping the inbox. Some of these bugs might need my attention because I’m mentioned in them and therefore if I’m CCed on them then I also need to add another tag to it e.g. bugs-me.

I receive emails directed to the google groups I’m subscribed, to my org and company wide, while these are important messages they’re not urgent and they can also be tagged to something like google-groups, org, company, etc.

I do the same with changelist & github emails, I group them by the team e.g. cl-team-a, cl-team-b and skip the inbox, for bugs where I’m the reviewer or where I’m CCed I add cl-me and they stay in my inbox.

There’s also spam that should be tagged and marked as read by default, for example a person joining a big team where I’m also part of might generate an automated email for all the members of the team, emails like this can be marked as read, tagged and archived.

With time I got used to check my inbox regularly following this priority: first my inbox, bugs-me, cl-me and if I feel like it then I read other tags.

Task management

Throughout my day/week I get emails with action items, in meetings after taking some notes we realize that there are some action items that I should act on soon (for example reading and reviewing a design doc, work on an upcoming release, etc). While I can create an internal bug for some of these with an assigned priority there are some items like asking and giving feedback in a design doc where I wouldn’t need to create a bug, in addition, for some items with deadlines I also need a reminder to work on it soon.

For these reasons I use Google Tasks as my task management tool, it’s easy to add tasks by hand, tasks from emails (with subtasks too 🙂), set calendar reminders and manually order to give some priority among them. What’s super cool is that I can see it every time I go to my email tab. ( Read this article for more info about how to set it up. ).

Keeping notes

At work almost all the meetings have meeting notes, we write the topics that were discussed and their conclusions, I can’t emphasize enough how useful these notes are, they help remember discussions and conclusions, they prepare you for the next meeting if it’s a regular meeting and if you missed a meeting then you could read the notes taken and reach out to people if needed.

For example these are the meeting notes of the Kubernetes Storage Special Interest Group , as you can see it’s split by dates, topics discussed and conclussions.

In my day to day I look at these notes just like I check my emails, outside work I keep a weekly checklist of the things I have to do, having more things written and outside my mind gives me more room to remember valuable things.

Development tools

At work I make changes to many codebases during the day, to quickly switch across codebases and the terminal layouts that I’m used to I use tmux , tmuxinator , a combination of a few scripts that I’ll talk about later and fzf . I’ll describe some concepts around tmux and tmuxinator, the scripts that I use and my workflow, and other tools that I tried that didn’t work for me. First a quick look into what it looks like:

Sessions in tmux can have a name and in my mind I keep the mapping of a session name to a codebase, e.g. if I want to work in the kubernetes codebase cloned at ~/go/src/k8s.io/kubernetes then that’d be the tmux session name I should remember.

Once I’m in that session I usually have a predefined terminal layout, in most of the codebases I keep a 3 pane layout with my editor on the left and two terminals stacked vertically on the right, because this is a common layout across many of the codebases I work on I have to save it so that the next time I open the codebase I keep the same layout, to do so I use the following file stored at ~/.tmuxinator.yaml

# I use · instead of . because . is reserved in tmuxinator
# I also don't want to see the entire path to home, instead just use ~
name: <%= ENV['PWD'].gsub('.', '·').gsub(ENV['HOME'], '~') %>
root: ./

windows:
  - editor:
      layout: 7598,272x69,0,0{209x69,0,0,10,62x69,210,0[62x34,210,0,11,62x34,210,35,12]}
      panes:
        - nvim -S
        - null
        - null

If a codebase needs a different layout I create a .tmuxinator.yaml file at the codebase root and override what I need e.g. multiple windows with different layout or commands that should be used instead.

Keeping track of all the location of the codebases that I work on is tiresome, instead as I mentioned I only need to remember codebase names which will be mapped to tmux sessions, moreover, I should keep track of the codebases that are worth remembering because there might be codebases that I cloned once and never used again, to save the codebases worth remembering I ‘bookmark’ them in the file ~/.bookmarks.data which looks like this:

/Users/mauriciopoppe/.dotfiles
/Users/mauriciopoppe/go/src/github.com/mauriciopoppe/blog
/Users/mauriciopoppe/go/src/k8s.io/kubernetes
...

Once I clone a codebase worth remembering I cd into it and invoke a script bookmark that will save the absolute path in the file ~/.bookmarks.data.

Finally it comes time to pick a codebase that I want to work on, to do so I use a python script that reads the ~/.bookmarks.data file and feeds it to fzf to provide fuzzy finding over all the existing and saved (but not started) sessions, after a bookmark (or tmux session) is selected then it comes time to call tmuxinator within that directory and start a new tmux session or just switch to an existing one if the selected item was already a tmux session.

This ruby script is keymapped be called whenever I type <ctrl+space><ctrl+j> with this tmux config .

With the concepts learned above it comes time to talk about my workflow:

Log into my workstation, start the tmux server (or attach to one already running), I’ll usually see the session 0.
Think to the project that I want to work on first e.g. the kubernetes/kubernetes repo, I only need to remember kubernetes.
Type <ctrl+space><ctrl+j>, that’ll trigger the tmux-switch-client script and run fzf with my bookmarks and the tmux sessions that are running currently.
I type kubernetes and see the repos related with kubernetes, I move over the list and select the one that I want. Once selected it will create a new tmux session in the kubernetes codebase with a predefined tmux layout.
I may run a long running command like building kubernetes or creating a dev cluster. In the meantime I can work on a different project, I switch to it with <ctrl+space><ctrl+j>.
After working for some time in the other codebase I remember that I created a kubernetes dev cluster! I can switch to the kubernetes codebase to see the status of the build.
Rinse and repeat

Things that I’ve tried in the past:

tmux-continuum - This tool saves your tmux session layout automatically which is great! However when I used it it’d reopen all the tmux sessions that were stored, imagine having tens of codebases and seeing all of them getting created when you invoke tmux-continuum!

A mental context switch is an analogy of what an OS does under the hood to share a single CPU among processes , but applied to our day to day mental tasks, after all we only have 1 brain that’s already multitasking with unconscious processes like perception or breathing. ↩︎

Preparation for a Software Engineer interview

Mon, 12 Oct 2020 21:23:30 +0000

This interview preparation plan contains notes that apply to any company and specifics for my target companies. I planned to interview with my target companies Airbnb, Facebook, and Google at once. I wanted to have the onsites close to each other (with at least one day to rest between interviews) to maximize my chances to get into a big company. I got excellent recruiters that helped me schedule the 3 on-sites in 2 weeks. I was able to pass all of the Hiring Committees and got offers from Airbnb, Facebook, and Google. Disclaimer: This plan worked for me. It might or might not work for you.

Airbnb, Facebook, Google

Before you start

Preparing for an interview is a rewarding, stressful, and exciting experience. Passing an interview could be the outcome of enough focused preparation (luck is involved too, but I’ll get to this shortly). The process is stressful on every single stage from the first time you send your resume and wait for the automated program or the recruiter not to reject it until the last day you negotiate your offer with your recruiter. However, the process is rewarding because you improve your problem-solving skills, also, the concepts that you learn when you’re preparing for the system design interviews are invaluable and will be helpful throughout your career.

Luck is involved in the interview process. You may have prepared a lot but:

you might get stage fright during the interview, you can decrease the anxiety effect with enough practice, but I guess that this feeling will always be there
you might be unable to make progress because the problem is too hard to solve because you didn’t practice that topic enough, or you just missed that small insight
there might be an external factor that you can’t control; for example, you might not click with your interviewer, your interviewer might be having a bad day or during the Hiring Committee review, someone dislikes something about your round even though all the interviewers gave you a positive score (as an anecdote not even the Hiring Committee members are safe from themselves ).

You increase your chances of landing a job at a top company if you interview with more of them. It may seem obvious, but it took me some courage to attempt more times because I was afraid of rejection, it helped to switch my mentality to focus more on the preparation rather than the result.

For stage fright, practice a lot either at pramp or with a friend experienced in interviews (highly recommended), the more you practice, the better you’ll become at communicating your solution as well as keeping control of the time, when I started, I could come up with a solution for a coding problem, but I’d present it in a disordered way, I’d explain a solution and start coding it right away without realizing that I was solving the wrong problem or I’d talk too much without considering the time, and I’d eat valuable time in the interview that I could use for the follow-up question or tests, when I practiced system design interviews, I’d jump across the system design stages or I’d go way too deep into the detailed design when I talked about the high-level design. After I did enough mock interview rounds with my friend, I corrected these problems and came up with a systematic way to tackle each coding and system design problem.

For challenging problems, always try to come up with examples. I got follow-up questions where I didn’t know where to start, so after cycling through some data structures and algorithms I’d write some examples, I’d find a pattern, and finally, the data structures and algorithms that would help me solve the problem. If you can’t make progress, ask for help! If you get an excellent interviewer, he/she might realize you need help and give you hints but in any case, keep this step as a last resort when you’re completely stuck.

For external factors, there’s nothing you can do. Just focus your energy on the next interview instead of thinking about what you could’ve done better.

Finally, my journey wasn’t without failures. I failed multiple interviews with top companies, I was naive in the past, and when I decided to go for an interview I’d “put all of my eggs in one basket” (meaning interviewing only with one company at a time in years) and fail but now I realize that the interview process gave me invaluable information that helped me in my next attempts, if you fail, it just means that you’ve failed that attempt. After all of your rounds, reflect on what you did right and wrong and focus on improving that part for the next shot.

Luck is what happens when preparation meets opportunity

Seneca

Getting an offer is not the end of the journey. It might be possible that even after you reach your target company and you work there for some years, you might look for new challenges in other top companies. The interview skill is something that you should keep up to date. Good luck!

Summary

(If you haven’t interviewed in the past and wanna get noticed)

If you’re a student, enjoy the student lifestyle! Work on challenging projects that make you stand out from the rest. In my case, I used to be a competitive programmer and also liked creating some projects for fun, I managed to have a JS library featured in a magazine once! , ( link to the project ).
Participate in Coding Contests , I solved thousands of problems over the years; my solutions are in this repo .
Google has this website called https://foobar.withgoogle.com/ (learn about it here ). When I was reading about web performance, I saw a pink button somewhere inside the Web Fundamentals Rendering Performance Guides which led me to the foobar site, I solved 3 problems in python and got an interview invitation and then I solved the remaining 2 and got more invitations to give to friends.
If you are working full-time, look for opportunities to work on challenging problems and show leadership, take the chance, no pain, no gain.
Make sure your resume shines and aim for a 1-page resume, here’s the CV I used for this round (with my details removed) .

Before looking to interview:

Do the daily Leetcode challenge. It’s a great way to keep the coding interview skill up to date
Participate in Leetcode contests every Saturday, or Codeforces when it’s available
Read about or code scalable systems, read interesting papers and read the red book Designing Data Intensive Applications , I focused a lot on this part since the last time I failed my previous onsite was because of system design.
Get to know me and the places I’d like to work, check if my values match the company core values

I’ve been a fan of their engineering blog ever since I saw this article Rearchitecting Airbnb’s Frontend , moreover, I had the opportunity to use part of their work presented in Dynein: Building an Open-source Distributed Delayed Job Queueing System at work, also, the obsession for what “belong anywhere” means matches my core values. If you don’t know what it means, I highly recommend this Ted talk by Joe Gebbia, one of the co-founders of Airbnb and if you understand Spanish, listen to this story by a writer that had a heart attack while staying at an Airbnb

In the Facebook HQ you see their core values everywhere. One phrase that stuck with me in one of their buildings is what would you do if you weren’t afraid?. A few of my friends have been working there for a long time, Facebook’s been one of my target companies for a while, their engineering blog is filled with so many gems across multiple fields. As a Frontend developer, I learned a lot from Browserlab whose infrastructure design I used as inspiration in a project I did at work with WebPagetest .

I use a lot of Google products every day, and I admire their high quality. Google’s always setting standards in engineering in the industry and ensuring that their research is available to the world. Many companies, including mine, have benefited from all the open-source projects they produce/support. Kubernetes is the core component in many companies’ infra, and it looks like a magic box to the engineers that use it.

NOTE about picking your companies: I only interviewed with the companies I wanted to work for, which is risky. Another strategy is to apply to all the top companies hoping to get multiple offers, which helps during the negotiation phase. During my preparation, I watched videos from various founders from all of the top companies and discovered that I’d like to work at Stripe too.

When I was ready to interview

Contact the recruiters, schedule phone screens in 3 to 4 weeks. More time to prepare for the phone screen is risky because I’ll burn out even before practicing the system design questions.
Practice only coding for the phone screens. The plan is detailed below.
After the phone screen interviews, schedule on sites in 4 to 6 weeks (I did 4 weeks), I also ensured that the onsites were as close as possible to a target date with enough breathing room to rest.
Practice mainly system design and coding to a less extent, give yourself enough breathing room not to burn out.
Practice behavioral types of questions. Each company has its way to assess this part that’s described below.
Go through the onsites, get to a mental state where you’re ready for the next interview regardless of the outcome of the previous round.
Receive feedback from the recruiters and the HC, go through the team matching interviews if you passed the HC, remember that you don’t have an offer yet.
If you get offers, learn to negotiate, read Salary negotiation strategies everyone in tech already knows — but you don’t , How I negotiated a $300,000 job offer in Silicon Valley , and/or get professional help.
If you don’t get an offer, learn from your mistakes, and move on, you’ll have enough time to reflect and think about the plan for the next attempt.

Coding

Phone screen preparation

The standard coding interview books

My phone screen preparation took 3 to 4 weeks, warmup for the 1st week: solve a few easy/medium problems in Leetcode, remember the C++ STL ( if your programming language is C++ I have a refresher article ).
Focus on breadth, a nice schedule based on your timeline can be found in the EPI book, then pick problems from your weakest area.
Practice EPI questions. I’ve already read EPI multiple times, so I just had to review their intro for each chapter’s STL methods and glance through the problems and the solutions. To practice and test if your implementation works, use the EPI Judge . Start solving medium-type questions and target hard questions in Leetcode, review medium questions that you may have solved in the past, and study how other people solved it (you’ll learn something new every time).
Go through this list of patterns: Important and Useful links from all over Leetcode . It’s by far the best source of knowledge and problems to solve for your preparation.
Participate in the weekly Leetcode contest every Saturday at 7:30PM PT.
Buy Leetcode premium and do a lot of mock interviews (at least 2 per day), force yourself to be in the session and don’t exit prematurely, get into the habit of drawing examples in comments (when I interviewed, there were only virtual onsites).

Here’s an example of what I’d do in my head and code, say what you’re thinking out loud. Your thought process might make you come up with a strategy or might help the interviewer guide you if you’re stuck

Example Problem: 1105. Filling Bookcase Shelves

We’re given an array of books represented as [width, height] and a parameter shelf_width. We want to accumulate consecutive books on the shelves of a bookshelf. The sum of the widths of the books must not be greater than the input shelf_width. The height of a single shelf is the max height across all the books on the shelf. We’re looking to minimize the bookshelf’s total height, which is the sum of all the max heights across all of the shelves.

Input: books = [[1,1],[2,3],[2,3],[1,1],[1,1],[1,1],[1,2]], shelf_width = 4
Output: 6

I’m trying to understand why the output is 6, I think I’d get 6 if I do:

[
   [1,1],                         sum(width) = 1, max height = 1
   [2,3], [2,3]                   sum(width) = 4, max height = 3
   [1,1], [1,1], [1,1], [1,2]     sum(width) = 4, max height = 2
]
sum(max height) = 6

Question: is it possible to have a book whose width is greater than shelf_width?

Answer: not possible

Question: it looks like I could have several solutions whose output is the optimal value. I guess we’re only interested in the sum of the max heights and not in the location of each book right?

Answer: yes, not interested in the location of the books, just in the height of the bookshelf

Question: is it possible for the sum to overflow int?

Answer: no, the answer fits inside an int

Possible greedy strategy: It looks like I can solve it with a greedy strategy if take items from right to left (i.e., starting at the bottom) and I can keep accumulating books until I get enough books for the current shelf whose sum doesn’t go beyond shelf_width e.g.

(reversed order e.g. bottom to top)
[
    [1,2], [1,1], [1,1], [1,1]
    [2,3], [2,3],
    [1,1]
]

Possible question that I might get asked: is this greedy strategy going to work for all the cases? Is there an example where it fails?

My answer: I’ll think about a case that might break the greedy solution, the case that breaks it is:

books = [5,5], [2,5], [2,2], shelf_width = 7

[
    [2,2], [2,5]
    [5,5]
]

In the case above, I’d take [2,2], [2,5] in the last row and [5,5] in the first one with a total height of 10; the optimal is 7 so I think that the greedy approach won’t work.

Brainstorm brute force: For a brute-force solution, I’d put some books on a shelf and then attempt to put the remaining books in the next shelf and so on recursively, in the recursion, I’d have a parameter i that would tell me where to start in the array of books, to decide how many books I can put on a shelf I’d also need an accumulator that keeps track of the current width sum

Optimized brute force width dp: I see that the solution above will create cases where we’re trying to solve the problem with the same parameters again, I think we can use DP, and the recurrence would be:

T(i) = min(max(height[i], height[i+1], ..., height[i+k]) + T(k + 1))   from i = 0 up to i = books.size()
T(books.size()) = 0
constraint for T(i): sum(width[i], width[i+1], ..., width[i+k]) <= shelf_width

The time complexity would be O(mn) where m is a variable whose value depends on how many books I can put on a shelf, I think that in the worst case it could be n so overall, it’s O(n^2).

The space complexity would be O(n) because we’re storing a solution for every index of the books’ array.

Question for the interviewer Do you think that this algorithm would work? I don’t know what’s the max value of n?

Possible answer Yes, it’ll work, n is small enough so that an O(n^2) algorithm works.

Finally, you can code it and test it, make sure you review your implementation before testing as you may find variables that are invalid or small logic errors

class Solution {
public:
    int minHeightShelves(vector<vector<int>>& books, int shelf_width) {
        const int INF = 1e9;
        vector<int> dp(books.size(), INF);

        function<int(int)> solve = [&](int i) -> int {
            if (i == books.size()) return 0;
            if (dp[i] != INF) return dp[i];
            int max_h = 0;
            int acc_width = 0;
            int j = i;
            while (j < books.size() && books[j][0] + acc_width <= shelf_width) {
                acc_width += books[j][0];
                max_h = max(max_h, books[j][1]);
                dp[i] = min(dp[i], max_h + solve(j + 1));
                ++j;
            }
            return dp[i];
        };

        return solve(0);
    }
};

/*

T(0) =
        [1,1] + T(1)
        [1,1], [2,3] + T(2)

T(1) =
        [2,3] + T(2)
        [2,3], [2,3] + T(3)

T(2) =
        [2,3] + T(3)
        ... this path would take the incorrect branch
*/

Then compute the last T(x) (the ones closer to the end of the array) and propagate the values up the stack.

Come up with additional tests if you have time, probably you can handle edge cases like what if the array is empty too. If you get a good question , the above will not be enough, and you might have a followup question where you attempt to solve the problem with less space or with better time complexity.

Also, go through the questions sorted by company. Depending on the company, you might get asked questions from their pool, don’t memorize questions; instead, learn the techniques used in each problem.
Master Big O notation and understand the time/space complexity of all the data structures that you might use , big focus here on the differences between a map, unordered_map, set, unordered_set, multiset, priority_queue, queue, stack, vector, etc. Also master Big O for recurrences (you can derive the master theorem so you don’t need to memorize it, learn how to solve recurrences by making a guess and proving your guess by induction or by unrolling the recurrence )
Pair with a friend and do 45 minute interviews. Take turns to interview each other, this, in my opinion, is the best way to prepare because you’re doing what you’re supposed to do in an interview.

Onsite preparation

After the phone screen interviews, schedule the onsites in 4 to 6 weeks (I did 4 weeks)
Pick new problems based on how weak I felt in the patterns shown in the leetcode master link , I discarded some problems that had a bad thumbs up/thumbs down ratio
Pick LC solved problems (medium and hard) and review my solutions if I did them recently. If I solved them a long time ago, redo them, read explanations by other coders in the discussion tab.
Do mock interviews again and again. I eventually went through all the onsite mock interviews for Google and Facebook
Participate in the weekly Leetcode contest

Some additional notes for each of my target companies

They may ask a single LC Hard Question in the 45m interview, check the ones tagged with Airbnb in Leetcode
The code that you write MUST compile and run with test cases that you prepare. Make sure to know the libraries that you need to include in your file to compile your code

One of their core values is move fast therefore, they expect you to come up with a brute force solution and an optimized solution pretty fast, practice for speed.
The coding platform is coderpad with their logo watermark
You can’t run your code, so make sure you check it once after you’re done coding and before testing it and debug it with some test cases
They typically ask 2 LC easy to medium type of questions. It might be possible that you run out of time explaining your approach! As I said, practice for speed.

Hardest one to practice because of how unpredictable it is. You may get a warmup question that has a follow up that turns it into medium or hard. You may also get a problem with a nice story hiding a well-known algorithm like sliding-window.
Interviews used to be in google docs, but now they have their proprietary coding platform at interview.google.com (the recruiting coordinator will give you unique links for each one).
You can’t run your code, so make sure you check it once after you’re done coding and before testing it and debug it with some test cases.
You may need to do back of the envelope calculations in the coding section too, make sure you understand the memory layout of a program . It could help you estimate the memory required for your program in a real-life scenario.
For the followups that you won’t be able to solve because of the time limit, get into the habit of expressing your ideas clearly in typed text. I don’t know if these notes are used by the Interviewer or read by the HC, for example:
- Interviewer: The solution you proposed should work fine. How would you modify your algorithm so that it runs faster if there are no resource constraints?
- You: I can improve the performance of my algorithm by using multiple threads doing X, or I can parallelize the work by using map reduce where the map function is M and the reduce function is R.

About practicing with someone

Practicing with someone and taking turns is the best way to get used to the interview environment. To practice, I used this template with a friend: Google Docs template

About the post onsite hype

After going through 5 rounds in an onsite I’d feel that I’ve nailed the interview. In reality, your performance might be around average or even below average; you never know. If you’re doing multiple onsites consecutively, expect the worst but hope for the best, don’t let the hype felt after an onsite impact your performance on the next one!

Reading list

Interesting Problems + Hints:

Google Docs - Coding Interview Notes

Sites to practice:

Leetcode , I have a university discount, so I got the premium for a year for 99$. The LC contests start on Saturdays at 7PM PDT
Codeforces , if you like harder problems, then try Codeforces. I think the Div2 A, B, and C problems are similar to what you’d get in an interview

Books:

Elements of Programming Interviews - Standard resource
Competitive Programmer’s Handbook - This may look advanced for the interviews, but chapters 6 (Greedy Algorithms) , 8 (Amortized Analysis), 10 (Bit Manipulation), 26 (String Algorithms) have techniques commonly used in coding interviews.
Cracking the Coding Interview - Great resource. I like how problems are solved in EPI more, though.

System Design

System design interviews are very unpredictable. You could practice a lot, but the problem you may get touches a point that you’ve never seen before, so you have to develop something based on your experience. If you’re targeting L5+ at Google and Airbnb or L4+ at Facebook you’ll have at least one System Design interview.

Acquiring knowledge

Watch all the videos from the MIT 6.824 Distributed Systems course and do the labs, this resource helped immensely during the team matching phase at Google, where I had a chance to show what I learned and how I could be useful to the team, my favorite lectures: all the Raft ones and how Facebook uses Memcached.
Watch tons of presentations about how big companies solve problems at scale; my links will be below
Learn algorithms and data structures used in distributed systems; my links will be below
Read the cloud design patterns for building reliable, scalable, secure applications in the cloud.
If you’re a full-time Software Engineer, then ask your manager for more challenging problems. I’m honored to have had a fantastic manager who genuinely cared about me in my previous company and helped me work on big projects where I had the chance to learn and grow. That was one of the reasons why I stayed there for so long.

Onsite preparation

Learn how to use google drawings, this is my google drawings template and this is an example of how I used it
Master the structure of the interview, I followed this system design template
Define functional and non-functional requirements, don’t make too many assumptions, and if you do say them out loud, so your interviewer is aware of this, big focus on mentioning tradeoffs

Design a system that does X and handles REQ requests doing W writes and R reads

What are the most critical features?
Daily active users, traffic volume, read/write ratio?
Are these writes in single/multiple regions?
Access patterns, even load vs. spikes throughout the day?
Latency requirements, tradeoff fast reads for slow writes?
Data consistency, eventual vs. strong consistency

Design a chat system where users send and receive messages in real-time

(after talking about the functional and non-functional requirements)

The client can use the following approaches:

polling, here the client will periodically make requests at some fixed rate like every 30s, the disadvantage is that every time we’re creating a new connection and wasting server resources because we might not have any message to send or receive.
long polling, this approach is similar to polling. However, we keep the connection open and wait for the server to send some data across the wire, as soon as we receive data, we reopen/create a connection
web sockets, in this approach, we keep the connection open since web sockets connections are persistent and made for bidirectional communication, for the application protocol, it could depend on the devices we’ll use for the chat system, if it’s a battery-powered device, then the overhead of the XMPP protocol is the fact that the device has to parse XML, which could be detrimental to the battery life, instead, the MQTT protocol is designed to use bandwidth and battery sparingly. At the same time, the XMPP protocol is extensible and adaptable that we could use if we want additional functionality like bots.

Do some back of the envelope calculations if needed, always clarify with your interviewer. This step is crucial at big companies where you need to design for scale and think about capacity planning before you design your system, I have created an article with exercises and estimates that could be helpful , I’ve taken this example from Gaurav Sen’s awesome course on system design

We’re going to design an email server for 2B users, when a user sends an email it can attach files with a size up to 1MB, How much storage do we need per day to store emails?

Let’s say each email has 200 characters, on average. A user receives emails from useful connections, companies and spam. Assume 20 spam emails, 20 marketing emails and 10 useful emails, per user per day.

Email data = Emails * Characters * Users
           = 50 * 200 * 2B
           = 20 TB

Attachment data = number of emails with attachments * average attachment size
                = 5% of all emails * 1 MB
                = 5% * 50 * 2B * 1 MB = 5 PB

So the total space requirement is Email data + Attachment data = 20TB + 5 PB per day. This is a naively optimistic estimate, since we must account for redundancy (to improve performance and fault tolerance). Estimated total space requirement = (20TB + 5 PB) * 3 ~ 15PB per day.

Define the API (signature, inputs, outputs) and the data model. I moved between these two back and forth during the interview
High-level design, make sure that your design covers all the functional requirements, don’t go too deep here, or you’ll waste invaluable time
Pick a component (alternatively, the interviewer may pick it for you) and then explain why you need it; big focus here again on tradeoffs
Since this is a design for scale, you’ll need to split processing or data into multiple machines, learn how to handle failures at scale
If you have time, talk about things you’d do to maintain the system, including monitoring and security.

Low level system design

For senior levels L5+ you might encounter questions involving designing and coding the internals of a class. This type of question is more of a coding question than a system design question, but you should be aware of the tradeoffs you make at every stage in your design.

If you haven’t taken any Operating Systems class, I recommend the Graduate Introduction to Operating Systems by Georgia Tech , I’m in grad school as I write this, and I recently took this class. Unfortunately, the labs aren’t public, but the material is nevertheless a great introduction.

I’d suggest you learn about mutexes, condition variables, atomics, readers-writers, boss-worker model, pipeline model, cache lines, and so many others! Also, read and attempt to implement the following classes from scratch, either with C++ 11 Multithreading primitives or pthreads.

Semaphore
Threadpool - I implemented a threadpool in my Cpp refresher algorithm
Thread-safe dictionary, queue, stack, and priority queue
Parallel sort
Multithreaded crawler
Distributed key-value store - MIT’s Distributed System course has a lab on it
Distributed file system - The Georgia Tech Graduate Introduction to Operating Systems course’s Project 4 is all about this.

Resources:

Graduate Introduction to Operating Systems by Georgia Tech
Concurrent Programming with C++ - Excellent intro to multithreading concepts and primitives, a perfect mix of theory and practice.
Back to Basics: Concurrency - Arthur O’Dwyer - CppCon 2020 - Overview of modern concurrency in C++ 11.
Chapter 19 in EPI
CPU Caches and Why You Care - Fantastic introduction to cache lines. I learned more about data locality and that having more threads will not always improve your program’s performance as you thought it would.
Gaurav Sen’s System design course - In addition to system design questions, it also has low-level design problems.

Reading list

These are my notes about interesting tech

List of books

System design Interview - Alex Xu - My favorite book about system design interviews, it covers the system design interview structure and has many real life examples. What I really like about this is the focus on tradeoffs, it has 4 different rate limiting algorithms and all of them are valid, having the ability to demonstrate why you picked a solution over others is what the interview is about.
Web Scalability for Startup Engineers - Artur Ejsmont - Great intro to all of the components in a complex web application
Distributed Systems for Practitioners - Dimos Raptis - The why behind a technology and use cases in the industry, I liked the way it’s structured with the theory first and then practice.
Designing Data Intensive Applications - Martin Kleppmann - Deep into how distributed systems work, I think this is a standard resource by now.

List of courses:

Grokking the system design interview - Standard resource
System Design Primer - Standard resource
System Design Interview – Step By Step Guide - Amazing youtube channel for system design interviews
Gaurav Sen’s System design course - Perfect resource to learn about making iterations over time in your design and to do back of the envelope calculations
Reliable Google Cloud Infrastructure: Design and Process - A guide from Google on how to design systems for Google Cloud, even though the implementation is done with Google Cloud Services the videos about the functional/non functional requirement phase is very helpful, I learned about KPIs and their relationship with SLI, SLO and SLA.

Behavioral Questions

Google and Facebook have 1 behavioral round, and Airbnb has 2 behavioral rounds. For this part, Cracking the Coding interview helped a lot.

your level inside the company is based on your answers to this round. Talk about your leadership skills!
learn to introduce yourself; you might do this in some coding rounds too. A must for the team matching phase at Google or Airbnb, the template that I use is:

Hello, my name is {your name} and I’m a software engineer at {current company} where we do {description of the product}, I’m currently working on {project A doing front-end, back-end, infra, ml, etc.}. In the past, I worked at {previous company} where I did {more projects}. On the side, I’m doing {school coursework or extracurricular activities}, and my next objective is to achieve {objective in the short term}.

learn about STAR, create a grid with questions, projects, and answers for each one
practice with a friend. I’m friends with an awesome googler that helped me a lot here with mock interviews, we practiced many times, and he noticed that my answers were not specific enough or were not structured pretty well. To improve, I wrote down my answers to all of these questions and rehearsed them many times:
- Tell me about a challenging project
- What did you enjoy learning the most?
- Tell me about a time you had a conflict of priorities with your manager
- Tell me about a time you made a mistake
- Tell me about a time you had to make a difficult decision
- Tell me how you solved an unambiguous task at work
- What are the qualities of a good leader, according to you?

A special note about Airbnb, they do care about the culture fit more than anyone, spend some time understanding what belong anywhere means, I watched a lot of interviews with Brian Chesky, which helped me solidify my willingness to work at Airbnb (6 golden rules) , On the onsite, you have 2 behavioral rounds with questions that can be found here: https://candor.co/interviews/airbnb . The most important ones are

What does belonging mean to you? What is your understanding of Airbnb culture?
Why Airbnb?

Make sure you practice some of these even before you apply to Airbnb. The recruiter wants to know if you’re genuinely interested in Airbnb

They have a behavioral round of 30m with common behavioral questions and an easy coding question at the end. Read this article about Ramping Up a Senior Software Engineer

Pure behavioral round, if you’re targeting L5+ show that you’re a leader!

The end?

Success is not final, failure is not fatal, it is the courage to continue that counts.

Winston Churchill

Passing/failing the interview is not the end and if you fail it doesn’t mean that you’re not good enough, there are too many variables outside your control. So if it didn’t go well, try again!

Back of the envelope calculations

Sat, 08 Aug 2020 15:45:36 +0000

Calculate with exponents. A lot of back-of-the-envelope calculations are done with just coefficients and exponents, e.g. $c * 10^e$. Your goal is to get within an order of magnitude right that’s just $e$. $c$ matters a lot less. Only worrying about single-digit coefficients and exponents makes it much easier on a napkin (not to speak of all the zeros you avoid writing).

Latency Comparison Numbers
--------------------------
Source: https://gist.github.com/BlackHC/2d0a3a21542b524a7cf2f8eac977481e
Benchmarks for read: https://ssd.userbenchmark.com/, https://hdd.userbenchmark.com/

L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Snappy            3,000   ns        3 µs
Read 1 MB sequentially from memory      20,000   ns       20 us  .02 ms  ~50GB/s DDR5
Read 1 MB sequentially from NVMe       100,000   ns      100 us   .1 ms  ~10GB/sec NVMe, 5x memory
Read 1 MB sequentially from SSD        300,000   ns      300 µs   .3 ms  ~3GB/sec SSD, 15x memory, 3x NVMe
Round trip within same datacenter      500,000   ns      500 us   .5 ms
Read 1 MB sequentially from HDD      6,000,000   ns    6,000 µs    6 ms  ~150MB/sec, 300x memory, 60x NVMe, 20x SSD
Send 1 MB over 1 Gbps network       10,000,000   ns   10,000 us   10 ms
Disk seek                           10,000,000   ns   10,000 µs   10 ms  20x datacenter roundtrip
Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms

Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Cost Numbers
------------
Approximate numbers that should be consistent between Cloud providers.

What    Amount  $/Month
CPU          1      $10
Memory    1 GB       $1
SSD       1 GB     $0.1
HDD       1 GB    $0.01
S3, GCS   1 GB    $0.01
Network   1 GB    $0.01

1 request per second = 100k requests / day (exact 1 req/s = 86.4k req/day)
1 request per second = 2.5M requests / month
10 requests per second = 1M requests per day (exact 11.6 req/s = 1M req/day)
40 requests per second = 100 million requests per month
400 requests per second = 1 billion requests per month
6-7 world-wide round trips per second
2000 round trips per second within a data center
100k commands per second in an in-memory single-threaded data store
It’s typically the case that we can ignore any memory latency as soon as I/O is involved in a 1Gbps network, in cloud datacenters bandwidth is capped depending on the instance type, from the Google Cloud docs there are different limits for ingress and egress, for simplicity let’s assume 10Gbps for both.
- C4 and C4A lowest egress is 10Gbps
- C4 and C4A highest egress is 100Gbps
Writes are 40 times more expensive than reads, therefore architect for scaling writes!

Exercises

We get better at using this table by practicing, https://sirupsen.com/napkin/ has lots of exercises with different difficulty levels. The following exercises are a warmup to the ones in other places.

Let’s assume a data store with the following types:

In-memory data store: state stored in RAM memory (volatile).
Persistent data store: state stored in disk (non volatile).

The data store can be located:

in-process: in the same computer.
out-of-process: in a different computer (so there’s the need of packet transimission over the network).

Read 1MB from an out-of-process data store, consider both in-memory and persistent caches (SSD), assume a 1Gbps and a 10Gbps network.

1Gbps
- (in memory) 0.02 ms/MB (read from memory) + 10^1 ms (transmission) = 10.02 ms
- (persistent) 0.3 ms/MB (read from SSD) + 10^1 ms (transmission) = 10.3 ms
10Gbps
- (in memory) 0.02 ms/MB (read from memory) + 10^0 ms (transmission) = 1.02 ms
- (persistent) 0.3 ms/MB (read from SSD) + 10^0 ms (transmission) = 1.3 ms

Read 5GB from HDD, SSD and RAM then write 5GB to the same medium. Assume no network IO needed

Read 5GB:

(memory) 5*10^3 MB * 0.02 ms/MB (memory read) = 100ms = 0.1s
(SSD) 5*10^3 MB * 0.3 ms/MB (SSD read) = 1500 ms = 1.5s
(HDD) 5*10^3 MB * 6 ms/MB (HDD read) = 30000 ms = 30s

Write 5GB, let’s assume that a write is 40x slower than a read:

(memory) 40 (write penalty) * 0.1s (read) = 4s
(SSD) 40 (write penalty) * 1.5s = 60s
(HDD) 40 (write penalty) * 30s = 1200s

Store information about 2B users including basic info and a profile picture

Basic info: name (20 chars), dob (10 chars), email (20 chars) = 50 bytes, $2 * 10^9 * 50 B = 100 GB$
Profile picture: 100 KB, $2 * 10^9 * 100 * 10^3 B = 200 TB$

Your SSD-backed database has a usage-pattern that rewards you with a 80% page-cache hit-rate (i.e. 80% of disk reads are served directly out of memory instead of going to the SSD). The median number of pages (e.g. InnoDB pages in MySQL) read to serve a query is 50 . What is the expected average query time from your database?

The default size of a page in InnoDB is 16KB , for each query we read 50 pages, 50 * 0.8 = 40 are read from memory and 10 from SSD

40 pages read from memory: 40 * 16KB * 0.02 ms/MB = 640KB * 10^-3 MB/KB * 0.02 ms/MB = 0.0128 ms
10 pages read from SSD: 10 * 16KB * 0.3 ms/MB = 160KB * 10^-3 MB/KB * 0.3 ms/MB = 0.048ms

In real life we just round the numbers, 1ms tops for the sum. It’s typically the case that we can ignore any memory latency as soon as I/O is involved for low Gbps (1GB).

1Gbps
- 50 pages (50 * 16KB = 800KB) transmitted in about 10ms, 1ms (read pages) + 10ms (transmission) = 11ms
10Gbps
- 1ms (read pages) + 1ms (transmission) = 2ms

How many commands-per-second can a simple, in-memory, single-threaded data store do? Assume that the commands don’t do any server side processing. e.g. Reading data is just reading data from the memory/disk and isn’t applying any algorithms on it.

I/O controls the number of ops/s, assuming that we transmit 1KB $\frac{1s}{10 us} = 10^5$ = 100k ops/s

Amount of computing power to process 1PB everyday, assume that the time required for the computation of 1MB is 0.1s

10^9 MB * 10^01 s/MB = 10^8 MB
The above has to be computed everyday or in 10^5 s
- 10^8 s * 10^-5 day/s = 10^3 days

We would need $10^3$ machines to get the work done, assuming that the servers should be running at 50% capacity and with possible spikes we can provision $4 * 10^3$ processes.

We have 3 storage devices, a 128GB DRAM as a 1st level cache, a 600GB flash memory as a 2nd level cache and a rotational disk for storage. With a random read workload, the rotational disk delivers 2000 reads/s with an 8 KB I/O size. How much time would it take to warm both caches in the ideal scenario?

Throughput in terms of data transmitted over time: 2000 reads/s * 8 KB = 16 Mb/s.
1st level cache:
- Time to fill out the cache: 128 GB / 16 MB/s = 8000 s = ~2.3h
2nd level cache:
- Time to fill out the cache: 600 GB / 16 MB/s = 38400 s = ~10.67h

Introduction to Machine Learning

Thu, 25 Jun 2020 23:05:30 +0000

Machine Learning is great for:

Problems for which existing solutions require a lot of fine-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better than the traditional approach.
Complex problems for which using a traditional approach yields no good solution: the best Machine Learning techniques can perhaps find a solution.
Fluctuating environments: a Machine Learning system can adapt to new data. Getting insights about complex problems and large amounts of data.

Examples of applications

Analyzing images of products in a production line to classify them (CNN)
Detecting tumors in brain scans (CNN)
News articles classification (NLP, RNN, CNN or Transformers)
Flagging unwanted content (NLP)
Summarizing long documents (Text Summarization)
Chatbot (NLP, NLU)
Forecasting revenue based on many performance metrics (Linear Regression, SVM, NN)
Segmenting clients based on their purchases (Clustering)
Recommending a product based on past purchases (Artificial NN)
AI bot (RL)

Types of ML

Trained with human supervision (supervised, unsupervised, semisupervised, RL)
Whether or not they can learn incrementally on the fly (online vs batch learning)
Whether they work by comparing new data points to known data points or by detecting patterns in the training data and building a model (instance-based vs model-based learning)

Machine Learning Glossary

Mon, 25 May 2020 15:04:38 +0000

iterations, batch, batch size and epoch

A batch is the set of examples used in one iteration, the number of examples in the set is the batch size.
For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference; however, TensorFlow does permit dynamic batch sizes.
Each iteration is the span in which the system processes one batch of size batch size.
An epoch spans spans sufficient iterations to process every example in the dataset i.e. an epoch represents $\frac{N}{batchSize}$ training iterations where $N$ is the number of samples.

Batch, batch size, epoch

k-fold cross validation

From https://www.analyticsvidhya.com/blog/2018/05/improve-model-performance-cross-validation-in-python-r/

Randomly split your entire dataset into k “folds”
For each k-fold in your dataset, build your model on k – 1 folds of the dataset. Then, test the model to check the effectiveness for kth fold
Record the error you see on each of the predictions
Repeat this until each of the k-folds has served as the test set
The average of your k recorded errors is called the cross-validation error and will serve as your performance metric for the model

feature extraction

Merge several correlated features into one. Also see dimensionality reduction

sampling noise/bias

Sampling noise: nonrepresentative sample data as result of chance (typically when the sample is too small) Sampling bias: nonrepresentative sample data as result of a flaw in the sampling method

Hyperparameter tuning

Mon, 25 May 2020 12:13:51 +0000

From https://colab.research.google.com/github/google/eng-edu/blob/master/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb

Most machine learning problems require a lot of hyperparameter tuning. Unfortunately, we can’t provide concrete tuning rules for every model. Lowering the learning rate can help one model converge efficiently but make another model converge much too slowly. You must experiment to find the best set of hyperparameters for your dataset. That said, here are a few rules of thumb:

Training loss should steadily decrease, steeply at first, and then more slowly until the slope of the curve reaches or approaches zero.
If the training loss does not converge, train for more epochs.
If the training loss decreases too slowly, increase the learning rate. Note that setting the training loss too high may also prevent training loss from converging.
If the training loss varies wildly (that is, the training loss jumps around), decrease the learning rate.
Lowering the learning rate while increasing the number of epochs or the batch size is often a good combination.
Setting the batch size to a very small batch number can also cause instability. First, try large batch size values. Then, decrease the batch size until you see degradation.
For real-world datasets consisting of a very large number of examples, the entire dataset might not fit into memory. In such cases, you’ll need to reduce the batch size to enable a batch to fit into memory.

Data structures for massive datasets

Sat, 09 May 2020 17:24:19 +0000

Count min sketch

Problem: given a stream of data with keys and values, how can we get the sum of all the values for a given key?

Approximate solution: Assume that we have $d$ counter hash maps each one with its own hash function, every time we see a new key/value we add it to all the $d$ counter hash maps (update), to get the sum of values (estimate) we take the hash of the key and return the minimum value of the counters in all the $d$ hash maps, because the counter hash maps size is finite we will have collisions and a hash map may report a higher sum than what’s the true value.

Images taken from: Algorithms and Data Structures for Massive Datasets

Update

Estimate

https://florian.github.io/count-min-sketch/

Applications

Top k elements, every time we update the count min sketch we also call estimate and insert the record to a min heap, when the heap’s capacity is greater than $k$ we remove the topmost item from the heap.

Similarity of words, assume that we have a stream of pairs (word, context), the problem is to find if two words A, B are similar in meaning based on the context where they appear, the similarity of two words is computed with:

$$ PMI(A, B) = log \frac{P(A, B)}{P(A) P(B)} $$

To solve the problem we can create a matrix of size O(number of words * number of contexts). The intuition behind this formula is that it measures how likely A and B are to occur close to each other (enumerator) in comparison to how often they would co-occur if they were independent (denominator).

To answer queries we can processes by using a matrix $M$ where the entry $M_{A,B}$ contains the number of times the word A appears in the context B, the problem is that the number of word context pairs gets quickly out of hand.

The solution is to transform the matrix such that the word-context pair frequencies are stored in the count-min sketch, the occurrences of words and contexts are kept in other hash maps.

Range queries Use a segment tree where each node is a CMS

Images taken from: Algorithms and Data Structures for Massive Datasets

Update

Read

e-approximate heavy hitters In a stream where the total number of frequencies is $n$ (for example if frequencies are all 1, then $N$ corresponds to the number of elements encountered thus far in the stream) output all the items that occur at least $n/k$ times, when $k=2$ this problem is known as the majority element.

If $n$ is known in advance we can process the array elements using a count-min sketch in a single pass, and remember an element once its estimated frequency (according to the count-min sketch) is at least $n/k$

If $n$ is not known in advance we use a min-heap, in a single pass we maintain the number of elements seen so far $m$ when processing the next element $x$ we call update(x, 1) and then estimate(x), if the estimate is $\geq m/k$ we store $x$ in the heap, Also, whenever $m$ grows to the point that some object $x$ stored in the heap has a key less than $m/k$ (checkable in O(1) time via Find-Min), we delete $x$ from the heap (via Extract-Min). After finishing the pass, we output all the objects in the heap

trending hashtags Quantify how different the currently observed activity against an estimate of the expected activity, for each hashtag store how many times it’s shared in an X-minute window over the last Y days $C(h, t)$ (normalized to get $P(h, t)$ i.e. $P(h, t) = \tfrac{C(h, t)}{\sum_{i=0}^{n}C(h, t_i)}$), at a new time $t$ we can compute $C(h, t)$ and $P’(h, t)$ then use KL divergence to measure the difference between the probabilities

$$ S(h, t) = P(h, t) ln \left ( \frac{P(h, t)}{P'(h, t)} \right ) $$

The top $k$ trending hashtags can be computed with a heap

Based on https://instagram-engineering.com/trending-on-instagram-b749450e6d93

Bloom filter

Problem: test if an element doesn’t exist in a set

Approximate solution: same as count min sketch, if the returned value is zero then we’re sure the element is not in the set, otherwise, it might be in the set, and we need to test for existence with another (more expensive) data structure

For more info read:

Applications

SSTable reads In the read path, Cassandra merges data on disk (in SSTables) with data in RAM (in memtables). To avoid checking every SSTable data file for the partition being requested we can query the SSTable bloom filter.

Reservoir sampling

Problem: given a stream of elements, we want to sample k random ones, without replacement and by using uniform probabilities

Solution: store first $k$ elements, for the $i$-th element add it to the reservoir with a probability of $k/i$, this is done by replacing a randomly selected element in the reservoir

https://florian.github.io/reservoir-sampling/

Expectation maximization

Mon, 16 Mar 2020 21:21:00 +0000

K-means clustering

Suppose we have a data set $\{\textbf{x}_1, \ldots, \textbf{x}_n\}$ consisting of $N$ observations of a random $D$ dimensional space, the goal is to partition the data set into some number $K$ of clusters, formally let $\{ \pmb{\mu} _1, \ldots, \pmb{\mu}_k \}$ be a set of $D$ dimensional vectors in which $\pmb{\mu}_k$ is associated with the $k^{th}$ cluster ($\pmb{\mu}_k$ can be thought as the centres of the clusters). The goal is to find an assignment of data points so that the distance of each data point to its closest vector $\pmb{\mu}_k$ is a minimum.

Let $r_{nk} \in \{0, 1\}$ where $k = 1, \ldots K$ describing the assignment of each data point to a cluster (1 if it’s assigned to a cluster and 0 if not), we define a function called distortion measure given by

$$ \begin{equation} \label{distortion_measure} J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} \magnitude{\mathbf{x}_n - \pmb{\mu}_k}^2 \end{equation} $$

Which represent the sum of the squares of the distances of each data point to its assigned vector $\pmb{\mu}_k$, our goal is to find values for $r_{nk}$ and the $\pmb{\mu}_k$ so as to minimize $J$, the algorithm is as follows:

Algorithm:

pick initial values for the $\pmb{\mu}$
minimize J with respect to $r_{nk}$ keeping the $\pmb{\mu}_k$ fixed (Expectation)
minimize J with respect to the $\pmb{\mu}_k$ keeping $r_{nk}$ fixed (Maximization)

Multivariate gaussian distribution

For a random variable $X$ with a finite number of outcomes $x_1, x_2, \ldots, x_n$ occurring with probabilities $p_1, p_2, \ldots, p_n$ the expectation of $X$ is defined as:

$$ E[X] = \sum_{i=1}^{N} x_i p_i $$

The covariance between two variables $X, Y$ is defined as the expected value (or mean) of the product of their deviations from their individual expected values

$$ cov(X,Y) = E[(X - E[X])(Y - E[Y])] $$

When working with multiple variables $X_1, X_2, X_n$ the covariance matrix denoted as $\Sigma$ is the $n \times n$ matrix whose $(i, j)$th entry is $cov(X_i, X_j)$

The density function of a univariate gaussian distribution is given by:

$$ p(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi \sigma^2}}\exp\left(-\frac{1}{2\sigma^2}(x - \mu)^2\right) $$

$(x - \mu)^2$ is always positive
the value $k(x, \mu) = -\frac{1}{2 \sigma^2}(x - \mu)^2$ is a always negative, it’s a parabola pointing downward
the $\exp(k(x, \mu))$ part makes sure that the quantity is always >= 0
the normalization factor $\frac{1}{\sqrt{2\pi \sigma^2}}$ multiples $\exp(k(x, \mu))$ so that this sum equals 1

$$ \underbrace{\frac{1}{\sqrt{2\pi \sigma^2}}}_\text{normalization factor} \int_{-\infty}^{\infty} \exp \left(-\frac{1}{2\sigma^2}(x - \mu)^2\right) = 1 $$

A vector random variable $X = [X_1, \ldots, X_n]^T$ is said to have a multivariate gaussian distribution with mean $\mu \in \mathbf{R}^n$ and covariance matrix $\Sigma$ if its probability density function is given by

$$ p(x; \mu, \Sigma) = \frac{1}{(2 \pi) ^ {n/2} \norm{\Sigma}^{1/2} } \exp \left ( -\frac{1}{2} (x - \mu)^T \Sigma ^{-1} (x - \mu) \right ) $$

Like in the univariate case the argument of the exponential function is a downward opening bowl, the coefficient in front is a normalization factor used to ensure that

$$ \underbrace{\frac{1}{(2 \pi) ^ {n/2} \norm{\Sigma}^{1/2} }}_\text{normalization factor} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \ldots \int_{-\infty}^{\infty} \exp \left ( -\frac{1}{2} (x - \mu)^T \Sigma ^{-1} (x - \mu) \right ) dx_1 dx_2 \cdots dx_n = 1 $$

Gaussian mixture models and EM

https://www.youtube.com/watch?v=qMTuMa86NzU

Bayesian Networks

Thu, 05 Mar 2020 18:11:00 +0000

Introduction

A Bayesian network is a directed graph in which each node is annotated with quantitative probability information. The full specification is as follows:

Each node corresponds to a random variable, which may be discrete or continuous.
A set of directed links or arrows connects pairs of nodes. If there is an arrow from node $X$ to node $Y$, $X$ is said to be a parent of $Y$. The graph has no directed cycles (and hence is a directed acyclic graph, or DAG.
Each node $X_i$ has a conditional probability distribution $P(X_i|Parents(X_i))$ that quantifies the effect of the parents on the node.

Example Bayesian Network

Semantics of a bayesian network:

The network is a representation of a joint probability distribution
Encoding of a collection of conditional independence statements

Full joint distribution

Given by:

$$ \begin{align} P(x_1, \ldots, x_n) = \prod_{i=1}^{N} P(x_i|Parents(X_i)) \end{align} $$

Which can be rewritten as:

$$ \begin{align*} P(x_1, \ldots, x_n) &= P(x_n | x_{n-1}, \ldots x_1) P(x_{n-1}, \ldots, x_1) \\ &= P(x_n | x_{n-1}, \ldots x_1) P(x_{n-1} | x_{n-2}, \ldots, x_1) \cdots P(x_2 | x_1) P(x_1) \\ &= \prod_{i=1}^{N} P(x_i | x_{i-1}, \ldots x_1) \;\; \text{(identity known as chain rule)} \end{align*} $$

The above is equivalent to

$$ P(X_i | X_{i-1}, \ldots X_1) = P(X_i|Parents(X_i)) $$

Provided that $Parents(X_i) \subseteq { X_{i-1}, \ldots, X_1 }$

Conditional independence relations in bayesian networks

Steps to determine if two variables are conditionally independent

Draw the ancestral graph Construct the “ancestral graph” of all variables mentioned in the probability expression. This is a reduced version of the original net, consisting only of the variables mentioned and all of their ancestors (parents, parents’ parents, etc.)
Moralize the ancestral graph by marrying the parents For each pair of variables with a common child, draw an undirected edge (line) between them. (If a variable has more than two parents, draw lines between every pair of parents.)
Disorient the graph by replacing the directed edges (arrows) with undirected edges (lines).
Delete the givens and their edges. If the independence question had any given variables, erase those variables from the graph and erase all of their connections, too.
Given a query between two variables A, B
If the variables are disconnected then they’re independent
If the variables are connected then they’re dependent
If the variables are missing because they were a given, they’re independent

In the following example we skip step 1 and moralize the entire bayesian network

Example Bayesian Network

Example Bayesian Network Moralized

Some conditional independence queries ($\ci$ meaning conditionally independent of), delete the givens and their edges to check the connection between the query variables:

Is $A \ci B \given C$? No, there is a path A-B
Is $A \ci E \given C$? No, there is a path A-B-D-E
Is $A \ci E \given C,D$? Yes, there isn’t a path between A and E
Is $A \ci D \given C$? No, there is a path A-B-D
Is $B \ci E \given C$? No, there is a path B-D-E
Is $A \ci F \given C$? No, there is a path A-B-D-E-F
Is $A \ci F \given C,D$? Yes, there isn’t a path between A and F

Exact inference

Compute the posterior probability distribution for a set of query values given some observed event (set of evidence variables)

By enumeration

Any conditional probability can be computed by summing terms from the full joint distribution

$$ \textbf{P}(X \given \textbf{e}) = \alpha \textbf{P} (X, \textbf{e}) = \alpha \sum_{y} \textbf{P} (X, \textbf{e}, \textbf{y}) $$

Working with the example below we can answer some queries:

Example Bayesian Network

$$ \newcommand\g[1]{\color{gray}{#1}} $$

$$ \begin{align*} P(c|a) &= \alpha \sum_{b} P(a) P(b) P(c \given a,b) \\ &= \alpha P(a) \sum_{b} P(b) P(c \given a,b) \\ &= \alpha P(a) \sum_{b} \bordermatrix{\g{b} & \g{\neg b}}{}{\begin{bmatrix} 0.4 & 0.6 \end{bmatrix}} \bordermatrix{\g{b} & \g{\neg b}}{\g{c} \\ \g{\neg c}}{\begin{bmatrix} 0.55 & 0.5 \\ 0.45 & 0.5 \end{bmatrix}} \\ &= \alpha P(a) \sum_{b} \bordermatrix{\g{b} & \g{\neg b}}{\g{c} \\ \g{\neg c}}{\begin{bmatrix} 0.22 & 0.3 \\ 0.18 & 0.3 \end{bmatrix}} \\ &= \alpha \; 0.2 \; \bordermatrix{}{\g{c} \\ \g{\neg c}}{\begin{bmatrix} 0.52 \\ 0.48 \end{bmatrix}} \\ &= \bordermatrix{}{\g{c} \\ \g{\neg c}}{\begin{bmatrix} 0.104 \\ 0.096 \end{bmatrix}} \\ &= \alpha [.104, .096] \\ &= [\textbf{0.52}, 0.48] \end{align*} $$

$$ \begin{align*} P(e|\neg c, b) &= \alpha \sum_{a,d,f} P(a) P(b) P(\neg c \given a,b) P(d \given b) P(e \given \neg c,d) P(f \given e) \\ &= \alpha P(b) \sum_{a} P(a) P(\neg c \given a,b) \sum_{d} P(d \given b) P(e \given \neg c,d) \sum_{f} P(f \given e) \\ &= \alpha P(b) \sum_{a} P(a) P(\neg c \given a,b) \sum_{d} P(d \given b) P(e \given \neg c,d) \sum_{f} \bordermatrix{\g{f} & \g{\neg f}}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.7 & 0.3 \\ 0.2 & 0.8 \end{bmatrix}} \\ &= \alpha P(b) \sum_{a} P(a) P(\neg c \given a,b) \sum_{d} P(d \given b) P(e \given \neg c,d) \bordermatrix{}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}} \\ &= \alpha P(b) \sum_{a} P(a) P(\neg c \given a,b) \sum_{d} \bordermatrix{\g{d} & \g{\neg d}}{}{\begin{bmatrix} 0.6 & 0.4 \end{bmatrix}} \bordermatrix{\g{d} & \g{\neg d}}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.45 & 0.2 \\ 0.55 & 0.8 \end{bmatrix}} \bordermatrix{}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}} \\ &= \alpha P(b) \sum_{a} P(a) P(\neg c \given a,b) \sum_{d} \bordermatrix{\g{d} & \g{\neg d}}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.27 & 0.08 \\ 0.33 & 0.32 \end{bmatrix}} \\ &= \alpha P(b) \sum_{a} P(a) P(\neg c \given a,b) \bordermatrix{}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.35 \\ 0.65 \end{bmatrix}} \\ &= \alpha P(b) \sum_{a} \bordermatrix{}{\g{a} \\ \g{\neg a}}{\begin{bmatrix} 0.2 \\ 0.8 \end{bmatrix}} \bordermatrix{\g{\neg c, b}}{\g{a} \\ \g{\neg a}}{\begin{bmatrix} 0.45 \\ 0.55 \end{bmatrix}} \bordermatrix{}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.35 \\ 0.65 \end{bmatrix}} \\ &= \alpha P(b) \sum_{a} \bordermatrix{}{\g{a} \\ \g{\neg a}}{\begin{bmatrix} 0.09 \\ 0.44 \end{bmatrix}} \bordermatrix{}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.35 \\ 0.65 \end{bmatrix}} \quad \text{the element wise product is with unrelated bases so we do with $a^T$} \\ &= \alpha P(b) \sum_{a} \bordermatrix{\g{a} & \g{\neg a}}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.0315 & 0.154 \\ 0.0585 & 0.2275 \end{bmatrix}} \\ &= \bordermatrix{}{\g{e} \\ \g{\neg e}}{\begin{bmatrix} 0.1855 \\ 0.286 \end{bmatrix}} \quad \text{$P(b)$ is not a factor, it's an evidence} \\ &= \alpha [0.1855, 0.286] \\ &= [\textbf{0.393425239}, 0.606574761] \end{align*} $$

Reading

Kafka

Sat, 29 Feb 2020 15:36:00 +0000

Kafka

https://www.slideshare.net/mumrah/kafka-talk-tri-hug

Key Choices

pub/sub messaging pattern
messages are persistent (stored in disk)
consumer keep their own state (stored in zookeeper)

Technology Summary

Concept	Notes
Brokers	Receive messages from producers (sequential write, push) and deliver messages to consumers (sequential read, pull) Messages are flushed to append-only log files
Topics	Logical collection of partitions mapped across many brokers
Partition	Physical append-only log files, a broker contains some of the partitions for a topic
Replication	Partitions are replicated, one broker is the leader and all writes/reads must go through it (replication is for fault tolerance only), replication can be tuned to write to N replicas
Producer	Responsible for load balancing messages among brokers, they can discover all brokers from a single one High level api: `Producer#send(String topic, K key, V value)` Determines the partition based on the key (default hash mod) e.g. `send("A", "foo", message)` in the example below: `"foo" mod 2` No total ordering across partitions Guaranteed ordering inside the partition. Useful if the key is a PK, if so all the messages related with that key will be ordered.
Consumer	Request a range of messages from a broker, responsible for their own state i.e. its own iterator High level api: `Map<String, List<KafkaStream>> Consumer.connector(Collections.singletonMap("topic", nPartitions))` Blocking/non blocking behavior
Consumer Group	Multiple consumers can be part of a consumer group coordinated with zookeeper, in a group each partition will be consumed by exactly one consumer Consequence: broadcast/pubsub (If all the consumer instances have different consumer groups) and load balance/queue (If all the consumer instances have the same consumer group)

Broker - Partition - Topic

Consumer Groups

Useful numbers

50MB/s (producer throughput), 100 MB/s (consumer throughput)
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Applications

Notification: A updates a record and sends a “record updated” message, B consumes the message and asks A for the updated record to sync its copy
Stream Processing: Data is produced and written into kafka, consumer groups process these messages and write them back to kafka

Memtable & SSTable (Sorted String Table)

Sat, 29 Feb 2020 15:04:00 +0000

Memtable

A data structure that holds data in memory before it’s flushed into disk.

For a write operation we write to memory which is fast compared to persistent storage, eventually, a memtable will surpass a predefined memory threshold and it’ll need to be flushed to disk, while we can define our own write format we can write the memtable in a sorted way to disk as an SSTable (see SSTable below). Once data is written to disk the data becomes immutable (the SSTable cannot be modified), therefore, new writes go to a new memtable and operations like update or delete on existing data in the previous memtable are instead stored in the new memtable.

For a read operation we first check in the current memtable, if the read can’t be fulfilled by the current memtable (maybe the data exists but it’s no longer in memory because it was flushed to disk) then we check recently created SSTables in decreasing creation order until we find the desired record (or we might not find it at all). Because the SSTable is sorted it enables faster reads because we can use binary search to find it in the file.

A memtable can be implemented with a Red-Black Tree, a SkipList, a HashSkipList, a HashLinkList. For tradeoffs on these implementations please check the RocksDB wiki .

Applications

Example: design a Timeseries Database with the following requirements:

For the current time, write a given value for a given list of labels, a label is a pair labelKey=labelValue e.g. [method=http, type=POST, statusCode=200] 1 (value is 1)
The labels can be arbitrary strings
Reads will be for an arbitrary combination of labels and it’ll cover a range of time (common)
Reads will be for an arbitrary combination of labels and it’ll cover a point in time (rare)
Write heavy system
99% of the data is never queried after 24h

A memtable fits this problem because it’s a write heavy system (therefore we need fast writes), the common scenario of reads for a range of time would also fit a linked list (either SkipList or HashLinkList), after a memtable is written to disk in a SSTable it enables slower reads for old data which is an acceptable tradeoff because 99% of the data is never queried after 24h.

Let’s define an entry to be a data structure that holds a collection of labels, a single value and a time.

type Entry struct {
	// id is the time an entry was created (not threadsafe)
	id time.Time
	// labels are the labels that identify the entry.
	labels map[string]string
	// value is the entry value.
	value any
	// next is an pointer to the next Entry.
	next *Entry
}

// example:
entry := &Entry{
	id:     time.Now(),
	labels: map[string]string{
		"method":     "http",
		"type":       "POST",
		"statusCode": "200",
	},
	value:  1,
}

Our memtable is a collection of entries stored in a linked list, the memtable has a pointer to the head and the tail of the linked list. index is explained later in this article.

type Memtable struct {
	// head is the head of the linked list.
	head *Entry
	// tail is the head of the linked list.
	tail *Entry
	// index is a reverse index of a label to an entry.
	index map[string][]*Entry
}

On write a new entry is added at the tail of the memtable linked list.

func (m *Memtable) Write(labels map[string]string, value any) {
	e := &Entry{
		id:     time.Now(),
		labels: labels,
		value:  value,
	}

	// process entry
	for k, v := range labels {
		key := m.encode(k, v)  // encode creates a unique key in the HashMap
		m.index[key] = append(m.index[key], e)
	}
	m.tail.next = e
	m.tail = m.tail.next
}

To find entries by label(s) we can iterate the linked list starting from head until tail collecting entries that match our labels in O(n) where n is the size of the linked list. To improve the performance of a query we can use an index that maps a label to the locations of entries, this speeds up the find operation by O(k) where k is the max number of entries mapped to a label, the tradeoff is space and the fact that we have to update the index on every write.

With the above we got values for single labels, for multiple labels we combine the results by doing an intersection.

func (m *Memtable) Read(labels map[string]string) []any {
	// read temporary results for every label
	entriesGroup := make([][]*Entry, 0)
	for k, v := range labels {
		key := m.encode(k, v)
		entriesGroup = append(entriesGroup, m.index[key])
	}

	// intersect
	if len(entriesGroup) == 0 {
		return make([]any, 0)
	}
	intersectedEntries := entriesGroup[0]
	for i := 1; i < len(entriesGroup); i += 1 {
		intersectedEntries = intersect(intersectedEntries, entriesGroup[i])
	}

	out := make([]any, 0)
	for _, entry := range intersectedEntries {
		out = append(out, entry.value)
	}
	return out
}

SSTable

An immutable data structure that stores a large number of key:value pairs sorted by key

Advantages over simple hash indexes

Merging SSTables is similar to doing a merge sort
To find if a key exists we don’t need an index of all the keys in memory, instead we can keep an index for every few kilobytes and then perform a scan (sparse index)
range queries can be compressed before writing to disk, the sparse index would only need to find the starting position of the compressed segment

Cassandra

Fri, 28 Feb 2020 20:47:00 +0000

Engine

Features

Consistent hashing
Replication factor, replicas of the data across the cluster
Consistency level controlled for each query
Up to 2 billion key-value pairs in a row

Cassandra replication

Replication factor = 3
Consistency level = QUORUM
Clients talks to any node, the node hashes the partition key and finds the location of the data
Data is read from all the replicas waiting for responses until we reach a quorum

Cassandra write

Acknowledged when we write to both the commit log (append only) and the memtable
When the memtable becomes full it’s flushed into an SSTable
Periodically SSTables are merged

Cassandra read

Check if the key is in the in-memory row cache
Query the bloomfilters of the existing SSTables to find the record, if it doesn’t exist then skip the SSTable
If the bloomfilter says that there may be data check the in-memory key cache
On miss get the data from the SSTable and merge it with the data in the memtable, write the key to the in-memory key-cache and merged result to the in-memory row cache

Data modeling

Goals

spread data evenly around the cluster
minimize the number of partitions read
keep partitions manageable

Process

Identify initial entities and relationships
Key attributes (map to PK columns)
Equality search attributes (map to the beginning of the PK)
Inequality search attributes (map to clustering columns)
Other attributes
- Static attributes are shared within a given partition

primary key = partition key + clustering columns

Legend:

K Partition key
C Clustering key and their ordering (ascending or descending)
S Static columns, fixed and shared per partition

Cassandra table structure

Validation

Is data evenly spread?
1 partition per read?
Are writes (overwrites) possible?
How large are the partitions? Let’s assume that each partition should have at most 1M cells, $n_{cells} = n_{rows} * (n_{cols} - n_{K} - n_{S}) + n_{S} < 1M$
How much data duplication?

Examples

Store books by ISBN

Attribute	Special
isbn	K
title
author
genre
publisher

Is data evenly spread? Yes
1 partition per read? Yes
Are writes (overwrites) possible? Yes
How large are the partitions? $1 * (5 - 1 - 0) + 0 < 1M$
How much data duplication? 0

Register a user uniquely identified by an email/password, we also want their fullname. They will be accessed by email and password or by UUID

Attribute	Special
email	K
password	C
fullname
uuid

Q1: find users by login info

Q3: find users by email (to guarantee uniqueness)

Is data evenly spread? Yes
1 partition per read? Yes
Are writes (overwrites) possible? Yes
How large are the partitions? $1 * (4 - 1 - 0) + 0 < 1M$
How much data duplication? 0

Attribute	Special
uuid	K
fullname

Q2: get users by UUID

Is data evenly spread? Yes
1 partition per read? Yes
Are writes (overwrites) possible? Yes
How large are the partitions? $1 * (2 - 1 - 0) + 0 < 1M$
How much data duplication? 0

Find books a logged in user has read sorted by title and author

Attribute	Special
uuid	K
title	C
author	C
fullname	S
ISBN
genre
publisher

Is data evenly spread? Yes
1 partition per read? Yes
Are writes (overwrites) possible? Yes
How large are the partitions? (up to 200k book reads per user)

$$ \begin{align*} n_{books} * (7 - 1 - 1) + 1 & < 1M \\\\ n_{books} & < \frac{1M}{5} - 1 \\\\ n_{books} & < 200k \end{align*} $$

How much data duplication? 0

Interaction of every user in the website

Attribute	Special
uuid	K
time	C (desc)
element
type

Is data evenly spread? Yes
1 partition per read? Yes
Are writes (overwrites) possible? Yes
How large are the partitions? (up to 333k book reads per user, 333k actions may be low number of actions to store therefore we should store actions by bucket)

$$ \begin{align*} n_{actions} * (4 - 1 - 0) + 0 & < 1M \\\\ n_{actions} & < 333K \end{align*} $$

How much data duplication? 0

Attribute	Special
uuid	K
month	K
time	C (desc)
element
type

Is data evenly spread? Yes
1 partition per read? Yes
Are writes (overwrites) possible? Yes
How large are the partitions? (up to 333k book reads per user)

1 year  = 333k / 365 / 24 = 38 actions / h
1 month = 333k / 30 / 24  = 462 actions / h (most realistic case)
1 week  = 333k / 7 / 24   = 1984 actions / h

$$ \begin{align*} n_{actions} * (5 - 2 - 0) + 0 & < 1M \\\\ n_{actions} & < 333K \end{align*} $$

How much data duplication? 0

Partitioning

Mon, 08 Jan 2018 22:43:20 +0000

With replication

Copies of each partition are stored in multiple nodes

Partitioning strategies

unfair partitioning may lead to hotspots e.g. nodes with more data than the others
assign records randomly, can’t read data
by range e.g. given a dictionary with sorted keys node 1 can have words from A -> B, node 2 B -> C, etc
- within each partition keys are stored in order
- range scans are easy
- may lead to hotspots e.g. if all the keys belong to the range A -> B node 1 will be the hotspot
by hash key e.g. take the hashkey of the key and assign it to a range (consistent hashing)
- distributes data evenly
- no range scans
- cassandra allows a multi-column primary key, the first part of the key is hashed to determine the partition and the other columns are used as a concatenated index to use SSTables
take a hybrid approach with skewed workloads e.g. where all the writes/reads are for the same key
- append 2 digits to the key e.g. key00, key01, …, key99, key00 = tradeoff with read

Rebalancing partitions

Things that change in a database over time:

more throughput = more cpu,ram,disk = vertical scaling
a machine fails and other machines need to take over the machine’s reponsabilities

Rebalancing requirements:

load should be fairly shared
the DB should accept read/writes while it’s being rebalanced
no more data than necessary should be moved (minimize IO)

Strategies:

fixed number of partitions (Riak)
- when # of partitions > # of nodes, assign multiple partitions to each node
- when a new node is added it steals some partitions from every other node
- when a node is removed it distributes its partitions to every other node
- the # of partitions is fixed when the DB is set up and not changed afterward
- choosing the # of partitions is difficult if the size of the dataset varies
dynamic partitioning (MongoDB)
- a partition is split once it reaches a limit or merged if it has very little data
- number of partitions adapt to the size of the dataset
- an empty DB starts with a single partition and all the writes are written to the same node i.e. the other nodes are idle
partitioning proportionally to nodes (Cassandra)
- fixed number of partitions per node, partitions grow without affecting the nodes
- when a node is added the partitions become smaller and the data is redistributed

Request routing

Problem: how does a client know which node to connect to?

Allow clients to connect to any node via a round-robin load balancer
- Cassandra and Riak use a gossip protocol to inform of changes in the cluster
- A request can be sent to any node which forwards it to the appropiate node
- Puts more complexity on the DB to avoid a dependency
Send requests to a routing tier acting as a partition aware load-balancer
- ZooKeeper is a coordination service that keeps track of the cluster metadata mapping partitions to nodes, whenever a partition is created/updated/removed ZooKeeper notifies the routing tier
Client is aware of the partition and doesn’t need an intermediary

Non Functional Requirements

Tue, 02 Jan 2018 02:14:56 +0000

Reliability & Availability

A system should be resilient (fault-tolerant) and performant under expected load

Strategies

design for failure and trigger them deliberately e.g. kill processes without a warning
consider hardware faults such as blackouts, hard disk crashes, add redundancy as necessary
consider software faults such as
- processes that slow down or that return corrupted responses
- fault cascading where the a fault triggers faults in other components
measure/monitor the system to identify faults

Scalability

A system should be able to handle load increases

Queries per second (QPS) to a web server
Ratio of read/writes in a DB
Cache hit/miss rate
Number of simultaneous users in a realtime system

Handling load

scaling up (vertical scaling), simple
scaling out (horizontal scaling), complex
manual scale, for predictable systems, simple
elastic scale, add resources as load increases, for unpredictable systems, complex

Performance

throughput: number of requests processed per second
latency: time to handle the request
response time: latency + network/queue delays

For the response time we use percentiles, given some metrics gathered for a set of requests in a period of time sort them from fastests to slowest, the common metrics are p50, p95, p99, p999 (used in SLAs)

When a requests involves parallel calls to multiple services, the response time is equal to the service which took the maximum time

Durability

Data should not be lost once sent to a system

Monitoring & metrics collection

Capture metrics about the data going in/out of the system

Implementing an A+ conformat Promise library in JavaScript the TDD way

Sat, 16 Sep 2017 21:05:42 +0000

My objective is to write a Promises/A+ conformant implementation similar to then/promise , also, I’ll do it the TDD way where I’ll write the some tests first and then implement what’s needed to make the tests pass (tests will be written on the platform Jest

This article was one of the best references I found online, this implementation is heavily inspired by it. I’ll also refer to the A+ promise spec when necessary.

Promise state

A promise is an object/function that must be in one of these states: PENDING, FULFILLED, REJECTED, initially the promise is in a PENDING state.

A promise can transition from a PENDING state to either a FULFILLED state with a fulfillment value or to a REJECTED state with a rejection reason.

To make the transition the Promise constructor receives a function called executor, the executor is called immediately with two functions fulfill and reject that when called perform the state transition:

fulfill(value) - from PENDING to FULFILLED with value, the value is now a property of the promise.
reject(reason) - from PENDING to REJECTED with reason, the reason is now a property of the promise.

it('receives a executor function when constructed which is called immediately', () => {
  // mock function with spies
  const executor = jest.fn()
  const promise = new APromise(executor)
  // mock function should be called immediately
  expect(executor.mock.calls.length).toBe(1)
  // arguments should be functions
  expect(typeof executor.mock.calls[0][0]).toBe('function')
  expect(typeof executor.mock.calls[0][1]).toBe('function')
})

it('is in a PENDING state', () => {
  const promise = new APromise(function executor(fulfill, reject) { /* ... */ })
  // for the sake of simplicity the state is public
  expect(promise.state).toBe('PENDING')
})

it('transitions to the FULFILLED state with a `value`', () => {
  const value = ':)'
  const promise = new APromise((fulfill, reject) => {
    fulfill(value)
  })
  expect(promise.state).toBe('FULFILLED')
})

it('transitions to the REJECTED state with a `reason`', () => {
  const reason = 'I failed :('
  const promise = new APromise((fulfill, reject) => {
    reject(reason)
  })
  expect(promise.state).toBe('REJECTED')
})

The initial implementation is straightforward

// possible states
const PENDING = 'PENDING'
const FULFILLED = 'FULFILLED'
const REJECTED = 'REJECTED'

class APromise {
  constructor(executor) {
    // initial state
    this.state = PENDING
    // the fulfillment value or rejection reason is mapped internally to `value`
    // initially the promise doesn't have a value

    // call the executor immediately
    doResolve(this, executor)
  }
}

// fulfill with `value`
function fulfill(promise, value) {
  promise.state = FULFILLED
  promise.value = value
}

// reject with `reason`
function reject(promise, reason) {
  promise.state = REJECTED
  promise.value = reason
}

// creates the fulfill/reject functions that are arguments of the executor
function doResolve(promise, executor) {
  function wrapFulfill(value) {
    fulfill(promise, value)
  }

  function wrapReject(reason) {
    reject(promise, reason)
  }

  executor(wrapFulfill, wrapReject)
}

Open @mauriciopoppe/Implementing-Promises-from-Scratch-1 in

Observing state changes

To observe changes in the state of the promise (and the fulfillment value or rejection reason) we use the then method, the method receives 2 parameters, an onFulfilled function and an onRejected function, the rules to invoke these functions are the following:

when the promise is in a FULFILLED state the onFulfilled function will be called with the promise’s fulfillment value e.g. onFulfilled(value)
when the promise is in a REJECTED state the onRejected function will be called with the promise’s rejection reason e.g. onRejected(reason)

From now on these functions will be referred as promise handlers.

it('should have a .then method', () => {
  const promise = new APromise(() => {})
  expect(typeof promise.then).toBe('function')
})

it('should call the onFulfilled method when a promise is in a FULFILLED state', () => {
  const value = ':)'
  const onFulfilled = jest.fn()
  const promise = new APromise((fulfill, reject) => {
    fulfill(value)
  })
    .then(onFulfilled)
  expect(onFulfilled.mock.calls.length).toBe(1)
  expect(onFulfilled.mock.calls[0][0]).toBe(value)
})

it('transitions to the REJECTED state with a `reason`', () => {
  const reason = 'I failed :('
  const onRejected = jest.fn()
  const promise = new APromise((fulfill, reject) => {
    reject(reason)
  })
    .then(null, onRejected)
  expect(onRejected.mock.calls.length).toBe(1)
  expect(onRejected.mock.calls[0][0]).toBe(reason)
})

Let’s add the .then function to the class prototype, note that it’ll call either the onFulfilled or onRejected function based on the state of the promise

class APromise {
  // ...
  then(onFulfilled, onRejected) {
    handleResolved(this, onFulfilled, onRejected)
  }
  // ...
}

function handleResolved(promise, onFulfilled, onRejected) {
  const cb = promise.state === FULFILLED ? onFulfilled : onRejected
  cb(promise.value)
}

Open @mauriciopoppe/Implementing-Promises-from-Scratch-2 in

One-way transition

Once the transition to either FULFILLED or REJECTED occurs, the promise must not transition to any other state.

const value = ':)'
const reason = 'I failed :('

it('when a promise is fulfilled it should not be rejected with another value', () => {
  const onFulfilled = jest.fn()
  const onRejected = jest.fn()

  const promise = new APromise((resolve, reject) => {
    resolve(value)
    reject(reason)
  })
  promise.then(onFulfilled, onRejected)

  expect(onFulfilled.mock.calls.length).toBe(1)
  expect(onFulfilled.mock.calls[0][0]).toBe(value)
  expect(onRejected.mock.calls.length).toBe(0)
  expect(promise.state === 'FULFILLED')
})

it('when a promise is rejected it should not be fulfilled with another value', () => {
  const onFulfilled = jest.fn()
  const onRejected = jest.fn()

  const promise = new APromise((resolve, reject) => {
    reject(reason)
    resolve(value)
  })
  promise.then(onFulfilled, onRejected)

  expect(onRejected.mock.calls.length).toBe(1)
  expect(onRejected.mock.calls[0][0]).toBe(reason)
  expect(onFulfilled.mock.calls.length).toBe(0)
  expect(promise.state === 'REJECTED')
})

In our current implementation, the function that calls the executor should make sure that either fulfill or reject is called once, subsequent calls should be ignored

function doResolve(promise, executor) {
  let called = false

  function wrapFulfill(value) {
    if (called) { return }
    called = true
    fulfill(promise, value)
  }

  function wrapReject(reason) {
    if (called) { return }
    called = true
    reject(promise, reason)
  }

  executor(wrapFulfill, wrapReject)
}

Open @mauriciopoppe/Implementing-Promises-from-Scratch-3 in

Handling executor errors

If the execution of the executor fails the promise should transition to the REJECTED state with the failure reason

describe('handling executor errors', () => {
  it('when the executor fails the promise should transition to the REJECTED state', () => {
    const reason = new Error('I failed :(')
    const onRejected = jest.fn()
    const promise = new APromise((resolve, reject) => {
      throw reason
    })
    promise.then(null, onRejected)
    expect(onRejected.mock.calls.length).toBe(1)
    expect(onRejected.mock.calls[0][0]).toBe(reason)
    expect(promise.state === 'REJECTED')
  })
})

The function that calls the executor should wrap it in a try/catch block and transition to REJECTED if the catch block is executed

function doResolve(promise, executor) {
  // ...
  try {
    executor(wrapFulfill, wrapReject)
  } catch (err) {
    wrapReject(err)
  }
}

Open @mauriciopoppe/Implementing-Promises-from-Scratch-4 in

Async executor

If the resolver’s fulfill/reject are executed asynchronously our .then method will fail because its handlers are executed immediately.

it('should queue callbacks when the promise is not fulfilled immediately', done => {
  const value = ':)'
  const promise = new APromise((fulfill, reject) => {
    setTimeout(fulfill, 1, value)
  })

  const onFulfilled = jest.fn()

  promise.then(onFulfilled)
  setTimeout(() => {
    // should have been called once
    expect(onFulfilled.mock.calls.length).toBe(1)
    expect(onFulfilled.mock.calls[0][0]).toBe(value)
    promise.then(onFulfilled)
  }, 5)

  // should not be called immediately
  expect(onFulfilled.mock.calls.length).toBe(0)

  setTimeout(function () {
    // should have been called twice
    expect(onFulfilled.mock.calls.length).toBe(2)
    expect(onFulfilled.mock.calls[1][0]).toBe(value)
    done()
  }, 10)
})

it('should queue callbacks when the promise is not rejected immediately', done => {
  const reason = 'I failed :('
  const promise = new APromise((fulfill, reject) => {
    setTimeout(reject, 1, reason)
  })

  const onRejected = jest.fn()

  promise.then(null, onRejected)
  setTimeout(() => {
    // should have been called once
    expect(onRejected.mock.calls.length).toBe(1)
    expect(onRejected.mock.calls[0][0]).toBe(reason)
    promise.then(null, onRejected)
  }, 5)

  // should not be called immediately
  expect(onRejected.mock.calls.length).toBe(0)

  setTimeout(function () {
    // should have been called twice
    expect(onRejected.mock.calls.length).toBe(2)
    expect(onRejected.mock.calls[1][0]).toBe(reason)
    done()
  }, 10)
})

Let’s add a queue to the promise, its purpose is to store handlers that will be called once the promise state changes from PENDING to something else, at the same time our .then method should check the promise state to decide whether to call the handler immediately or to store the handler, let’s move this logic to a new helper function handle

class APromise {
  constructor(executor) {
    this.state = PENDING
    // .then handler queue
    this.queue = []
    doResolve(this, executor)
  }

  then(onFulfilled, onRejected) {
    handle(this, { onFulfilled, onRejected })
  }
}

// checks the state of the promise to either:
// - queue it for later use if the promise is PENDING
// - call the handler if the promise is not PENDING
function handle(promise, handler) {
  if (promise.state === PENDING) {
    // queue if PENDING
    promise.queue.push(handler)
  } else {
    // execute immediately
    handleResolved(promise, handler)
  }
}

function handleResolved(promise, handler) {
  const cb = promise.state === FULFILLED ? handler.onFulfilled : handler.onRejected
  cb(promise.value)
}

Also the fulfill, reject methods should be updated so that they invoke all the handlers stored in the promise when called, this is implemented in a new function finale called after the state and the value have been updated.

function fulfill(promise, value) {
  promise.state = FULFILLED
  promise.value = value
  finale(promise)
}

function reject(promise, reason) {
  promise.state = REJECTED
  promise.value = reason
  finale(promise)
}

// invoke all the handlers stored in the promise
function finale(promise) {
  const length = promise.queue.length
  for (let i = 0; i < length; i += 1) {
    handle(promise, promise.queue[i])
  }
}

Open @mauriciopoppe/Implementing-Promises-from-Scratch-5 in

Chaining promises

Our .then methods should return a new promise. Note that in the example below p.then returns a promise q, the handler qOnFulfilled is stored on q, also the handler rOnFulfilled is stored in r.

it('.then should return a new promise', () => {
  expect(function() {
    const qOnFulfilled = jest.fn()
    const rOnFulfilled = jest.fn()
    const p = new APromise(fulfill => fulfill())
    const q = p.then(qOnFulfilled)
    const r = q.then(rOnFulfilled)
  }).not.toThrow()
})

The implementation is again straightforward, however as we’ll see the new promise transitions to a different state in a different way than using a executor, the new promise uses the handlers to make the transition as follows:

if the onFulfilled or onRejected function is called
- if there are no errors executing it, the promise will transition to the FULFILLED state with the returned value as the fulfillment value
- if there is an error executing it, the promise will transition to the REJECTED state with the error as the rejection reason

Let’s make the .then method return a promise first

class APromise {
  // ...
  then(onFulfilled, onRejected) {
    // empty executor
    const promise = new APromise(() => {})
    handle(this, { onFulfilled, onRejected })
    return promise
  }
}

And then write the test to handle the new promise resolution

it('if .then\'s onFulfilled is called without errors it should transition to FULFILLED', () => {
  const value = ':)'
  const f1 = jest.fn()
  new APromise(fulfill => fulfill())
    .then(() => value)
    .then(f1)
  expect(f1.mock.calls.length).toBe(1)
  expect(f1.mock.calls[0][0]).toBe(value)
})

it('if .then\'s onRejected is called without errors it should transition to FULFILLED', () => {
  const value = ':)'
  const f1 = jest.fn()
  new APromise((fulfill, reject) => reject())
    .then(null, () => value)
    .then(f1)
  expect(f1.mock.calls.length).toBe(1)
  expect(f1.mock.calls[0][0]).toBe(value)
})

it('if .then\'s onFulfilled is called and has an error it should transition to REJECTED', () => {
  const reason = new Error('I failed :(')
  const f1 = jest.fn()
  new APromise(fulfill => fulfill())
    .then(() => { throw reason })
    .then(null, f1)
  expect(f1.mock.calls.length).toBe(1)
  expect(f1.mock.calls[0][0]).toBe(reason)
})

it('if .then\'s onRejected is called and has an error it should transition to REJECTED', () => {
  const reason = new Error('I failed :(')
  const f1 = jest.fn()
  new APromise((fulfill, reject) => reject())
    .then(null, () => { throw reason })
    .then(null, f1)
  expect(f1.mock.calls.length).toBe(1)
  expect(f1.mock.calls[0][0]).toBe(reason)
})

For the implementation, we first have to store the new promise in the handler queue as well, that way if the observed promise is resolved the elements in the queue know which promise they need to resolve.

class APromise {
  // ...
  then(onFulfilled, onRejected) {
    const promise = new APromise(() => {})
    // store the promise as well
    handle(this, { promise, onFulfilled, onRejected })
    return promise
  }
}

function handleResolved(promise, handler) {
  const cb = promise.state === FULFILLED ? handler.onFulfilled : handler.onRejected
  // execute the handler and transition according to the rules
  try {
    const value = cb(promise.value)
    fulfill(handler.promise, value)
  } catch (err) {
    reject(handler.promise, err)
  }
}

Open @mauriciopoppe/Implementing-Promises-from-Scratch-6 in

Async handlers

Next let’s consider the case where a handler returns a promise, in this case, the promise that’s part of the handler (not the returned promise) should adopt the state and fulfillment value or rejection reason of the returned promise.

it('if a handler returns a promise, the previous promise should ' +
    'adopt the state of the returned promise', () => {
  const value = ':)'
  const f1 = jest.fn()
  new APromise(fulfill => fulfill())
    .then(() => new APromise(resolve => resolve(value)))
    .then(f1)
  expect(f1.mock.calls.length).toBe(1)
  expect(f1.mock.calls[0][0]).toBe(value)
})

it('if a handler returns a promise resolved in the future, ' +
    'the previous promise should adopt its value', done => {
  const value = ':)'
  const f1 = jest.fn()
  new APromise(fulfill => setTimeout(fulfill, 0))
    .then(() => new APromise(resolve => setTimeout(resolve, 0, value)))
    .then(f1)
  setTimeout(() => {
    expect(f1.mock.calls.length).toBe(1)
    expect(f1.mock.calls[0][0]).toBe(value)
    done()
  }, 10)
})

Let’s imagine the following scenario

const executor = fulfill => setTimeout(fulfill, 0, 'p')
const p = new APromise(executor)

const qOnFulfilled = value =>
  new APromise(fulfill => fulfill(value + 'q'))
const q = p.then(qOnFulfilled)

const rOnFulfilled = value => (
  // value should be 'pq'
)
const r = q.then(rOnFulfilled)

In our current implementation the tuple { q, qOnFulfilled } is stored in the handlers of p and we are sure that qOnFulfilled is called before storing the tuple { r, rOnFulfilled } in q, we could take advantage of this fact and detect when a handler returns a promise to store observers in the returned promise instead e.g. store { r, onFulfilled } on the promise returned by qOnFulfilled.

Note that we’re using a while because a nested promise might itself have another promise as the resolution value.

function handle(promise, handler) {
  // take the state of the innermost promise
  while (promise.value instanceof APromise) {
    promise = promise.value
  }

  if (promise.state === PENDING) {
    // queue if PENDING
    promise.queue.push(handler)
  } else {
    // execute immediately
    handleResolved(promise, handler)
  }
}

Open @mauriciopoppe/Implementing-Promises-from-Scratch-7 in

Additional cases

Invalid handlers

If the handler that was supposed to be a function is not a function our implementation will fail

it('works with invalid handlers (fulfill)', () => {
  const value = ':)'
  const f1 = jest.fn()

  const p = new APromise(fulfill => fulfill(value))
  const q = p.then(null)
  q.then(f1)

  expect(f1.mock.calls.length).toBe(1)
  expect(f1.mock.calls[0][0]).toBe(value)
})

it('works with invalid handlers (reject)', () => {
  const reason = 'I failed :('
  const r1 = jest.fn()

  const p = new APromise((fulfill, reject) => reject(reason))
  const q = p.then(null, null)
  q.then(null, r1)

  expect(r1.mock.calls.length).toBe(1)
  expect(r1.mock.calls[0][0]).toBe(reason)
})

Let’s imagine the following scenario

const p = new APromise(fulfill => fulfill('p'))
const qOnFulfilled = null
const q = p.then(qOnFulfilled)

In this case, q should be resolved right away with the resolution value of p

function handleResolved(promise, handler) {
  const cb = promise.state === FULFILLED ? handler.onFulfilled : handler.onRejected
  // resolve immediately if the handler is not a function
  if (typeof cb !== 'function') {
    if (promise.state === FULFILLED) {
      fulfill(handler.promise, promise.value)
    } else {
      reject(handler.promise, promise.value)
    }
    return
  }
  try {
    const ret = cb(promise.value)
    fulfill(handler.promise, ret)
  } catch (err) {
    reject(handler.promise, err)
  }
}

Execute the handlers after the event loop

Requirement 2.2.4 , as pointed in 3.1 the handlers are called with a fresh stack, also, this makes the promise resolution consistent by ensuring that the observers are called in the future even if the executor/handlers are synchronous.

it('the promise observers are called after the event loop', done => {
  const value = ':)'
  const f1 = jest.fn()
  let resolved = false

  const p = new APromise(fulfill => {
    fulfill(value)  // should not execute f1 immediately
    resolved = true
  }).then(f1)

  expect(f1.mock.calls.length).toBe(0)

  setTimeout(function () {
    expect(f1.mock.calls.length).toBe(1)
    expect(f1.mock.calls[0][0]).toBe(value)
    expect(resolved).toBe(true)
    done()
  }, 10)
})

We can use any function that allows us to call a function after the event loop, this includes setTimeout, setImmediate and requestAnimationFrame

function handleResolved(promise, handler) {
  setImmediate(() => {
    // ...
  })
}

NOTE: Most of the unit tests must be changed to be async as well.

Reject with a resolved promise as a reason

Requirement 2.2.7.2

it('rejects with a resolved promise', done => {
  const value = ':)'
  const reason = new APromise(fulfill => fulfill(value))

  const r1 = jest.fn()
  const p = new APromise(fulfill => fulfill())
    .then(() => { throw reason })
    .then(null, r1)

  expect(r1.mock.calls.length).toBe(0)

  setTimeout(function () {
    expect(r1.mock.calls.length).toBe(1)
    expect(r1.mock.calls[0][0]).toBe(reason)
    done()
  }, 10)
})

Only adopt the state of the nested promise if the promise is not in a REJECTED state.

function handle(promise, handler) {
  // take the state of the returned promise
  while (promise.state !== REJECTED && promise.value instanceof APromise) {
    promise = promise.value
  }
  if (promise.state === PENDING) {
    // queue if PENDING
    promise.queue.push(handler)
  } else {
    // execute handler (after the event loop)
    handleResolved(promise, handler)
  }
}

A promise shouldn’t be resolved with itself

Requirement 2.3.1

it('should throw when attempted to be resolved with itself', done => {
  const r1 = jest.fn()
  const p = new APromise(fulfill => fulfill())
  const q = p.then(() => q)
  q.then(null, r1)

  setTimeout(function () {
    expect(r1.mock.calls.length).toBe(1)
    expect(r1.mock.calls[0][0] instanceof TypeError).toBe(true)
    done()
  }, 10)
})

On the fulfill method let’s check that the fulfillment value is not equal to the promise itself, if so then throw a TypeError as mentioned in 2.3.1

function fulfill(promise, value) {
  if (value === promise) {
    return reject(promise, new TypeError())
  }
  promise.state = FULFILLED
  promise.value = value
  finale(promise)
}

Thenables

Related requirement 2.3.3.3 , the handler’s returned value may be a thenable, an object/function that has a then property that is accessible and that is a function, the then function is like a executor, it receives a fulfill and reject callbacks that should be used to transition the state of the thenable.

it('should work with thenables', done => {
  const value = ':)'
  const thenable = {
    then: fulfill => fulfill(value)
  }
  const f1 = jest.fn()
  new APromise(fulfill => fulfill(value))
    .then(() => thenable)
    .then(f1)

  setTimeout(function () {
    expect(f1.mock.calls.length).toBe(1)
    expect(f1.mock.calls[0][0]).toBe(value)
    done()
  }, 10)
})

Let’s modify the fulfill method and add the check for thenables, note that accessing a property is not always a safe operation (e.g. the property might be defined using a getter that fails), that’s why we should wrap it in a try/catch.

Also, note that by the requirement 2.3.3.3 the thenable’s then should be called with the thenable as this

function fulfill(promise, value) {
  if (value === promise) {
    return reject(promise, new TypeError())
  }
  if (value && (typeof value === 'object' || typeof value === 'function')) {
    let then
    try {
      then = value.then
    } catch (err) {
      return reject(promise, err)
    }

    // promise
    if (then === promise.then && promise instanceof APromise) {
      promise.state = FULFILLED
      promise.value = value
      return finale(promise)
    }

    // thenable
    if (typeof then === 'function') {
      return doResolve(promise, then.bind(value))
    }
  }

  // primitive
  promise.state = FULFILLED
  promise.value = value
  finale(promise)
}

The end

That was it! What I learned from implementing it on my own was that a promise can be a rejection error, previously I thought that promises would never be something that an observer would receive, I thought that all the promises were unwrapped before sending them to the observer.

This is the final version of our tests and the promise implementation

Open @mauriciopoppe/Implementing-Promises-from-Scratch-8 in

Running the A+ Promise compliance tests

This implementation passed all the 872 tests, cool!

872 passing (14s)

Improvements

Add a task queue so that the execution of multiple handlers happens in a batch (it’s not actually a batch, the way the event loop works is that multiple calls to an API like setTimeout will add multiple tasks to the task queue as well, however, if we send them in a batch all the handlers will be executed in a row in the next event loop)
Add missing methods: Promise.all, Promise.race and the like
Performance improvements, the creator of BlueBird has a detailed document with some optimization tips
Async stack traces, see q

Divisibility

Sun, 21 May 2017 23:18:42 +0000

Let $a,b \in \mathbb{Z}$, we say that $a$ divides $b$, written $a \given b$, if there’s an integer $n$ so that $$ b = na $$

If $a$ divides $b$ then $b$ is divisible by $a$ and $a$ is a divisor or factor of $b$, also $b$ is called a multiple of $a$.

Additional properties of the relation $|$:

if $a \given b$ and $b \given c$ then $a \given c$
if $a \given b$ and $c \given d$ then $ac \given bd$
if $d \given a$ and $d \given b$ then $d \given a + b$
if $d \given a$ and $d \given b$ then $d \given ax + by$ for any integers $x,y$

Proof.

if $b=ma$ and $c=nb$ then $c=(nm)a$
if $b=ma$ and $d=nc$ then $bd=(nm)ac$
if $a=md$ and $b=nd$ then $a + b=(m + n)d$
if $a=md$, $b=nd$ then $ax=(mx)d$, $by=(ny)d$ therefore $ax + by = (mx + ny)d$

Division algorithm

Let $a, b \in \mathbb{Z}$ with $b > 0$, then there exists $q, r \in \mathbb{Z}$ such that $$ a = bq + r, \quad \text{where $0 \leq r \lt b$} $$

Proof. if $bq$ is the largest multiple of $b$ that does not exceed $a$ then $r = a - bq$ is positive and since $b(q + 1) > a$ then $r \lt b$.

Also, if $r = 0$ then $a = bq$ which implies that $q \given a$.

Greatest common divisor

Let $a, b \in \mathbb{N}$, the greatest common divisor of $a$ and $b$, written as $gcd(a,b)$ or $(a,b)$, is the element $d$ in $\mathbb{N}$ such that $d \given a$ and $d \given b$ and every common divisor of $a$ and $b$ also divides $d$.

Let $a$ and $b$ be two numbers in $\mathbb{N}$, the value of $(a,b)$ is a linear combination of $a$ and $b$ i.e. there exists $s,t$ in $\mathbb{Z}$ such that $$ sa + tb = (a, b) $$

Proof.

Let $d$ be the least positive integer that is a linear combination of $a$ and $b$

$$ d = sa + tb $$

First lets show that $d \given a$, by the division algorithm we know that

$$ a = dq + r, \quad \text{where $0 \le r \lt d$} $$

It follows that

$$ \begin{align*} r &= a - dq \\ &= a - (sa + tb)q \\ &= a - saq - tbq \\ &= (1 - sq)a + (-tq)b \\ \end{align*} $$

We can see that $r$ is a linear combination of $a$ and $b$. Since $0 \le r \lt d$ and considering that we defined $d$ as the least positive linear combination of $a$ and $b$ it follows that $r = 0$ (if $0 \lt r \lt d$ then $r$ would be the least possible linear combination which is a contradiction), therefore $d \given a$.

In a similar fashion $d \given b$, therefore by the divisibility property #4

$$ d \given sa + tb $$

The next thing to prove is that $d$ is the greatest common divisor of $a$ and $b$. To prove this lets show that if $d’$ is any other common divisor of $a$ and $b$ then $d’ \le d$.

If $d’ \given a$ and $d’ \given b$ then by the divisibility property #4 it divides any other linear combination of $a$ and $b$, since $d = sa + bt$ is one linear combination of $a$ and $b$ it follows that $d’ \given d$ so either $d’ \lt d$ or $d’ = d$, finally we can conclude that

$$ d = (a,b) $$

Euclidean Algorithm

A very efficient method to compute the greatest common denominator

Suppose $a, b$ be integers with $a \ge b \gt 0$

Apply the division algorithm $a = bq + r, 0 \le r \lt b$

Rename $b$ as $a$ and $r$ as $b$ and repeat 1 until $r = 0$ The last nonzero remainder is the greatest common divisor of $a$ and $b$

The euclidean algorithm depends on the following lemma

Let $a, b$ be integers with $a \ge b \gt 0$. Let $r$ be the remainder of dividing $a$ by $b$ then $$ (a,b) = (b, r) $$

Proof. Let $q$ be the quotient of dividing $a$ by $b$ so that $a = bq + r$. If $d = (a,b)$ then it must divide any other linear combination of $a$ and $b$ like $r = a - bq$, therefore $d \given r$. Finally we can conclude that $d = (b,r)$.

Proof of the theorem If we keep on repeating the division algorithm we have:

$$ \begin{align*} a &= bq_1 + r_1, \quad (a,b) = (b, r_1) \\ b &= r_1q_2 + r_2, \quad (b, r_1) = (r_1, r_2) \\ r_1 &= r_2q_3 + r_3, \quad (r_1, r_2) = (r_2, r_3) \\ r_2 &= r_3q_4 + r_4, \quad (r_2, r_3) = (r_3, r_4) \\ & \; \vdots \\ r_{n-3} &= r_{n-2}q_{n-1} + r_{n-1}, \quad (r_{n-3}, r_{n-2}) = (r_{n-2}, r_{n-1}) \\ r_{n-2} &= r_{n-1}q_n + r_n, \quad (r_{n-2}, r_{n-1}) = (r_{n-1}, r_n) \\ r_{n-1} &= r_n q_{n+1}, \quad \quad (r_{n-1}, r_n) = r_n \end{align*} $$

Therefore

$$ (a,b) = (b,r_1) = (r_1,r_2) = (r_2, r_3) = (r_3, r_4) = \ldots = (r_{n-3}, r_{n-2}) = (r_{n-2}, r_{n-1}) = (r_{n-1}, r_n) = r_n $$

Extended Euclidean Algorithm

One of the applications of the euclidean algorithm is the calculation of the integers $x,y$ satisfying $d = (a,b) = ax + by$

First note that if $b=0$ then $(a,b) = (a,0) = a$, now assume that there are integers $x’$ and $y’$ so that

$$ (a,b) = (b,r) = bx' + ry' $$

Since

$$ \begin{align*} r &= a - bq \\ &= a - b \left \lfloor \frac{a}{b} \right \rfloor \end{align*} $$

Then

$$ \begin{align*} (a,b) &= bx' + \Big( a - \left \lfloor \frac{a}{b} \right \rfloor b \Big) y' \\ &= bx' + ay' - \left \lfloor \frac{a}{b} \right \rfloor by' \\ &= a(y') + b \Big(x' - \left \lfloor \frac{a}{b} \right \rfloor y'\Big) \end{align*} $$

Comparing it to $(a,b) = ax + by$ we obtain the required coefficients $x$ and $y$ based on the following recursive equations

$$ \begin{align*} x &= \begin{cases} 1, & \text{when $r = 0$} \\ y', & \text{otherwise} \end{cases} \\ y &= \begin{cases} 0, & \text{when $r = 0$} \\ x' - \left \lfloor \frac{a}{b} \right \rfloor y', & \text{otherwise} \end{cases} \end{align*} $$

Flat shading

Thu, 09 Jun 2016 12:39:53 +0000

Flat shading is the simplest shading model which calculates the illumination at a single point for each polygon (or polygon vertices in OpenGL) which means that it the color is the same for all points of each polygon

Advantages

Fast, a single computation per polygon (or one per polygon vertex in OpenGL)

Disadvantages

Inaccurate
Discontinuities at polygon boundaries

Implementation

GLSL has the keyword flat to skip interpolation

// vertex shader
flat out vec4 polygon_color;
void main() {
  // ...
  polygon_color = vec4(ambient + diffuse + specular, 1.0);
}

// fragment shader
flat in vec4 polygon_color;
out vec4 color;
void main () {
  color = polygon_color;
}

Diffuse shading

Fri, 03 Jun 2016 14:49:37 +0000

Many objects will have a surface that is not shiny for example wood and paper, such objects can be modeled using the Lambertian Model

Lambertian shading model

A Lambertian object obeys the Lambert’s cosine law which states that

The luminous intensity of a surface is proportional to the cosine of the angle between the surface normal and the direction of the light

$$ c \propto \cos{\theta} \quad \text{or} \quad c \propto \mathbf{n} \cdot \mathbf{l} $$

Both $\mathbf{n}$ and $\mathbf{l}$ are unit vectors

Note that the model does not depend on the distance between the light and the object, this assumption is equivalent to saying that the light is “distant” relative to the object size which is often a directional light

When the light hits the surface a portion of the light gets reflected, this is controlled by the diffuse reflectance $c_r$, a color that varies depending on the surface, also the surface color can be made darker/lighter by changing the color of the light source $c_l$

$$ c = c_r \; c_l \; \mathbf{n} \cdot \mathbf{l} $$

$c_r$ and $c_l$ are RGB colors with components in the range $[0, 1]$ where the multiplication is done element-wise so $c_r; c_l$ returns another RGB color, note however that the product $\mathbf{n} \cdot \mathbf{l}$ might create negative values (e.g. when the surface normal is pointing away from the light), to solve this we can use the max function

$$ c = c_r \; c_l \; \text{max}(\mathbf{n} \cdot \mathbf{l}, 0) $$

Ambient shading

Some surfaces that receive no direct illumination in real life are perceived as having a color distinct to black, this is because the light is actually reflected in other surfaces. In addition there’s sometimes skylight which increases the amount of light reflected

A common trick is to put a dim light at the position of the eye so that all visible points receive some light, another approach is to add an ambient color $c_a$ which is simply a constant value which interacts with the diffuse reflectance $c_r$

$$ c = c_r * c_a $$

Introduction to surface shading

Fri, 03 Jun 2016 13:46:07 +0000

Shading is the process of altering the color of a surface, different shading models capture the process of light reflection on a surface, these models use the following variables in the computation

$\mathbf{ray}$ (ray) - a ray emitted from a pixel, defined with an origin ($\mathbf{ray_{origin}}$) and a direction $\mathbf{ray_{direction}}$
$\mathbf{p}$ (intersection point) - the intersection point of the surface and $\mathbf{ray}$
$\mathbf{l}$ (light direction) - a unit vector pointing from the surface towards a light source, computed by normalizing the vector between the intersection point $\mathbf{p}$ and the light source position $\mathbf{l_s}$

$$ \mathbf{l} = \frac{\mathbf{l_s - p}}{\norm{\mathbf{l_s - p}}} $$

$\mathbf{v}$ (view direction) - a unit vector pointing from the surface towards the place the ray is emitted from, it’s computed by normalizing the vector between the intersection point $\mathbf{p}$ and the ray origin $\mathbf{ray_{origin}}$

$$ \mathbf{v} = \frac{\mathbf{ray_{origin} - p}}{\norm{\mathbf{ray_{origin} - p}}} $$

$\mathbf{n}$ (surface normal) - a unit vector perpendicular to the surface at the point where the reflection is taking place
other characteristics of the light source and the surface depending on the shading model

Building a first person shooter camera in C++

Fri, 29 Apr 2016 22:10:40 +0000

A first person camera captures objects from the viewpoint of a player’s character, the camera has the following characteristics:

orbit: the character can look to the left, right, up & down, however if we imagine the head of the character it can’t be tilted
translation: the character can move in 4 directions, forward backward, to the left and to the right, note that the vector that represents the direction the character is looking at doesn’t change (the orbit is not affected by translation)
- our camera will always move in the same direction the camera is looking at, this is usually done differently on first person shooters where the character may move in a different direction than the direction the camera is looking at

Both characteristics can be implemented by creating a space for the camera and defining the direction in this space, that way translation doesn’t modify the direction the camera is looking at and for orbit we would rotate the basis vectors of the space

Assuming that the world space axes are as follows

Chosen world space $+x$ (right), $+y$ (up) and $+z$ (backward), note that the choice is just personal preference

Let $\mathbf{M}_{upright \leftarrow camera}$ be the rotation matrix that transform points from camera space to upright space, also let the “look at” vector be defined as $\mathbf{p}_{camera} = \begin{bmatrix} 0 & 0 & -1 \end{bmatrix}^T$ in camera space. To define the rotation matrix $\mathbf{M}_{upright \leftarrow camera}$ let’s first identify the euler angles involved in the rotation, taking the image above as a reference we can identify the following actions:

the character looks to the left or right - rotation relative to the upright space $y$-axis
the character looks up or down - rotation relative to the upright space $x$-axis

Note that the sequence of intrinsic rotations $y-x’$ or $x-y$ if expressed as a sequence of extrinsic rotations) represents the rotation of the camera, the sequence of extrinsic rotations can be represented as a multiplication of the following rotation matrices

$$ \begin{align*} \mathbf{M}_{upright \leftarrow camera} &= \mathbf{Y}(\alpha) \mathbf{X}(\beta) \\ &= \begin{bmatrix} \cos{\alpha} & 0 & \sin{\alpha} \\ 0 & 1 & 0 \\ -\sin{\alpha} & 0 & \cos{\alpha} \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & \cos{\beta} & -\sin{\beta} \\ 0 & \sin{\beta} & \cos{\beta} \end{bmatrix} \\ &= \begin{bmatrix} \cos{\alpha} & \sin{\alpha}\sin{\beta} & \sin{\alpha}\cos{\beta} \\ 0 & \cos{\beta} & -\sin{\beta} \\ -\sin{\alpha} & \cos{\alpha}\sin{\beta} & \cos{\alpha}\cos{\beta} \end{bmatrix} \end{align*} $$

The angles $\alpha$ and $\beta$ are computed as follows:

let $\Delta{\alpha}$ and $\Delta{\beta}$ represent the change in the rotation around the $\mathbf{Y}$ and $\mathbf{X}$ axis respectively, the values of $\alpha$ and $\beta$ are computed based on the previous state

$$ \begin{align*} \beta &:= \beta + \Delta{\beta} \\ \alpha &:= \alpha + \Delta{\alpha} \end{align*} $$

if the character looks up then $\Delta{\beta}$ is positive
if the character looks to the right then $\Delta{\alpha}$ is negative

Mouse coordinates delta to extrinsic rotations delta

Next we need to define what happens when we move the mouse, we can configure a window manager like GLFW to call a callback method whenever we move the mouse with the coordinates of the mouse as an argument (e.g. as $x_{new}$ and $y_{new}$), Note: the coordinates of the mouse are expressed relative to the top left corner of the window whose $+x$-axis points right and $+y$-axis points down, if we keep the old coordinates of the mouse (as $x_{old}$ and $y_{old}$) we can obtain how much the mouse moved with respect to the old position with the following calculation

$$ \begin{align*} \Delta x &= x_{new} - x_{old} \\ \Delta y &= -(y_{new} - y_{old}) \end{align*} $$

Note that $y_{new} - y_{old}$ will be positive if we move the mouse down which is unintuitive, therefore we can multiply this result by $-1$ so that moving the mouse downward sets a negative value in $\Delta y$

The next step is to update the values of $\alpha$ (yaw) and $\beta$ (pitch) using $\Delta x$ and $\Delta y$, note that when we move the mouse to the right we’re moving clockwise with respect to the $+y$ axis and when we move the mouse upward we’re moving counterclockwise with respect to the $+x$-axis therefore

$$ \begin{align*} \alpha &:= \alpha - \Delta x \\ \beta &:= \beta + \Delta y \end{align*} $$

Note that the we also need to value of $\beta$ to be inside the range $-\deg{90} \leq \beta \leq \deg{90}$ to avoid looking backwards

Finally to compute the value of $\mathbf{p}_{world}$ we need to transform $\mathbf{p}_{object}$ with $\mathbf{M}_{world \leftarrow object}$, note that the value of $\mathbf{p}_{object} = \begin{bmatrix} 0 & 0 & -1 \end{bmatrix}^T$ is always the same, therefore the value of $\mathbf{p}_{world}$ is

$$ \begin{align*} \mathbf{p}_{world} &= \mathbf{M}_{world \leftarrow object} \mathbf{p}_{object} \\ &= \begin{bmatrix} \cos{\alpha} & \sin{\alpha}\sin{\beta} & \sin{\alpha}\cos{\beta} \\ 0 & \cos{\beta} & -\sin{\beta} \\ -\sin{\alpha} & \cos{\alpha}\sin{\beta} & \cos{\alpha}\cos{\beta} \end{bmatrix} \begin{bmatrix} 0 \\ 0 \\ -1 \end{bmatrix} \\ &= \begin{bmatrix} -\sin{\alpha}\cos{\beta} \\ \sin{\beta} \\ -\cos{\alpha}\cos{\beta} \end{bmatrix} \end{align*} $$

#pragma once

class FPS_Mouse {
public:
  float sensitivity;
  float yaw
  float pitch;
  glm::vec4 target;

  static const glm::vec3 YAW_AXIS = glm::vec3(0.0f, 1.0f, 0.0f);
  static const glm::vec3 PITCH_AXIS = glm::vec3(1.0f, 0.0f, 0.0f);

  FPS_Mouse(float yaw, float pitch);
  void process_mouse_movement(double delta_x, double delta_y, bool constraint_pitch);
  glm::mat4 get_view_matrix() const;

private:
  static const glm::vec4 P = glm::vec3(0.0f, 0.0f, -1.0f, 1.0f);
  void update_target();
}

FPS_Mouse::FPS_Mouse(float yaw = 0, float pitch = 0) :
    sensitivity(0.05f) {
  this->yaw = yaw;
  this->pitch = pitch;
  this->update_target();
}

void FPS_Mouse::process_mouse_movement(double delta_x, double delta_y, bool constraint_pitch = true) {
  yaw -= delta_x * sensitivity;
  pitch += delta_y * sensitivity;

  if (constraint_pitch) {
    if (pitch > 89.0f) { pitch = 89.0f; }
    if (pitch < -89.0f) { pitch = -89.0f; }
  } 
  this->update_target();
}

void FPS_Mouse::update_target() {
  /* Y = glm::rotate(glm::mat4(1.0f), glm::radians(yaw), FPS::YAW_AXIS); */
  /* X = glm::rotate(glm::mat4(1.0f), glm::radians(pitch), FPS::PITCH_AXIS); */
  /* target = Y * X * p; */
  float yaw_radians = glm::radians(yaw);
  float pitch_radians = glm::radians(pitch);
  target.x = -sin(yaw_radians) * cos(pitch_radians);
  target.y = sin(pitch_radians);
  target.z = -cos(yaw_radians) * cos(pitch_radians);
}

Quaternions

Tue, 26 Apr 2016 16:39:27 +0000

Quaternions as rotations

Let $p$ be a 3d point represented as a quaternion using its homogeneous coordinates, $p = [w, \mathbf{v}]$ and let $q$ be any non-zero quaternion then

Theorem: The product $qpq^{-1}$ takes $p = [w, \mathbf{v}]$ to $p’ = [w, \mathbf{v’}]$

Before proving this theorem let’s make the following observation, we can express $q$ as a multiplication of a scalar quaternion $s$ and a unit quaternion $\mathbf{U}q$, $q = s\mathbf{U}q$, then $qpq^{-1}=s\mathbf{U}qp(s\mathbf{U}q)^{-1}=s\mathbf{U}qp\mathbf{U}q^{-1}s^{-1}$, because the scalar multiplication is commutative $\mathbf{U}qp\mathbf{U}q^{-1}ss^{-1}=\mathbf{U}qp\mathbf{U}q^{-1}$ so the product doesn’t change irrespective of whether $q$ is a unit quaternion or not, finally notice that $\mathbf{U}q^{-1} = \mathbf{U}q^*$ so we can write the action as $qpq^*$ note that from now on, $q$ is assumed to be a unit quaternion without loss of generality

Next, let’s prove that the scalar part $qpq^{*}$ is the same as the scalar of $p$ (we can use the formula to find the scalar component of a quaternion)

$$ \begin{align*} 2S(qpq^*) &= qpq^* + (qpq^*)^* \\ &= qpq^* + qp^*q^* \\ &= q(p + p^*)q^* \\ &= q2S(p)q^* \\ &= 2qS(p)q^* \\ &= 2[s_q, \mathbf{v_q}][s_p, \mathbf{0}][s_q, -\mathbf{v_q}] \\ &= 2[s_ps_q, s_p\mathbf{v_q}][s_q, -\mathbf{v_q}] \\ &= 2[s_ps_q^2 - s_p (\mathbf{v_q} \cdot -\mathbf{v_q}), -s_ps_q\mathbf{v_q} + s_ps_q\mathbf{v_q} + s_p\mathbf{v_q \times v_q}] \\ &= 2[s_ps_q^2 + s_p\norm{v}^2, \mathbf{0}] \\ &= 2[s_ps_q^2 + s_p(1 - s_q^2), \mathbf{0}] \quad \text{because of the definition of a unit quaternion} \\ &= 2[s_p, \mathbf{0}] \\ &= 2S(p) \end{align*} $$

Therefore the scalar part of $p$ remains constants in the operation i.e. if $p = [w, \mathbf{v}]$ then $p’ = qpq^{*} = [w, \mathbf{v’}]$, and because multiplication preserves norms then $\norm{p} = \norm{p’}$ and also $\norm{v} = \norm{v’}$ $\blacksquare$

Theorem: if $\norm{q} = 1$ then $q = [\cos{\theta}, \unit{v} \sin{\theta}]$ acts to rotate around unit axis $\unit{v}$ by $2 \theta$

Let

$$ v_0 = [0, \mathbf{v_0}] \quad \norm{v_0} = \norm{\mathbf{v_0}} = 1 \\ v_1 = [0, \mathbf{v_1}] \quad \norm{v_1} = \norm{\mathbf{v_1}} = 1 $$

Be two pure quaternions (which can be represented in 3d space), and an arbitrary quaternion $q$ which has the form

$$ \begin{align} q &= v_1v_0^* \label{q} \\ &= [0, \mathbf{v_1}][0, -\mathbf{v_0}] \nonumber \\ &= [\mathbf{v_0 \cdot v_1}, \mathbf{v_0 \times v_1}] \label{q3d} \end{align} $$

Let $\theta$ be the angle between $\mathbf{v_0}$ and $\mathbf{v_1}$ then $\mathbf{v_0 \cdot v_1} = \cos{\theta}$, also let $\mathbf{v_0 \times v_1} = \sin{\theta} \unit{v}$, then \eqref{q} becomes

$$ \begin{equation} \label{q2} q = [\cos{\theta}, \sin{\theta} \unit{v}] \end{equation} $$

Let’s prove first that the product $v_2 = qv_0q^{*}$ lies in the same plane as $\mathbf{v_0}$ and $\mathbf{v_1}$, we do so by proving first that the product $v_2v_1^*$ has the same components (dot and cross products) as $v_1v_0^*$

$$ \begin{align*} v_2v_1^* &= (qv_0q^*) v_1^* \\ &= (q v_0 (v_1v_0^*)^*) v_1^* \\ &= (q v_0 v_0 v_1^*) v_1^* \\ &= q (v_0v_0)(v_1^*v_1^*) \\ &= q (-1)(-1) \quad \text{since they're unit quaternions they square to $-1$} \\ &= v_1v_0^* \end{align*} $$

Then if $v_2 v_1^* = v_1v_0^*$ that means that $v_2=qv_0q^*$ lies in the same plane as $v_0$ and $v_1$, also $v_2$ forms an angle of $\theta$ with $v_1$, furthermore $\mathbf{v_1} \times \mathbf{v_2} = \unit{v} \sin{\theta}$, finally if the angle between $v_0$ and $v_1$ is $\theta$ then the angle between $v_0$ and $v_2$ is $2\theta$ which confirms what’s seen on the image above

Furthermore the same can be said of $q$ acting on $v_1$, let $v_3 = qv_1q^{*}$ then

$$ \begin{align*} v_3v_2^* &= (qv_1q^*)(qv_0q^*)^* \\ &= (q(qv_0)q^*)(qv_0q^*)^* \quad \text{by finding $v_1$ from \eqref{q}} \\ &= q (qv_0q^*)(qv_0q^*)^* \\ &= q \\ &= v_1v_0^* \end{align*} $$

Now any vector $p$ can be represented in terms of the base $v_0$, $v_1$ and $\unit{v}$ e.g. $p = s_1\mathbf{v_0} + s_1\mathbf{v_1} + s_2\unit{v}$, we’ve seen what $q$ does to $v_0$ and $v_1$ so let’s see what it does to $\unit{v}$

Before computing $q\unit{v}q^{*}$ see that

$$ \begin{align*} q\unit{v} &= [\cos{\theta}, \sin{\theta} \unit{v}][0, \unit{v}] \\ &= [\ldots, \ldots - \sin{\theta} (\unit{v} \times \unit{v})] \\ &= [\ldots, \ldots - \mathbf{0}] \end{align*} $$

So $q\unit{v}$ is a commutative operation because the cross product is the only term that makes the quaternion operation non-commutable and in $q\unit{v}$ that therm is zero therefore $q\unit{v}q^ * = \unit{v}qq^ * = \unit{v}$ which means that $q$ does not modify $\unit{v}$

Thus the action of $q$ on any vector $p$ is a rotation around $\unit{v}$ by $2\theta$ $\blacksquare$

Quaternion rotation facts

Let $q_1$ be a quaternion which rotates the pure quaternion $p_1$ to $p_2$ and also let $q_2$ be a quaternion which rotates the vector $p_2$ to $p_3$ then $p_3$ will have the form

$$ \begin{align*} p_3 &= q_2p_2q_2^* \\ &= q_2(q_1p_1q_1^*)q_2^* \\ &= (q_2q_1)p_1(q_1^*q_2^*) \\ &= (q_2q_1)p_1(q_2q_1)^* \end{align*} $$

Therefore the combination of rotation $q_1$ followed by $q_2$ is given by $q = q_2q_1$

When the rotations $q_1, q_2, \ldots, q_n$ are applied to the pure quaternion $p$ the result is equal to $qpq^*$ where $q = q_n q_{n-1} \ldots q_2 q_1$

gcc

Tue, 05 Apr 2016 17:56:08 +0000

Stages

preprocessing - text substitution, stripping comments and file inclusion

g++ -E main.cpp -o main.i

compilation - compilation of the processed source code into assembly language

g++ -S main.i -o main.s
# or
g++ -S main.cpp -o main.s

assembler - conversion of assembly code into machine code

as main.s -o main.o
# or
g++ -c main.cpp -o main.o

linker - produce a single executable program file, it combines our program with startup code like the following ones
- standard code at the beginning of the program to set up the running environment to pass command-line parameters and environmental variables
- standard code at the end of the program to pass back a return code

g++ main.cpp -o main

Flags

-E run the preprocessing stage
-S run the preprocessing and compilation stages
-c run the preprocessing, compilation and assemble stages
-o file write output to file
-llibrary, -l library search the library named library when linking
-Idir add dir to the list of directories to be searched for header files
-Ldir add dir to the list of directories to be searched for -l
-Wall enable all the warnings about some constructions considered questionable by some users
-O enable optimization

make

Thu, 31 Mar 2016 19:34:48 +0000

make is a tool to simplify building executables from many sources, make will only re-build things that need to be re-built

Contents of a makefile

variable definitions, text that can be substituted later
explicit rules, says when and how to remake files called the rule’s targets, it lists files that the target depends on called prerequisites and may also give a recipe to update the targets
implicit rules, says when and how to remake files based on the filename, it describes the dependencies of the target and gives a recipe to create/update such a target
directives, special instructions like
- reading from another makefile include a.Makefile b.Makefile
- decide based on some variables to use or not part of the makefile
- defining multiline variables

Rules

targets : prerequisites
[tab] recipe

A rule tells make two things, when targets are out of date and how to update them when necessary

targets are filenames separated by spaces, usually there’s only one filename per rule
prerequisites determine when targets are out of date, targets are out of date if it doesn’t exist or is older than any of the prerequisites (by comparison of the last-modification time)
recipe determines how to update targets when they’re out of date, this is one or more lines to be executed by the shell

Example

foo.o : foo.c defs.h
  cc -c -g foo.c

The target is foo.o, the prerequisites are foo.c and defs.h, the command to update foo.o is cc -c -g foo.c, additionally it tells two things

how to decide whether foo.o is out of date, it’s out of date if foo.c or defs.h is more recent than it
how to update foo.o, it’s updated by compiling foo.c assuming that it includes defs.h

Wildcards

A single file can specify multiple files using wildcard character (the same as the ones in Bash e.g. *, ?, "")

clean:
  rm -f *.o     # `make clean` removes all the object files

To define a variable with a wildcard use

objects := $(wildcard *.o)

Phony targets

A phony target is one that is not the name of a file, it’s just the name of a recipe to be executed when you make an explicit request, the two reasons to use a phony target are

to avoid conflict with a file of the same name
to improve performance by avoiding the implicit rule search on this type of targets

When a rule has a recipe that won’t create the target file it will be executed every time the target comes up for remaking

clean:
  rm *.o program

If make clean is run the target clean will always be out of date (assuming such a file doesn’t exist) then the recipe will always be executed

If there’s a file clean the recipe will never be executed because since the target clean has no dependencies it’s considered to be always up to date, to avoid this problem we make the target a phony target, once this is done the recipe will be executed regardless of the existence of a file named clean

.PHONY: clean
clean:
  rm *.o program

Implicit rules

make is able to figure out which implicit rule to use based on the kind of source file that needs to be make/updated, for example the makefile

foo : foo.o bar.o
  cc -o foo foo.o bar.o $(CFLAGS) $(LDFLAGS)

Doesn’t have rules on how to make foo.o or bar.o, make will automatically look for an implicit rule that tells how to make/update it from a catalogue of built in rules

Among the catalogue of built in rules for POSIX based OS the ones for C and C++ programs are

Compiling C, n.o is made from n.c automatically with a recipe of the form

$(CC) $(CPPFLAGS) $(CFLAGS) -c

Compiling C++, n.c is made from n.cc,n.cpp,n.C with a recipe of the form

$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c

Linking C, n is made from n.o by running the linker ld via the C compiler with a recipe of the form

$(CC) $(LDFLAGS) n.o $(LOADLIBES) $(LDLIBS)

Variables used by implicit rules

CC (default cc)
CXX (default g++)
CXXFLAGS, CPPFLAGS, CFLAGS (default empty)

Pattern rules

A pattern rule contains % exactly once in the target which matches any nonempty substring called the stem, then % in the prerequisites of a rule stands for the same stem that was matched in the target, for example a rule in the form

%.o: %.c
  rule

The recipe then needs a way to operate on the right source file name, such a name can’t be written on the recipe because the name is different each time the implicit rule is used, to refer to the correct name we use automatic variables which are variables computed afresh for each rule that is executed, they only have values within the recipe, the most used ones are

$@ - the filename of the target of the rule
$< - the name of the first prerequisite
$^ - the name of all the prerequisites
$? - the name of all the prerequisites that are newer than the target
$* - the stem

For example

%.o: %.c
  $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@

Specifies how to make an object file n.o from a source file n.c provided that n.c exists or can be made, inside the recipe the automatic variables $@ and $< correspond to the target file and source file respectively

Variables

To substitute a variable’s value write $(var) or ${var}

objects = program.o foo.o utils.o
program : $(objects)
  cc -o program $(objects)

$(objects) : defs.h

Setting variables

Variables defined with

=, define recursively expanded variable - if the value contains references to other variables these references are expanded whenever this variable is substituted

foo = $(bar)
bar = $(message)
message = hello

all:
  echo $(foo)   # prints hello

:= or ::= simple expanded variable - the value of a variable is set as of the time it was defined

x := foo
y := $(x) bar
x := later

# at this point
#   - x is equal to `later`
#   - y is equal to `foo bar`

?= sets the value of a variable if it’s not already set

foo = hello
foo ?= bar
# foo is equal to `hello`

!= executes a program and sets a variable to its output (alternatively use $(shell commands))

foo != printf `hi`
# foo is equal to `hi`

Advanced features for reference to variables

substitution reference $(var:a=b) - substitutes every a at the end of a word with b

foo := a.o b.o c.o
bar := $(foo:.o=.c)
# bar is equal to `a.c b.c c.c`

computed variable names $($(a)) - nested variable reference

x = y
y = z
a := $($(x))   # a is equal to `z`

Recipes

Each line must start with a tab, any line in the makefile that begins with tab and appears in a “rule context” will be considered part of the recipe for that rule, blank lines that appear in the middle of rules are ignored

Each time a recipe is executed make will invoke a new sub-shell for each line of the recipe, this implies that setting shell variables will not affect the following lines in the recipe

foo: bar/lose
  cd $bar   # dir is ./bar/
  cat file  # dir is ./

Normally make prints each line of the recipe in the shell before it’s executed to avoid this behavior prepend @

program.o: program.c
  @cc -c -g program.c
  # won't print the compilation line on the terminal

To ignore errors in a recipe line prepend the command with -

clean:
  -rm -f *.o

Running `make`

The simplest use is to recompile every file that is out of date, however it’s possible to update only some files, or find out which files are out of date without changing them

The exist status of make is always one of the following

0 - make is successful
1 - if -q is used and make determines that some target is not already up to date
2 - if make encountered any errors

Goals

The goals are the targets that make should strive to update (other targets are updated if they appear as prerequisites of goals, or prerequisites of prerequisites of goals, etc)

By default the goal is the first target in the makefile (not counting targets that start with .)

A different goal can be specified with arguments to make by using their names, if many goals are specified make processes each of them in turn, any target in the makefile may be specified as a goal unless it starts with - or contains = (parsed as a switch or variable definition respectively)

For example given a project with multiple programs we can compile only a part of the program by specifying as a goal each file that we wish to remake

.PHONY: all
all: a b c

If we’re working on the program a we can execute make a so that only files of that program are recompiled

Specifying a goal has the following advantages

make files that are normally not made i.e. rules that are not prerequisites of the default goal e.g. a file for debugging output
run a recipe associated with a phony target

Flags

-f [filename] - use filename as the makefile (default to GNUMakefile,makefile,Makefile)
-n - prints all the recipes that are needed to update the targets without executing them
-q - check whether the targets are up to date, the exit code shows if updates are needed
-t - makes targets up to date without changing them (their modified times are updated)
-k - try to compile every file that can be tried instead of exiting on the first failure

Overriding variables

Given the following makefile

CFLAGS = -g

all: program.o

program.o: program.c
  cc -c $(CFLAGS) program.c

.PHONY: all

We can override the value of the variable CFLAGS when make is executed like this make CFLAGS="-g -O"

Convention for makefiles

Every makefile should include

# avoid trouble on systems where `SHELL` might be inherited from the environment
SHELL = /bin/sh

# specify all the suffixes which may be subject to implicit rules in this makefile
.SUFFIXES:            # clears the suffix list
.SUFFIXES: .c .o

Use $(srcdir)/ to refer to the location of the source files when the build directory is distinct from the source file directory
Use variables for specifying commands e.g. $(CXX) instead of g++
File management utilities such as ln,rm,mv don’t need to be referred through variables since users don’t need to replace them with other programs
Every makefile should define the var INSTALL which is the basic command for installing a file into the system

Standard targets

all - compiles the entire program, this should be the default target
install - compile the program and copy the executables, libraries to the desired place
clean - delete all the files that are created by building the program

Example

CMake

Thu, 31 Mar 2016 19:31:37 +0000

Executable

$ cmake [options] (<path-to-source> | <path-to-existing-build>)

Assuming that the directory contains

.
├── CMakeLists.txt
└── main.cpp

#include <iostream>
using namespace std;
int main () {
  cout << "Hello world" << endl;
  return 0;
}

The minimal CMakeLists.txt file contains

cmake_minimum_required (VERSION 2.6)
project(Hello)
add_executable(${PROJECT_NAME} main.cpp)

Running cmake . creates a Makefile in the same directory whose recipes are cross platform commands, CMake’s documentation suggest that the build is separated from the source

$ mkdir build
$ cd build
$ cmake ..

Running the default target in the makefile creates the executable Hello, note that this is done on the ./build/ directory

$ make
Scanning dependencies of target Hello
[ 50%] Building CXX object CMakeFiles/Hello.dir/main.cc.o
[100%] Linking CXX executable Hello
[100%] Built target Hello
$ ./Hello
Hello world

set(<variable> <value>) sets a normal variable available to the current function or directory scope, variables can be accessed with ${variable}
project(<PROJECT-NAME> [LANGUAGES] [<language-name>...]) , sets the following variables
- PROJECT_NAME, same as <PROJECT-NAME>
- PROJECT_SOURCE_DIR same as /path/to/project/
- PROJECT_BINARY_DIR same as /path/to/project/build/
add_executable(<name> source1 [source2 ...]) adds an executable target called name to be built from the source files listed

Useful variables

CMAKE_SOURCE_DIR - path to the top level of the source tree (default value ./)
CMAKE_BINARY_DIR - path to the top level of the build tree (default value ./build/)
CMAKE_RUNTIME_OUTPUT_DIRECTORY - path to the executable (usually set to ${CMAKE_BINARY_DIR}/bin/)
CMAKE_ARCHIVE_OUTPUT_DIRECTORY - path to the static libraries (code from static libraries is included in the executable, usually set to ${CMAKE_BINARY_DIR}/lib/)
CMAKE_LIBRARY_OUTPUT_DIRECTORY - path to the shared libraries (additional code required by the executable, usually set to ${CMAKE_BINARY_DIR}/lib/)

Project structure and organization

. project
├── build
├── include
│   └── project
│       └── World.hpp
└── src
    ├── World.cpp
    └── main.cpp

The CMakeLists.txt file should do the following

add the ./include path to compiler include search path
create an executable file from main.cpp into ./build/bin/
create an static/dynamic library (in the example is World.cpp) into ./build/lib/
link the library with the executable

cmake_minimum_required(VERSION 3.0)

project(runner)

set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin")
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")

# the -I flag in gcc
include_directories(
  ${PROJECT_SOURCE_DIR}/include/
)

set(APP_SOURCES src/main.cpp)
set(LIB_SOURCES src/World.cpp)

# creates ./build/bin/runner
add_executable(${PROJECT_NAME} ${APP_SOURCES})

# shared library
set(LIBRARY_NAME World)
add_library(${LIBRARY_NAME} SHARED ${LIB_SOURCES})
target_link_libraries(${PROJECT_NAME} ${LIBRARY_NAME})

include_directories(dir) - add the given directories to those the compiler uses to search for include files (gcc -I dir)
add_library(library_name [STATIC | SHARED | MODULE] source1 [source2...]) - adds a library called library_name to be built from the source files listed
- STATIC - archives of object files .a (archive)
- SHARED - libraries dynamically linked at runtime .so (shared object)
target_link_libraries(target item) , specify libraries/flags to use when linking a given target

cmake demo

Complex CMake configuration

A complex CMake configuration will have multiple CMakeLists.txt files per directory

./CMakeLists.txt - configures dependencies, platform specifics and output paths
./src/CMakeLists.txt - configures the library to build

C++ refresher

Sat, 26 Mar 2016 12:30:00 +0000

If you have spare time

Watch some CppCon ‘Back to Basics’ videos

Mechanics of a C++ program

Write source code 1.1 Unix extensions: C,cc,cxx,c 1.2 GNU C++ extensions: C,cc,cxx,cpp,c++
Compile the source code - translate the code to machine code, the file containing the translation is the object code of the program
Link the object code with additional code - combination of the source code with startup code and libraries object code to produce a runtime version, the final product is a file called the executable code which contains a set of machine language instructions
Execute the program

Compilation and linking is done with

UNIX - CC
GNU C++ - gcc,g++ ( notes on the difference ), gcc small description

Preprocessor

The preprocessor processes a source file before compilation, it allows to define macros which are abbreviations for longer constructs

Directives

Lines which begin with #

#define IDENTIFIER [value] - replaces the occurrences of IDENTIFIER in the code with value, note that value is optional
#undef IDENTIFIER - removes the definition of IDENTIFIER

Conditional directives allow to include or discard parts of the code if a certain condition is met

#ifdef IDENTIFIER - if IDENTIFIER is defined then the code that follows is included until #endif is included
#ifndef IDENTIFIER - if IDENTIFIER is not defined then the code that follows until #endif is included

Conditional directives are used for example to include headers only once

#ifndef FOO_BAR_BAZ_H_
#define FOO_BAR_BAZ_H_
  // header code
#endif // FOO_BAR_BAZ_H_

File inclusion

#include <library> - the contents of the library file are sent along the source code, in essence the contents of the library replace the #include line, note that the compiler tries to find library in the host system’s file system that holds the standard header files
#include "library" - same as above but library is looked in the current working directory
#pragma - specify diverse options to the compiler specific of the platform/compiler

Program structure

Large programs can be split in multiple files which can be compiled (if needed) and linked to generate an executable program, for example a program can be split into three files

a header file that contains structure declarations and prototypes for functions that use those structures
a source code file that contains the code for the functions
a source code that uses those functions

The following things are commonly found in headers

function prototypes
symbolic constants defined with #define or const
structure, class, template declarations
inline functions

trunk
├── bin     : for all executables (applications)
├── lib     : for all other binaries (static and shared libraries (.so or .dll))
├── include : for all project header files, 3rd party files not present in `/usr/local/include` should be here
├── src     : for source files
├── doc     : for documentation
├── build   : for all the object files, removed by `clean`
└── test    : for testing

Example

. project
├── build
├── include
│   └── project
│       └── Vector.hpp
│   └── [third party library]
└── src
    ├── Vector.cpp
    └── main.cpp

Strings

C-style strings

The last character is the null character \0

char name[20];              // initialized with random data
char name[5] = {'j', 'h', 'o', 'n', '\0'};
char name[8] = {'j', 'h', 'o', 'n', '\0'};    // right padded with \0
char name[5] = "john";      // the \0 is understood
char name[8] = "john";      // right padded with \0
char name[] = "john";       // let the compiler count

Operations

#include <cstring>
char source[] = "john";       // let the compiler count
char dest[10];

// size of the string
strlen(source);   // 4
strlen(dest);     // 10, 10 random characters, the 11th is \0

// copy `source` to `dest`
strcpy(dest, source);

// concat `dest` with `source`
strcat(dest, source);

Reading input

char name[20];
cin >> name;            // read until space or newline
cin.getline(name, 20);  // read 20 characters or until newline
cin.get(name, 20);      // read 20 characters or until before newline

C++ strings

#include <string>

string str;           // ""
string name = "john";

cin >> name;          // reads until space or newline
cin.getline(name);   // reads until newline

Pointers

Given a variable the address operator & is used to get its address or location in memory

int oranges = 5;
int apples = 6;

// location in memory e.g. 0x0065fd40
&oranges;

// location in memory e.g. 0x0065fd44
&apples;

// NOTE: the difference between them is 4 bytes, the size of int

Pointers are variables that store addresses of values rather than the values themselves, to declare a pointer we use the form typeName * pointerName

int oranges = 5;
int* p_oranges;        // declare pointer to an int
p_oranges = &oranges;  // assign address to pointer
sizeof(p_oranges);     // 4 bytes

The dereferencing operator * yields the value at the location.

int oranges = 5;
int* p_oranges = &oranges;
*p_oranges;                   // 5
*p_oranges = *p_oranges + 1;  // update the value
oranges;                      // 6
*p_oranges;                   // 6

Always initialize a pointer to a definite address before applying the dereferencing operator.

int* p_int;
*p_int = 3;     // value is lost forever

When a pointer is assigned to another pointer the value stored is the address stored in the first pointer.

int oranges = 5;        // value: 5,     address: 0x000
int* p = &oranges;      // value: 0x000, address: 0x004
int* q = p;             // value: 0x000, address: 0x008
*q;                     // 5

If we want to create a pointer to a pointer we use extra ‘’, for the declaration the number of ‘’ must be equal to the length of pointers (including this one), in the same fashion we must use the same number of ‘*’ for dereferencing.

int oranges = 5;        // value: 5,     address: 0x000
int* p = &oranges;      // value: 0x000, address: 0x004
int** q = &p;           // value: 0x004, address: 0x008
*p;                     // 5
**q;                    // 5

Pointer and arrays

C++ handles arrays internally using pointers which may seem equivalent, an ordinary array variable name is interpreted as the address of the first element of the array, the bracket notation [] allows us to get/set elements of the array.

int numbers[] = {1, 2, 3};
numbers;      // address 0x0065fd40
numbers[0];   // 1, the value allocated in memory
// NOTE: numbers ~ &numbers[0]

// since a pointer is a reference to an address we can also do
int* p_numbers = numbers;
*p_numbers;   // 1, the value in memory accessed through pointer dereferencing

Adding one to a pointer variable increases its value by the number of bytes of the type to which it points

int numbers[] = {1, 2, 3};
int* p_numbers = numbers;
p_numbers;      // points to the first element of the array
p_numbers + 1;  // points to the second element of the array
p_numbers + 2;  // points to the third element of the array

// NOTE:
//  numbers[0] == *(p_numbers)
//  numbers[1] == *(p_numbers + 1)
//  numbers[2] == *(p_numbers + 2)

The value &numbers is the address of a 3-int block of memory, so even though &numbers[0] == numbers == &numbers numerically the value of &numbers + 1 != numbers + 1 because &numbers + 1 points to the next 3-int block of memory however numbers + 1 points to the second element of the initial 3-int block of memory

numbers is type pointer-to-int or int*
&numbers is type pointer-to-array-of-3-int or (*int)[3]

The relationship of pointers and arrays also extend to C-style strings, and it’s for C++ a quoted string constant, strings in an array and strings described by pointers are all handled equivalently

char first[20] = "john";
const char* last = "smith";    // string literals are constant
cout << "I am the agent" << first << " " << last

Given a multidimensional array int a[][2] = { { 1, 2 } }, a is a pointer to the first element which is a 2 element array (which is a pointer to the first of its elements), therefore a pointer to a has form of a pointer-to-array-of-2-int

int a[][2] = { { 1, 2 } };
int (*b)[2] = a;
(*b)[0];       // 1

Array of pointers

int a = 1, b = 2;
int* p[2] = {&a, &b};

Since p is a pointer to the first element which is &a and &a is another pointer then we can reference p with a pointer to pointer

int** q = p;

Runtime allocation: new

Pointers are sort of an alias for memory accessed which could be accessed by named variables (memory allocated in compile time), however we can allocate memory in runtime with the operator new, runtime allocated memory can be freed with the operator delete

Advantages of runtime allocated memory:

Memory is allocated only when needed

Drawbacks of runtime allocated memory:

Memory allocated by new must be freed using the operator delete otherwise we have a memory leak which is memory allocated but unused, if it grows too large it can halt the execution of the program
An attempt of freeing a block of memory previously freed results in an undefined behavior i.e. don’t use delete twice on the same block of memory in succession

Additional notes regarding runtime allocated memory

Ordinary variable have their values stored in a memory region called the stack, memory allocated with new have their values stored in a memory region called the heap

// p_int address = 0x0065fd40
int* p_int = new int;
delete p_int;

int oranges = 5;
int* p_oranges = &oranges;
// INVALID since delete works only with memory allocated with new
delete p_oranges;

Dynamic arrays can be created with new typeName[count], a pointer can be assigned to the location of the first element of the dynamic array

// dynamic array
int* p_array = new int[10];

// p_array points to the first element of the array
// *p_array is the value of the first element using pointer dereferencing
// p_array[0] is also the value of the first element using array notation

delete [] p_array;

Dynamic structures can be created with new structName, when a pointer pointer to this block of memory we can access the properties with the arrow membership operator ->

struct person {
  string name;
  int age;
};
person* p_person = new person;
p_person->name = "john smith";
p_person->age = 25;

Functions

Steps to build a function

Provide a function prototype
Provide a function definition
Call the function

// function prototype
double cube(double x);

int main() {
  // function call
  double q = cube(2.2);
}

// function definition
double cube(double x) {
  return x * x * x;
}

Writing prototypes have the following advantages:

the compiler correctly handles the function return value
the compiler checks the use of the correct number of arguments
the compiler checks the use of the correct type of arguments (performing conversion to the correct type if possible)

When a function is called with basic types for arguments the function creates a new variable and initializes it with the same value, i.e. the function works with a copy with basic types

int main() {
  double x = 1.3;
  cube(x);
  // ..
}

double cube(double x) {
  // x is passed by value
  // x is private to this function
  return x * x * x;
}

However we can pass instead the address of the basic type which means that the function should be rewritten to use pointers

int main() {
  double x = 1.3;
  cube(&x);
  // ..
}

double cube(double* x) {
  // x is passed by value
  // x is private to this function
  return (*x) * (*x) * (*x);
}

This is useful for complex structures if we want to save time/space by passing a reference to the structure instead of passing the entire structure

struct person {
  string name;
  int age;
};

int main() {
  person john = { "john doe", 25 };
  greet(&john);
  // ..
}

double cube(person* someone) {
  // someone is private to this function
  // someone is a pointer to the original person
  someone->age;       // 25
}

When a function is called with an array what’s sent actually is the name of the array which is the address of the first element/a pointer-to-int (int *), this is different from basic types because the array is not copied, instead the function works with the original array

const int k_size = 3;

int main() {
  int a[k_size] = {1, 2, 3}
  sum(a, k_size);         // 6
  cout << *a << endl;     // 1
}

double sum(int* a, int k_size) {
  // a is another pointer to the original array
  // a is private to this function
  int sum = 0;
  for (int i = 0; i < k_size; ++i) {
    sum += *a;
    a++;
  }
  return sum;
}

Inline functions

When a program is executed and a function is about to be invoked the following steps occur with the program

store the memory address of the next instruction
copy function arguments to the stack
jump to the memory address the function is located
execute the function code
jump back to the instruction stored

A little enhancement to speed up the program is to make the function inline, that is the program replaces the function call with the function code avoiding the jumps

When to use it:

the function is small and called very often

inline double cube(double x) { return x * x * x; }

Reference variables

A reference variable is a name that acts as an alias on a previously defined variable

int p;
int& q = p;

In this context & is not the address operator, instead it serves as part of the type identifier, like int* is a pointer-to-int int& is a reference-to-int

a reference must be initialized to a defined variable when declared
a reference is like a const pointer e.g. int& r_n = n; is like int* const r_n = &n;

int n = 5;
int* p_n = &n;
int& r_n = n;

// the following expressions can be used interchangeably
// - *p_n, r_n, n  to get the value
// - p_n, &r_n, &n to get the address

Example with a function

int main() {
  int x = 2;
  pow2(x); // 4
  x;       // 2
}

int pow2(int& x) {
  // x is an alias to the x in main
  return x * x;
}

Note any change to x in pow2() will actually change the original value, to avoid this behavior use const e.g. int pow2(const int &x)

Reference arguments should be used to

allow the modification of data inside a function
speed the program by passing a reference instead of an entire data object

Classes

class Person {
  // var, functions declared here are private by default
private:
  // private vars and function prototypes
public:
  // public vars and function prototypes
  void sayHi();
};

Class member functions

class member functions can access the private components of the class
to identify to which class a function definition belongs to the operator :: is used

void Person::sayHi() { /* ... */ }

if a class member function won’t modify the instance created then use the const qualifier for the function

// function prototype
class Person {
  // ..
  void show() const;
}

// function definition
void Person::show() const { /* ... */ }

All class methods have a this pointer set to the address of the object that invokes this method, class members can be accessed through pointer dereferencing

Class constructor/destructor

a class has the default constructor by default, it has the form Person() {}
custom constructors/destructor can be defined as follows

class Person {
  string name;
  int age;
public:
  // implicit default constructor:
  //    Person() {}
  Person();                         // default constructor
  Person(string &name);             // operator overload
  Person(string &name, int &age);   // operator overload
  ~Person();                        // default destructor
}

// constructor definition
Person::Person() {
  // explicit default constructor
  // NOTE: constructor/destructor returns the class object (no need to add return)
}
Person::Person(string& name) { /* ... */ }
Person::Person(string& name, int& age) { /* ... */ }
Person::~Person() { /* ... */ }

Class objects

int main() {
  Person a;                               // default constructor
  Person b = Person("john", 25);          // with parameters
  Person c("john", 25);                   // alternative syntax
  Person* p_d = new Person("john", 25);   // pointer-to-Person

  b.show();
  p_d->show();
}

Operator overloading

class Time {
public:
  Time operator+(const Time& other) const;
}
// ..
Time Time::operator+(const Time& other) const {
  Tim total;
  // code for `total = other + *this`
  return total;
}
// ..
int main() {
  Time a, b;
  Time c = a + b;
  // translated to a.operator+(b)
}

Misc

Deciphering variable types

http://andybohn.com/deciphering-variable-types/

Find the identifier and start there
Sweep to the right, translating the symbols you see. You should stop your sweep to the right when you get to the end of the type, or if you see a lone right parenthesis ). Seeing a left parenthesis ( is the start of a function symbol, so continue sweeping right.
Sweep left of the identifier until you run out of symbols, or you hit a left parenthesis (. If you hit the left parenthesis now, you should go back to part 2, sweeping right, but now on the outside of the enclosing ), and continuing onto part 3 on the outside of the enclosing (.

Reading examples

Read a number and the next line as a string

// input:
//   1234\n
//   a line of text
int year;
string name;
(cin >> year).get();
getline(cin, name);

Read until a char is found (note that cin >> ch omits spaces)

char ch;
cin.get(ch);            // or ch = cin.get();
while (ch != '#') {
  // do something with ch
  cin.get(ch);
}

Read until EOF

int a, b;
// cin is an istream object that is casted to bool in this case
while (cin >> a >> b) { ... }

string str;
// same as before cin is casted to bool
while (getline(cin, str)) { ... }

char ch;
cin.get(ch);
// same as before cin is casted to bool
while (cin) { cin.get(ch); }

char ch;
while ((ch = cin.get()) != EOF) { ... }

Tokenize

// example: split the following line by commas
// 1,2,hello
stringstream tokens(line);
string token;
string id, rank, description;

getline(tokens, id, ',')
getline(tokens, rank, ',')
getline(tokens, description)

Read/write files

#include <fstream>

ifstream inFile;
inFile.open("input");
ofstream outFile;
outFile.open("output");

string line;
int n;

// reading input from file
getline(inFile, line);
inFile >> n;

// writing output to file
outFile << line;
outFile << n;

// close the stream
inFile.close();
outFile.close();

Read from file reusing the stdin stream, write to file reusing the stdout stream, see freopen

#include <cstdio>
freopen("input", "r", stdin);
freopen("output", "w", stdout);
// use cin here
// close the streams
fclose(stdin);
fclose(stdout);

Type casts

(long) value
long(value)
static_cast<long> (value)

// pointer cast
int* p_number = (int*) 0xB8000000;

Conversion between types

C++11

Declarations

auto automatic type deduction
decltype creates a variable of the type indicated by an expression

Range-based for loop

int numbers[] = {1, 2, 3, 4, 5};
for (int n : numbers) { ... }
for (int n : {1, 2, 3, 4}) { ... }
for (auto n : {1, 2, 3, 4}) { ... }

Multithreading

Back to basics: Concurrency CppCon 2020 https://www.youtube.com/watch?v=F6Ipn7gCOsY
Multithreading basics: https://classroom.udacity.com/courses/ud923
Concurrent Programming with C++: https://www.youtube.com/playlist?list=PL5jc9xFGsL8E12so1wlMS0r0hTQoJL74M

Threadpool

Culling & Clipping

Wed, 16 Mar 2016 11:03:05 +0000

There’s a problem when the objects transformed to NDC need to be rasterized, some objects that are behind the eye might be rendered leading to incorrect results

For example when the perspective projection matrix is used all the points’ $z$-coordinate will be mapped to NDC using

$$ z_{ndc} = \frac{Az_{cam} + B}{-z_{cam}} $$

If $n,f$ are the locations of the near and far plane in the negative $z$-axis in camera space and

$$ \begin{align*} A &= -\frac{f + n}{f - n} \\ B &= \frac{-2fn}{f - n} \end{align*} $$

Note that the equations above assume that $n,f \geq 0, n \leq f$ because $A$ and $B$ were already mapped using $-n \mapsto -1$ and $-f \mapsto 1$, for example when $n = 1$ and $f = 10$ the possible values can be described with the following plot

We see that objects behind the camera (points with $z_{cam} > 0$) are mapped to NDC as $z_{ndc} > 1$ i.e. in NDC points behind the camera are visible

For this reason there’s a preceding step in the rasterization process called clipping that removes parts of primitives that are outside the view volume (clipping against the six faces of the view volume), a basic implementation of the clipping process is described below

input: triangle, 6 planes of the view volume

for (each of the six planes) do
  if (the triangle is entirely outside the plane) then
    discard the triangle
  else if (the triangle passes through the plane) then
    clip the triangle
    if (the triangle is now a quadrilateral) then
      break the quadrilateral into two triangles

Culling is a process where geometry that’s not visible from the camera is discarded to save processing time

View volume culling - Geometry outside the view volume can be culled since it won’t produce fragments when rasterized, this process is specially useful when triangles are grouped into an object that has an associated bounding volume, then
Backface culling - polygons that face away from the camera can be culled before the pipeline starts

Affine spaces

Tue, 15 Mar 2016 12:19:52 +0000

Image a vector space where two points $P$ and $P’$ exist, then there’s a unique translation of the plane that maps $P$ to $P’$ which means that the space of translations in the plane can be identified with a set of vectors that exist in the plane, composition of translation correspond to addition of vectors e.g. $\v{PP’’} = \v{PP’} + \v{P’P’’}$

affine space

An affine space is a space where translation is defined, formally an affine space is a set $E$ (of points) that admits a free transitive action of a vector space $\v{E}$ (of translations) whose action results in an element of the set $E$, that is there’s a map $E \times \v{E} \rightarrow E: (a,\mathbf{v}) \mapsto a + \mathbf{v}$ such that

The zero vector acts as an identity i.e. for all $a \in E$, $a + \mathbf{0} = a$
Addition of vectors correspond to translations i.e. for all $a \in E$ and $\mathbf{u,v} \in \v{E}$, $x + (\mathbf{u} + \mathbf{v}) = (x + \mathbf{u}) + \mathbf{v}$
For any $a,b \in E$ there’s a unique free vector $\mathbf{u} \in \v{E}$ such that $a + \mathbf{u} = b$

The affine space is commonly represented with the triple $\left \langle E, \v{E}, + \right \rangle$ where $E$ is a set of points, $\v{E}$ a vector space acting on $E$ and an action $+: E \times \v{E} \rightarrow E$

Consider a subset $L$ of $\mathbb{A}^2$ consisting of points satisfying

$$ -x + y - 2 = 0 $$

Where any point has the form $(x, f(x)) = (x, 2 + x)$, the line can be made into an affine space by defining $+: L \times V \rightarrow L$ (note that $V$ is a vector space) so that for any $u \in V$

$$ (x, 2 + x) + u = (x + u, 2 + x + u) $$

For example the point $(-2,0)$ added with the vector $u = [1,1]$ results in the point $(-1, 1)$ which belongs to the set $L$, note that for the example above the vector space $V$ has only vectors parallels to $u = [1,1]$

Chasles’s Identity

Given any three points $a,b,c \in E$ we know that $c = a + \mathbf{ac}$, $b = a + \mathbf{ab}$ and $c = b + \mathbf{bc}$ by the axiom 3, therefore

$$ c = b + \mathbf{bc} = (a + \mathbf{ab}) + \mathbf{bc} = a + (\mathbf{ab} + \mathbf{bc}) $$

And thus

$$ \mathbf{ab} + \mathbf{bc} = \mathbf{ac} $$

Which is known as Chasles’s identity

Affine combinations

Consider $\mathbb{R}^2$ an affine space with its origin at $(0,0)$ and basis vectors $\mathbf{b_1} = [1, 0]$ and $\mathbf{b_2} = [0,1]$, given any two points $a,b \in \mathbb{R}^2$ with coordinates $a = (a_1,a_2)$ and $b = (b_1,b_2)$ we can define the affine combination $\lambda a + \mu b$ as the point of coordinates

$$ (\lambda a_1 + \mu b_1, \lambda a_2 + \mu b_2) $$

Let $\lambda = 1, \mu = 1$, $a = (-1,1)$ and $b = (2, 2)$ then $a + b = (1, 1)$

If we change the coordinate system to have an origin at $(1,1)$ with the same basis vectors then the coordinates of the given points are $a=(-2,-2)$ and $b=(1,1)$, the linear combination is then $a + b = (-1,-1)$ which is the same as the point $(0,0)$ of the first coordinate system, therefore $a+b$ corresponds to two different points depending on the coordinate system used

A restriction is needed for affine combinations to make sense and the restriction is that the scalar add up to 1

Lemma: Given an affine space $E,v{E},+$, let $a_i, i \in I$ be a family of points in $E$ and let $\lambda_i, i \in I$ a family of scalars then any two points $a,b \in E$ the following properties hold

$$ \begin{equation} \label{lemma-1} a + \sum_{i \in I} \lambda_i \mathbf{aa_i} = b + \sum_{i \in I} \lambda_i \mathbf{ba_i} \quad \text{if $\sum_{i \in I} \lambda_i = 1$} \end{equation} $$

> > and >

$$ \begin{equation} \label{lemma-2} \sum_{i \in I} \lambda_i \mathbf{aa_i} = \sum_{i \in I} \lambda_i \mathbf{ba_i} \quad \text{if $\sum_{i \in I} \lambda_i = 0$} \end{equation} $$

To prove \eqref{lemma-1} we apply Chasles’s identity

$$ \begin{align*} a + \sum_{i \in I} \lambda_i \mathbf{aa_i} &= a + \sum_{i \in I} \lambda_i (\mathbf{ab} + \mathbf{ba_i}) \\ &= a + (\sum_{i \in I} \lambda_i) \mathbf{ab} + \sum_{i \in I} \lambda_i \mathbf{ba_i} \\ &= a + \mathbf{ab} + \sum_{i \in I} \lambda_i \mathbf{ba_i} \quad \text{since $\sum_{i \in I} \lambda_i = 1$} \\ &= b + \sum_{i \in I} \lambda_i \mathbf{ba_i} \quad \text{since $b = a + \mathbf{ab}$} \\ \end{align*} $$

For \eqref{lemma-2} we also have

$$ \begin{align*} \sum_{i \in I} \lambda_i \mathbf{aa_i} &= \sum_{i \in I} \lambda_i (\mathbf{ab} + \mathbf{ba_i}) \\ &= (\sum_{i \in I} \lambda_i) \mathbf{ab} + \sum_{i \in I} \lambda_i \mathbf{ba_i} \\ &= \sum_{i \in I} \lambda_i \mathbf{ba_i} \quad \text{since $\sum_{i \in I} \lambda_i = 0$} \\ \end{align*} $$

Formally for any family of points $a_i, i \in I$ in $E$, for any family $\lambda_i, i \in I$ of scalars such that $\sum_{i \in I} \lambda_i = 1$ the point

$$ \begin{equation} \label{affine-combination} x = a + \sum_{i \in I} \lambda_i \mathbf{aa_i} \end{equation} $$

Is independent of $a \in E$ and is called the barycenter or affine combination of the points $a_i$ with weights $\lambda_i$, and is denoted as

$$ \sum_{i \in I} \lambda_i a_i $$

Affine maps

An affine map between two affine spaces $X$ and $Y$ is a map $f: X \rightarrow Y$ that preserves affine combinations i.e.

$$ f \left (\sum_{i \in I} \lambda_i a_i \right ) = \sum_{i \in I} \lambda_i f(a_i) $$

Vector spaces

Mon, 14 Mar 2016 17:07:51 +0000

A vector space is a set whose elements are called “vectors” (denoted as $\v{v}$ or $\mathbf{v}$) which have two operations defined on them: addition of vectors and multiplication of an scalar by a vector

Formally a vector space $V$ is a set with two operations $+$ and $*$ that satisfy the following properties

if $\mathbf{u},\mathbf{v} \in V$ then $\mathbf{u + v} \in V$
- $\mathbf{u + v} = \mathbf{v + u}$
- $\mathbf{u + (v + w)} = \mathbf{(u + v) + w}$
- There is an special element called the zero vector $\mathbf{0} \in V$ such that $\mathbf{u + 0} = \mathbf{0 + u} = \mathbf{u}$
- For every $\mathbf{u} \in V$ there’s an inverse element $-\mathbf{u}$ such that $\mathbf{u + (-u)} = \mathbf{0}$
if $\mathbf{u} \in V$ and $\alpha \in \mathbb{R}$ then $\alpha\mathbf{u} \in V$
- $(\alpha + \beta) \mathbf{u} = \alpha \mathbf{u} + \beta \mathbf{u}$
- $\alpha (\beta \mathbf{u}) = (\alpha\beta) \mathbf{u}$
- $1 \cdot \mathbf{u} = \mathbf{u}$

Notable examples of vectors spaces

Segments on the plane and space, addition uses the parallelogram law and multiplication by a scalar scales the segment
The set of $n \times n$ matrices with addition defined by element
The set of all polynomials
The space consisting of the zero vector alone $\{\mathbf{0}\}$

Vector subspaces

A subset $U \subseteq V$ of a vectors space $V$ is a subspace if

For all $\mathbf{u,v} \in U$, $\mathbf{u+v} \in U$
For all $\alpha \in \mathbb{R}$ and $\mathbf{u} \in U$, $\alpha \mathbf{u} \in U$

Linear dependence

A set of vectors is linearly dependent if one element from the set can be written as a linear combination of the other elements in the set, if this cannot be done then the set is linearly independent which is also known as a basis for some vector space, the dimension is the number of elements in the basis, if $\mathbf{b_1, b_2, \ldots, b_n}$ is a basis then any linear combination of the basis will have the form

$$ \mathbf{v} = a_1 \mathbf{b_1} + a_2 \mathbf{b_2} + \ldots + a_n \mathbf{b_n} $$

The numbers $a_1, a_2, \ldots, a_n$ are called the components of $\mathbf{v}$ in the specified basis, note that the basis doesn’t need to be orthogonal nor have unit vectors

The set of vectors $[1,0,0], [0,1,0], [0,0,1]$ is an example of a basis of dimension 3

Linear maps

A map between vectors spaces is linear if it preserves addition and multiplication with scalars as defined above, formally a map $L: U \rightarrow V$ is linear if

For all $\mathbf{u,v} \in U$, $L(\mathbf{u,v}) = L(\mathbf{u}) + L(\mathbf{v})$
For all $\alpha \in \mathbb{R}$ and $\mathbf{u} \in U$, $L(\alpha \mathbf{u}) = \alpha L(\mathbf{u})$

Additional operations

Norm

The norm of a vector is denoted by $\norm{\mathbf{v}}$ and satisfies

$\norm{\mathbf{v}} \geq 0$, $\norm{\mathbf{v}} = 0$ only if $\mathbf{v} = \mathbf{0}$
$\norm{\alpha \mathbf{v}} = \alpha \norm{\mathbf{v}}$
$\norm{\mathbf{v_1} + \mathbf{v_2}} \leq \norm{\mathbf{v_1}} + \norm{\mathbf{v_2}}$ (triangle sides)

Scalar product

The scalar product of two vectors is a function $f: V \times V \rightarrow \mathbb{R}$, the function is commonly denoted as $\left \langle \mathbf{v_1}, \mathbf{v_2} \right \rangle$ and satisfies

$\left \langle \mathbf{w, (u + v)} \right \rangle = \left \langle \mathbf{w,u} \right \rangle + \left \langle \mathbf{w,v} \right \rangle$
$\left \langle \mathbf{w},\alpha \mathbf{v} \right \rangle = \alpha \left \langle \mathbf{w,v} \right \rangle$
$\left \langle \mathbf{v,v} \right \rangle \geq 0$

Triangle in affine spaces

Thu, 10 Mar 2016 23:17:08 +0000

In an affine space there’s the concept of affine combination which states that any point in space can be represented as a affine combination in the form

$$ a + \sum_{i \in I} \lambda_i \mathbf{aa_i} \quad \quad \text{if $\sum_{i \in I} \lambda_i = 1$} $$

We can add an additional restriction on the values of $\lambda_i$ to define a triangle built out of three points, if $\lambda_1 = \beta, \lambda_2 = \gamma$, $\beta + \gamma = 1$ and $\beta, \gamma \in [0,1]$ then a triangle is defined as the affine combination

$$ a + \beta \mathbf{ab} + \gamma \mathbf{ac} $$

barycentric coordinates

One geometric property of the scalar values is that they’re the signed scaled distance from the lines that pass through the triangle sides, to compute the scalar values $\beta$ and $\gamma$ we can use the fact that when the implicit equation of the line that pass through a side is evaluated with points that don’t lie on the line the result is equal to

beta

$$ f(x,y) = d_{(x,y)} \cdot \sqrt{A^2 + B^2} $$

Where $d_{(x,y)}$ is the distance from the point $(x,y)$ to the line, $A$ and $B$ are the coefficients of $x$ and $y$ of the general equation of the line that passes through $a$ and $c$

$$ Ax + Bx + C = 0 $$

To find the value of $\beta$ we can use the value of the implicit equation of the line to map the distance between any point to the line in the range $[f_{ac}(x_a, y_a), f_{ac}(x_b, y_b)] = [0, f_{ac}(x_b, y_b)]$, we can use a simple division to find the value of $\beta$

$$ \beta = \frac{f_{ac}(x,y)}{f_{ac}(x_b, y_b)} = \frac{d_{(x,y)}}{d_{(x_b, y_b)}} $$

In a similar fashion the value of $\gamma$ is

$$ \gamma = \frac{f_{ab}(x,y)}{f_{ab}(x_c, y_c)} = \frac{d_{(x,y)}}{d_{(x_c, y_c)}} $$

Geometric tests

Wed, 09 Mar 2016 22:52:35 +0000

Line-line intersection

Given two lines in 3D defined as rays

$$ r_1(t_1) = \mathbf{p_1} + t_1 \mathbf{d_1} \\ r_2(t_2) = \mathbf{p_2} + t_2 \mathbf{d_2} $$

Where $t_1, t_2 \in \mathbb{R}$, the two lines intersect if

$$ \mathbf{p_1} + t_1 \mathbf{d_1} = \mathbf{p_2} + t_2 \mathbf{d_2} $$

We can apply the cross multiplication operation on both sides with $\mathbf{d_2}$ and work from there to find the value of $t_1$

$$ t_1 = \frac{\norm{(\mathbf{p_2} - \mathbf{p_1}) \times \mathbf{d_2} }}{ \norm{\mathbf{d_1} \times \mathbf{d_2}} } $$

Similarly we can find the value of $t_2$ by crossing with $\mathbf{d_1}$ and work from there to find the value of $t_2$

$$ t_2 = \frac{\norm{ (\mathbf{p_2} - \mathbf{p_1}) \times \mathbf{d_1}} }{ \norm{\mathbf{d_1} \times \mathbf{d_2}} } $$

The proof can be found here

We can actually solve this problem graphically by using triangle similarity , imagine the following situation

line line intersection

The intersection point $\mathbf{p}$ is equal to

$$ \begin{equation} \label{line-line-intersection-point} \begin{split} \mathbf{p} &= \mathbf{a} + \norm{\mathbf{p - a}} \unit{ \mathbf{b - a} } \\ &= \mathbf{a} + \norm{\mathbf{p - a}} \frac{ \mathbf{b - a} }{ \norm{\mathbf{b - a}} } \end{split} \end{equation} $$

By triangle similarity we see that

$$ \frac{ \norm{\mathbf{p - a}} }{ \norm{\mathbf{b - a}} } = \frac{ \norm{\mathbf{n -a}} }{ \norm{\mathbf{m - a}} } $$

Multiplying the left side with an identity

$$ \begin{equation} \label{line-line-triangle-similarity} \frac{ \norm{\mathbf{p - a}} }{ \norm{\mathbf{b - a}} } = \frac{ \norm{\mathbf{n -a}} }{ \norm{\mathbf{m - a}} } \frac{ \norm{\mathbf{d - c}} }{ \norm{\mathbf{d - c}} } \end{equation} $$

We see that the quantity $\norm{ \mathbf{ n - a } } \norm{\mathbf{d - c}}$ is equal to the equation of the area of a parallelogram, we can skew the parallelogram (in the graphic towards the $x$-axis) so that the left side becomes $\mathbf{c - a}$ and the bottom side $\mathbf{d - c}$ (which is not affected by the skew), note that the area can also be expressed with the cross product of the vectors $\mathbf{c - a}$ and $\mathbf{d - c}$ therefore

$$ \begin{equation} \label{numerator-area} \norm{\mathbf{n - a}} \norm{\mathbf{d - c}} = \norm{(\mathbf{c - a}) \times (\mathbf{d - c})} \end{equation} $$

A similar equation can be derived for the parallelogram with sides $\mathbf{m - a}$ and $\mathbf{d - c}$, only this time the skewed side will become $\mathbf{b - a}$

$$ \begin{equation} \label{denominator-area} \norm{\mathbf{m - a}} \norm{\mathbf{d - c}} = \norm{(\mathbf{b - a}) \times (\mathbf{d - c})} \end{equation} $$

Replacing \eqref{numerator-area}, \eqref{denominator-area} in \eqref{line-line-triangle-similarity} and \eqref{line-line-intersection-point} we see that the intersection point is equal to

$$ \mathbf{p} = \mathbf{a} + (\mathbf{b - a}) \frac{ \norm{(\mathbf{c - a}) \times (\mathbf{d - c})} }{ \norm{(\mathbf{b - a}) \times (\mathbf{d - c})} } $$

Transformation matrix to transform objects from NDC coordinates to screen coordinates (viewport transform)

Tue, 08 Mar 2016 22:20:58 +0000

The objective of this step is to find a transformation matrix to transform points expressed in normalized device coordinates to screen coordinates

$$ \mathbf{v}_{screen} = \mathbf{M}_{vp} \mathbf{v}_{ndc} $$

The canonical view volume needs to be mapped to the screen that has $n_x \times n_y$ pixels in a way so that points with $x = -1, x = 1$ are mapped to the left and right sides of the screen respectively and $y = -1, y = 1$ are mapped to the bottom and top sides of the screen respectively, the $z$ coordinate isn’t visible in a 2D image so it can be discarded for the mapping

Since the mapping is linear we can use the linear interpolation method

$$ f(x) = out_{lo} + (out_{hi} - out_{lo}) \frac{x - in_{lo}}{ in_{hi} - in_{lo} } $$

Given

$out_{lo} = -0.5$
$out_{hi} = n_x - 0.5$
$in_{lo} = -1$
$in_{hi} = 1$

The value of $x_{screen}$ is

$$ \begin{align*} x_{screen} &= -0.5 + n_x \frac{x_{ndc} + 1}{2} \\ &= -\frac{1}{2} + \frac{n_x}{2}x_{ndc} + \frac{n_x}{2} \\ &= \frac{n_x}{2}x_{ndc} + \frac{n_x - 1}{2} \end{align*} $$

The value of $y_{screen}$ is found in a similar way

$$ y_{screen} = \frac{n_y}{2}y_{ndc} + \frac{n_y - 1}{2} $$

Finally the transformation matrix that converts points from NDC to screen coordinates is

$$ \mathbf{M}_{vp} = \begin{bmatrix} \frac{n_x}{2} & 0 & 0 & \frac{n_x - 1}{2} \\ 0 & \frac{n_y}{2} & 0 &\frac{n_y - 1}{2} \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

Note that the $z$-coordinate doesn’t need to be modified since it doesn’t affect the projection in the image, the $z$-coordinate is still used to check the order in which objects should be drawn

Normals

Tue, 08 Mar 2016 14:18:11 +0000

A normal vector to a curve at a particular point is a vector perpendicular to the tangent vector of the curve at that point (also called a gradient). For an implicit 2D function in the form $f(x,y) = 0$ the 2D gradient is

$$ \nabla f(x,y) = \left ( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right ) $$

For an implicit 3D function the normal is the vector perpendicular to the surface, the surface normal at a point $\mathbf{p}$ is given by the gradient of the implicit function

$$ \mathbf{n} = \nabla f (\mathbf{p}) = \left ( \frac{\partial f(\mathbf{p})}{\partial x}, \frac{\partial f(\mathbf{p})}{\partial y}, \frac{\partial f(\mathbf{p})}{\partial z} \right ) $$

For a plane we know that the dot product of the normal $\mathbf{n}$ and any vector that lies in the plane is zero, therefore we can model a plane as the following implicit equation

$$ (\mathbf{p} - \mathbf{a}) \cdot \mathbf{n} = 0 $$

Where $\mathbf{p}$ and $\mathbf{a}$ are any two points lying on the plane, sometimes we want the equation of a plane through points $\mathbf{a, b, c}$, the normal can be found by taking the cross product of any two vectors on the plane

$$ \mathbf{n} = (\mathbf{b} - \mathbf{a}) \times (\mathbf{c} - \mathbf{a}) $$

Transforming normal vectors

Normal vectors do not transform the way we would like when they’re multiplied by a transformation matrix, if the points on a surface are transformed by the transformation matrix $\mathbf{M}$, a vector $\mathbf{t}$ tangent to the surface will still be tangent to the transformed surface, however a surface normal vector $\mathbf{n}$ may not be normal to the transformed surface

For example when a transformation matrix $\mathbf{M} = \mathbf{H_x}(s)$ that skews points toward the $x$ axis multiplies the normal vector $\mathbf{n}$, the resulting vector $\mathbf{Mn}$ is not normal to the surface, we would like to find a transformation matrix $\mathbf{N}$ so that $\mathbf{Nn}$ is indeed the surface normal

transforming normal

To find the value of $\mathbf{N}$ we start from the fact that the normal $\mathbf{n}$ and the tangent $\mathbf{t}$ are perpendicular

$$ \mathbf{ n \cdot t } = 0 $$

Expressed as a matrix multiplication

$$ \begin{equation} \label{perpendicular} \mathbf{n}^T \mathbf{t} = 0 \end{equation} $$

After the transformation they’re still perpendicular so

$$ (\mathbf{Nn})^T \mathbf{Mt} = 0 $$

Applying the transpose

$$ \begin{equation} \label{post-transformation} \mathbf{n}^T \mathbf{N}^T \mathbf{Mt} = 0 \end{equation} $$

Relating \eqref{post-transformation} with \eqref{perpendicular} we see that the only way that both equations hold true is that

$$ \mathbf{N}^T \mathbf{M} = \mathbf{I} $$

The value of $\mathbf{N}$ is then

$$ \begin{align*} \mathbf{N}^T \mathbf{M} &= \mathbf{I} \\ \mathbf{N}^T \mathbf{MM}^{-1} &= \mathbf{IM}^{-1} \\ \mathbf{N}^T &= \mathbf{M}^{-1} \\ \mathbf{N} &= (\mathbf{M}^{-1})^T \end{align*} $$

Eigenvalues and eigenvectors

Mon, 07 Mar 2016 12:50:15 +0000

Given an square matrix $\mathbf{M}$

an eigenvector $\mathbf{v}$ is a non-zero vector whose direction doesn’t change when multiplied by $\mathbf{M}$, note that $\mathbf{M}$ has an eigenvector then there are an infinite number of eigenvectors (vectors parallel to $\mathbf{v}$)
an eigvenvalue $\lambda$ is the scale factor associated with some eigenvector $\mathbf{v}$ of $\mathbf{M}$ has after the multiplication with $\mathbf{M}$

$$ \begin{equation} \label{eigenvector} \mathbf{Mv} = \lambda \mathbf{v} \end{equation} $$

Assuming that $\mathbf{M}$ has at least one eigenvector $\mathbf{v}$ we can do standard matrix multiplications to find it, first let’s manipulate the right side of \eqref{eigenvector} so that it also features a matrix multiplication

$$ \mathbf{Mv} = \lambda \mathbf{Iv} $$

Where $\mathbf{I}$ is the identity matrix, next we can rewrite the last equation as

$$ \mathbf{Mv} - \lambda \mathbf{Iv} = \mathbf{0} $$

Because matrix multiplication is distributive

$$ \begin{equation} \label{eigenvector-0} (\mathbf{M} - \lambda \mathbf{I})\mathbf{v} = \mathbf{0} \end{equation} $$

The quantity $\mathbf{M} - \lambda \mathbf{I}$ must not be invertible, if it had an inverse we could premultiply both sides by $(\mathbf{M} - \lambda \mathbf{I})^{-1}$ which would yield

$$ \begin{align*} (\mathbf{M} - \lambda \mathbf{I})^{-1}(\mathbf{M} - \lambda \mathbf{I})\mathbf{v} &= (\mathbf{M} - \lambda \mathbf{I})^{-1} \; \mathbf{0} \\ \mathbf{v} &= \mathbf{0} \end{align*} $$

The vector $\mathbf{v = 0}$ fulfills \eqref{eigenvector} however we’ll try to find a vector $\mathbf{v} \not = \mathbf{0}$, if such a condition is added then the matrix $\mathbf{M} - \lambda \mathbf{I}$ must not have an inverse which also means that its determinant is 0

$$ \left | \mathbf{M} - \lambda \mathbf{I} \right | = 0 $$

If $\mathbf{M}$ is a $2 \times 2$ matrix then

$$ \begin{align*} \label{lambda} \left | \mathbf{M} - \lambda \mathbf{I} \right | &= \begin{vmatrix} m_{11} - \lambda & m_{12} \\ m_{21} & m_{22} - \lambda \end{vmatrix} \\ & = \lambda^2 - (m_{11}+m_{22})\lambda + (m_{11}m_{22} - m_{12}m_{21}) \\ & = 0 \end{align*} $$

From \eqref{lambda} we can find two values for $\lambda$ which may be unique/imaginary, a similar manipulation for a $n \times n$ matrix will yield an $n$th degree polynomial, for $n \leq 4$ we can compute the solutions by analytical methods, for $n > 4$ only numeric methods are used

The associated eigenvector can be found by solving \eqref{eigenvector-0}

$$ \begin{bmatrix} m_{11} - \lambda & m_{12} \\ m_{21} & m_{22} - \lambda \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} $$

Applications

List of applications

if $\mathbf{M}$ is a transformation matrix then $\mathbf{v}$ is a vector that isn’t affected by the rotation part of $\mathbf{M}$, therefore $\mathbf{v}$ is the rotation axis of $\mathbf{M}$

Projective space

Fri, 04 Mar 2016 10:00:00 +0000

In Euclidean geometry two lines are said to be parallel if they lie in the same plane and never meet, moreover properties like this one don’t change when an Euclidean transformation is applied (translation/rotation), however what we perceive in real life is different from what’s described with Euclidean geometry.

This problem coincided with the one the renaissance artists had while trying to paint on a canvas, when they tried to paint tiles on a canvas they realized that the following rules applied:

parallel lines meet on the horizon

straight lines must be represented on the page by straight lines

the image of a conic is also a conic (for example a circle is drawn as an ellipse depending on the perspective)

projection

The French mathematician Girard Desargues researching more on this new type of geometry had the necessity to have a point at infinity, he introduced the concept of a line at infinity which helped defined the point infinity as follows

for every family of parallel lines on some ordinary plane there’s one point at infinity where they all meet which lies on the line at infinity

projective plane

An ordinary plane + the line at infinity is called a projective plane

Projective geometry exists in any number of dimensions (just like Euclidean geometry), When we take a picture using a camera the imaging process makes a projection from $P^3$ to $P^2$, such a process is called projective transformation

Properties of projective transformations

Preservation of type (points remain points and lines remain lines)
Incidence (a point remains on a line after transformation)

In 1D there’s the projective line which is an ordinary line + one point at infinity which can be reached by moving towards each end of the line

Projective line

spaces of 1-dimensional subspaces that exist in 2-dimensions i.e. any line that passes through the origin in a 2-dimensional space

Let $w$ be the vertical axis in this 2-dimensional space, the picture of a line at infinity is shown by looking at the line $w = 1$ and its relation with $w = 0$, almost all the lines that pass through the origin intercept $w = 1$ except the line $w = 0$, therefore we can see that the set of points that exists in the line $w = 1$ is the same as the set that contains all the interception points between all the possible 1-dimensional subspaces and $w = 1$, however the line $w = 0$ (which is also part of the 1-dimensional subspaces) doesn’t meet $w = 1$, so the $w = 0$ plays the role of infinity with respect to $w = 1$

Any 1D point is represented in projective geometry as the pair $(x, w)$, we can see that such a pair can be projected to $w = 1$ by dividing both coordinates by $w$ so any 1D point projected to this plane is represented by the pair $(\tfrac{x}{w}, 1)$ unless $w = 0$ which is in the line at infinity, any point at infinity has then the form $(x, 0)$

projective line

Projective plane

spaces of 1-dimensional subspaces that exist in 3-dimensions i.e. any line that passes through the origin in a 3-dimensional space

A similar situation is seen on a 3-dimensional space, in this space any line that passes through the origin intercepts the plane $w = 1$ except the set of lines that lie in the plane $w = 0$, so $w = 0$ plays the role of infinity with respect to $w = 1$

Any 2D point is represented in this space as the triplet $(x, y, w)$, just like 1D we can project any point to the plane $w = 1$ by dividing the triplet by $w$ which has the form $(\tfrac{x}{w}, \tfrac{y}{w}, 1)$ unless $w = 0$ which means that any point at infinity has the form $(x, y, 0)$

projective plane

Ray Tracing

Fri, 26 Feb 2016 17:03:44 +0000

A ray tracer emits a ray from each pixel toward the scene to determine the color of the pixel, the process of computing the color can be split in three parts

ray generation, the origin and direction of each pixel ray is computed
ray intersection, the ray finds the closest object intersecting the viewing ray
shading, where the intersection point, surface normal and other information is used to determine the color of the pixel

A ray can be represented with a 3D parametric line from the eye $\mathbf{e}$ to a point $\mathbf{s}$ on the image plane as

$$ \mathbf{p}(t) = \mathbf{e} + t(\mathbf{ s - e }) $$

Note that

$\mathbf{p}(0) = \mathbf{e}$
$\mathbf{p}(1) = \mathbf{s}$
if $0 < t_1 < t_2$ then $\mathbf{p}(t_1)$ is closer to $\mathbf{e}$ than $\mathbf{p}(t_2)$
if $t < 0$ then $\mathbf{p}(t)$ is behind $\mathbf{e}$

Camera coordinate system

All the rays start from the origin of an orthonormal coordinate frame known as the camera/eye coordinate system, in this frame the camera is looking at the negative $\mathbf{w}$ axis

camera

The coordinate system is built from

the viewpoint $\mathbf{e}$ which is at the origin of the camera coordinate system
the view direction which is $\mathbf{-w}$
the up vector which is used to construct a basis that has $\mathbf{v}$ and $\mathbf{w}$ in the plane defined by the view direction and the up vector

Ray generation

Pixel coordinates

The image dimensions is defined with four numbers

$l, r$, the position of the left and right edges
$t, b$, the position of the top and bottom edges

Note that the coordinates are expressed in the camera coordinate frame defined in a plane parallel to the $w=0$ plane (the $w=0$ plane is defined by the point $\mathbf{e}$ and the vectors $\mathbf{u}$ and $\mathbf{v}$)

The image has to be fitted within a rectangle of $n_x \times n_y$ pixels, for example the pixel $(0,0)$ has the position $(l + 0.5 \tfrac{r - l}{n_x}, b + 0.5 \tfrac{t - b}{n_y})$ note that the half-pixel measurement times pixel-dimension is because of the way a pixel is defined (see rendering ), a pixel with coordinates $(x, y)$ will have the position

$$ \begin{align*} u = l + (x + 0.5) \frac{r - l}{n_x} \\ v = b + (y + 0.5) \frac{t - b}{n_y} \end{align*} $$

Orthographic view

For an orthographic view all the rays will have the direction $-\mathbf{w}$, there isn’t a particular viewpoint however we can define all the rays to be emitted from the $w=0$ plane using the pixel’s image-plane position as the ray’s starting point

orthographic view

$$ \begin{align*} \mathbf{ray_{direction}} &= -\mathbf{w} \\ \mathbf{ray_{origin}} &= \mathbf{e} + u \mathbf{u} + v \mathbf{v} \end{align*} $$

Perspective view

For a perspective view all the rays will have the same origin $e$ but the image-plane is not located at $w=0$ but at some distance $d$ in the $-\mathbf{w}$ direction, this time each ray will have a varying direction based on the location of the pixel’s image-plane position respect to $e$

perspective view

$$ \begin{align*} \mathbf{ray_{direction}} &= -d \mathbf{w} + u \mathbf{u} + v \mathbf{v} \\ \mathbf{ray_{origin}} &= \mathbf{e} \end{align*} $$

Ray intersection

Once a ray in the form $\mathbf{e} + t\mathbf{d}$ is generated we find the first intersection with an object where $t > 0$, whenever there are many objects that intersect a ray the intersection point with the lowest $t$ is returned

The following pseudocode tests for “hits”

ray = e + td
t = infinity
for each `object` in the scene
  if `object` is hit by `ray` and `ray's t` < `t`
    hit object = `object`
    t = `ray's t`
return hit t < infinity

Shading

Once the visible surface is known the next step is to compute the value of the pixel using a shading model, which can be made out of simple heuristics or elaborate numeric computations

A shading model is designed to capture the process of light reflection on a surface, the important variables in this process are

$\mathbf{p}$ (intersection point) - the intersection point between a surface and a ray
$\mathbf{l}$ (light direction) - a unit vector pointing from the surface towards a light source, computed by normalizing the vector between the intersection point $\mathbf{p}$ and the light source position $\mathbf{l_s}$

$$ \mathbf{l} = \frac{\mathbf{l_s - p}}{\norm{\mathbf{l_s - p}}} $$

$\mathbf{v}$ (view direction) - a unit vector pointing from the surface towards the place the ray is emitted from, it’s computed by normalizing the vector between the intersection point $\mathbf{p}$ and the ray origin $\mathbf{ray_{origin}}$

$$ \mathbf{v} = \frac{\mathbf{ray_{origin} - p}}{\norm{\mathbf{ray_{origin} - p}}} \quad \text{or} \quad \mathbf{v} = -\mathbf{d} $$

$\mathbf{n}$ (surface normal) - a unit vector perpendicular to the surface at the point where the reflection is taking place
other characteristics of the light source and the surface depending on the shading model

Lambertian shading

One of the simplest shading models discovered by Lambert in the 18th century, the amount of energy from a light source that falls on a surface depends on the angle of the surface to the light

lambert

A surface facing directly the light receives maximum illumination
A surface tangent to the light receives no illumination
A surface facing away from the light receives no illumination

Thus the illumination is proportional to the cosine of the angle between $\mathbf{n}$ and $\mathbf{l}$ i.e. $\mathbf{n \cdot l} = \cos{\theta}$, the color of the pixel is then

$$ L = k_d \cdot I \cdot max(0, \mathbf{n \cdot l}) $$

Where

$k_d$ is the diffuse coefficient, a characteristic of the surface
$I$ is the intensity of the light source

Additional notes of this model

The model is view independent
The color of the surface appears to have a very matte, chalky appearance

Blinn-Phong shading

Many surfaces show some degree of highlights (shininess) or specular reflections that appear to move as the viewpoint changes, the idea is to produce reflections when $\mathbf{v}$ and $\mathbf{l}$ are positioned symmetrically across the surface normal

blinn phong

the half vector $\mathbf{h}$ is a unit vector that goes through the bisector of the angle between $\mathbf{v}$ and $\mathbf{l}$

$$ \mathbf{h} = \frac{\mathbf{v + l}}{\norm{\mathbf{v + l}}} $$

Also

if $\mathbf{h}$ is near $\mathbf{n}$ then the specular component should be bright, if it’s far away it should be dim, therefore the illumination is proportional to the cosine of the angle between $\mathbf{n}$ and $\mathbf{h}$ i.e. $\mathbf{n \cdot h} = \cos {\theta}$
the specular component decreases exponentially when $\mathbf{h}$ is far away from $\mathbf{n}$, therefore the result is taken to the $p$ power, $p > 1$ to make it decrease faster

The color of the pixel is then

$$ L = k_d \cdot I \cdot max(0, \mathbf{n \cdot l}) + k_s \cdot I \cdot max(0, \mathbf{n \cdot h})^p $$

Where

$k_s$ is the specular coefficient, a characteristic of the surface
$I$ is the intensity of the light source
$p$ is a variable that controls how fast the result decreases

Note that the color of the pixel is the overall contribution of both the lambertian shading model and the blinn-phong shading model

Ambient shading

Surfaces that receive no illumination are rendered completely black, to avoid this a constant component is added to the shading model, the color depends entirely on the object hit with no dependence on the surface geometry

$$ L = k_a \cdot I_a + k_d \cdot I \cdot max(0, \mathbf{n \cdot l}) + k_s \cdot I \cdot max(0, \mathbf{n \cdot h})^p $$

Where

$k_a$ is the surface ambient coefficient
$I_a$ is the ambient light intensity

Rendering

Fri, 26 Feb 2016 16:59:48 +0000

An image can be abstracted as a function

$$ I(x,y): R \rightarrow V $$

Where $R \in \mathbb{R}^2$ is a rectangular area and $V$ is a set with the possible pixel values, the following are examples of the set $V$

$V = \mathbb{R}^+$ (non-negative reals) for grayscale images, each pixels represents only brightness and no color
$V = (\mathbb{R}^+)^3$ (combinations of 3 sets of non-negative reals), which is a color image with red/green/blue values for each pixel

Pixels

A pixel from a camera or scanner is a measurement of the average color of the image in the surrounding area near the pixel

If an image has $n_x$ columns and $n_y$ rows a common convention is to count rows and columns from the bottom left, the bottom left pixel is $(0,0)$ and the top-right is pixel $(n_x - 1, n_y - 1)$

Note that because of the definition gave to a pixel the coordinate $(0,0)$ is mapped to the center of the pixel $(0,0)$, therefore half-pixel will exist in both the $-\mathbf{x}$-axis and the $-\mathbf{y}$-axis

pixel coordinates

So the domain of a $n_x \times n_y$ image is

$$ R = [0.5, n_x - 0.5] \times [0.5, n_y - 0.5] $$

Pixel values

The value of a pixel depends on the precision and range of value needed, for example high dynamic range (HDR) images store floating-point numbers allowing a wide range of values, low dynamic range (LDR) images are instead stored with integers, the following pixel-values are used in a variety of applications

1-bit grayscale per pixel - images where intermediate grays are not needed e.g. text
8-bit grayscale per pixel - images with intermediate grays, it can store a total of 256 gray values e.g. a grayscale photo
8-bit red, green and blue (RGB), 24-bits per pixel - full color images that allow near 16 million possible values, e.g. consumer photographs, web and email applications
12- to 14-bit RGB, 36-42 bits per pixel - raw camera images for professional photography
16-bit half precision RGB, 48 bits per pixel - HDR images used in real time rendering
32-bit floating-point RGB, 96 bits per pixel - HDR images for software rendering

Transformation matrix for projection of 3D objects into a 2D plane (projection transform)

Sun, 14 Feb 2016 12:18:26 +0000

The canonical view volume is a cube with its extreme points at $[-1, -1, -1]$ and $[1, 1, 1]$. Coordinates in this view volume are called normalized device coordinates (NDC), the objective of this step is to build a transformation matrix so that a region of space we want to render called the view volume is mapped to the canonical view volume

$$ \mathbf{v}_{ndc} = \mathbf{M}_{proj} \mathbf{v}_{view} $$

Some points expressed in view space won’t be part of the view volume and will be discarded after the transformation, this process is called clipping (we only need to check if any coordinate of a point is outside the range $[-1, 1]$ to discard it)

Later it’ll be seen that both transformations imply division and a neat trick is the use of projective geometry to avoid division, any point that has the form $(\alpha x, \alpha y, \alpha z, 1)$ can be represented as $(x, y, z, \tfrac{1}{\alpha})$ in homogeneous coordinates, so we can introduce an intermediate step which transforms the points to clip coordinates and then to normalized device coordinates by doing a division with the $w$-coordinate $\tfrac{1}{1/\alpha} = \alpha$

$$ \begin{align*} \mathbf{v}_{clip} = \mathbf{M}_{proj} \mathbf{v}_{view} \\ \mathbf{v}_{ndc} = \alpha \mathbf{v}_{clip} \end{align*} $$

Orthographic projection

An orthographic projection matrix is built with 6 parameters

left, right: planes in the $x$-axis
bottom, top: planes in the $y$-axis
near, far: planes in the $z$-axis

These parameters bound the view volume which is an axis-aligned bounding box

Ortographic Projection

Since the mapping of the range $[l, r]$ to the range $[-1, 1]$ is linear we can use the equation of the line $y = mx + b$ and find the values of $m$ and $b$ however we can intuitively get a similar equation by creating a function $f(x)$ so that $f(0) = -1$ and $f(1) = 1$, we can create a nested function $g(x)$ so that $g(l) = 0$ and $g(r) = 1$ (note that $[l, r]$ is the input range) then $f(x)$ has the form

$$ \begin{align} f(x) &= -1 + 2 \; g(x) \\ g(x) &= \frac{x - l}{r - l} \end{align} $$

Finally $f(x)$ has the form

$$ \begin{align} f(x) &= -1 + 2 \frac{x - l}{r - l} \nonumber \\ &= \frac{l - r}{r - l} + \frac{2}{r - l}x - \frac{2l}{r - l} \nonumber \\ &= \frac{2}{r - l}x + \frac{-l - r}{r - l} \nonumber \\ &= \frac{2}{r - l}x - \frac{r + l}{r - l} \label{linear-mapping} \end{align} $$

We can adapt \eqref{linear-mapping} to have a similar form for the y-coordinate using $t$ and $b$. These equations are transformations from view space to clip space:

$$ x_{clip} = \frac{2}{r - l}x_{view} - \frac{r + l}{r - l} $$

$$ y_{clip} = \frac{2}{t - b}y_{view} - \frac{t + b}{t - b} $$

The $z_{clip}$ value will be different from the ones above since we’re mapping $[-n, -f] \Rightarrow [-1, 1]$

$$ \begin{align*} z_{clip} &= \frac{2}{-f - (-n)}z_{view} - \frac{-f + (-n)}{-f - (-n)} \\ &= \frac{2}{-f + n}z_{view} - \frac{-f - n}{-f + n} \\ &= -\frac{2}{f - n}z_{view} + \frac{-f - n}{f - n} \\ &= -\frac{2}{f - n}z_{view} - \frac{f + n}{f - n} \end{align*} $$

The $w$ is left untouched since the projection doesn’t imply division, the general orthographic projection matrix is

$$ \begin{equation} \label{orthographic-projection} \mathbf{M}_{proj} = \begin{bmatrix} \tfrac{2}{r - l} & 0 & 0 & -\tfrac{r + l}{r - l} \\ 0 & \tfrac{2}{t - b} & 0 & -\tfrac{t + b}{t - b} \\ 0 & 0 & -\tfrac{2}{f - n} & -\tfrac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{equation} $$

The transformation matrix from view space to clip space is

$$ \begin{align*} \mathbf{v}_{clip} &= \mathbf{M}_{proj} \mathbf{v}_{view} \\ \begin{bmatrix} x_{clip} \\ y_{clip} \\ z_{clip} \\ w_{clip} \end{bmatrix} &= \begin{bmatrix} \tfrac{2}{r - l} & 0 & 0 & -\tfrac{r + l}{r - l} \\ 0 & \tfrac{2}{t - b} & 0 & -\tfrac{t + b}{t - b} \\ 0 & 0 & -\tfrac{2}{f - n} & -\tfrac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_{view} \\ y_{view} \\ z_{view} \\ w_{view} \end{bmatrix} \end{align*} $$

Finally note that $w_{clip}$ will always have the value of $w_{view} = 1$, therefore the transformation to NDC will not modify the coordinates

$$ \begin{bmatrix} x_{ndc} \\ y_{ndc} \\ z_{ndc} \end{bmatrix} = \begin{bmatrix} x_{view}/1 \\ y_{view}/1 \\ z_{view}/1 \end{bmatrix} $$

Building the matrix using combined transformations

A simpler way to think about this orthographic projection transformation is by splitting it in three steps

translation of the bottom left near corner to the origin i.e. $[l, b, -n] \rightarrow [0, 0, 0]$
scale it to be a 2-unit length cube
translation of the bottom left corner from the origin i.e. $[0, 0, 0] \rightarrow [-1, -1, -1]$

$$ \begin{align*} \mathbf{M}_{proj} &= \begin{bmatrix} 1 & 0 & 0 & -1 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \tfrac{2}{r - l} & 0 & 0 & 0 \\ 0 & \tfrac{2}{t - b} & 0 & 0 \\ 0 & 0 & -\tfrac{2}{f - n} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -l \\ 0 & 1 & 0 & -b \\ 0 & 0 & 1 & n \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \\ \ &= \begin{bmatrix} 1 & 0 & 0 & -1 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \tfrac{2}{r - l} & 0 & 0 & -\frac{2l}{r - l} \\ 0 & \tfrac{2}{t - b} & 0 & -\frac{2b}{t - b} \\ 0 & 0 & -\tfrac{2}{f - n} & -\frac{2n}{f - n} \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ \ &= \begin{bmatrix} \tfrac{2}{r - l} & 0 & 0 & -\frac{2l}{r - l} - 1 \\ 0 & \tfrac{2}{t - b} & 0 & -\frac{2b}{t - b} - 1 \\ 0 & 0 & -\tfrac{2}{f - n} & -\frac{2n}{f - n} - 1 \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ \ &= \begin{bmatrix} \tfrac{2}{r - l} & 0 & 0 & -\tfrac{r + l}{r - l} \\ 0 & \tfrac{2}{t - b} & 0 & -\tfrac{t + b}{t - b} \\ 0 & 0 & -\tfrac{2}{f - n} & -\tfrac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \end{bmatrix} \ \end{align*} $$

Perspective projection

Projective geometry concepts are used in this type of projection, particularly the fact that objects away from the point of view appear smaller after projection, this type of projection mimics how we perceive objects in reality

A perspective projection matrix is built with 6 parameters, left, right, bottom, top, near, far

left, right: $x$-axis bounds for the near plane
bottom, top: $y$-axis bounds for the near plane
near, far: planes in the $z$-axis, the interception point of the line passing through the origin parallel to the vector $[l,b,-n]$ and the plane far is the bottom left far extreme of the view volume, a similar logic is used to find all the extremes in the far plane of the view volume

These parameters define a truncated pyramid also called a frustum

Perspective projection

General perspective projection matrix

The mapping of the range $[l,r]$ to the range $[-1,1]$ can be split into two steps

Project all the points to the near plane, this way all the $x$- and $y$-coordinates will be inside the range $[l,r] \times [b,t]$
Map all the values in the range $[l,r]$ and $[b,t]$ to the range $[-1, 1]$

Top view of the frustum

Side view of the frustum

Let $\mathbf{v}_{view}$ be a vector in view space which is going to be transformed to clip space, by similar triangles we see that the value of $x_p$ and $y_p$ (the coordinates projected to the near plane) is

$$ \begin{align} \label{projection-near} \frac{x_p}{x_{view}} &= \frac{-n}{z_{view}} \quad \quad x_p = \frac{n \cdot x_{view}}{-z_{view}} \\ \frac{y_p}{y_{view}} &= \frac{-n}{z_{view}} \quad \quad y_p = \frac{n \cdot y_{view}}{-z_{view}} \end{align} $$

Note that both quantities are inversely proportional to $-z_{view}$, what we can do is manipulate the coordinate so that it has a common denominator

$$ \begin{bmatrix} \tfrac{n \cdot x_{view}}{-z_{view}} & \tfrac{n \cdot y_{view}}{-z_{view}} & n \tfrac{z_{view}}{-z_{view}} \end{bmatrix}^T = \frac{ \begin{bmatrix} n \cdot x_{view} & n \cdot y_{view} & n \cdot z_{view} \end{bmatrix}^T }{-z_{view}} $$

The point in homogeneous coordinates is

$$ \begin{bmatrix} n \cdot x_{view} & n \cdot y_{view} & n \cdot z_{view}& \tfrac{1}{-z_{view}} \end{bmatrix}^T $$

OpenGL will then project any 4D homogeneous coordinate to the 3D hyperplane $w=1$ by dividing each of the coordinates by $w$, note that this division operation isn’t done by the application but by OpenGL itself on a further step on the rendering pipeline

We can take advantage of this process and use $-z_{view}$ as our $w$, with this in mind we can construct a transformation matrix so that transformed points have $w = -z_{view}$

$$ \begin{align} \begin{bmatrix} x_{clip} \\ y_{clip} \\ z_{clip} \\ w_{clip} \end{bmatrix} &= \begin{bmatrix} . & . & . & . \\ . & . & . & . \\ . & . & . & . \\ 0 & 0 & -1 & 0 \end{bmatrix} \begin{bmatrix} x_{view} \\ y_{view} \\ z_{view} \\ w_{view} \end{bmatrix} \label{pm1} \\ \therefore w_{clip} &= -z_{view} \nonumber \end{align} $$

Where $x_{clip}, y_{clip}, z_{clip}, w_{clip}$ are expressed in terms of the clip space, when each coordinate is divided by $w_{clip}$ we’ll have NDC

$$ \begin{bmatrix} x_{ndc} \\ y_{ndc} \\ z_{ndc} \end{bmatrix} = \begin{bmatrix} x_{clip}/w_{clip} \\ y_{clip}/w_{clip} \\ z_{clip}/w_{clip} \end{bmatrix} $$

Next $x_p$ and $y_p$ are mapped linearly to $[-1,1]$, we can use the function to perform linear mapping \eqref{linear-mapping}

$$ \begin{align} x_{ndc} = \frac{2}{r - l}x_p - \frac{r + l}{r - l} \nonumber \\ y_{ndc} = \frac{2}{t - b}y_p - \frac{t + b}{t - b} \label{ndc-near} \end{align} $$

Next we substitute the values of $x_p$ \eqref{projection-near} in $x_{ndc}$ \eqref{ndc-near}

$$ \begin{align*} x_{ndc} &= \frac{2}{r - l}\frac{n \cdot x_{view}}{-z_{view}} - \frac{r + l}{r - l} \\ &= \frac{2n}{r - l} \frac{x_{view}}{-z_{view}} - \frac{r + l}{r - l} \frac{-z_{view}}{-z_{view}} \\ &= \left ( \frac{2n}{r - l} x_{view} + \frac{r + l}{r - l} z_{view} \right ) \big / -z_{view} \end{align*} $$

Note that the second fraction is manipulated so that it’s also divisible by $-z_{view}$, also note that the quantity in the parenthesis is in clip space coordinates: $x_{clip}$

$$ x_{clip} = \frac{2n}{r - l} x_{view} + \frac{r + l}{r - l} z_{view} $$

Similarly the value of $y_{clip}$ is

$$ y_{clip} = \frac{2n}{t - b} y_{view} + \frac{t + b}{t - b} z_{view} $$

Then the transformation matrix seen in \eqref{pm1} is now

$$ \begin{equation} \label{pm2} \begin{bmatrix} x_{clip} \\ y_{clip} \\ z_{clip} \\ w_{clip} \end{bmatrix} = \begin{bmatrix} \tfrac{2n}{r - l} & 0 & \tfrac{r + l}{r - l} & 0 \\ 0 & \tfrac{2n}{t - b} & \tfrac{t + b}{t - b} & 0 \\ . & . & . & . \\ 0 & 0 & -1 & 0 \end{bmatrix} \begin{bmatrix} x_{view} \\ y_{view} \\ z_{view} \\ w_{view} \end{bmatrix} \end{equation} $$

Next we need to find the value of $z_{clip}$, note that the projected value is always a constant because the $z_{clip}$ component depends on $z_{view}$ and is also divided by $-z_{view}$, we need $z_{clip}$ to be unique for the clipping and depth test, plus we should be able to unproject it (through an inverse transformation)

Since $z_{ndc}$ doesn’t depend on $x_{view}$ or $y_{view}$ we can borrow the $w$-coordinate to find the relationship between $z_{ndc}$ and $z_{view}$, with that in mind we can make the third row of \eqref{pm2} equal to

$$ \begin{equation} \label{pm3} \begin{bmatrix} x_{clip} \\ y_{clip} \\ z_{clip} \\ w_{clip} \end{bmatrix} = \begin{bmatrix} \tfrac{2n}{r - l} & 0 & \tfrac{r + l}{r - l} & 0 \\ 0 & \tfrac{2n}{t - b} & \tfrac{t + b}{t - b} & 0 \\ 0 & 0 & A & B \\ 0 & 0 & -1 & 0 \end{bmatrix} \begin{bmatrix} x_{view} \\ y_{view} \\ z_{view} \\ w_{view} \end{bmatrix} \end{equation} $$

Then $z_{ndc}$ has the form

$$ z_{ndc} = \frac{z_{clip}}{w_{clip}} = \frac{Az_{view} + Bw_{view}}{-z_{view}} $$

Since $w_{view}=1$ in view space

$$ z_{ndc} = \frac{Az_{view} + B}{-z_{view}} $$

Note that the value is not linear but it needs to be mapped to $[-n, -f] \mapsto [-1,1]$, substituting the desired output range $[-1, 1]$ as $z_{ndc}$ we have a system of equations

$$ \begin{cases} -1 &= \frac{-An + B}{n} \\ 1 &= \frac{-Af + B}{f} \end{cases} \rightarrow \begin{cases} -An + B &= -n \\ -Af + B &= f \end{cases} $$

Subtracting the second equation from the first

$$ \begin{align*} -An + B + Af - B &= -n - f \\ A (f - n) &= -n - f \\ A = -\frac{f + n}{f - n} \end{align*} $$

Solving for $B$ given $A$

$$ \frac{f + n}{f - n}n + B = -n $$

$$ \begin{align*} B &= -n - \frac{f + n}{f - n}n \\ &= \frac{-fn + n^2 - fn - n^2}{f - n} \\ &= \frac{-2fn}{f - n} \\ \end{align*} $$

Substituting the values of $A$ and $B$ in \eqref{pm3} we have the general perspective projection matrix

$$ \begin{equation} \label{pm4} \mathbf{M}_{proj} = \begin{bmatrix} \tfrac{2n}{r - l} & 0 & \tfrac{r + l}{r - l} & 0 \\ 0 & \tfrac{2n}{t - b} & \tfrac{t + b}{t - b} & 0 \\ 0 & 0 & -\tfrac{f + n}{f - n} & \tfrac{-2fn}{f - n} \\ 0 & 0 & -1 & 0 \end{bmatrix} \end{equation} $$

Symmetric perspective projection matrix

If the viewing volume is symmetric i.e. $r = -l$ and $t = -b$ then some quantities can be simplified

$$ r + l = 0, \quad r - l = 2r \\ t + b = 0, \quad t - b = 2t $$

Then \eqref{pm4} becomes

$$ \begin{equation} \label{pm5} \mathbf{M}_{proj} = \begin{bmatrix} \tfrac{n}{r} & 0 & 0 & 0 \\ 0 & \tfrac{n}{t} & 0 & 0 \\ 0 & 0 & -\tfrac{f + n}{f - n} & \tfrac{-2fn}{f - n} \\ 0 & 0 & -1 & 0 \end{bmatrix} \end{equation} $$

Symmetric perspective projection matrix from field of view/aspect

gluPerspective receives instead of the $x$ and $y$ bounds two arguments

field of view ($fov$) which specifies the field of view angle in the $y$ direction
aspect ($aspect$) which is the aspect ratio that determines the field of view in the $x$ direction calculated as $\tfrac{x}{y}$, the value is commonly $\tfrac{screen\ width}{screen\ height}$

fov

We see that the value of $t$ (top) is

$$ \begin{align} \tan{ (fov/2) } &= \frac{t}{n} \\ \label{fov-t} t &= n \cdot \tan{ (fov/2) } \end{align} $$

We can find the value of $r$ (right) with the aspect ratio

$$ \begin{align} aspect &= \frac{2r}{2t} = \frac{r}{t} \\ r &= aspect \cdot t \\ \label{fov-r} &= aspect \cdot n \cdot \tan{(fov/2)} \end{align} $$

Substituting \eqref{fov-t} and \eqref{fov-r} in \eqref{pm5}

$$ \begin{equation} \label{pm6} \mathbf{M}_{proj} = \begin{bmatrix} \tfrac{1}{aspect \cdot \tan{ (fov/2) } } & 0 & 0 & 0 \\ 0 & \frac{1}{\tan{ (fov/2) }} & 0 & 0 \\ 0 & 0 & -\tfrac{f + n}{f - n} & \tfrac{-2fn}{f - n} \\ 0 & 0 & -1 & 0 \end{bmatrix} \end{equation} $$

Transformation matrix to transform 3D objects from World Space to View Space (View transform)

Sat, 13 Feb 2016 11:59:56 +0000

The objective of this step is to find a transformation matrix to transform points expressed in world space to view space, a camera can be imagined to exist from a known point of view that captures some objects of the space

$$ \mathbf{v}_{view} = \mathbf{M}_{view} \mathbf{v}_{wld} $$

The construction of the transformation matrix to transform points from world space to view space needs 3 parameters:

$\mathbf{camera}$ a point expressed in world space defining the location of the point of view, note that the $\mathbf{camera}$ is at the origin of the view space
$\mathbf{at}$ the direction where the camera is aiming at
$\mathbf{up}$ denotes the upward orientation of the camera (typically coincides with the positive $y$-axis)

view transform

Note that the camera is looking at the negative $z$-axis of the view space, this is a convention rather than a rule since the projection matrix will be constructed in a way so that points in the $-z$-axis in view space are transformed to the range $[-1,1]$

Derivation of the view transform matrix

The process of transforming the vertices in the world space to view space is given by

Creation of a coordinate frame for the view space
Application of the appropriate translation for the camera location (world space -> upright space)
Transformation of the points in world space to camera space (upright space -> object space)

Creation of a coordinate frame for the view space

Given $\mathbf{camera}$, $\mathbf{at}$ and $\mathbf{up}$ the steps to compute the coordinate frame are whose basis vectors are $\mathbf{u}$, $\mathbf{v}$ and $\mathbf{w}$ (note that since these are basis vectors they need to be unit vectors)

compute $\mathbf{w}$ trivially by normalizing the vector $\mathbf{camera - at}$

$$ \mathbf{w} = \frac{\mathbf{camera - at}}{\norm{\mathbf{camera - at}}} $$

next $\mathbf{u}$ can be computed with the cross product of $\mathbf{w}$ and $\mathbf{up}$, again the resulting vector must be normalized

$$ \mathbf{u} = \frac{\mathbf{w} \times \mathbf{up}}{\norm{ \mathbf{w} \times \mathbf{up} }} $$

finally $\mathbf{v}$ can be computed as

$$ \mathbf{v} = \mathbf{w} \times \mathbf{u} $$

Camera translation

The transformation matrix that moves all the points from world space to view’s upright space is

$$ \mathbf{T} = \begin{bmatrix} 1 & 0 & 0 & -camera_x \\ 0 & 1 & 0 & -camera_y \\ 0 & 0 & 1 & -camera_z \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

Transformation of the points from world space to view space

Given that the camera transformation basis vectors (encoded in a matrix) are

$$ \mathbf{M}_{wld \leftarrow view} = \begin{bmatrix} \mathbf{u}_{3 \times 1} & \mathbf{v}_{3 \times 1} & \mathbf{w}_{3 \times 1} \end{bmatrix} $$

Expressed in a 4x4 matrix

$$ \mathbf{M}_{wld \leftarrow view} = \begin{bmatrix} x_u & x_v & x_w & 0 \\ y_u & y_v & y_w & 0 \\ z_u & z_v & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

Works as a transformation matrix to transform points from view space to world space, therefore the matrix that does the opposite operation (transformation from world space to view space) is the inverse of this matrix (the transpose is equivalent since the matrix is orthonormal)

$$ \mathbf{M}_{view \leftarrow wld} = \mathbf{M^{-1}}_{wld \leftarrow view} = \mathbf{M}^T_{wld \leftarrow view} = \begin{bmatrix} x_u & y_u & z_u & 0 \\ x_v & y_v & z_v & 0 \\ x_w & y_w & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

The view matrix

We can combine the translation and the rotation matrix in a single matrix called the view matrix which has the form

$$ \begin{align*} \mathbf{M}_{view} &= \mathbf{M}_{view \leftarrow wld} \mathbf{T} \\ &= \begin{bmatrix} x_u & y_u & z_u & 0 \\ x_v & y_v & z_v & 0 \\ x_w & y_w & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -camera_x \\ 0 & 1 & 0 & -camera_y \\ 0 & 0 & 1 & -camera_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ &= \begin{bmatrix} x_u & y_u & z_u & -\mathbf{camera \cdot u} \\ x_v & y_v & z_v & -\mathbf{camera \cdot v} \\ x_w & y_w & z_w & -\mathbf{camera \cdot w} \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{align*} $$

Combining Matrix Transformations

Wed, 10 Feb 2016 21:13:17 +0000

This article is part 5 in the series about transformation matrices:

We can compose a series of transformations by multiplying the matrices that define the transformation, for example if we have one object in the world with arbitrary position and orientation that we want to render through a camera lens located in the same world also with arbitrary position and orientation, to get the coordinates of the object relative to the camera lens we must transform the object from object space to world space (transformation known as model transform) denoted by the matrix $\mathbf{M}_{world \leftarrow object}$, and then transform the vertices of the object from world space to view space (transformation known as view transform) denoted with $\mathbf{M}_{view \leftarrow world}$

$$ \begin{align*} \mathbf{v}_{world} &= \mathbf{M}_{world \leftarrow object} \mathbf{v}_{object} \\ \mathbf{v}_{view} &= \mathbf{M}_{view \leftarrow world} \mathbf{v}_{world} \\ &= \mathbf{M}_{view \leftarrow world} \mathbf{M}_{world \leftarrow object} \mathbf{v}_{object} \end{align*} $$

We can associate the transformation matrices and have a single matrix to transform vertices of the object directly to camera space

$$ \begin{align*} \mathbf{v}_{view} &= (\mathbf{M}_{view \leftarrow world} \mathbf{M}_{world \leftarrow object})\mathbf{v}_{object} \\ &= \mathbf{M}_{view \leftarrow object} \mathbf{v}_{object} \end{align*} $$

Now if we have two transformation matrices $\mathbf{M}$ and $\mathbf{N}$ and they are applied to some vector $\mathbf{v}$ in that respective order their product is

$$ \begin{align*} \mathbf{NM} = \begin{bmatrix} \mathbf{\cuv{s}} & \mathbf{\cuv{t}} & \mathbf{\cuv{u}} \end{bmatrix} \begin{bmatrix} \mathbf{\cuv{p}} & \mathbf{\cuv{q}} & \mathbf{\cuv{r}} \end{bmatrix} = \begin{bmatrix} p_x \mathbf{s} + p_y \mathbf{t} + p_z \mathbf{u} \\ q_x \mathbf{s} + q_y \mathbf{t} + q_z \mathbf{u} \\ r_x \mathbf{s} + r_y \mathbf{t} + r_z \mathbf{u} \\ \end{bmatrix}^T \end{align*} $$

We can see that the rows of the product $\mathbf{NM}$ are the result of transforming the basis vectors of $\mathbf{M}$ by the transformation matrix $\mathbf{N}$ so matrix-matrix multiplication encodes a basis vectors transformation

Rotation followed by translation

Given the vector $\mathbf{v}$ let’s apply a rotation and a translation transform in that order

$$ \mathbf{v'} = \mathbf{TRv} $$

Let’s analyze the product $\mathbf{TR}$

$$ \mathbf{TR} = \begin{bmatrix} I_{3 \times 3} & T_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} \begin{bmatrix} R_{3 \times 3} & 0_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} = \begin{bmatrix} R_{3 \times 3} & T_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} $$

Which when multiplied by $\mathbf{v}$ results in

$$ \mathbf{v'} = \mathbf{TRv} = \begin{bmatrix} R_{3 \times 3} & T_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} \begin{bmatrix} \mathbf{v}_{3 \times 1} \\ 1 \end{bmatrix} = \begin{bmatrix} R_{3 \times 3} \mathbf{v}_{3 \times 1} + T_{3 \times 1} \\ 1 \end{bmatrix} $$

$\mathbf{v’}$ will have a compact form equal to

$$ \mathbf{v'} = \mathbf{TRv} = \mathbf{Rv} + T_{3 \times 1} $$

Translation followed by rotation

Given the vector $\mathbf{v}$ let’s apply a translation and a rotation transform in that order

$$ \mathbf{v'} = \mathbf{RTv} $$

Let’s analyze the produce $\mathbf{RT}$

$$ \mathbf{RT} = \begin{bmatrix} R_{3 \times 3} & 0_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} \begin{bmatrix} I_{3 \times 3} & T_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} = \begin{bmatrix} R_{3 \times 3} & R_{3 \times 3} T_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} $$

Which when multiplied by $\mathbf{v}$ results in

$$ \mathbf{v'} = \mathbf{TRv} = \begin{bmatrix} R_{3 \times 3} & R_{3 \times 3} T_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} \begin{bmatrix} \mathbf{v}_{3 \times 1} \\ 1 \end{bmatrix} = \begin{bmatrix} R_{3 \times 3} \mathbf{v}_{3 \times 1} + R_{3 \times 3} T_{3 \times 1} \\ 1 \end{bmatrix} $$

$\mathbf{v’}$ will have a compact form equal to

$$ \mathbf{v'} = \mathbf{RTv} = \mathbf{Rv} + \mathbf{R}T_{3 \times 1} $$

Note that both the vector $\mathbf{v}$ and the translation vector are transformed by $\mathbf{R}$

Transformations between coordinate systems

The following figure shows two coordinate system, the one with the basis vectors $\mathbf{x}, \mathbf{y}$ and $\mathbf{z}$ is the canonical coordinate system, $\mathbf{u}, \mathbf{v}$ and $\mathbf{w}$ are the basis of a nested coordinate system expressed in terms of the canonical coordinate system

coordinate systems

The value of $\mathbf{p}$ expressed in the canonical coordinate system is

$$ \mathbf{p} = x_p \mathbf{x} + y_p \mathbf{y} + z_p \mathbf{z} $$

Similarly we can express $\mathbf{p}$ with the following equation

$$ \mathbf{p} = \mathbf{e} + u_p \mathbf{u} + v_p \mathbf{v} + w_p \mathbf{w} $$

Note that both equations express $\mathbf{p}$ in terms of the canonical coordinate system, we can express the same relationship using transformations matrices as a rotation followed by a translation

$$ \begin{bmatrix} x_p \\ y_p \\ z_p \\ 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & x_e \\ 0 & 1 & 0 & y_e \\ 0 & 0 & 1 & z_e \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_u & x_v & x_w & 0 \\ y_u & y_v & y_w & 0 \\ z_u & z_v & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} u_p \\ v_p \\ w_p \\ 1 \end{bmatrix} \\ \begin{bmatrix} x_p \\ y_p \\ z_p \\ 1 \end{bmatrix} = \begin{bmatrix} x_u & x_v & x_w & x_e \\ y_u & y_v & y_w & y_e \\ z_u & z_v & z_w & z_e \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} u_p \\ v_p \\ w_p \\ 1 \end{bmatrix} $$

We can then introduce $\mathbf{p}_{uvw}$ which is the point $\mathbf{p}$ expressed in the nested coordinate system, similarly $\mathbf{p}_{xyz}$ is the same point expressed in canonical coordinate system

$$ \begin{equation} \label{frame-to-canonical} \mathbf{p}_{xyz} = \begin{bmatrix} \mathbf{u}_{3 \times 1} & \mathbf{v}_{3 \times 1} & \mathbf{w}_{3 \times 1} & \mathbf{e}_{3 \times 1} \\ 0 & 0 & 0 & 1 \end{bmatrix} \mathbf{p}_{uvw} \end{equation} $$

This is the frame-to-canonical transformation matrix for the $(u,v,w)$ coordinate space

The inverse transformation is given by a translation followed by a rotation

$$ \begin{bmatrix} u_p \\ v_p \\ w_p \\ 1 \end{bmatrix} = \begin{bmatrix} x_u & y_u & z_u & 0 \\ x_v & y_v & z_v & 0 \\ x_w & y_w & z_w & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -x_e \\ 0 & 1 & 0 & -y_e \\ 0 & 0 & 1 & -z_e \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_p \\ y_p \\ z_p \\ 1 \end{bmatrix} $$

Which is the same as finding the value of $\mathbf{p}_{uvw}$ in \eqref{frame-to-canonical}

$$ \mathbf{p}_{uvw} = \begin{bmatrix} \mathbf{u}_{3 \times 1} & \mathbf{v}_{3 \times 1} & \mathbf{w}_{3 \times 1} & \mathbf{e}_{3 \times 1} \\ 0 & 0 & 0 & 1 \end{bmatrix}^{-1} \mathbf{p}_{xyz} $$

This is the canonical-to-frame transformation matrix for the $(u,v,w)$ coordinate space

Perspective projection

Sat, 06 Feb 2016 18:00:00 +0000

As seen on projective geometry the perspective phenomenon is where an object appears to be smaller the further away is from the point of view

We can again use some concepts of projective geometry to understand perspective projection, particularly the fact that any object in our 3D world is represented in the 4D projective hyperplane by the homogeneous coordinate $(x, y, z, 1)$, now any finite point with $w \not = 1$ can be projected to the 4D hyperplane $w = 1$ by dividing each coordinate by $w$ i.e. $(\tfrac{x}{w}, \tfrac{y}{w}, \tfrac{z}{w})$, a key observation in the values of $w$ is that the higher the value of $w$ the smaller the object will be when they get projected to the $w=1$ hyperplane

Perspective is implemented in 3D by using a transformation matrix that changes the value of $w$ based on how far the object is ($z$-coordinate)

Now let’s imagine that we want to project the points that exists in our world to the plane $z = d$

perspective projection y

By similar images we can see that the projected value of the $y$ coordinate is

$$ \frac{v_y'}{d} = \frac{v_y}{v_z} \Rightarrow v_y' = \frac{d v_y}{v_z} $$

The projected value of the $x$ coordinate can be computed in a similar way

perspective projection x

$$ \frac{v_x'}{d} = \frac{v_x}{v_z} \Rightarrow v_x' = \frac{d v_x}{v_z} $$

The projected value of the $z$ coordinate is the same for all the points

$$ v_z' = d $$

Summarizing

$$ \mathbf{v'} = \begin{bmatrix} \tfrac{d v_x}{v_z} & \tfrac{d v_y}{v_z} & d \end{bmatrix}^T $$

Manipulating the last equation so that it has a common denominator

$$ \mathbf{v'} = \begin{bmatrix} \tfrac{d v_x}{v_z} & \tfrac{d v_y}{v_z} & d \tfrac{v_z}{v_z} \end{bmatrix}^T = \frac{ \begin{bmatrix} v_x & v_y & v_z \end{bmatrix}^T }{ \tfrac{v_z}{d} } $$

The point above expressed in 4D homogeneous coordinates is

$$ \mathbf{v'} = \begin{bmatrix} v_x & v_y & v_z & \tfrac{v_z}{d} \end{bmatrix}^T $$

Finally the transformation matrix that transforms $\mathbf{v}$ to $\mathbf{v’}$ is

$$ \mathbf{v'} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & \tfrac{1}{d} & 0 \end{bmatrix} \begin{bmatrix} v_x \\ v_y \\ v_z \\ 1 \end{bmatrix} = \begin{bmatrix} v_x \\ v_y \\ v_z \\ \frac{v_z}{d} \end{bmatrix} $$

Orthographic projection

Fri, 05 Feb 2016 23:15:00 +0000

Orthographic projection

A projection is a dimension-reducing operation, if we apply a scale operation with $k = 0$ all the points are projected onto the perpendicular axis in 2d or the perpendicular plane in 3d of $\unit{n}$, this type of projection is called orthographic projection

Projection on a cardinal axis/plane

The simplest type of projection just discards a coordinate of the vectors transformed, e.g. in 2d the vector $\mathbf{v} = \begin{bmatrix} v_x & v_y \end{bmatrix}^T$ projected onto the $x$ axis will discard its $y$ coordinate and make $\mathbf{v’} = \begin{bmatrix} v_x & 0 \end{bmatrix}^T$, the operation can be achieved by applying a scale transformation with $k = 0$

$$ \mathbf{P_x} = \mathbf{S} \left (\begin{bmatrix} 0 \\ 1 \end{bmatrix}, 0 \right ) = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} $$

$$ \mathbf{P_y} = \mathbf{S} \left (\begin{bmatrix} 1 \\ 0 \end{bmatrix}, 0 \right ) = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix} $$

When a 3d vector $v = [v_x, v_y, v_z]$ is projected onto the $xy$ plane then the $v_z$ coordinate will be discarded by copying just $v_x$ and $v_y$ i.e. $v’ = [v_x, v_y, 0]$

$$ \mathbf{P_{xy}} = \mathbf{S} \left (\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}, 0 \right ) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix} $$

$$ \mathbf{P_{xz}} = \mathbf{S}\left (\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, 0 \right ) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$

$$ \mathbf{P_{yz}} = \mathbf{S} \left (\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, 0 \right ) = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$

Projection onto an arbitrary axis/plane

We can apply a zero factor scale along the direction of the vector perpendicular to the axis/plane

In 2d

$$ \begin{align*} \mathbf{P(\unit{n})} = \mathbf{S}(\unit{n}, 0) &= \begin{bmatrix} 1 + (0 - 1){n_x}^2 & (0 - 1)n_xn_y \\ (0 - 1)n_xn_y & 1 + (0 - 1{n_y}^2 \end{bmatrix} \\ \\ &= \begin{bmatrix} 1 - {n_x}^2 & -n_xn_y \\ -n_xn_y & 1 - {n_y}^2 \end{bmatrix} \end{align*} $$

In 3d

$$ \begin{align*} \mathbf{P(\unit{n})} = \mathbf{S}(\unit{n}, 0) &= \begin{bmatrix} 1 + (0 - 1){n_x}^2 & (0 - 1)n_yn_x & (0 - 1)n_zn_x \\ (0 - 1)n_xn_y & 1 + (0 - 1{n_y}^2 & (0 - 1)n_zn_y \\ (0 - 1)n_xn_z & (0 - 1)n_yn_z & 1 + (0 - 1){n_z}^2 \end{bmatrix} \\ \\ &= \begin{bmatrix} 1 - {n_x}^2 & -n_yn_x & -n_zn_x \\ -n_xn_y & 1 - {n_y}^2 & -n_zn_y \\ -n_xn_z & -n_yn_z & 1 - {n_z}^2 \\ \end{bmatrix} \end{align*} $$

Translating objects with a Transformation Matrix

Fri, 05 Feb 2016 18:00:00 +0000

This article is part 4 in the series about transformation matrices:

2D translation

A translation is an affine transformation which is a linear transformation followed by some displacement

$$ \mathbf{v'} = \mathbf{Mv} + \mathbf{b} $$

Even though we can’t express 2D translation using a 2x2 matrix, we can express such a transformation as a shearing transformation in 3D projective geometry , to do so we have to imagine that the 2D Euclidean world exists as the plane $w = 1$ in a 3D space, under this geometry any point has the form $\begin{bmatrix} x & y & 1 \end{bmatrix}$

In Euclidean geometry a vector expressed as a linear combination of the standard basis has the form

$$ \mathbf{v} = v_x \unit{i} + v_y \unit{j} = \begin{bmatrix} v_x & v_y\end{bmatrix}^T $$

In Projective geometry a vector which exists in the plane $w = 1$ has the form

$$ \mathbf{v} = v_x \unit{i} + v_y \unit{j} + 1 \unit{w} = \begin{bmatrix} v_x & v_y & 1 \end{bmatrix}^T $$

This basis can be represented using the following transformation matrix

$$ \mathbf{M} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} \mathbf{\cuv{i}} & \mathbf{\cuv{j}} & \mathbf{\cuv{w}} \end{bmatrix} $$

The translation transform then can be seen in Projective geometry as a simple shearing of the space by the coordinate $w$, using the shearing transform $\mathbf{H_{xy}}(\Delta{x}, \Delta{y})$ to transform a point $v$

$$ \mathbf{v'} = \mathbf{H_{xy}}(\Delta{x},\Delta{y}) \mathbf{v} = \begin{bmatrix} 1 & 0 & \Delta{x} \\ 0 & 1 & \Delta{y} \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} v_x \\ v_y \\ 1 \end{bmatrix} = \begin{bmatrix} v_x + \Delta{x} \\ v_y + \Delta{y} \\ 1 \end{bmatrix} $$

Now that we’re using perspective geometry to represent entities, let’s imagine a point $p = \begin{bmatrix} x & y & 0 \end{bmatrix}$ (a point that lies in the plane $w = 0$), whenever this point is transformed by a transformation matrix we can notice that the translation components of the matrix are cancelled because of $w = 0$, we can take advantage of this fact and represent vectors with this notation.

Let $v_{\infty}$ be a point located in the plane $w = 0$, applying the shearing operation $\mathbf{H_{xy}}(s, t)$ results in

$$ \mathbf{v_{\infty}'} = H_{xy}(\Delta{x},\Delta{y}) \mathbf{v_{\infty}} = \begin{bmatrix} 1 & 0 & \Delta{x} \\ 0 & 1 & \Delta{y} \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} v_x \\ v_y \\ 0 \end{bmatrix} = \begin{bmatrix} v_x \\ v_y \\ 0 \end{bmatrix} $$

It’s important to note that this matrix multiplication is still a linear transformation and that this trick of translating 2D points is actually a shearing of the 3D projective plane

3D translation

Similarly to 2D a 3D translation can be represented as a shearing of the 4D projective hyperplane which has the form

$$ \mathbf{T} = \mathbf{H_{xyz}}(\Delta{x},\Delta{y},\Delta{z}) = \begin{bmatrix} 1 & 0 & 0 & \Delta{x} \\ 0 & 1 & 0 & \Delta{y} \\ 0 & 0 & 1 & \Delta{z} \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

When a 4D vector existing on the hyperplane $w = 1$ is transformed with this matrix the result is

$$ \mathbf{v'} = \mathbf{H_{xyz}}(\Delta{x},\Delta{y},\Delta{z})\mathbf{v} = \begin{bmatrix} 1 & 0 & 0 & \Delta{x} \\ 0 & 1 & 0 & \Delta{y} \\ 0 & 0 & 1 & \Delta{z} \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} v_x \\ v_y \\ v_z \\ 1 \end{bmatrix} = \begin{bmatrix} v_x + \Delta{x} \\ v_y + \Delta{y} \\ v_z + \Delta{z} \\ 1 \end{bmatrix} $$

The general 3D translation matrix is then denoted as

$$ \begin{equation} \label{general-translation-matrix} \mathbf{T} = \begin{bmatrix} 1 & 0 & 0 & T_x \\ 0 & 1 & 0 & T_y \\ 0 & 0 & 1 & T_z \\ 0 & 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} I_{3 \times 3} & T_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{bmatrix} \end{equation} $$

Euler angles

Fri, 05 Feb 2016 13:00:00 +0000

Euler angles are three angles used to describe the orientation of a rigid body, they are typically denoted $\alpha, \beta, \gamma$, these angles represent a sequence of three elemental rotations about the axes of some coordinate system

Intrinsic and extrinsic rotations

Intrinsic rotations

A set of intrinsic rotations represent rotations relative to the object space which changes after each rotation

If the axes of some coordinate system are $X,Y,Z$ (note that initially the axes are aligned with the axes of a fixed coordinate system $x,y,z$), one of the most conventional set of intrinsic rotations is $z-x’-z’’$, it’s computed as follows

perform a rotation of $\alpha$ around the $z$-axis, the resulting set of axes is $x’, y’, z’$ (note that $z’ = z$)
perform a rotation of $\beta$ around the $x’$-axis, the resulting set of axes is $x’’, y’’, z’’$ (note that $x’’ = x’$)
perform a rotation of $\gamma$ around the $z’’$-axis, the resulting set of axes is $x’’’, y’’’, z’’’$ (note that $z’’’ = z’’$ and that the object space $z$-axis is used twice in the overall rotation)

intrinsic rotation $z-x'-z''$, note that the $+z$-axis points upward, the $+x$-axis points left and the $+y$-axis point right (all shown in blue), the rotated system $X,Y,Z$ is shown in red

A rotation matrix (used to pre-multiply column vectors) can be used to represent a sequence of intrinsic rotations, for example the extrinsic rotations $x-y’-z’’$ with angles $\alpha, \beta, \gamma$ are represented as a multiplication of the following rotation matrices

$$ \mathbf{R} = \mathbf{X}(\alpha)\mathbf{Y}(\beta)\mathbf{Z}(\gamma) $$

Where $\mathbf{X}(\alpha)$, $\mathbf{Y}(\beta)$ and $\mathbf{Z}(\gamma)$ are rotation matrices that represent a rotation around the $x$-axis by $\alpha$, around the $y$-axis by $\beta$ and around the $z$-axis by $\gamma$ respectively

Extrinsic rotations

A set of extrinsic rotations represent rotations relative to a fixed coordinate system (typically the world coordinate system), for example the set of extrinsic rotations $z-x-z$ works as follows

perform a rotation of $\alpha$ around the $z$-axis, the resulting set of axes is $x’, y’, z’$ (note that $z’ = z$)
perform a rotation of $\beta$ around the $x$-axis, the resulting set of axes is $x’’, y’’, z’'$
perform a rotation of $\gamma$ around the $z$-axis, the resulting set of axes is $x’’’, y’’’, z’’'$

A rotation matrix (used to pre-multiply column vectors) can be used to represent a sequence of intrinsic rotations, for example the extrinsic rotations $x-y-z$ with angles $\alpha, \beta, \gamma$ are represented as a multiplication of the following rotation matrices

$$ \mathbf{R} = \mathbf{Z}(\gamma)\mathbf{Y}(\beta)\mathbf{X}(\alpha) $$

Conversion between intrinsic rotations and extrinsic rotations

Any intrinsic rotation is equivalent to an extrinsic rotation by the same angles but with inverted order of rotations

For example the intrinsic rotations $x-y’-z’’$ by the angles $\alpha,\beta,\gamma$ are equivalent to the extrinsic rotations $z-y-x$ by the angles $\gamma,\beta,\alpha$, both represented by

$$ \mathbf{R} = \mathbf{X}(\alpha)\mathbf{Y}(\beta)\mathbf{Z}(\gamma) $$

Proper Euler angles

A sequence of three elemental rotations are called proper Euler angles when the first and third rotation axes are the same

Proper Euler angles representing rotations about $z-x'-z''$ by the angles $\alpha, \beta, \gamma$, the rotated system $X,Y,Z$ is shown in red

There are six possibilities of choosing the rotation axes for proper Euler angles which are intrinsic rotations in a similar way there are other six other possibilities of choosing the rotation axes which are extrinsic rotations

intrinsic rotations	extrinsic rotations
$x-y’-x’'$	$x-y-x$
$x-z’-x’'$	$x-z-x$
$y-x’-y’'$	$y-x-y$
$y-z’-y’'$	$y-z-y$
$z-x’-z’'$	$z-x-z$
$z-y’-z’'$	$z-y-z$

Tait-Bryan angles

A sequence of three elemental rotations are called Tail-Bryan angles when the angles represent rotations about three distinct axes

Just like proper Euler angles there are 6 possible intrinsic rotations and 6 possible extrinsic rotations

intrinsic rotations	extrinsic rotations
$x-y’-z’'$	$z-y-x$
$x-z’-y’'$	$y-z-x$
$y-x’-z’'$	$z-x-y$
$y-z’-x’'$	$x-z-y$
$z-x’-y’'$	$y-x-z$
$z-y’-x’'$	$x-y-z$

The set of intrinsic rotations $z-y’-x’’$ is known as yaw, pitch and roll, these angles are also known as nautical angles because they can describe the orientation of a ship or aircraft

Tait–Bryan angles representing the sequence $z-y'-x''$

The rotation matrix for the sequence $z-y’-x’’$ (or $x-y-z$) which is known as yaw, pitch and roll is given by

$$ \begin{align*} \mathbf{R} = \mathbf{Z}(\alpha)\mathbf{Y}(\beta)\mathbf{X}(\gamma) \end{align*} $$

Extrinsic rotations expressed in upright space

An important thing to note is that the standard rotation matrices work in upright space, if the object space axes are not aligned with the upright space axes (different direction) then the sequence of extrinsic rotations must be done on the axes expressed in upright space

For example given that world space is

Chosen world space $+x$ (right), $+y$ (up) and $+z$ (backward), note that the choice is just personal preference

If there’s an object whose object space axes $+x$ (backward), $+y$ (right) and $+z$ (up) then a sequence of intrinsic rotations $z-y’-x’’$ by the angles $\alpha, \beta, \gamma$ (equivalent to the extrinsic rotation $x-y-z$ by the angles $\gamma, \beta, \alpha$ which is also known as yaw, pitch and roll) is equivalent to the multiplication of the following rotation matrices

$$ \mathbf{R} = \mathbf{R}(\mathbf{w}, \alpha)\mathbf{R}(\mathbf{v}, \beta)\mathbf{R}(\mathbf{u}, \gamma) $$

Where

$\mathbf{R}(\mathbf{s}, t)$ is the general rotation matrix used to rotate around the axis $\mathbf{s}$ by the angle $t$
$\mathbf{u, v, w}$ are the columns of the transformation matrix used to transform any point $\mathbf{p}$ expressed in object space to upright space

$$ \begin{align*} \mathbf{p}_{upright} &= \mathbf{M}_{upright \leftarrow object} \mathbf{p}_{object} \\ &= \begin{bmatrix} \mathbf{u}_{3 \times 1} \mathbf{v}_{3 \times 1} \mathbf{w}_{3 \times 1}\end{bmatrix} \mathbf{p}_{object} \end{align*} $$

The problem can be simplified when frame is somewhat aligned with the upright space (the order might be different and the axis directions might be reversed but it’s still aligned), the following diagram shows some of these simplifications

Description	Intrinsic/Extrinsic rotations	Equivalence in world space
yaw, pitch, roll	$$ \begin{align} y-x'-z'' \\ z-x-y \end{align} $$	$$ \begin{align} \mathbf{Y}(\alpha) \\ \mathbf{X}(\beta) \\ \mathbf{Z}(\gamma) \end{align} $$
yaw, pitch, roll	$$ \begin{align} z-y'-x'' \\ x-y-z \end{align} $$	$$ \begin{align} \mathbf{Z}(\alpha) \equiv \mathbf{Y_{wld}}(\alpha) \\ \mathbf{Y}(\beta) \equiv \mathbf{X_{wld}}(\beta) \\ \mathbf{X}(\gamma) \equiv \mathbf{Z_{wld}}(\gamma) \end{align} $$
yaw, pitch, roll	$$ \begin{align} y-x'-z'' \\ z-x-y \end{align} $$	$$ \begin{align} \mathbf{Y}(\alpha) \equiv \mathbf{Y_{wld}}(-\alpha) \\ \mathbf{X}(\beta) \equiv \mathbf{X_{wld}}(-\beta) \\ \mathbf{Z}(\gamma) \equiv \mathbf{Z_{wld}}(\gamma) \end{align} $$

Equivalence of common extrinsic rotations in world space

Shearing objects with a Transformation Matrix

Fri, 05 Feb 2016 10:00:00 +0000

This article is part 3 in the series about transformation matrices:

2D shearing

In 2D we can skew points towards the $x$ axis by making $x’ = x + sy$, if $s > 0$ then points will skew towards the positive $x$-axis, if $s < 0$ points will move towards the negative $x$-axis

The transformation matrix that skews points towards the $x$ axis is

$$ \begin{equation} \label{2d-shear-x} \mathbf{H_x}(s) = \begin{bmatrix} 1 & s \\ 0 & 1 \end{bmatrix} \end{equation} $$

Towards the $y$ axis is

$$ \begin{equation} \label{2d-shear-y} \mathbf{H_y}(s) = \begin{bmatrix} 1 & 0 \\ s & 1 \end{bmatrix} \end{equation} $$

For example a vector $\mathbf{v}$ multiplied by \eqref{2d-shear-x} results in

$$ \mathbf{v'} = \mathbf{H_x}(s)\mathbf{v} = \begin{bmatrix} 1 & s \\ 0 & 1 \end{bmatrix} \begin{bmatrix} v_x \\ v_y \end{bmatrix} = \begin{bmatrix} v_x + sv_y \\ v_y \end{bmatrix} $$

3D shearing

The notation $\mathbf{H_{xy}}$ indicates that the $x$ and $y$ coordinates are shifted by the other coordinate $z$ i.e.

$$ \begin{align*} x' &= x + sz \\ y' &= y + tz \\ z' &= z \end{align*} $$

The shearing matrices in 3D are

$$ \begin{equation} \label{shear-xy} \mathbf{H_{xy}}(s,t) = \begin{bmatrix} 1 & 0 & s \\ 0 & 1 & t \\ 0 & 0 & 1 \end{bmatrix} \end{equation} $$

$$ \begin{equation} \label{shear-xz} \mathbf{H_{xz}}(s,t) = \begin{bmatrix} 1 & s & 0 \\ 0 & 1 & 0 \\ 0 & t & 1 \end{bmatrix} \end{equation} $$

$$ \begin{equation} \label{shear-yz} \mathbf{H_{yz}}(s,t) = \begin{bmatrix} 1 & 0 & 0 \\ s & 1 & 0 \\ t & 0 & 1 \end{bmatrix} \end{equation} $$

For example a vector $\mathbf{v}$ multiplied by \eqref{shear-xy} results in

$$ \mathbf{v'} = \mathbf{H_{xy}}(s,t) \mathbf{v} = \begin{bmatrix} 1 & 0 & s \\ 0 & 1 & t \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} v_x \\ v_y \\ v_z \end{bmatrix} = \begin{bmatrix} v_x + sv_z \\ v_y + tv_z \\ v_z \end{bmatrix} $$

Introduction to rotation for computer graphics

Tue, 15 Dec 2015 13:00:00 +0000

2d rotation

A 2d rotation has only one parameter $\theta$, when the basis vectors $\unit{i} = [1, 0]$ and $\unit{j} = [0, 1]$ are rotated by an angle $\theta$

$$ \mbold{p} = \cos{\theta} \unit{i} + \sin{\theta} \unit{j} \\ \mbold{q} = -\sin{\theta} \unit{i} + \cos{\theta} \unit{j} $$

Which builds the rotation matrix

$$ \mbold{R}(\theta) = \begin{bmatrix} \textbf{p} \\ \textbf{q} \end{bmatrix} = \begin{bmatrix} \cos{\theta} & \sin{\theta} \\ -\sin{\theta} & \cos{\theta} \end{bmatrix} $$

When a vector $\mbold{v}$ is transformed by this matrix we know that the vector will be a linear combination of the basis which are $\mbold{p}$ and $\mbold{q}$

$$ \begin{align*} \mbold{v'} = \mbold{vR}(\theta) &= v_x \mbold{p} + v_y \mbold{q} \\ &= v_x \begin{bmatrix}\cos{\theta} & \sin{\theta}\end{bmatrix} + v_y \begin{bmatrix}-\sin{\theta} & \cos{\theta}\end{bmatrix} \\ &= \begin{bmatrix} v_x \cos{\theta} - v_y \sin{\theta} \\ v_x \sin{\theta} + v_y \cos{\theta} \end{bmatrix}^T \end{align*} $$

Using a matrix to encode this operation

$$ \begin{align*} \mbold{v'} = \mbold{vR}(\theta) &= \begin{bmatrix}v_x & v_y\end{bmatrix} \begin{bmatrix} \cos{\theta} & \sin{\theta} \\ -\sin{\theta} & \cos{\theta} \end{bmatrix} \\ &= \begin{bmatrix} v_x \cos{\theta} - v_y \sin{\theta} \\ v_x \sin{\theta} + v_y \cos{\theta} \end{bmatrix}^T \end{align*} $$

3d rotation

About cardinal axes

$$ \mathbf{R_x}(\alpha) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & \cos{\alpha} & -\sin{\alpha} \\ 0 & \sin{\alpha} & \cos{\alpha} \end{bmatrix} $$

$$ \mathbf{R_y}(\beta) = \begin{bmatrix} \cos{\beta} & 0 & \sin{\beta} \\ 0 & 1 & 0 \\ -\sin{\beta} & 0 & \cos{\beta} \end{bmatrix} $$

$$ \mathbf{R_z}(\gamma) = \begin{bmatrix} \cos{\gamma} & -\sin{\gamma} & 0 \\ \sin{\gamma} & \cos{\gamma} & 0 \\ 0 & 0 & 1 \end{bmatrix} $$

About an arbitrary axis

Given an axis $\unit{n}$ and an amount of rotation around it $\theta$ our goal is to find a rotation matrix that rotates about $\unit{n}$ by th angle $\theta$

$$ \mathbf{v'} = \mathbf{R}(\unit{n}, \theta)\mathbf{v} $$

The basic idea is to solve this problem in a plane perpendicular to $\unit{n}$ which becomes a 2d problem

Separate $\mbold{v}$ in two vectors, a vector parallel to $\unit{v}$ called $\mbold{v_{\parallel}}$ and a vector perpendicular to $\unit{v}$ called $\mbold{v_{\perp}}$ such that $\mbold{v_{\parallel}} + \mbold{v_{\perp}} = \mbold{v}$

$$ \begin{align*} \mbold{v_{\parallel}} &= (\mbold{v} \cdot \unit{n}) \unit{n} \\ \mbold{v_{\perp}} &= \mbold{v} - \mbold{v_{\parallel}} \end{align*} $$

After the rotation it’s obvious that the $\mbold{v_{\parallel}}$ component will be the same and only the vector $\mbold{v_{\perp}}$ will be rotated

A plane can be defined with two vectors that lie on it, since we have $\mbold{v_{\perp}}$ and we also know the normal of the plane (which is $\unit{n}$) any vector perpendicular to both vectors will also lie in the plane, we can use the cross product to find this vector

$$ \mbold{w} = \unit{n} \times \mbold{v_{\perp}} $$

The length of $\mbold{w}$ is

$$ \begin{align*} \left \| \mbold{w} \right \| &= \left \| \unit{n} \right \| \left \| \mbold{v_{\perp}}\right \| \sin{\deg{90}} \\ &= \left \| \mbold{v_{\perp}}\right \| \end{align*} $$

Which means that $\mbold{w}$ has the same length as $\mbold{v_{\perp}}$, note that even though they have the same length they don’t necessarily have unit length

$\mbold{w}$ and $\mbold{v_{\perp}}$ form now a 2d coordinate space where the $x$-axis is $\mbold{v_{\perp}}$ and the $y$-axis is $\mbold{v_{\perp}}$

Let $\mbold{v_{\perp}’}$ be a vector that is the result of rotating $\mbold{v_{\perp}}$ by an angle $\theta$, we can find the projection of it onto the $x$-axis and the $y$-axis as follows

$$ \begin{align*} \mbold{v_{\perp,x}'} &= (\magnitude{ \mbold{v_{\perp}} } \cos{\theta}) \unit{v_{\perp}} = \cos{\theta} \mbold{v_{\perp}}\\ \mbold{v_{\perp,y}'} &= (\magnitude{ \mbold{v_{\perp}} } \sin{\theta}) \unit{w} = \sin{\theta} \mbold{w} \end{align*} $$

Expressing $\mbold{v_{\perp}’}$ as a linear combination of the basis

$$ \mbold{v_{\perp}'} = \cos{\theta} \mbold{v_{\perp}} + \sin{\theta} \mbold{w} $$

Reconstructing the solution from the observations above

$$ \begin{align*} \mbold{v_{\parallel}} &= (\mbold{v} \cdot \unit{n}) \unit{n} \\ \mbold{v_{\perp}} &= \mbold{v} - \mbold{v_{\parallel}} \\ &= \mbold{v} - (\mbold{v} \cdot \unit{n}) \unit{n} \\ \mbold{w} &= \unit{n} \times \mbold{v_{\perp}} \\ &= \unit{n} \times (\mbold{v} - \mbold{v_{\parallel}}) \\ &= \unit{n} \times \mbold{v} - \unit{n} \times \mbold{v_{\parallel}} \\ &= \unit{n} \times \mbold{v} \end{align*} $$

Finally

$$ \begin{align} \mbold{v'} &= \mbold{v_{\perp}'} + \mbold{v_{\parallel}'} \nonumber \\ &= \cos{\theta} \mbold{v_{\perp}} + \sin{\theta} \mbold{w} + (\mbold{v} \cdot \unit{n}) \unit{n} \nonumber \\ &= \cos{\theta} (\mbold{v - (\mbold{v} \cdot \unit{n}) \unit{n}}) + \sin{\theta} (\unit{n} \times \mbold{v}) + (\mbold{v} \cdot \unit{n}) \unit{n} \nonumber \\ &= \cos{\theta} \mbold{v} - \cost (\mathbf{v} \cdot \unit{n}) \unit{n} + \sin{\theta} (\unit{n} \times \mbold{v}) + (\mbold{v} \cdot \unit{n}) \unit{n} \nonumber \\ &= \cos{\theta} \mbold{v} + \sin{\theta} (\unit{n} \times \mbold{v}) + (1 - \cost)(\mathbf{v} \cdot \unit{n}) \unit{n} \label{3d-rotation} \end{align} $$

Now we can compute what the basis vectors are after the transformation above (by using each of the basis vectors as $\mbold{v}$ on \eqref{3d-rotation}) to construct a rotation matrix

$$ \begin{align*} \mbold{p} &= \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \quad \quad \quad \quad \mbold{p'} = \begin{bmatrix} n_x^2(1 - \cost) + \cost \\ n_xn_y(1 - \cost) + n_z \sint \\ n_xn_z(1 - \cost) - n_z \sint \end{bmatrix}\\ \\ \mbold{q} &= \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} \quad \quad \quad \quad \mbold{q'} = \begin{bmatrix} n_yn_x(1 - \cost) - n_z \sint \\ n_y^2(1 - \cost) + \cost \\ n_yn_z(1 - \cost) + n_x \sint \end{bmatrix}\\ \\ \mbold{r} &= \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \quad \quad \quad \quad \mbold{r'} = \begin{bmatrix} n_zn_x(1 - \cost) + n_y \sint \\ n_zn_y(1 - \cost) - n_x \sint \\ n_z^2(1 - \cost) + \cost \end{bmatrix}\\ \end{align*} $$

Constructing the matrix from these vectors

$$ \mathbf{R}(\unit{n}, \theta) = \begin{bmatrix} \mbold{p'} & \mbold{q'} & \mbold{r'} \end{bmatrix} $$

3d rotations using quaternions

A complex rotor is a unit norm complex number which rotates another complex number by the angle $\theta$ and has the form

$$ e^{i\theta} = \cost + i \sint $$

Hamilton had hoped that a unit-norm quaternion $q$ could be used to rotate a vector which is stored as a pure quaternion $p$, the unit norm quaternion is given by

$$ \begin{align} q &= [s, \lambda \unit{n}] \quad s,\lambda \in \mathbb{R}, \unit{n} \in \mathbb{R}^3 \label{unit-norm-quaternion}\\ \left | \unit{n} \right | &= 1 \nonumber \\ s^2 + \lambda^2 &= 1 \nonumber \end{align} $$

$$ p = [0, \mbold{v}] \quad \mbold{v} \in \mathbb{R}^3 $$

Let’s compute the product $p’ = qp$

$$ \begin{align} p' &= qp \nonumber \\ &= [s, \lambda \unit{n}][0, \mathbf{v}] \nonumber \\ &= [-\lambda \unit{n} \cdot \mathbf{v}, s \mathbf{v} + \lambda \unit{n} \times \mathbf{v}] \label{p-prime} \end{align} $$

Special case

What if $\unit{n}$ is perpendicular to $\mathbf{v}$? Then the scalar quantity of \eqref{p-prime} is zero and we are left with the pure quaternion

$$ \begin{equation} \label{p-prime-perpendicular} p' = [0, s \mathbf{v} + \lambda \unit{n} \times \mathbf{v}] \quad\quad \text{given that $\unit{n}$ is perpendicular to $\mathbf{n}$} \end{equation} $$

Let’s analyze the vector part of $\eqref{p-prime-perpendicular}$ (which is now a 3d entity because it’s a pure quaternion), since $\unit{n}$ is perpendicular to $\mathbf{v}$ then the vector $\unit{n} \times \mathbf{v}$ will have a norm equal to $\magnitude{ \unit{n} \times \mathbf{v} } = \magnitude{ \unit{n} } \magnitude { \mathbf{v} } \sin{\deg{90}}$ and also since $\unit{n}$ is a unit vector then $\magnitude{\unit{n} \times \mathbf{v}} = \magnitude{\mathbf{v}}$ which means that we have two orthogonal vectors with the same length

To rotate the vector $\mathbf{v}$ about $\unit{n}$ let’s transform $\mathbf{v}$ to the 2d space whose basis vectors are $\mathbf{v}$ and $\unit{n} \times \mathbf{v}$ and perform the rotation there which is trivially $[\cost, \sint]$, therefore all we have to do in \eqref{p-prime-perpendicular} is make the scalar quantities multiplying each vector equal the projection of the rotated vector over the basis

$$ p' = [0, \cost \mathbf{v} + \sint \unit{n} \times \mathbf{v}] $$

Which makes the quaternion $\mathbf{q}$ have the form

$$ \begin{align} q &= [\cost, \sint \unit{n}] \label{perp-rotor} \end{align} $$

And it acts as a rotor only when $\unit{n}$ is perpendicular to $\mathbf{v}$

Important notes/facts about orthogonal quaternions

If $q$ is a rotor about the unit vector $\unit{n}$ by an angle $\theta$ whose vector term is perpendicular to the pure quaternion $p$
- $qp$ and $pq^{-1}$ rotate $p$ by an angle $\theta$ about $\unit{n}$
- $pq$ and $q^{-1}p$ rotate $p$ by an angle $-\theta$ about $\unit{n}$
- Each of these products leave $p’$ unscaled (because $q$ is a unit norm quaternion)

General case

Let’s use \eqref{unit-norm-quaternion} as the starting point, note that this time its vector part it’s not necessarily perpendicular to the pure quaternion $p$, the product $qp$ yields

$$ \begin{align*} qp &= [s, \lambda \unit{n}][0, \mathbf{v}] \\ &= [-\lambda \unit{n} \cdot \mathbf{v}, s \mathbf{v} + \lambda \unit{n} \times \mathbf{v}] \end{align*} $$

Note that the term $-\lambda \unit{n} \cdot \mathbf{v}$ does not vanish since for the general case $\unit{n}$ and $\mathbf{v}$ are no longer perpendicular, what’s more important is that the product $qp$ is no longer a pure quaternion, multiplying a vector by a non-orthogonal quaternion has converted some of the vector information into the quaternion’s scalar component

What happens if we post-multiply $qp$ by $q^{-1}$, could it reverse the operation? (Note that since $q$ is a norm quaternion $q^{-1} = q^*$)

$$ qpq^{-1} = [-\lambda \unit{n} \cdot \mathbf{v}, s \mathbf{v} + \lambda \unit{n} \times \mathbf{v}][s, -\lambda \unit{n}] $$

Let’s first check if doing this multiplication makes the scalar component vanish

$$ \begin{align*} qpq^{-1} &= [-\lambda s \unit{n} \cdot \mathbf{v} - (s \mathbf{v} + \lambda \unit{n} \times \mathbf{v}) \cdot (-\lambda \unit{n}), \ldots] \\ &= [-\lambda s \unit{n} \cdot \mathbf{v} + (s \mathbf{v}) \cdot (\lambda \unit{n}) + (\lambda \unit{n} \times \mathbf{v}) \cdot (\lambda \unit{n}), \ldots] \\ &= [-\lambda s \unit{n} \cdot \mathbf{v} + (s \mathbf{v}) \cdot (\lambda \unit{n}) + 0, \ldots] \quad \text {since $\unit{n}$ is perpendicular to $\unit{n} \times \mathbf{v}$ }\\ &= [-\lambda s \unit{n} \cdot \mathbf{v} + \lambda s \mathbf{v} \cdot \unit{n}), \ldots] \\ &= [0, \ldots] \end{align*} $$

Indeed it magically made the scalar component vanish! Now let’s look at the vector component of $qpq^{-1}$

$$ \begin{align*} qpq^{-1} &= [0, s (s \mathbf{v} + \lambda \unit{n} \times \mathbf{v}) + (-\lambda \unit{n} \cdot \mathbf{v})(-\lambda \unit{n}) + (s \mathbf{v} + \lambda \unit{n} \times \mathbf{v}) \times (-\lambda \unit{n})] \\ &= [0, s^2 \mathbf{v} + s \lambda (\unit{n} \times \mathbf{v}) + \lambda^2 (\unit{n} \cdot \mathbf{v})\unit{n} - s \lambda (\mathbf{v} \times \unit{n}) - \lambda^2 (\unit{n} \times \mathbf{v} \times \unit{n})] \end{align*} $$

Let’s expand the cross product

$$ (\unit{n} \times \mathbf{v}) \times \unit{n} = (\unit{n} \cdot \unit{n}) \mathbf{v} - (\mathbf{v} \cdot \unit{n}) \unit{n} = \mathbf{v} - (\mathbf{v} \cdot \unit{n}) \unit{n} $$

Therefore

$$ \begin{align*} qpq^{-1} &= [0, s^2 \mathbf{v} + s \lambda (\unit{n} \times \mathbf{v}) + \lambda^2 (\unit{n} \cdot \mathbf{v})\unit{n} - s \lambda (\mathbf{v} \times \unit{n}) - \lambda^2 (\mathbf{v} - (\mathbf{v} \cdot \unit{n}) \unit{n})] \\ &= [0, s^2 \mathbf{v} + s \lambda (\unit{n} \times \mathbf{v}) + \lambda^2 (\unit{n} \cdot \mathbf{v})\unit{n} - s \lambda (\mathbf{v} \times \unit{n}) - \lambda^2 \mathbf{v} + \lambda^2 (\mathbf{v} \cdot \unit{n}) \unit{n})] \\ &= [0, s^2 \mathbf{v} + 2 s \lambda (\unit{n} \times \mathbf{v}) + \lambda^2 (\unit{n} \cdot \mathbf{v})\unit{n} - \lambda^2 \mathbf{v} + \lambda^2 (\mathbf{v} \cdot \unit{n}) \unit{n})] \\ &= [0, (s^2 - \lambda^2) \mathbf{v} + 2 s \lambda (\unit{n} \times \mathbf{v}) + 2 \lambda^2 (\mathbf{v} \cdot \unit{n}) \unit{n}] \\ \end{align*} $$

Let’s make $s = \cost$ and $\lambda = \sint$ just like in \eqref{perp-rotor} (it worked as a rotor when it was orthogonal to $p$, it might work with the general case too)

$$ qpq^{-1} = [0, (\cos^2{\theta} - \sin^2{\theta}) \mathbf{v} + 2 \cost \sint (\unit{n} \times \mathbf{v}) + 2 \sin^2{\theta} (\mathbf{v} \cdot \unit{n}) \unit{n}] $$

Which involves double-angle terms, replacing these terms with double angle-identities

$$ qpq^{-1} = [0, \cos{2\theta} \mathbf{v} + \sin{2\theta} (\unit{n} \times \mathbf{v}) + (1 - \cos{2\theta}) (\mathbf{v} \cdot \unit{n}) \unit{n}] $$

The product created a pure quaternion equal to $\mathbf{v}$ rotated by an angle $2\theta$, if we want to rotate $\mathbf{v}$ by an angle $\theta$ we must build a half angle $\theta$ quaternion $q$ (note above that $q$ was equal to \eqref{perp-rotor})

$$ \begin{equation} \label{rotor} q = [\cos{\frac{1}{2}\theta}, \sin{\frac{1}{2}\theta}\unit{n}] \end{equation} $$

Using \eqref{rotor} the product is

$$ qpq^{-1} = [0, \cos{\theta} \mathbf{v} + \sin{\theta} (\unit{n} \times \mathbf{v}) + (1 - \cos{\theta}) (\mathbf{v} \cdot \unit{n}) \unit{n}] $$

Note that the vector part of $qpq^{-1}$ is identical to \eqref{3d-rotation}

Quaternion difference and dot product

Let $a$ and $b’$ be two unit norm quaternions (rotors that have the same form as \eqref{rotor}), the quaternion to rotate from $a$ to $b$ is given by $da = b$ and is known as quaternion difference, finding the value of $d$ given that we know $a$ and $b$

$$ \begin{align*} da &= b \\ d(aa^*) &= ba^* \quad \quad \text{since $a$ is a unit norm quaternion its inverse is equal to its conjugate} \\ d &= ba^* \end{align*} $$

Expanding the product

$$ \begin{align*} d &= [s_b, \mathbf{b}][s_a, -\mathbf{a}] \\ &= [s_bs_a + \mathbf{b} \cdot \mathbf{a}, -s_b\mathbf{a} + s_a\mathbf{b} - \mathbf{b} \times \mathbf{a}] \end{align*} $$

Note that the scalar part of this quaternion is equal to the inner product (a generalization of the dot product to abstract vector spaces) between two quaternions

$$ d = [\left \langle a, b \right \rangle, -s_b\mathbf{a} + s_a\mathbf{b} - \mathbf{b} \times \mathbf{a}] $$

Remembering that a rotor is given by \eqref{rotor} we can relate the inner product between rotor quaternions with the scalar quantity of \eqref{rotor} and interpret it geometrically just like the dot product between two vectors in 3d/2d space but this time noticing that the dot product gives the cosine of half the angle between the quaternions

$$ a \cdot b = \cos{\frac{\theta}{2}} $$

This means that the angle between $a$ and $b$ is equal to

$$ \theta = 2 \arccos{(a \cdot b)} $$

Or using the half angle formulas

$$ \begin{align*} \cos^2{\frac{\theta}{2}} &= \frac{1}{2}(1 + \cos{\theta}) \\ (a \cdot b)^2 &= \frac{1}{2}(1 + \cos{\theta}) \\ \cost &= 2 (a \cdot b)^2 - 1 \\ \theta &= \arccos(2(a \cdot b)^2 - 1) \end{align*} $$

The second formula works for all the cases as noted here (the first one doesn’t work when $a \cdot b < 0$)

Scaling objects with a Transformation Matrix

Tue, 20 Oct 2015 13:30:00 +0000

This article is part 2 in the series about transformation matrices:

Scaling along the cardinal axes

Intuitively the basis vectors should be multiplied by an scalar, also they are independently affected by the scale factors

In 2D the basis vectors become

$$ \mathbf{p'} = k_x \mathbf{p} = k_x \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} k_x \\ 0 \end{bmatrix} \\ \mathbf{q'} = k_y \mathbf{q} = k_y \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ k_y \end{bmatrix} $$

Constructing the 2D scale matrix $\mathbf{S}(k_x, k_y)$ from these basis vectors

$$ \mathbf{S}(k_x, k_y) = \begin{bmatrix} k_x & 0 \\ 0 & k_y \end{bmatrix} $$

Similarly the 3D scale matrix is given by

$$ \mathbf{S}(k_x, k_y, k_z) = \begin{bmatrix} k_x & 0 & 0 \\ 0 & k_y & 0 \\ 0 & 0 & k_z \end{bmatrix} $$

Scaling along an arbitrary axis

Let $\unit{n}$ be the unit vector parallel to the direction of scale and $k$ to be the scale factor, a vector transformed by this scale operations can be represented as

$$ \mathbf{v'} = \mathbf{S}(\unit{n}, k) \mathbf{v} $$

scale arbitrary axis

Separate $\mathbf{v}$ in two vectors, a vector parallel to $\unit{v}$ called $\mathbf{v_{\parallel}}$ and a vector perpendicular to $\unit{v}$ called $\mathbf{v_{\perp}}$ such that

$$ \mathbf{v} = \mathbf{v_{\parallel}} + \mathbf{v_{\perp}} $$

Where

$$ \begin{align*} \mathbf{v_{\parallel}} &= (\mathbf{v} \cdot \unit{n}) \unit{n} \\ \mathbf{v_{\perp}} &= \mathbf{v} - \mathbf{v_{\parallel}} \end{align*} $$

We can also represent $\mathbf{v’}$ as a sum of two vectors parallel and perpendicular to $\unit{n}$

$$ \mathbf{v'} = \mathbf{v_{\parallel}'} + \mathbf{v_{\perp}'} $$

Note that any vector that lies in the 2d line or 3d plane perpendicular to $\unit{n}$ will not be affected by the scale operation so $\mathbf{v’} = \mathbf{v_{\parallel}’} + \mathbf{v_{\perp}}$

Since $\mathbf{v_{\parallel}}$ is parallel to the direction of scale then $\mathbf{v_{\parallel}’} = k\mathbf{v_{\parallel}}$

Reconstructing the solution from the observations above

$$ \begin{align*} \mathbf{v_{\parallel}} &= (\mathbf{v} \cdot \unit{n}) \unit{n} \\ \mathbf{v_{\perp}'} &= \mathbf{v_{\perp}} \\ &= \mathbf{v} - \mathbf{v_{\parallel}} \\ &= \mathbf{v} - (\mathbf{v} \cdot \unit{n}) \unit{n} \\ \mathbf{v_{\parallel}'} &= k\mathbf{v_{\parallel}} \\ &= k(\mathbf{v} \cdot \unit{n}) \unit{n} \\ \mathbf{v'} &= \mathbf{v_{\perp}'} + \mathbf{v_{\parallel}'} \\ &= \mathbf{v} - (\mathbf{v} \cdot \unit{n}) \unit{n} + k(\mathbf{v} \cdot \unit{n}) \unit{n} \\ &= \mathbf{v} + (k - 1) (\mathbf{v} \cdot \unit{n}) \unit{n} \end{align*} $$

We can construct a general scale matrix by computing the vectors resulting after transforming the basis vectors $\mathbf{p}$, $\mathbf{q}$ and $\mathbf{r}$, for example let’s transform $\mathbf{p} = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix}^T$

$$ \begin{align*} \mathbf{p'} &= \mathbf{p} + (k - 1) (\mathbf{p} \cdot \unit{n}) \unit{n} \\ &= \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} + (k - 1) \left ( \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \begin{bmatrix} n_x \\ n_y \\ n_z \end{bmatrix}^T \right ) \begin{bmatrix} n_x \\ n_y \\ n_z \end{bmatrix} \\ &= \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} + (k - 1) n_x \begin{bmatrix} n_x \\ n_y \\ n_z \end{bmatrix} \\ &= \begin{bmatrix} 1 + (k - 1) {n_x}^2 \\ (k - 1)n_xn_y \\ (k - 1)n_xn_z \end{bmatrix} \end{align*} $$

Similarly the values of $\mathbf{q’}$ and $\mathbf{r’}$ can be found which make the general rotation matrix equal to

$$ \begin{align*} \mathbf{S}(\unit{n}, k) &= \begin{bmatrix} \mathbf{p'} & \mathbf{q'} & \mathbf{r'} \end{bmatrix} \nonumber \\ & = \begin{bmatrix} 1 + (k - 1) {n_x}^2 & (k - 1)n_yn_x & (k - 1)n_zn_x \\ (k - 1)n_xn_y & 1 + (k - 1) {n_y}^2 & (k - 1)n_zn_y \\ (k - 1)n_xn_z & (k - 1)n_yn_z & 1 + (k - 1) {n_z}^2 \end{bmatrix} \end{align*} $$

Transformation matrix

Thu, 15 Oct 2015 13:00:00 +0000

Let’s say that we’re given the standard basis vectors $\mathbf{i} = [1, 0, 0], ; \mathbf{j} = [0, 1, 0], ; \mathbf{k} = [0, 0, 1]$ and we multiply each of these vectors by an arbitrary matrix $\mathbf{M}$

$$ \begin{align*} \ \mathbf{iM} &= \begin{bmatrix}1 & 0 & 0\end{bmatrix} \begin{bmatrix} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \\ m_{31} & m_{32} & m_{33} \\ \end{bmatrix} = \begin{bmatrix} m_{11} & m_{12} & m_{13} \end{bmatrix} \\ \ \\ \ \mathbf{jM} &= \begin{bmatrix}0 & 1 & 0\end{bmatrix} \begin{bmatrix} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \\ m_{31} & m_{32} & m_{33} \\ \end{bmatrix} = \begin{bmatrix} m_{21} & m_{22} & m_{23} \end{bmatrix} \\ \ \\ \ \mathbf{kM} &= \begin{bmatrix}0 & 0 & 1\end{bmatrix} \begin{bmatrix} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \\ m_{31} & m_{32} & m_{33} \\ \end{bmatrix} = \begin{bmatrix} m_{31} & m_{32} & m_{33} \end{bmatrix} \\ \ \end{align*} $$

The first row of $\mathbf{M}$ contains the result of performing a transformation on the vector $\mathbf{i}$, the second row is the result of transforming $\mathbf{j}$ and third row to $\mathbf{k}$

Let $\mathbf{v}$ be some vector expressed under this coordinate space which means that it can be represented as a linear combination of the basis

$$ \mathbf{v} = v_x \mathbf{i} + v_y \mathbf{j} + v_z \mathbf{k} $$

If we multiply this vector by the matrix $\mathbf{M}$

$$ \begin{align} \mathbf{v'} = \mathbf{vM} &= (v_x \mathbf{i} + v_y \mathbf{j} + v_z \mathbf{k}) \mathbf{M} \nonumber \\ &= v_x (\mathbf{iM}) + v_y (\mathbf{jM}) + v_z (\mathbf{kM}) \nonumber \\ &= v_x \begin{bmatrix} m_{11} & m_{12} & m_{13} \end{bmatrix} + v_y \begin{bmatrix} m_{21} & m_{22} & m_{23} \end{bmatrix} + v_z \begin{bmatrix} m_{31} & m_{32} & m_{33} \end{bmatrix} \label{vm} \\ \end{align} $$

If we let $\mathbf{M}$ have the form

$$ \mathbf{M} = \begin{bmatrix} -\mathbf{p}- \\ -\mathbf{q}- \\ -\mathbf{r}- \end{bmatrix} $$

Then \eqref{vm} can be rewritten as

$$ \mathbf{v'} = \mathbf{vM} = v_x \mathbf{p} + v_y \mathbf{q} + v_z \mathbf{r} $$

$\mathbf{vM}$ is a linear combination of the rows of $\mathbf{M}$, if we interpret these row vectors as the basis vectors of some coordinate system expressed/measured in terms of an outer coordinate system then we have successfully created a structure that encodes a space coordinate transformation (from object space to upright space) in the form of a matrix

$$ \mathbf{v'} = \mathbf{vM} = \begin{bmatrix}v_x & v_y & v_z\end{bmatrix} \begin{bmatrix} -\textbf{p}- \\ -\textbf{q}- \\ -\textbf{r}- \end{bmatrix} = v_x \mathbf{p} + v_y \mathbf{q} + v_z \mathbf{r} $$

Another way to see this is that $\mathbf{M}$ encodes in its rows a transformation made to the standard basis vectors $\mathbf{i}, \mathbf{j}, \mathbf{k}$

The following notation means the rotation matrix that transforms the frame $a$ to the frame $b$ and that is represented in the frame $c$

$$ ^{c} \mathbf{M}_{a \to b} $$

If the frame $c$ is equal to the frame $b$ then it can be omitted since it’s assumed that the matrix is represented in terms of the frame $b$

$$ \mathbf{M}_{a \rightarrow b} $$

For example the matrix that transform from object space to upright space is represented as

$$ \mathbf{M}_{object \rightarrow upright} $$

Transforming the vector $\mathbf{v_{object}}$ expressed in object space to upright space is then

$$ \mathbf{v}_{upright} = \mathbf{v}_{object} \mathbf{M}_{object \rightarrow upright} $$

Row versus column vectors

A space coordinate transform operation has the form

$$ \mathbf{v'} = \mathbf{vM} $$

Where $\mathbf{M}$ encodes in its rows a transformation made to the standard basis vectors and $\mathbf{v’}$ and $\mathbf{v}$ are row vectors

Let’s say that we want to transform a row vector by the matrices $\mathbf{A}$, $\mathbf{B}$ and $\mathbf{C}$ in that order, the operation is represented as

$$ \mathbf{v'} = \mathbf{vABC} $$

However it could be possible that $\mathbf{v}$ is instead a column vector, in that case also $\mathbf{v’}$ must be a column vector, for $\mathbf{v’}$ to have the correct result we must pre-multiply $\mathbf{v}$ by the transpose of the transformation matrix which is equivalent to transposing both sides of the equation

$$ \begin{align*} \mathbf{v'} &= \mathbf{vABC} \\ \mathbf{v'}^T &= (\mathbf{vABC})^T && \text{transposing both sides} \\ \mathbf{v'}^T &= \mathbf{C}^T \mathbf{B}^T \mathbf{A}^T \mathbf{v}^T && \text{because of the }\href{https://www.wikiwand.com/en/Transpose#/Properties}{\text{matrix transpose properties}} \end{align*} $$

Note that

the transformations matrices $\mathbf{A}^T$, $\mathbf{B}^T$ and $\mathbf{C}^T$ encode in their columns a transformation made to the standard basis vectors i.e. they have the form

$$ \mathbf{M} = \begin{bmatrix} \mathbf{p}_{3 \times 1} & \mathbf{q}_{3 \times 1} & \mathbf{r}_{3 \times 1} \end{bmatrix} \quad \text{where $\mathbf{p} = \begin{bmatrix} p_x \\ p_y \\ p_z \end{bmatrix}$, $\mathbf{q} = \begin{bmatrix} q_x \\ q_y \\ q_z \end{bmatrix}$ and $\mathbf{r} = \begin{bmatrix} r_x \\ r_y \\ r_z \end{bmatrix}$} $$

In Dunn & Parberry’s book a column vector inside a matrix is written as

$$ \mathbf{M} = \begin{bmatrix} \cuv{\mathbf{p}} & \cuv{\mathbf{q}} & \cuv{\mathbf{r}} \end{bmatrix} $$

Also note that in this notation the arrow that connects the frames involved in the transformation is reversed, for example the transformation matrix that transform from object space to upright space is

$$ \mathbf{M}_{upright \leftarrow object} $$

In computer graphics column vectors should be used to represent points, differences between points and the likes

Coordinate systems and transformations between them

Thu, 15 Oct 2015 12:00:00 +0000

This article is part 1 in the series about transformation matrices:

World space¹, upright space, object space

why bother having multiple spaces?

Information is given only in the context of a particular reference frame

world space: global reference frame
- the position of other coordinates spaces can be expressed in terms of this space
- this space cannot be expressed in terms of any larger/outer space
- note that there’s no “absolute” space however this space is the largest one we care about
object space: space associated with each object that belongs to the world space
- camera space: object space associated with the viewport used for rendering
upright space: special space associated with each object, it’s halfway between world space and object space in the sense that the axes of this space are parallel to the ones of the world space but the origin of this space is coincident with the origin of the object space

Why do we have an upright space?

Thanks to this space the problem of transforming a point between object space -> world space (and vice-versa) can be divided in two subproblems

object space -> upright space (a rotation)
upright space -> world space (a change of location)

Coordinates of a vector

A coordinate system consists of

an origin (displacement from another coordinate system origin)
a basis (a set of three vectors)

The numeric coordinates of a vector expressed with respect to some basis are the coefficients of the representation of the vector as a linear combination of the basis

$$ \mathbf{v} = v_x \mathbf{i} + v_y \mathbf{j} + v_z \mathbf{k} $$

In other words the numeric coordinates are the quantities that multiply each basis vector which are $v_x$, $v_y$ and $v_z$

When the basis vectors are $\mathbf{i} = [1, 0, 0]$, $\mathbf{j} = [0, 1, 0]$ and $\mathbf{k} = [0, 0, 1]$ then

$$ \begin{align*} \mathbf{v} &= v_x \begin{bmatrix} 1 & 0 & 0 \end{bmatrix} + v_y \begin{bmatrix} 0 & 1 & 0 \end{bmatrix} + v_z \begin{bmatrix} 0 & 0 & 1 \end{bmatrix} \\ &= \begin{bmatrix} v_x & v_y & v_z \end{bmatrix} \end{align*} $$

Transformations between space coordinates

From object space to upright space

Let $\mathbf{v}$ be some vector expressed/measured relative to a space (object space) whose basis vectors are $\mathbf{p}, \mathbf{q}, \mathbf{r}$ (which are themselves expressed/measured relative to a wrapper space), the vector $\mathbf{v}$ expressed relative to the wrapper space is

$$ \begin{align} \mathbf{v}_{upright} &= v_x \mathbf{p} + v_y \mathbf{q} + v_z \mathbf{r} \label{object-upright} \\ &= v_x \begin{bmatrix} p_x & p_y & p_z \end{bmatrix} + v_y \begin{bmatrix} q_x & q_y & q_z \end{bmatrix} + v_z \begin{bmatrix} r_x & r_y & r_z \end{bmatrix} \nonumber \\ &= \begin{bmatrix} v_x p_x + v_y q_x + v_z r_x & v_x p_y + v_y q_y + v_z r_y & v_x p_z + v_y q_z + v_z r_z \end{bmatrix} \nonumber \end{align} $$

Note that if $\mathbf{p}, \mathbf{q}, \mathbf{r}$ were not orthogonal then $\mathbf{v}_{upright}$ couldn’t be uniquely determined

The coordinates of $\mathbf{p}, \mathbf{q}, \mathbf{r}$ are always equal to $[1, 0, 0], [0, 1, 0]$ and $[0, 0, 1]$ respectively when expressed using the coordinate system for which they are the basis, relative to other wrapper coordinate systems they will have arbitrary coordinates

From upright space to world space

Since the axes of the upright space are parallel to the axes of the world space the only difference between these spaces is the translation of these axes with respect to the origin of the axes of the world space, let $\mathbf{o}$ be the translation of the upright basis axes then

$$ \mathbf{v}_{world} = \mathbf{o} + \mathbf{v}_{upright} $$

From world space to upright space

We just have to translate the whole space so that the origin lies exactly on the origin of the upright space, if $\mathbf{o}$ is the origin of the upright space expressed in world space then

$$ \mathbf{v}_{upright} = \mathbf{v}_{world} - \mathbf{o} $$

From upright space to object space

What if $\mathbf{v}_{upright}$ is known and we want to know $\mathbf{v}$? The dot product is the key as it’s used to measure distance in a particular direction, since we know that the basis vectors $\mathbf{p}, \mathbf{q}, \mathbf{r}$ are expressed in terms of the upright space perspective we just have to calculate the projection of $\mathbf{v}_{upright}$ in the direction of each $\mathbf{p}, \mathbf{q}, \mathbf{r}$

$$ \begin{align*} v_x = \mathbf{v}_{upright} \cdot \mathbf{p} \\ v_y = \mathbf{v}_{upright} \cdot \mathbf{q} \\ v_z = \mathbf{v}_{upright} \cdot \mathbf{r} \end{align*} $$

If we use \eqref{object-upright} this works because the dot product with $\mathbf{p}$ will isolate the $v_x$ coordinate

$$ \begin{align*} \mathbf{v}_{upright} \cdot \mathbf{p} &= v_x (\mathbf{p} \cdot \mathbf{p}) + v_y (\mathbf{q} \cdot \mathbf{p}) + v_z (\mathbf{r} \cdot \mathbf{p}) \\ &= v_x (1) + v_y (0) + v_z (0) \\ &= v_x \end{align*} $$

Note: this only works when $\mathbf{p}, \mathbf{q}, \mathbf{r}$ are orthonormal, for the general case we have to solve this using linear algebra

words like “coordinate system”, “coordinate frame” or “space” are used interchangeably ↩︎

Quaternions

Tue, 08 Sep 2015 20:00:00 +0000

Definition

The existence of complex number presented a question for mathematicians, if a complex number exists in a 2D complex plane, could there be a 3D equivalent?

Sir William Rowan Hamilton among many other mathematicians of the 18th and 19th century had been searching for the answer, Hamilton conjectured that a 3D complex number could be represented by the triple $a + bi + cj$ where $i$ and $j$ are imaginary quantities and square to $-1$, when he was developing the algebra for this triplet the product of them raised a problem when expanded

$$ \begin{align*} z_1 &= a_1 + b_1i + c_1j \\ z_2 &= a_2 + b_2i + c_2j \\ z_1z_2 &= (a_1 + b_1i + c_1j)(a_2 + b_2i + c_2j) \\ &= (a_1b_1 - b_1b_2 - c_1c_2) + (a_1b_2 + b_1a_2)i + (a_1c_2 + c_1a_2)j \\ & \quad + b_1c_2ij + c_1b_2ji \end{align*} $$

The quantities $ij$ and $ji$ represented a problem for Hamilton, even if $ij = -ji$ we are still left with $(b_1c_2 - c_1b_2)ij$

On October 16th, 1843, while he was walking with his wife along the Royal Canal in Ireland he saw the solution as a quadruple instead of a triple, instead of using two imaginary terms, three imaginary terms provided the necessary quantities to resolve products like $ij$

Hamilton defined a quaternion $q$ as

$$ \begin{align*} q = s + ai + bj + ck \quad s,a,b,c \in \mathbb{R} \\ i^2 = j^2 = k^2 = ijk = -1 \\ ij = k \quad jk = i \quad ki = j \\ ji = -k \quad kj = -1 \quad ik = -j \end{align*} $$

If a complex number $i$ is capable of rotating points on the plane by $\deg{90}$ then perhaps a triple rotates points in space by $\deg{90}$, in the end the triplet was replaced by a quaternion

Notation

There are three ways of annotating a quaternion $q$

$$ \begin{align} q &= s + xi + yj + zk \\ q &= s + \mbold{v} \\ q &= [s, \mbold{v}] \\ & \text{where $s,x,y,z \in \mathbb{R}$, $\mbold{v} \in \mathbb{R}^3$} \nonumber \\ & \text{and $i^2 = j^2 = k^2 = ijk = -1$} \nonumber \end{align} $$

Real quaternion

A real quaternion has a zero vector term

$$ q = [s, \mbold{0}] $$

Pure quaternion

A pure quaternion is a quaternion having a zero scalar term

$$ q = [0, \mbold{v}] $$

Quaternion conjugate

Given

$$ q = [s, \mbold{v}] $$

The quaternion conjugate is defined as

$$ q^* = [s, - \mbold{v}] $$

Quaternion norm

The norm of a quaternion $q = [s, \mbold{v}]$ is defined as the square root of the product of itself and its conjugate (the multiplication operation is defined later)

$$ \begin{align*} \norm{q} &= \sqrt{qq^*} \\ &= \sqrt{s^2 + x^2 + y^2 + z^2} \end{align*} $$

Also note that

$$ \norm{q}^2 = qq^* $$

Norm facts

$\norm{qq^*} = \norm{q}\norm{q^*}$
$\norm{q^*} = \norm{q}$

Unit quaternion

A unit quaternion is a quaternion of norm one given by

Note: dividing a non-zero quaternion by its norm produces a unit norm quaternion

Operations

Quaternion Product

Given two quaternions

$$ \begin{align*} q_a = [s_a, \mbold{a}] \quad \quad \mbold{a} = x_a i + y_a j + z_a k \\ q_b = [s_b, \mbold{b}] \quad \quad \mbold{b} = x_b i + y_b j + z_b k \end{align*} $$

The product $q_aq_b$ is computed as follows

$$ \begin{align} q_aq_b &= (s_a + x_a i + y_a j + z_a k)(s_b + x_b i + y_b j + z_b k) \nonumber \\ &= (s_as_b - x_ax_b - y_ay_b - z_az_b) \nonumber \\ & \quad + (s_ax_b + s_bx_a + y_az_b - y_bz_a)i \nonumber \\ & \quad + (s_ay_b + s_by_a + z_ax_b - z_bx_a)j \nonumber \\ & \quad + (s_az_b + s_bz_a + x_ay_b - x_by_a)k \label{quaternion-product} \end{align} $$

Replacing the imaginaries by the ordered pairs (which are themselves quaternion units)

$$ i = [0, \mbold{i}] \quad j = [0, \mbold{j}] \quad k = [0, \mbold{k}] \quad 1 = [1, \mbold{0}] $$

And substituting them in \eqref{quaternion-product}

$$ \begin{align*} q_aq_b &= (s_as_b - x_ax_b - y_ay_b - z_az_b)[1, \mbold{0}] \\ & \quad + (s_ax_b + s_bx_a + y_az_b - y_bz_a)[0, \mbold{i}] \\ & \quad + (s_ay_b + s_by_a + z_ax_b - z_bx_a)[0, \mbold{j}] \\ & \quad + (s_az_b + s_bz_a + x_ay_b - x_by_a)[0, \mbold{k}] \end{align*} $$

By doing some aggrupations

$$ \begin{align*} q_aq_b &= [s_as_b - x_ax_b - y_ay_b - z_az_b, \\ & \quad s_a(x_b \mbold{i} + y_b \mbold{j} + z_b \mbold{k}) + s_b(x_a \mbold{i} + y_a \mbold{j} + z_a \mbold{k}) \\ & \quad + (y_az_b - y_bz_a) \mbold{i} + (z_ax_b - z_bx_a) \mbold{j} + (x_ay_b - x_by_a) \mbold{k}] \\ &= [s_as_b - \mbold{a} \cdot \mbold{b}, s_a\mbold{b} + s_b\mbold{a} + \mbold{a} \times \mbold{b}] \end{align*} $$

Now let’s compute the product $q_bq_a$

$$ q_bq_a = [s_bs_a - \mbold{b} \cdot \mbold{a}, s_b\mbold{a} + s_a\mbold{b} + \mbold{b} \times \mbold{a}] $$

Note that the scalar quantity of both products is the same however the vector quantity varies (the cross product sign is changed) therefore

$$ q_aq_b \not = q_bq_a $$

This is an important fact to note since for complex number the product commutes however for quaternions it doesn’t

Product of a scalar and a quaternion

Let $k$ be a scalar represented as a quaternion as $q_k = [k, \mathbf{0}]$ and $q = [s, \mathbf{v}]$

Their product is

$$ \begin{align*} q_kq &= [k, \mathbf{0}][s, \mathbf{v}] \\ &= [ks, k\mathbf{v}] \end{align*} $$

Note that this product is commutative

Product of a quaternion with itself (square of a quaternion)

$$ \begin{align*} q &= [s, \mbold{v}] \\ q^2 &= [s, \mbold{v}] [s, \mbold{v}] \\ &= [s^2 - \mbold{v} \cdot \mbold{v}, 2s\mbold{v} + \mbold{v} \times \mbold{v}] \\ &= [s^2 - \norm{v}^2, 2s\mbold{v}] \\ &= [s^2 - (x^2 + y^2 + z^2), 2s(x\mbold{i} + y\mbold{j} + z\mbold{k})] \end{align*} $$

Product of a quaternion and its conjugate

Let $q = [s, \mathbf{v}]$

$$ \begin{align*} qq^* &= [s, \mathbf{v}][s, -\mathbf{v}] \\ &= [s^2 + \mathbf{v} \cdot \mathbf{v}, -s \mathbf{v} + s\mathbf{v} - \mathbf{v} \times \mathbf{v}] \\ &= [s^2 + \mathbf{v} \cdot \mathbf{v}, \mathbf{0}] \\ &= s^2 + x^2 + y^2 + z^2 \end{align*} $$

Note that this product commutes i.e. $qq^* = q^*q$

Product of unit quaternions

Given

$$ q_a = [s_a, \mbold{a}] \\ q_b = [s_b, \mbold{b}] $$

Where $\norm{q_a} = \norm{q_b} = 1$, the product is another unit-norm quaternion

$$ q_c = [s_c, \mbold{c}] $$

Where $\norm{q_c} = 1$

Product of pure quaternions

Let

$$ q_a = [0, \mbold{a}] \\ q_b = [0, \mbold{b}] $$

The product $q_aq_b$ is defined as

$$ \begin{align*} q_aq_b &= [-\mbold{a} \cdot \mbold{b}, \mbold{a} \times \mbold{b}] \end{align*} $$

Note that the resulting quaternion is no longer a pure quaternion as some information has propagated into the real part via the dot product

Product of a pure quaternion with itself (square of a pure quaternion)

$$ \begin{align*} q &= [0, \mbold{v}] \\ q^2 &= [0, \mbold{v}] [0, \mbold{v}] \\ &= [-\mbold{v} \cdot \mbold{v}, \mbold{v} \times \mbold{v}] \\ &= [-(x^2 + y^2 + z^2), \mbold{0}] \\ &= -\norm{v}^2 \end{align*} $$

If $q$ is a unit norm pure quaternion then

$$ q^2 = -1 $$

Product of a pure quaternion with its conjugate

$$ \begin{align*} q^*q = qq^* &= [0, \mathbf{v}][0, -\mathbf{v}] \\ &= [\mathbf{v} \cdot \mathbf{v}, -\mbold{v \times v}] \\ &= [\mathbf{v} \cdot \mathbf{v}, \mbold{0}] \\ &= \norm{v}^2 \end{align*} $$

Inverse of a quaternion

By definition, the inverse $q^{-1}$ of $q$ is

$$ qq^{-1} = [1, \mbold{0}] $$

To isolate $q^{-1}$ let’s pre multiply both sides by $q^*$

$$ \begin{align*} q^*qq^{-1} &= q^* \\ \norm{q}^2q^{-1} &= q^* \\ q^{-1} &= \frac{q^*}{\norm{q}^2} \end{align*} $$

Quaternion units

Given the vector $\mbold{v}$

$$ \mbold{v} = v \hat{\mbold{v}} \quad \text{where $v = |\mbold{v}|$, and $|\hat{\mbold{v}}| = 1$} $$

Combining this with the definition of a pure quaternion

$$ \begin{align*} q &= [0, \mbold{v}] \\ &= [0, v \hat{\mbold{v}}] \\ &= v[0, \hat{\mbold{v}}] \end{align*} $$

It’s convenient to identify the unit quaternion as $\hat{q}$ (where $v = 1$)

$$ \hat{q} = [0, \hat{\mbold{v}}] $$

Let’s check if the quaternion unit $\mbold{i}$ squares to the ordered pair $[-1, \mbold{0}]$

$$ \begin{align*} i^2 &= [0, \mbold{i}][0, \mbold{i}] \\ &= [0 \cdot 0 - \mbold{i} \cdot \mbold{i}, 0 \cdot \mbold{i} + 0 \cdot \mbold{i} - \mbold{i} \times \mbold{i}] \\ &= [-|\mbold{i}|^2, \mbold{0}] \quad \text{$\mbold{i} \times \mbold{i} = 0$} \\ & = [-1, \mbold{0}] \end{align*} $$

Misc operations

Taking the scalar part of a quaternion

To isolate the scalar part of $q$ we could add $q^*$ to it

$$ 2 S(q) = q + q^* $$

Complex numbers

Tue, 08 Sep 2015 13:30:00 +0000

Imaginary numbers

Invented to solve problems where an equation has no real roots e.g. $x^2 + 16 = 0$, the idea of declaring the existence of a quantity $i$ such that $i^2 = -1$ allows us to express the solution as

$$ x = \sqrt{-16} = \sqrt{16i^2} = \pm4i $$

The set represented by $\mathbb{I}$ defines an imaginary number as

$$ i^2 = -1 $$

Powers of i

If $i^2 = -1$ then $i^4 = i^2i^2 = -1 * -1 = 1$

Therefore we have the sequence

$$ \begin{array}{ccccc} \hline i & i^2 & i^3 & i^4 & i^5 & \ldots \\ \hline i & -1 & -i & 1 & i & \ldots \\ \hline \end{array} $$

Complex numbers

A complex number is just the sum of a real and an imaginary number

$$ z = a + bi \quad a,b \in \mathbb{R}, \quad i^2 = -1 $$

Operations on complex numbers

Given two complex numbers

$$ z_1 = a_1 + b_1i \\ z_2 = a_2 + b_2i $$

Addition and subtraction

$$ z_1 \pm z_2 = a_1 \pm a_2 + (b_1 \pm b_2)i $$

Product

$$ \begin{align*} z_1z_2 &= a_1a_2 + a_1b_2i + a_2b_1i + b_1b_2i^2 \quad \text{given that $i^2 = -1$} \\ &= (a_1a_2 - b_1b_2) + (a_1b_2 + b_1a_2)i \end{align*} $$

Given the complex number

$$ z = a + bi $$

Norm (modulus or absolute value)

$$ |z| = \sqrt{a^2 + b^2} $$

Complex conjugate

The product of two complex numbers where the only difference between them is the sign of the imaginary part is

$$ (a + bi)(a - bi) = a^2 - abi + abi - b^2i^2 = a^2 + b^2 $$

This quantity $a - bi$ is called the complex conjugate of $z$ (denoted as $z^*$), it implies that

$$ zz^* = |z|^2 $$

Inverse

$$ z^{-1} = \frac{1}{z} $$

Multiplying the numerator and denominator with the conjugate of $z$ (so that we have a real part on the denominator)

$$ z^{-1} = \frac{1}{z} \frac{z^*}{z^*} = \frac{z^*}{zz^*} = \frac{z^*}{|z|^2} $$

Square root of $i$

We’re trying to find a complex number $z$ such that

$$ \sqrt{i} = z \\ i = z^2 $$

Assuming that $z$ is the complex number $z = a + bi$

$$ \begin{align} i &= (a + bi)^2 \nonumber \\ &= (a + bi)(a + bi) \nonumber \\ &= a^2 - b^2 + 2abi \label{square-imaginary} \end{align} $$

Therefore

$$ (a^2 - b^2) + (2ab)i = 0 + 1i $$

Equaling real and imaginary parts

$$ \begin{align*} a^2 - b^2 &= 0 \\ 2ab = 1 \end{align*} $$

Therefore $a = \pm b$, replacing $a = -b$ in the second equation we obtain $-2b^2 = 1$ which is not satisfied by any real number $b$ therefore the case $a = -b$ is impossible, replacing $a = b$ in the second equation we obtain $2a^2 = 1$ so

$$ 2a^2 = 1 \\ a^2 = \frac{1}{2} \\ a = b = \pm \sqrt{\frac{1}{2}} = \pm \frac{1}{\sqrt{2}} $$

Finally the value of $\sqrt{i}$ is

$$ \sqrt{i} = (a + bi) = \pm{\frac{1}{\sqrt{2}}} (1 + i) $$

The value of $\sqrt{-i}$ is found in the same way (by replacing $b = -a$ in the equation $-2ab = 1$ found from multiplying \eqref{square-imaginary} by $-1$)

$$ \sqrt{-i} = (a + bi) = \pm{\frac{1}{\sqrt{2}}} (1 - i) $$

Matrix representation of a complex number

The matrix $C$ for a complex number is the sum of two other matrices representing the real $R$ and imaginary $I$ parts:

$$ C = R + I $$

which can be written as

$$ C = a \hat{R} + b \hat{I} \quad\quad a, b \in \mathbb{R} $$

Where $R = 1$ and $I = i$

The matrix representation of $R = 1$ in 2d is the identity matrix

$$ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $$

To find the matrix representation of $i$ we have to analyze the definition of $i$ which is a quantity which squares to $-1$, given that we already know the value of $1$ in matrix form

$$ \begin{align*} i^2 &= -1 * \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \\ &= \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix} \end{align*} $$

Squaring the following matrix gives the matrix above, then the value of $i$ expressed in matrix form is

$$ I = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} $$

Finally the value of $C$ is

$$ C = a \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} + b \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} a & -b \\ b & a \end{bmatrix} $$

The complex plane

The powers of $i$ give rise to the sequence $(1, i, -1, -i, 1, \ldots)$ which is quite similar to the pattern $(x, y, -x, -y, x, \ldots)$, the resemblance is no coincidence as complex number belong to a 2-dimensional plane, this complex plane allows us to visualize complex numbers using the horizontal axis for the real part and the vertical axis for the imaginary part

$1, i, -1, -i$

We can see that the positions of $i^0 = 1, i^1 = i, i^2 = -1, i^3 = -i, \ldots$ suggest that the multiplication of a complex number by $i$ is equivalent to rotating through 90 degrees

e.g.

$$ \begin{align*} z_1 &= 2 + i \\ z_2 &= (2 + i)(i) = -1 + 2i \\ z_3 &= (-1 + 2i)(i) = -2 - i \\ z_4 &= (-2 - i)(i) = 1 - 2i \\ z_5 &= (1 - 2i)(i) = 2 + i = z_1 \end{align*} $$

A complex number is rotated $\pm 90^{\circ}$ by multiplying it by $\pm i$

Let’s graph the roots of $\sqrt{i} = \pm \frac{1}{\sqrt{2}} (1 + i)$

We can see that $\tfrac{1}{\sqrt{2}} (1 + i)$ is exactly at $45^{\circ}$ and $- \tfrac{1}{\sqrt{2}} (1 + i)$ is exactly at $225^{\circ}$

Let’s multiply the complex number $2 + i$ by $\sqrt{i}$ (it should rotate it by $45^{\circ}$)

$$ \begin{align*} z_1 &= 2 + i \\ z_2 &= (2 + i)(\sqrti + \sqrti i) = \sqrti + 3 \sqrti i \end{align*} $$

Multiplying $z_2$ by $\sqrt{i}$ again should be equal to multiplying $z_1$ by $i$ (because $z_2$ is already rotated by $45^{\circ}$)

$$ \begin{align*} z_2 &= \sqrti + 3 \sqrti i \\ z_3 &= (\sqrti + 3 \sqrti i)(\sqrti + \sqrti i) \\ &= (\frac{1}{2} - \frac{3}{2}) + (\frac{1}{2} + \frac{3}{2})i \\ &= -1 + 2i \end{align*} $$

Which is exactly what we find if we multiply $z_i$ by $i$, these observations suggest that we can build a complex number which can rotate another complex number by any angle

A complex number is rotated $45^{\circ}$ by multiplying it by $\sqrti + \sqrti i$

A complex number is rotated $225^{\circ}$ by multiplying it by $-\sqrti + \sqrti i$

Polar representation

Instead of using coordinates in the complex plane we can represent a polar number with the length of the vector from the origin to the complex coordinate and the angle between the complex vector and the positive real axis

$$ r = |z| = \sqrt{a^2 + b^2} \\ \theta = arctan(\frac{b}{a}) $$

The horizontal component of $z$ is then $r * cos(\theta)$ and the vectical component is $r * sin(\theta)$, expressing the complex number using these quantities

$$ \begin{align*} z &= a + bi \\ &= r * \cos \theta + ri\; \sin \theta \\ &= r \; (\cos \theta + i \sin \theta) \end{align*} $$

Euler provided the identity

$$ \begin{equation}\label{rotor} e^{i\theta} = \cos \theta + i \sin \theta \end{equation} $$

Which allows us to represent any complex number as

$$ z = r\,e^{i\theta} $$

Given two polar numbers

$$ z = r\,e^{i\theta} \\ w = s\,e^{i\phi} \\ $$

Their product is

$$ zw = rs\, e^{i(\theta + \phi)} = rs [ \cos (\theta + \phi) + i \sin (\theta + \phi)] $$

Which effectively rotated the complex number $z$ by $\phi$ angles! However the quantity $zw$ was scaled $s$ units, to avoid scalling we can normalize $w$ (i.e. making $r = 1$ which is equal to \eqref{rotor})

A rotor is a complex number that rotates another complex number by an angle $\theta$ (through multiplication) and has the form

$$ e^{i\theta} = \cos \theta + i \sin \theta $$

Rotating a complex number $x + yi$ by an angle $\theta$

$$ \begin{align*} x' + y'i &= (x + yi)(\cos \theta + i \sin \theta) \\ &= (x \cos \theta - y \sin \theta) + (x \sin \theta + y \cos \theta)i \end{align*} $$

Which in matrix form is

$$ \begin{bmatrix} x' & -y' \\ y' & x' \end{bmatrix} = \begin{bmatrix} x & -y \\ y & x \end{bmatrix} \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} $$

Note that because of the way the complex product is defined, the multiplication between two complex numbers commutes

$$ \begin{align*} x' + y'i &= (\cos \theta + i \sin \theta)(x + yi)\\ &= (x \cos \theta - y \sin \theta) + (x \sin \theta + y \cos \theta)i \end{align*} $$

Hamiltonian Graphs

Tue, 07 Jul 2015 19:30:51 +0000

A cycle that contains every vertex of a graph $G$ is called a Hamiltonian cycle, a Hamiltonian cycle is a spanning cycle of $G$, a Hamiltonian graph is a graph that contains a Hamiltonian cycle

A path in a graph that contains every vertex of $G$ is called a Hamiltonian path in $G$, if a graph contains a Hamiltonian cycle then it also contains a Hamiltonian path obviously removing any edge from a Hamiltonian cycle produces a Hamiltonian path

$$ C = {v_0, v_1, v_3, v_8, v_{12}, v_{13}, v_9, v_4, v_5, v_6, v_{10}, v_{14}, v_{11}, v_7, v_2, v_0} $$

every complete graph $K_n$ is a Hamiltonian graph

Eulerian Graph and Eulerian Trails

Sun, 05 Jul 2015 15:22:15 +0000

A circuit $C$ in a graph $G$ is called an Eulerian circuit if $C$ contains every edge of $G$ (remember that a circuit is a closed trail, i.e. a walk in which no edge is traversed more than once and it and that it begins and ends in the same vertex)

every edge of $G$ appears only once in the circuit
only graphs with one component can contain such a circuit

A connected graph $G$ that contains an eulerian circuit $C$ is called an Eulerian Graph

$$ C = (v_0,v_1,v_2,v_3,v_1,v_6,v_3,v_4,v_5,v_6,v_7,v_5,v_8,v_7,v_{10},v_8,v_9,v_{10},v_0) $$

An Eulerian trail is an open trail $T$ that contains all the edges of $G$ (but doesn’t end in the same start vertex)

$$ T = (v_0,v_1,v_2,v_4,v_3,v_1,v_4,v_5) $$

Königsberg Bridge Problem

The city of Königsberg, located in Prussia was separated by a river in 4 land areas, to travel between these areas 7 bridges were built, some citizens wondered whether it was possible to go for a walk in Königsberg and pass over each bridge exactly once

The land areas and the bridges built in the city of Königsberg modeled as a graph $M$

In graph theory terms the problem can be stated as follows

Does the multigraph $M$ of order $n = 4$ and size $m = 7$ contain an Eulerian circuit or an Eulerian trail?

Suppose that such a journey is possible then it must begin at some land area and end at some land area (possibly the same one), certainly each land area must appear in the trail, note that at least two vertices of $M$ are neither the initial nor the terminal vertex of the trail, let’s say that we start at land $A$ and end at land $A$

$$ T = (A, L_1, L_2, L_3, L_4, L_5, L_6, A) $$

Each of the $L$ lands but the first and the last are entered and exited every time they appear in the trail, this implies that all $L$ lands must have an even degree for a trail to exist

Going back to the Königsberg bridge problem we can see that it’s impossible to find a trail because all the vertices have an odd degree

The length of the eulerian circuit/trail of a graph $G$ is equal to $m + 1$ where $m$ is the size of $G$

For undirected graphs

A graph $G$ is an Eulerian graph if and only if every vertex of $G$ has even degree
A graph $G$ contains an Eulerian trail if and only if exactly 2 vertices of $G$ have odd degree, also each trail of $G$ begins at one of these vertices and ends at the other

For directed graphs

A graph $G$ is an Eulerian graph if and only if every vertex of $G$ has the same incoming degree and outgoing degree values and it’s strongly connected
A graph $G$ contains an Eulerian trail if and only if for each vertex the difference between its incoming degrees and outgoing degrees is 0 except for 2 vertices whose difference is $-1$ (start) and $+1$ (end), if those edges are connected by an edge then the graph is strongly connected

Hierholzer’s algorithm

Let $C$ be a cycle in an Eulerian graph, removing $E(C)$ from $G$ will create a subgraph which has an Eulerian trail

identify a circuit $C$ in $G$, mark the edges of $C$
if $C$ contains all the edges of $G$ then stop
otherwise let $v_i$ be a node on $C$ that is incident with an unmarked edge $e_i$
build a circuit $D$ starting at node $v_i$ and using edge $e_1$, mark the edges of $D$
join the circuit $D$ to $C$ by inserting the edges of $D$ into $C$ at position $v_1$, move to step 2

Implementation notes

In the implementation a source vertex $u$ is chosen to be arbitrary or to be the one of the two odd degree vertices, then an edge $uv$ is marked as visited, then we move to the vertex $v$, next an edge $vw$ is marked as visited, eventually we will get to a vertex $z$ that doesn’t have unvisited edges, this means that there’s a circuit starting at vertex $z$ and ending at vertex $z$, next there might be one vertex $y$ in the circuit $z-z$ that has unvisited edges, if one is found we know that there’s other circuit $y-y$, both circuits $z-z$ and $y-y$ might have nested circuits themselves, when the $y-y$ circuit doesn’t have a vertex with unvisited edges then the result is appended to the main circuit $z-z$ i.e. $u-v-\ldots-z-y-y-z$

// each edge is saved by id, helper to avoid the traversal
// of an edge many times
vector<bool> edge_used;
// the number of edges used in the adjacency list of the vertex `i`
vector<int> edge_pointer;
// the eulerian trail
vector<int> trail;
// the adjacency list representation of `g`, each element `g_{i,j}` is
// a tuple (to, id) which denotes an edge `(i, to)` with id `id`
vector<vector<pair<int, int> > > g;

void dfs(int v) {
  for (; edge_pointer[v] < g[v].size(); edge_pointer[v] += 1) {
    pair<int, int> &edge = g[v][edge_pointer[v]];
    if (edge_used[edge.second]) {
      // if the edge was already used analyze the next one
      continue;
    }
    // mark the edge
    edge_used[edge.second] = true;
    dfs(edge.first);
  }
  trail.push_back(v);
}

/**
 * Computes an euler trail if possible in an undirected graph `G`
 * whose `edges` are given as an input
 *
 * NOTE: The trail if it exists is saved on the global `trail`
 *
 * @param {int} n The order of the graph
 * @param {vector<pair<int, int> >} A collection of tuples
 * denoting the indexes of the vertices the edge `i` is incident to
 * @return {bool} True if the graph has an euler trail
 */
bool euler_trail_undirected(int n, vector<pair<int, int> > &edges) {
  int m = edges.size();
  g.assign(n, vector<pair<int, int> > ());
  edge_pointer.assign(n, 0);
  edge_used.assign(m, 0);
  vector<int> deg(n, 0);

  // build the adjacency list of the graph
  for (int i = 0; i < m; i += 1) {
    int u = edges[i].first;
    int v = edges[i].second;
    g[u].push_back({ v, i });
    g[v].push_back({ u, i });
    deg[u] += 1;
    deg[v] += 1;
  }

  // find an odd vertex
  int start = 0;
  int odd_degree_count = 0;
  for (int i = 0; i < n; i += 1) {
    if (deg[i] % 2 != 0) {
      ++odd_degree_count;
      start = i;
    }
  }

  if (odd_degree_count == 2 || odd_degree_count == 0) {
    dfs(start);
    return trail.size() == m + 1;
  }
  return false;
}


/**
 * Computes an euler trail if possible in an directed graph `G`
 * whose `edges` are given as an input
 *
 * NOTE: The trail if it exists is saved on the global `trail`
 *
 * @param {int} n The order of the graph
 * @param {vector<pair<int, int> >} A collection of tuples
 * denoting the indexes of the vertices the edge `i` is incident to
 * @return {bool} True if the graph has an euler trail
 */
bool euler_trail_directed(int n, vector<pair<int, int> > &edges) {
  int m = edges.size();
  g.assign(n, vector<pair<int, int> > ());
  edge_pointer.assign(n, 0);
  edge_used.assign(m, 0);
  vector<int> in_deg(n, 0), out_deg(n, 0);

  // build the adjacency list of the graph
  for (int i = 0; i < m; i += 1) {
    int u = edges[i].first;
    int v = edges[i].second;
    g[u].push_back({ v, i });
    out_deg[u] += 1;
    in_deg[v] += 1;
  }

  // find an odd vertex
  int start = 0;
  int odd_degree_count = 0;
  for (int i = 0; i < n; i += 1) {
    if (in_deg[i] - out_deg[i] != 0) {
      ++odd_degree_count;
      if (out_deg[i] > in_deg[i]) {
        start = i;
      }
    }
  }

  if (odd_degree_count == 2 || odd_degree_count == 0) {
    dfs(start);
    return trail.size() == m + 1;
  }
  return false;
}

Single Source Shortest Path (SSSP) in a graph

Fri, 03 Jul 2015 13:21:32 +0000

Unweighted graph

We call a shortest path from vertex $u$ to vertex $v$ a path of length $k$ where the path consists of vertices $p = (x_1, x_2, \ldots, x_k)$ such that $x_1 = u, x_k = v$ and $k$ is minimum

In an unweighted graph, breadth first search guarantees that when we analyze a vertex $v$ it will actually hold the shortest path to it, more searching will never find a path $uv$ to $v$ with fewer edges

let $d(v)$ be the shortest distance from a vertex $v$ to $s$, initially $d(v) = \infty, v \not= s$ and $d(s) = 0$
whenever a vertex $v$ where $d(v) = \infty$ is reached by some other vertex $u$ whose $d(u)$ was already computed then $d(v) = d(u) + 1$

/**
 * Breadth first search algorithm applied on an unweighted graph `G`
 * of order `n` and size `m` to find the shortest path from a source
 * vertex `s`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @param {vector<vector<int> >} g The adjacency list representation
 *  of `G`, each entry `g_{ij}` holds the end `v` of the edge `iv`
 * @param {int} s The source vertex
 * @return {vector<int>} The shortest path from `s` to all the other vertices
 */
vector<int> bfs(vector<vector<int> > &g, int s) {
  int n = g.size();

  // the vertex predecessor of `i` in the `s-i` path
  vector<int> parent(n, -1);
  // holds the shortest distance from `s` to vertex `i`
  vector<int> d(n, INF);

  // the distance from the source vertex is zero
  d[s] = 0;

  // accumulated weight, next vertex (weight, v)
  queue<int> q;
  q.push(s);

  while (!q.empty()) {
    int v = q.front();
    q.pop();

    for (int i = 0; i < g[v].size(); i += 1) {
      int to = g[v][i];
      if (d[to] == INF) {
        d[to] = d[v] + 1;
        parent[to] = v;
        q.push(to);
      }
    }
  }

  return d;
}

Weighted graph

Dijkstra’s algorithm

Dijkstra described an algorithm to solve the SSSP, there are some additional states that need to be stored per vertex:

let $d(v)$ be an estimate of the shortest distance from a vertex $v$ to $s$, initially $d(v) = \infty, v \not= s$ and $d(s) = 0$
let $visited(v)$ be the visited state of a given vertex, initially $visited(v) = false$

The algorithm consists in a series of iterations, on each iteration let $u$ be the vertex with the minimum distance to $s$ that wasn’t visited yet, a process called relaxation is the performed with $u$

the visited state is set to true, i.e. $visited(v) = true$
let $uv$ be an edge to an unvisited node $v$ with weight $w(uv)$, we might improve the best estimate of the shortest path between $u$ and $v$ by including $uv$ in the path so

$$ d(v) = min(d(v), d(u) + w(uv)) $$

After $n$ iterations all the vertices will be marked and $d(v)$ state will hold the shortest path from $s$ to all the other vertices

We need a data structure that supports the following 3 operations quickly:

remove a vertex with the minimum distance that wasn’t discovered yet (up to once for each vertex in the graph)
add a new vertex (up to once for each vertex in the graph)
update the estimated distance of an existing vertex (once for each edge in the graph)

Implementation with an array

An array supports the operations above in $O(V)$, $O(1)$, and $O(1)$ respectively leading to an overall $O(V^2 + E)$ which is optimal for dense graphs (when $E \approx V^2$)

/**
 * An implementation of Dijkstra's algorithm which computes
 * the shortest path from a source vertex `s` to all the other vertices
 * in a graph `G` with `V` vertices and `E` edges.
 *
 * Time complexity: O(V^2 + E)
 * Space complexity: O(V)
 *
 * @param {vector<vector<pair<int, int> > >} g The adjacency list representation
 *  of `G`, each entry `g_{ij}` holds the end `v` of the edge `iv` and the weight
 *  `weight` of the edge i.e. (v, weight)
 * @param {int} s The source vertex
 * @return {vector<int>} The shortest path from `s` to all the other vertices
 */
vector<int> dijkstra(vector<vector<pair<int, int> > > &g, int s) {
  int V = g.size();
  int INF = 1e9;

  vector<bool> visited(V);
  // the vertex predecessor of `i` in the `s-i` path
  vector<int> parent(V, -1);
  // holds the estimated distance
  vector<int> d(V, INF);

  // the estimated distance from the source vertex is zero
  d[s] = 0;

  for (int i = 0; i < V; i += 1) {
    // the vertex with the minimum estimated distance
    int v = -1;
    for (int j = 0; j < V; j += 1) {
      // find the vertices which haven't been visited yet
      // among them find a vertex with the minimum estimated distance
      if (!visited[j] && (v == -1 || d[j] < d[v])) {
        v = j;
      }
    }

    if (d[v] == INF) {
      // the vertex selected is not reachable from `s`
      break;
    }

    visited[v] = true;

    // update the estimated distance from `v`
    // to all the other adjacent vertices
    for (int j = 0; j < g[v].size(); j += 1) {
      pair<int, int> &edge = g[v][j];
      int next = edge.first;
      int weight = edge.second;
      int new_distance = d[v] + weight;

      if (new_distance < d[next]) {
        d[next] = new_distance;
        parent[next] = v;
      }
    }
  }

  return d;
}

Implementation with a BST

A balanced searth tree supports the operations above in $O(\log V)$, $O(\log V)$, and $O(\log V)$ respectively leading to an overal $O((E + V) \log V)$ time complexity optimal for sparse graphs (when $E \approx V$)

/**
 * C++11
 *
 * An implementation of Dijkstra's algorithm which computes
 * the shortest path from a source vertex `s` to all the other vertices
 * in a graph `G` of order `V` and size `E`
 *
 * Time complexity: O((E+V) log V)
 * Space complexity: O(V)
 *
 * @param {vector<vector<pair<int, int>>>} g The adjacency list representation
 *  of `G`, each entry `g_{ij}` holds a pair which represents an edge
 * (vertex, weight) which tells that there's an edge from `i` to `vertex`
 * with weight `weight`
 * @param {int} s The source vertex
 * @return {vector<int>} The shortest path from `s` to all the other vertices
 */
int dijkstra(vector<vector<pair<int, int>>> &g, int source) {
  int V = g.size();
  int INF = 1e9;
  int total = 0;

  // the vertex predecessor of `i` in the `s-i` path
  vector<int> parent(V, -1);
  // holds the estimated distance
  vector<int> d(V, INF);

  // the estimated distance from the source vertex is zero
  d[s] = 0;

  // accumulated weight, next vertex (weight, v)
  set<pair<int, int>> q;
  q.insert({0, s});

  while (!q.empty()) {
    pair<int, int> edge = *(q.begin());
    int from = edge.second;
    q.erase(q.begin());

    for (int i = 0; i < g[v].size(); i += 1) {
      int to, weight;

      // note that in the graph the first element is the neighbor vertex
      // but in the set the first element is the edge weight
      tie(to, weight) = g[v][i];

      if (d[from] + weight < d[to]) {
        q.erase({ d[to], to });
        d[to] = d[from] + weight;
        parent[to] = v;
        q.insert({ d[to], to });
      }
    }
  }

  return d;
}

Applications

Find the shortest path between two vertices $u$ and $v$
Find the shortest path from all the vertices to a given vertex $v$ by reversing the direction of each edge in the graph
Find the shortest path for every pair of vertices $u$ and $v$ by running the algorithm once per vertex

Introduction to Trees in Graph Theory

Tue, 30 Jun 2015 15:00:00 +0000

A graph $G$ is called acyclic if it has no cycles, a tree is an acyclic connected graph

every two vertices of a tree $T$ are connected by a unique path
every nontrivial tree has at least two end-vertices
if $T$ is a tree of order $n$ then the size of the tree is $m = n - 1$
additional definitions

Strongly Connected Components in Graph Theory

Thu, 25 Jun 2015 15:00:00 +0000

A connected subgraph of $G$ that is not a proper subgraph of any other connected subgraph of $G$ is a component of $G$, i.e. there’s a $u-v$ path in the mentioned subgraph

Strongly connected components are useful in a variety of graph algorithms, including finding the shortest path between two vertices, detecting cycles in a graph, and determining the structure of a graph. They can be computed efficiently using algorithms such as Tarjan’s algorithm and Kosaraju’s algorithm.

Undirected graphs

The problem of finding components in an undirected graph requires a simple graph traversal starting from an arbitrary vertex keeping track of the vertices that were already visited, it’s also needed to run the algorithm above for every vertex of $G$ (given that it was not visited)

the number of components of an undirected graph $G$ is equal to the number of disconnected subgraphs

vector<bool> visited;
// adjacency list of G
vector<vector<int> > g;

void dfs(int v) {
  visited[v] = true;
  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];
    if (!visited[next]) {
      dfs(next);
    }
  }
}

/**
 * Computes the number of connected components in an undirected graph `G`
 * of order `n` and size `m`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @return {int} The number of components in `G`
 */
int connected_components() {
  int n = g.size();
  visited.assign(n, false);

  int components = 0;
  for (int i = 0; i < visited.size(); i += 1) {
    if (!visited[i]) {
      dfs(i);
      ++components;
    }
  }
  return components;
}

Directed graphs

Given a directed graph $G$ two nodes $u, v \in V(G)$ are called strongly connected if $v$ is reachable from $u$ and $u$ is reachable from $v$

A strongly connected component (SCC) of $G$ is a subgraph $C \subseteq V(G)$ such that

$C$ is not empty
for any $u,v \in V(G)$, $u$ and $v$ are strongly connected
for any $u \in V(G)$ and $v \in G - C$, $u$ and $v$ are not strongly connected

Tarjan’s algorithm

The idea is to perform a DFS from an arbitrary vertex (conducting subsequent DFS from non-explored vertices), during the traversal each vertex $v$ is assigned with two numbers:

the time it was explored denoted as $v_{time}$
the smallest index of any node known to be reachable from $v$ denoted as $v_{low}$

Let $u$ be a node that belongs to a SCC, if $v$ is the arbitrary vertex chosen then the only known vertex that is reachable from $u$ is $u$, let $v$ be a vertex discovered during the exploration of $u$, if there’s a $v \rightarrow u$ path then it means that there’s a cycle and all the vertices in the path $u-v$ belong to the same connected component, such a node $u$ is called the root of the SCC

Let $u$ be a node that belongs to a SCC, if it’s known that there’s a $u-v$ cycle and also that $u$ can reach a vertex $t$ with lower index than $u$ then $v$ and $t$ belong to the same component

A stack is also needed to keep track of the nodes that were visited, the working of the stack follows the invariant: a node remains on the stack after exploration if and only if it has a path to some node earlier in the stack

// adjacency list of G
vector<vector<int> > g;

int time_spent;
// the number of scc
int total_scc;

// the time a vertex was discovered
vector<int> time_in;
// the smallest index of any vertex known to be reachable from `i`
vector<int> back;
// the scc vertex `i` belongs to
vector<int> scc;
// invariant: a node remains in the stack after exploration if
// it has a path to some node explored earlier that is in the stack
vector<bool> in_stack;
stack<int> vertices;

void dfs(int v) {
  int next;

  // the lowest back edge discovery time of `v` is
  // set to the discovery time of `v` initally
  back[v] = time_in[v] = ++time_spent;

  vertices.push(v);
  in_stack[v] = true;

  for (int i = 0; i < g[v].size(); i += 1) {
    next = g[v][i];
    if (time_in[next] == -1) {
      // unvisited edge
      dfs(next);
      // propagation of the lowest back edge discovery time
      back[v] = min(back[v], back[next]);
    } else if (in_stack[next]) {
      // (v, next) is a back edge only if it's connected to a predecessor
      // of `v`, i.e. if `next` is in same branch in the dfs tree
      //
      // an alternative is to use the time a vertex finished exploring its
      // adjacent nodes, if the time is not set then it's a back edge
      back[v] = min(back[v], time_in[next]);
    }
  }

  // if the root node of a connected component has finished
  // exploring all its neighbors, assign the same component `id`
  // to all the elements in the scc
  if (back[v] == time_in[v]) {
    total_scc += 1;
    do {
      next = vertices.top();
      vertices.pop();
      in_stack[next] = false;
      scc[next] = total_scc;
    } while (next != v);
  }
}

/**
 * Finds the strongly connected components in a digraph `G` of order `n`
 * and size `m`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @returns {int} the number of strongly connected components
 */
int tarjan() {
  int n = g.size();

  scc.assign(n, -1);
  time_in.assign(n, -1);
  back.assign(n, -1);
  in_stack.assign(n, false);
  while (!vertices.empty()) {
    vertices.pop();
  }

  time_spent = 0;
  total_scc = 0;

  for (int i = 0; i < n; i += 1) {
    if (time_in[i] == -1) {
      dfs(i);
    }
  }
  return total_scc;
}

Minimum Spanning Tree

Wed, 24 Jun 2015 18:31:59 +0000

If a connected graph $G$ of order $n$ has no cycles then of course $G$ is a tree, let’s suppose that $G$ contains cycles, let $e_1$ be an edge lying on a cycle of $G$ we know that since $e_1$ is part of cycle it’s not a bridge which means that $G - e_1$ is connected, if $G - e_1$ contains cycles then let $e_2$ be an edge lying on a cycle of $G - e_1$, $e_2$ is not a bridge and therefore $G - e_1 - e_2$ is still connected. Eventually we arrive to the set $X = {e_1, e_2, \ldots, e_k}$ of edges such that $G - X$ doesn’t contain cycles (i.e. it’s a tree) which has the same vertex set of $G$ ($V(G) = V(G - X)$).

Let $T = G - X$ be a tree with the same vertex set of $G$

$T$ is a spanning subgraph of $G$, since $T$ is also a tree it’s called a spanning tree of $G$

Minimum spanning tree

The purpose of finding the minimum spanning tree (MST) is to identify the subset of edges of a given connected, undirected graph that form a tree, connecting all the vertices and having the minimum possible total edge weight.

Let $G$ be a connected weighted graph where the weight of an edge $e \in E(G)$ is denoted by $w(e)$, for each subgraph $H$ of $G$ the weight of the subgraph $W$ is the sum of the weights of its edges

$$ w(H) = \sum_{e \in E(H)} w(e) $$

We are looking for a spanning tree of $G$ whose weight is minimum among all spanning trees of $G$, such a spanning tree is called minimum spanning tree (shortened as MST)

$$ \begin{align*} G &= (V, E) \\ V &= \{0, 1, 2, 3, 4\} \\ E &= \{\{0, 1\}, \{0, 2\}, \{0, 3\}, \{0, 4\}, \{1, 2\}, \{1, 3\}, \{1, 4\}, \{2, 3\}, \{3, 4\}\} \\ w(e) &= \{1, 4, 4, 5, 3, 7, 5, 6, 2\} \end{align*} $$

the MST is unique if the weights of all the edges are different
the maximum spanning tree is the tree whose weight is maximum among all spanning trees, it can be computed using the algorithm below by using the edges with the maximum weight instead of the ones with the minimum weight

Kruskal’s algorithm

For a connected weighted graph $G$ a spanning tree is constructed as follows

for the first edge $e_1$ we select any edge of $G$ of minimum weight
for the second edge $e_2$ we select any remaining edge of $G$ of minimum weight
for the third edge $e_3$ we select any remaining edge of $G$ of minimum weight that does not produce a cycle with the previously selected edges
we continue in this manner until a spanning tree is produced

Let’s apply it to the weighted graph above, sorting the edges in nondecreasing order we have:

$$ w(e) = {1, 2, 3, 4, 4, 5, 5, 6, 7} $$

Some properties of the edges

The edge with the minimum cost is $e_1 = v_0v_1$ with $w(e_1) = 1$, $e_1$ is part of the MST
The edge with the minimum cost is now $e_9 = v_3v_4$ with $w(e_9) = 2$, $e_2$ is part of the MST
The next edge is $e_5 = v_1v_2$ with $w(e_5) = 3$, since it does not form a cycle with the previously selected edges it’s part of the MST
The next edge is $e_2 = v_0v_2$ with $w(e_2) = 4$, this one forms a cycle with the following path $v_0,v_1,v_2$ so it’s not part of the MST
The next edge is $e_3 = v_0v_3$ with $w(e_3) = 4$, since it does not form a cucle with the previously selected edges it’s part of the MST
No need to do more iterations since the set is already a spanning tree

struct edge {
  int u, v, weight;
  edge(int _u, int _v, int _w) {
    u = _u; v = _v; weight = _w;
  }
  // custom sort
  bool operator<(const edge &other) const {
    return weight < other.weight;
  }
};

vector<int> tree, size;

void initialize_sets(int n) {
  tree.resize(n);
  size.resize(n);
  for (int i = 0; i < n; i += 1) {
    tree[i] = i;
    size[i] = 1;
  }
}

int find_set(int element) {
  if (element != tree[element]) {
    tree[element] = find_set(tree[element]);
  }
  return tree[element];
}

void set_union(int x, int y) {
  int rx, ry;
  rx = find_set(x);
  ry = find_set(y);
  if (rx == ry) {
    return;
  }
  if (rx > ry) {
    size[rx] += size[ry];
    tree[ry] = rx;
  } else {
    size[ry] += size[rx];
    tree[rx] = ry;
  }
}

/**
 * An implementation of Kruskal's algorithm which computes
 * the minimum spanning tree of a graph `G`
 *
 * Time complexity: O(m log m)
 * Space complexity: O(m)
 *
 * @param {vector<vector<pair<int, int> > >} g The adjacency list representation
 * of a graph `G`, each entry `g_{ij}` holds a pair which represents an edge
 * (vertex, weight) which tells that there's an edge from `i` to `vertex`
 * with weight `weight`
 * @return {int} The weight of the MST
 */
int kruskal(vector<vector<pair<int, int> > > &g) {
  int n = g.size();

  vector<edge> edges;
  for (int i = 0; i < n; i += 1) {
    for (int j = 0; j < g[i].size(); j += 1) {
      int v = g[i][j].first;
      int weight = g[i][j].second;
      edges.push_back(edge(i, v, weight));
    }
  }

  initialize_sets(n);

  sort(edges.begin(), edges.end());

  int total = 0;
  for (int i = 0; i < edges.size(); i += 1) {
    int u = find_set(edges[i].u);
    int v = find_set(edges[i].v);
    if (u != v) {
      set_union(u, v);
      total += edges[i].weight;
    }
  }

  return total;
}

Prim’s algorithm

For a connected weighted graph $G$ a spanning tree is constructed as follows

for an arbitrary vertex $u$ and edge of minimum weight $e_1$ incident to $u$ is chosen as the first edge of the MST
for subsequent edges $e_2, e_3, \ldots, e_{n - 1}$ we select an edge of minimum weight among those edges having exactly one of its vertices incident with an edge already selected

Prim in dense graphs

Let’s say we’re given the following problem

given $n$ points in a plane find the skeleton of minimum weight that connects them all

This problem can be modeled as a graph of order $n$ where each vertex is connected to every other vertex by an edge of weight equal to the euclidean distance between the vertices therefore $m \approx n^2$

Implementation strategies:

we need a data structure that keeps track of a single edge per vertex (space: $O(n)$ and is able to tell the one with the minimum weight (doing $O(n)$ queries), since $m \approx n^2$ we visit each vertex finding an edge with minimum cost (each query will take $O(n)$ for an overall $O(n^2)$ time complexity)
after an arbitrary vertex $u$ has been chosen all the vertices incident to $u$ will update their minimum edge weight

/**
 * An implementation of Prim's algorithm which computes
 * the minimum spanning tree of a dense graph `G`
 *
 * Time complexity: O(n^2)
 * Space complexity: O(n)
 *
 * @param {vector<vector<int> >} g The adjacency matrix of `G`, each entry `a_{ij}`
 * holds the weight of the edge connecting vertex `i` and vertex `j`, if this number
 * is <= 0 then `i` and `j` are not adjacent
 * @return {int} The weight of the MST
 */
int prim(vector<vector<int> > &g) {
  int n = g.size();
  int INF = 1e9;
  int total = 0;

  vector<bool> visited(n, false);
  // holds the weight of the edge of minimum weight incident
  // to the vertex `i`
  vector<int> min_weight_edge(n, INF);
  // (optional) holds the index of a vertex adjacent to the
  // vertex `i` in the MST, note that the size of the MST is
  // n - 1 so the first vertex won't store the mentioned index
  vector<int> neighbor_selected(n, -1);

  // pick the first node as the "arbitrary" node
  min_weight_edge[0] = 0;

  for (int i = 0; i < n; i += 1) {
    int v = -1;
    for (int j = 0; j < n; j += 1) {
      // find the vertices which haven't been visited yet
      // among them find a vertex with the minimum edge weight
      if (!visited[j] && (v == -1 ||
          min_weight_edge[j] < min_weight_edge[v])) {
        v = j;
      }
    }

    visited[v] = true;
    total += min_weight_edge[v];

    // update the minimum edge weight of all the vertices
    // adjacent to `v`
    for (int to = 0; to < n; to += 1) {
      if (g[v][to] > 0 &&
          g[v][to] < min_weight_edge[to]) {
        min_weight_edge[to] = g[v][to];
        // update the candidate neighbor of the vertex `to` to
        // be `v` since it's connected with an edge
        // of minimum weight among all the adjacent vertices to `to`
        neighbor_selected[to] = v;
      }
    }
  }

  return total;
}

Prim in sparse graphs

Implementation strategies:

we need a data structure that keeps track of a single edge per vertex (space: $O(n)$ and is able to tell the one with the minimum weight (doing $O(n)$ queries), since $m \approx n$ we analyze each edge finding the one with minimum weight $O(n)$ times, we can use a red-black tree (each operation takes $O(log;n)$ for an overall $O(m;log ;n)$ time complexity)
after an arbitrary vertex $u$ has been chosen all the vertices incident to $u$ will update their minimum edge weight
the red-black tree will hold $n - 1$ entries at max (one entry per vertex), each iteration a vertex will be removed from the red-black tree
there will be exactly $n$ iterations if the graph is connected

/**
 * C++11
 *
 * An implementation of Prim's algorithm which computes
 * the minimum spanning tree of a sparse graph `G` of order `n` and size `m`
 *
 * Time complexity: O(m log n)
 * Space complexity: O(n)
 *
 * @param {vector<vector<pair<int, int> > >} g The adjacency list representation
 * of a graph `G`, each entry `g_{ij}` holds a pair which represents an edge
 * (vertex, weight) which tells that there's an edge from `i` to `vertex`
 * with weight `weight`
 * @return {int} The weight of the MST or a negative number if the graph
 * wasn't connected
 */
int prim(vector<vector<pair<int, int> > > &g) {
  int n = g.size();
  int total = 0;

  vector<bool> visited(n, false);
  // holds the weight of the edge of minimum weight incident
  // to the vertex `i`
  vector<int> min_cost(n, INF);
  // (optional) holds the index of a vertex adjacent to the
  // vertex `i` in the MST, note that the size of the MST is
  // n - 1 so the first vertex won't store the mentioned index
  vector<int> neighbor(n, -1);

  // the first node is the "arbitrary" node for the sake of the implementation
  min_cost[0] = 0;

  // (min weight, vertex)
  set<pair<int, int> > q;
  q.insert({0, 0});

  while (!q.empty()) {
    int v = q.begin()->second;

    // the vertex `v` belongs to the MST and is adjacent
    // to the vertex `neighbor[v]` with and edge
    // of weight `weight`
    total += q.begin()->first;

    q.erase(q.begin());

    visited[v] = true;

    for (int i = 0; i < g[v].size(); i += 1) {
      pair<int, int> &next = g[v][i];

      // note that in the graph the first element is the neighbor vertex
      // but in the set the first element is the edge weight
      int to = next.first;
      int weight = next.second;

      if (!visited[to] && weight < min_cost[to]) {
        q.erase({ min_cost[to], to });
        min_cost[to] = weight;
        neighbor[to] = v;
        q.insert({ min_cost[to], to });
      }
    }
  }
  // check that every vertex has a min cost edge associated
  for (int i = 0; i < n; i += 1) {
    if (min_cost[i] == INF) {
      return -1;
    }
  }
  return total;
}

Number of spanning trees in a graph

Let $G$ be a graph with $V(G) = {v_1, v_2, \ldots, v_n}$, let $A = [a_{ij}]$ be the adjacency matrix of $G$ and let $C = [c_{ij}]$ be a $n \times n$ matrix where

$$ c_{ij} = \begin{cases} deg\;v_i & \text{if $i = j$} \\ -a_{ij} & \text{if $i \neq j$} \\ \end{cases} $$

Then the number of spanning trees of $G$ is the value of any cofactor of $C$

The matrix of cofactors a $n \times n$ matrix $C = [c_{ij}]$ where

$$ c_{ij} = (-1)^{i + j} \cdot det(M_{ij}) $$

$det(M_{ij})$ indicates the determinant of the $(n - 1) \times (n - 1)$ submatrix of $M$ obtained by removing the $i$-th row and the $j$-th column

/**
 * Given a square matrix M of size (n x n) this method
 * computes the a matrix of size (n - 1) x (n - 1) by eliminating
 * the elements belonging to the `row` row of M and the `col`
 * column of M
 *
 * @param {vector<vector<int> > } m The square matrix
 * @param {int} row The row to be ignored
 * @param {int} col The column to be ignored
 * @return {vector<vector<int> >} the value of the determinant
 */
vector<vector<int> > minor(vector<vector<int> > m, int row, int col) {
  int n = m.size();
  vector<vector<int> > t(n - 1, vector<int>(n - 1));

  int trow = 0;
  for (int i = 0; i < n; i += 1) {
    if (i == row) {
      continue;
    }
    int tcol = 0;
    for (int j = 0; j < n; j += 1) {
      if (j == col) {
        continue;
      }
      t[trow][tcol] = m[i][j];
      tcol += 1;
    }
    trow += 1;
  }
  return t;
}

/**
 * Computes the determinant of an square matrix
 *
 * @param {vector<vector<int> > } m The square matrix
 * @return {int} the value of the determinant
 */
int determinant(vector<vector<int> > m) {
  int n = m.size();

  if (n == 1) {
    return m[0][0];
  }
  if (n == 2) {
    return m[0][0] * m[1][1] - m[1][0] * m[0][1];
  }

  int result = 0;
  for (int col = 0; col < n; col += 1) {
    vector<vector<int> > t = minor(m, 0, col);
    result += m[0][col] * pow(-1, col) * determinant(t);
  }
  return result;
}

/**
 * Computes the number of spanning trees in an undirected graph `G`
 *
 * @param <vector<vector<int> > > g The adjacency matrix of `G`
 * @return {int} The number of spanning trees
 */
int number_of_spanning_trees(vector<vector<int> > &g) {
  int n = g.size();
  vector<vector<int> > t = g;
  for (int i = 0; i < n; i += 1) {
    // -a_{ij} for elements that are not in the main diagonal
    int degree = 0;
    for (int j = 0; j < n; j += 1) {
      if (i != j) {
        t[i][j] *= -1;
        if (t[i][j]) {
          degree += 1;
        }
      }
    }

    // deg v_i for t[i][i]
    t[i][i] = degree;
  }

  // compute the (0,0) cofactor
  // c_{0, 0} = (-1)^{0 + 0} * determinant((0, 0) minor)
  //          = determinant((0, 0) minor)
  return determinant(minor(t, 0, 0));
}

Number of spanning trees in a complete graph $K_n$

Computing the number of spanning trees of a graph $G = K_n$ where $V(G) = {v_1, v_2, \ldots, v_n}$ is the same as computing the number of distinct trees with vertex set ${v_1, v_2, \ldots, v_n}$, the formula is called the Caley Tree Formula

The number of spanning trees of order $n$ with a specific vertex set is $n^{n - 2}$

Cut-vertices (articulation points) in Graph Theory

Wed, 24 Jun 2015 15:00:00 +0000

All the facts/properties below are considered for an undirected connected graph $G$

if $v$ is a vertex incident with a bridge in a graph $G$ then $v$ is a cut-vertex if $deg(v) \geq 2$ (if $deg(v) = 1$ then $v$ is an end-vertex of $G$ so $G - v$ is still connected)
given that the order $G$ is $\geq 3$, if it contains a bridge then it also contains a cut-vertex
if $v$ is a cut-vertex of $G$ and $u$, $w$ are vertices in different components formed by $G - v$ then $v$ is part of every $u-w$ path in $G$
let $u \in V(G)$, if $v$ is a vertex that is farthest from $u$ then $v$ is not a cut-vertex

$$ \text{$v_0, v_2$ are cut vertices } $$

Let $G$ be an undirected graph, by analyzing the properties of the dfs tree we can determine if a vertex is an articulation point given the following facts:

a leaf vertex is not an cut-vertex
let $u$ and $v$ be two vertices of the dfs such that $u$ is an antecesor of $v$
- if $u$ and $v$ are not adjacent and there’s a back edge $vw$ to some vertex $w$ such that $w$ is an predecessor of $u$ then none of the vertices in the $u-v$ path are cut-vertices
- let $u$ and $v$ are not adjacent and there’s a back edge from $v$ to some vertex in the $u-v$ path then $u$ is a cut-vertex
let $u$ be the root node of the dfs tree, it’s an cut-vertex if during the exploration of its successor vertices finds out that it has more than one children i.e. the root has more than one branch in the dfs tree

int time_spent;

// the adjacency list representation of `G`
vector<vector<int> > g;
// the time a vertex `i` was discovered first
vector<int> time_in;
// stores the discovery time of the lowest predecessor that vertex `i`'s
// succesor vertices can reach **through a back edge**, initially
// the lowest predecessor is set to the vertex itself
vector<int> back;
// the articulation points found during the dfs
vector<int> cut_vertex;

void dfs(int v, int parent) {
  // the lowest back edge discovery time of `v` is
  // set to the discovery time of `v` initally
  back[v] = time_in[v] = ++time_spent;

  // count the number of children for the `root` vertex
  int children = 0;
  int is_cut_vertex = false;

  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];

    if (next == parent) {
      continue;
    }

    if (time_in[next] == -1) {
      dfs(next, v);
      /// if there's a back edge between a descendant of `next` and
      // a predecessor of `v` then `next` will have a lower reachable
      // vertex than `v` through a back edge, in this case the vertex `v` is not
      // a cut-vertex (the special case of the root node is handled below)
      if (back[next] >= time_in[v] && parent != -1) {
        is_cut_vertex = true;
      }
      // propagation of the back edge to a vertex with the lowest discovery time
      back[v] = min(back[v], back[next]);
      ++children;
    } else {
      // * back edge *
      // update index of the vertex incident with this back edge to
      // be the one with the lowest discovery time
      // it's possible for this edge to be a *forward edge*, in that
      // case the time won't be updated since time[v] < time[next]
      back[v] = min(back[v], time_in[next]);
    }
  }

  // the root vertex of the dfs tree is a cut-vertex
  // if it has more than two children in the dfs tree
  if (parent == -1 && children > 1) {
    is_cut_vertex = true;
  }

  if (is_cut_vertex) {
    cut_vertex.push_back(v);
  }
}

/**
 * Finds the articulation points in an undirected graph `G`
 * of order`n` and size `m`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @returns {int} the number of articulation points
 */
int articulation_points() {
  int n = g.size();
  time_spent = 0;
  time_in.assign(n, -1);
  back.assign(n, -1);
  cut_vertex.clear();

  for (int i = 0; i < n; i += 1) {
    if (time_in[i] == -1) {
      dfs(i, -1);
    }
  }
  return cut_vertex.size();
}

Biconnected components in an undirected graph

A biconnected graph is a nonseparable graph meaning that if any vertex is removed the graph is still connected and therefore it doesn’t have cut-vertices

Key observations:

two different biconnected components can’t have a common edge (but they might share a common vertex)
a common vertex linking multiple biconnected components must be a cut-vertex of $G$

Let $uv$ be an edge of an undirected graph $G$, we can keep an stack telling the order of the edges analyzed so we push it to the stack, let $u$ be a cut-vertex then all the edges from the top of the stack up to $uv$ are the edges of one biconnected component

int time_spent;

// the adjacency list representation of `G`
vector<vector<int> > g;
// the time a vertex `i` was discovered first
vector<int> time_in;
// stores the discovery time of the lowest predecessor that vertex `i`'s
// succesor vertices can reach **through a back edge**, initially
// the lowest predecessor is set to the vertex itself
vector<int> back;

// the biconnected components found during the dfs
vector<vector<pair<int, int> > > bcc;
stack<pair<int, int> > edges_processed;

void output_biconnected_component(int u, int v) {
  pair<int, int> top;
  vector<pair<int, int> > component;
  do {
    top = edges_processed.top();
    edges_processed.pop();
    component.push_back(top);
  } while (u != top.first || v != top.second);
  bcc.push_back(component);
}

void dfs(int v, int parent) {
  // the lowest back edge discovery time of `v` is
  // set to the discovery time of `v` initally
  back[v] = time_in[v] = ++time_spent;

  // count the number of children for the `root` vertex
  int is_cut_vertex = false;

  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];

    if (parent == next) {
      continue;
    }

    // mark the edge (v, next) as processed
    if (time_in[next] == -1) {
      // this edge is being processed right now
      edges_processed.push(pair<int, int> (v, next));

      dfs(next, v);
      // if there's a back edge between a descendant of `next` and
      // a predecessor of `v` then `next` will have a lower reachable
      // vertex than `v` through a back edge, in this case the vertex `v` is not
      // a cut-vertex
      if (back[next] >= time_in[v]) {
        output_biconnected_component(v, next);
      }
      // propagation of the back edge to a vertex with the lowest discovery time
      back[v] = min(back[v], back[next]);
    } else if (time_in[next] < time_in[v]) {
      // * back edge *
      // update index of the vertex incident with this back edge to
      // be the one with the lowest discovery time
      back[v] = min(back[v], time_in[next]);

      // push this edge to the stack only once
      edges_processed.push(pair<int, int> (v, next));
    }
  }
}

/**
 * Finds the biconnected components in an undirected graph `G`
 * of order`n` and size `m`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(m)
 *
 * @returns {int} the number of biconnected components
 */
int biconnected_components() {
  int n = g.size();
  time_spent = 0;
  time_in.assign(n, -1);
  back.assign(n, -1);

  while (!edges_processed.empty()) {
    edges_processed.pop();
  }
  bcc.clear();

  for (int i = 0; i < n; i += 1) {
    if (time_in[i] == -1) {
      dfs(i, -1);
    }
  }

  return bcc.size();
}

Connectivity

Let $G$ be a noncomplete graph without cut vertices, let $U$ be a set of vertices of $G$ such that $G - U$ is disconnected, $U$ is called a vertex-cut set

The graph below doesn’t have a cut-vertex but it has many vertex-cut sets, $U_1 = {v_1, v_2}$, $U_2 = {v_2, v_4}$, $U_3 = {v_1, v_2, v_3}$, $U_4 = {v_1, v_2, v_4}$, $U_5 = {v_0, v_2, v_4}$

the set with minimum cardinality is called a minimum vertex-cut set
a connected graph $G$ contains a cut-vertex set only if $G$ is not complete

For a graph $G$ that is not complete the vertex-connectivity denoted as $\kappa(G)$ is the cardinality of the minimum vertex-cut set of $G$, for the graph above $\kappa(G) = 2$

if $G$ is a graph of order $n$ and size $m \geq n - 1$ then $\kappa(G) = \left \lfloor \tfrac{2m}{n} \right \rfloor$

There are other measures of how connected a graph is, let $X$ be a set of edges of $G$ such that $G - X$ is disconnected or a trivial graph, $X$ is called a edge-cut set, the edge-connectivity denoted as $\lambda(G)$ is the cardinality of the minimum edge-cut of $G$

for complete graph $G$ of order $n$, $\lambda(G) = n - 1$

Cut-edges (bridges) in Graph Theory

Wed, 24 Jun 2015 14:51:12 +0000

Undirected graph

In the following undirected graph $G$ the edges $v_2v_3$ and $v_3v_4$ are bridges

An edge $e$ of an undirected graph $G$ is a bridge if and only if $e$ lies on no cycle of $G$
Every edge of an undirected tree is a bridge

Let $G$ be an undirected graph, by analyzing the properties of the dfs tree we can determine if an edge is a bridge given the following facts:

let $u$ and $v$ be two vertices of the dfs such that $u$ is an antecesor of $v$, also $u$ and $v$ are not adjacent
- if there’s a back edge $vu$ then none of the edges in the $u-v$ path are bridges, if we remove one of them the graph is still connected because of this edge
- otherwise the edge is a bridge

Implementation notes

to check if a succesor of a vertex $u$ has a back edge to a predecessor of $u$ an additional state is stored in each vertex which is the discovery time of the lowest back edge of a successor of $u$ (by lowest back edge we mean the back edge to a vertex with the lowest discovery time) denoted as $u_{back}$, initially this state is set to the discovery time of the vertex $v$ i.e. $u_{back} = u_{in}$, this state is propagated when the backtracking is performed
let $uv$ be a back edge, when this edge is analyzed the $v_{back}$ state needs to be updated to be the minimum between the existing $v_{back}$ and the discovery time of $u$, i.e. $v_{back} = min(v_{back}, u_{in})$
let $v$ be an adjacent successor of $u$ in the dfs tree, when we’ve finished analyzing the branch of the tree because of the $uv$ edge we have to check if the $v_{back}$ state contains a back edge to some predecessor of $u$ ($v_{back}$ is propagated) i.e. $u_{in} > v_{back}$, if so then $uv$ is not a bridge

int time_spent;

// the adjacency list representation of `G`
vector<vector<int> > g;
// the time a vertex `i` was discovered first
vector<int> time_in;
// stores the discovery time of the lowest predecessor that vertex `i`'s
// succesor vertices can reach **through a back edge**, initially
// the lowest predecessor is set to the vertex itself
vector<int> back;
// the bridges found during the dfs
vector<pair<int, int> > cut_edge;

void dfs(int v, int parent) {
  // the lowest back edge discovery time of `v` is
  // set to the discovery time of `v` initally
  back[v] = time_in[v] = ++time_spent;

  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];

    if (next == parent) {
      continue;
    }

    if (time_in[next] == -1) {
      dfs(next, v);
      // if there's a back edge between a descendant of `next` and
      // a predecessor of `v` then `next` will have a lower back edge discovery time
      // otherwise it's a bridge
      if (back[next] > time_in[v]) {
        cut_edge.push_back(pair<int, int> (v, next));
      }
      // propagation of the lowest back edge discovery time
      back[v] = min(back[v], back[next]);
    } else {
      // *back edge*
      // update the lowest back edge discovery time of `v`
      back[v] = min(back[v], time_in[next]);
    }
  }
}

/**
 * Finds the bridges in an undirected graph `G` of order `n` and size `m`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 */
void bridges() {
  int n = g.size();
  time_spent = 0;
  time_in.assign(n, -1);
  back.assign(n, -1);
  cut_edge.clear();

  for (int i = 0; i < n; i += 1) {
    if (time_in[i] == -1) {
      dfs(i, -1);
    }
  }
}

Directed graph (strong bridges)

Let $G$ be a directed graph, an edge $uv \in E(G)$ is a strong bridge if its removal increases the number of stronly connected components of $G$

The following is a connected graph $G$, every edge but $v_2v_0$ is a strong bridge because removing it from $G$ increases the number of strongly connected components, removing $v_2v_0$ doesn’t increase the number of strongly connected components so it’s not a bridge

A trivial algorithm to find the strong bridges of a digraph $G$ of order $n$ and size $m$ is as follows:

Compute the number of strongly connected componentes of $G$ denoted as $k(G)$
For each edge $e \in E(G)$
remove $e$ from $G$
compute the number of strongly connected components of $G$ denoted as $k(G - e)$
if $k(G) < k(G - e)$ then $e$ is a bridge

The time complexity of the algorithm above is clearly $O(m(n + m))$

Let $uv$ be an edge of a digraph $G$, we say that $uv$ is redundant if there’s an alternative path from vertex $u$ to vertex $v$ avoiding $uv$, otherwise we say that $uv$ is not redundant, computing the strong bridges is equivalent to compute the non-redundant edges of a graph

http://www.sofsem.cz/sofsem12/files/presentations/Thursday/GiuseppeItaliano.pdf

Topological sorting of a graph

Wed, 24 Jun 2015 11:30:00 +0000

Let $G$ be a digraph, the topological sorting algorithm is a linear ordering of the vertices of $G$ such that for every directed edge $u \rightarrow v$ where $u,v \in V(G)$, $u$ comes before $v$ in the ordering, the ordering is possible only if the graph has no directed cycles

since the graph has no directed cycles, at least one of the vertices has no incoming edges

vector<bool> visited;
// adjacency list of G
vector<vector<int> > g;
vector<int> order;

void dfs(int v) {
  visited[v] = true;
  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];
    if (!visited[next]) {
      dfs(next);
    }
  }
  order.push_back(v);
}

/**
 * Given a graph `G` of order `n` and size `m` computes a linear ordering
 * of the vertices such that for every edge u -> v, `u` comes earlier than `v`
 * in the ordering
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 */
void topological_sort() {
  int n = g.size();
  visited.assign(n, false);

  for (int i = 0; i < visited.size(); i += 1) {
    if (!visited[i]) {
      dfs(i);
    }
  }

  reverse(order.begin(), order.end());
}

Applications

Shortest path in a Directed Acyclic Graph

// adjacency list of G
// (to, weight)
vector<vector<pair<int, int> > > g;

// topological sort states
vector<bool> visited;
vector<int> order;

// shortest path state
vector<int> dist;

void dfs(int v) {
  visited[v] = true;
  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];
    if (!visited[next]) {
      dfs(next);
    }
  }
  order.push_back(v);
}

void topological_sort() {
  int n = g.size();
  visited.assign(n, false);

  for (int i = 0; i < visited.size(); i += 1) {
    if (!visited[i]) {
      dfs(i);
    }
  }

  reverse(order.begin(), order.end());
}

/**
 * Given a weighted graph `G` of order `n` and size `m` and a source vertex `source`
 * it computes the shortest distance between `source` and every other reachable
 * vertex from `source`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 */
void shortest_path_dag(int source) {
  int n = g.size();
  dist.assign(n, -1);

  topological_sort();
  dist[source] = 0;

  for(int i = 0; i < order.size(); i += 1) {
    int v = order[i];
    if (dist[v] >= 0) {
      for (int j = 0; j < g[v].size(); j += 1) {
        int to = g[v][j].first;
        int weight = g[v][j].second;
        int path_distance = dist[v] + weight;
        if (dist[to] < 0 || dist[to] > path_distance) {
          dist[to] = path_distance;
        }
      }
    }
  }
}

Traversal of graphs

Wed, 24 Jun 2015 11:00:00 +0000

Breadth First Search (BFS)

Given a graph $G$ and a distinguished source vertex $s$, BFS explores the edges of $G$ to discover the vertices adjacent to $s$, as a consequence it also computes the distance of the path from $s$ to each reachable vertex

vector<int> dist;
vector<int> parent;

/**
 * Traverses a graph `G` of order `n` and size `m` by breadth
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @param {vector<vector<int> >} g The adjacency list representation
 * of a graph
 * @param {int} source The source vertex
 */
void bfs(vector<vector<int> > &g, int source) {
  int n = g.size();
  dist.assign(n, -1);
  parent.assign(n, -1);

  queue<int> remaining;
  dist[source] = 0;
  remaining.push(source);

  while (!remaining.empty()) {
    int current = remaining.front();
    remaining.pop();

    for (int i = 0; i < g[current].size(); i += 1) {
      int next = g[current][i];
      if (dist[next] == -1) {
        dist[next] = dist[current] + 1;
        parent[next] = current;
        remaining.push(next);
      }
    }
  }
}

Depth First Search (DFS)

Given a graph $G$ and a distinguished source vertex $s$, DFS explores the edges incident to $s$ and explores as far as possible along each branch before backtracking, to prevent infinite loops caused by visiting a vertex multiple times an additional state is used in each vertex which denotes if the vertex was visited before

Whenever a vertex $v$ is discovered by some vertex $u$, we say that $u$ is a predecesor of $v$, and also since every vertex can only have one predecessor (a vertex can only be visited once) during the traversal the algorithm forms a tree called the dfs tree

During the process of creation of the dfs tree the algorithm can also define timestamps on each vertex (an integer denoting the time an action happened)

$v_{in}$ recorded when $v$ is first discovered
$v_{out}$ recorded when the search finishes exploring $v$’s adjacent vertices

Properties

the number of descendent of any vertex $v$ is equal to $\tfrac{v_{f} - v_{d} - 1}{2}$
for any two vertices $u$ and $v$ exactly one of the following holds
if the interval $[u_{in}, u_{out}]$ and $[v_{in}, v_{out}]$ are disjoint intervals then neither $u$ is a descendant of $v$ nor $v$ a descendant of $u$ in the dfs tree
if the interval $[u_{in}, u_{out}]$ is contained in $[v_{in}, v_{out}]$ then $u$ is a descendant of $v$
if the interval $[v_{in}, v_{out}]$ is contained in $[u_{in}, u_{out}]$ then $v$ is a descendant of $u$

Classification of edges

We can define four edge types produced by a DFS on $G$

Tree edges, an edge $uv$ is a tree edge if $v$ was first discovered by $u$
Back edges, an edge $uv$ is a back edge if it connects $u$ with an antecesor of of $v$)
Forward edges, an edge $uv$ is a forward edge if it connects $u$ with a descendant of $v$ (nontree edge)
Cross edges, all the other edges, e.g. an edge between branches in the dfs tree

We can identify these edges with an additional state stored in the vertices of the graph during the dfs tree process, the additional state will be $v_{color}$ and can have three possible values

$v_{color} = WHITE$ if a vertex wasn’t explored yet
$v_{color} = GRAY$ when a vertex is discovered first
$v_{color} = BLACK$ when a vertex has finished exploring its adjacent vertices

During the analysis of an edge we can take a look at the color of the adjacent vertex to determine the type of edge, given the edge $uv$ there are three possible outcomes

if $v_{color} = WHITE$ then $uv$ is a tree edge
if $v_{color} = GRAY$ then $uv$ is a back edge
if $v_{color} = BLACK$ then $uv$ is a forward/cross edge

Another way to determine the type of edge is by analyzing the states $u_{in}$ ($u_{out}$ is undefined when all the edge $uv$ are being analyzed) and $v_{in}, v_{out}$ of the incident vertices to the edge, given an edge $uv$

if $v_{in}$ is not defined then $uv$ is a tree edge
if $v_{in}$ is defined and $v_{out}$ is not defined then $uv$ is a back edge
if $v_{in}$ is defined and $v_{out}$ is defined and $u_{in} < v_{in}$ then $uv$ is a forward edge
if $v_{in}$ is defined and $v_{out}$ is defined and $u_{in} > v_{in}$ then $uv$ is a cross edge

Additional properties of the edges

if $G$ is an undirected graph then every edge of $G$ is either a tree edge or a back edge during the exploration of the dfs tree
a directed graph $G$ is acyclic if it contains no back edges

int time_spent = 0;

// the adjacency list of `G`
vector<vector<int> > g;
// the explored state of a vertex `i`
vector<bool> visited;
// the predecesor of a vertex `i` in the dfs tree
vector<bool> predecessor;
// the time a vertex `i` was discovered first
vector<int> time_in;
// the time a vertex `i` spent exploring each reachable non-visited vertices
vector<int> time_out;

/**
 * Traverses a graph `G` of order `n` and size `m` by depth,
 * it's assumed that `time_in`, `time_out`, `visited`, `predecessor`
 * are initialized correctly with a size equal to `n`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @param {int} v The current vertex being analyzed
 */
void dfs(int v) {
  visited[v] = true;
  time_in[v] = ++time_spent;

  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];

    // edge analysis
    if (!time_in[next]) {
      // edge (v, next) is a tree edge
    } else if (!time_out[next]) {
      // edge (v, next) is a back edge
    } else if (time_in[v] < time_in[next]) {
      // edge (v, next) is a forward edge
    } else {
      // edge (v, next) is a forward edge
    }

    // traversal to adjacent vertices
    if (!visited[next]) {
      predecessor[next] = v;
      dfs(next);
    }
  }

  time_out[v] = ++time_spent;
}

Introduction to Graph Theory

Mon, 22 Jun 2015 17:03:41 +0000

A graph is a pair $G = (V, E)$, it consists of a finite set $V$ of objects called the vertices (or nodes or points) and a set $E$ of 2-elements subsets of $V$ called edges, another way to denote the vertex set/edges of a graph $G$ is using $V(G)$ (the vertex set of $G$) and $E(G)$ (the edge set of $G$)

$$ \begin{align*} G &= (V, E) \\ V &= \{1, 2, 3, 4, 5, 6, 7\} \\ E &= \{ \{1, 5\}, \{5, 7\}, \{2, 3\}, \{2, 4\}, \{3, 4\} \} \end{align*} $$

Properties

the order of a graph is the number of vertices (written as $\mid G \mid$)
the size of a graph is the number of edges (written as $\Vert G \Vert$)
an edge ${u, v}$ is usually written as $uv$ (or $vu$), if $uv$ is an edge of $G$ then $u$ and $v$ are said to be adjacent in $G$
a vertex $v$ is incident with an edge e if $v \in e$
the set of neighbors of a vertex $v$ is denoted by $N(v)$
the degree of a vertex $v$ is the number of edges incident to $v$ (loops are counted twice)

Let $G = (V, E)$ and $G’ = (V’, E’)$ be two graphs, we set $G \cup G’ = (V \cup V’, E \cup E’)$ and $G \cap G’ = (V \cap V’, E \cap E’)$

if $G \cup G = \varnothing$ then $G$ and $G’$ are disjoint
if $V’ \subseteq V$ and $E’ \subseteq E$ then $G’$ is a subgraph of $G$ written as $G’ \subseteq G$
if $G’$ is a subgraph of $G$ and either $V’ \subset V$ or $E’ \subset E$ then $G’$ is a proper subgraph of $G$
if $G’$ is a subgraph of $G$ and $V’ = V$ then $G’$ is an spanning subgraph of $G$
if $V’ \subseteq V$ and all the edges $e = uv \in E$ so that $u \in V’$ and $v \in V’$ and also $e \in E’$ then $G’$ is an induced graph of $G$

Movement

A walk in a graph is a sequence of movements beginning at $u$ moving to a neighbor of $u$ and then to a neighbor of that vertex and so on until we stop at a vertex $v$

$$ W = (u = v_0, v_1, \ldots, v = v_k) $$

where $k \geq 0$, note that there are no restrictions on the vertices visited so a vertex can be visited more than once also there are no restrictions on the edges traversed so an edge can be traversed more than once, every two consecutive vertices in $W$ are distinct since they are adjacent, if $u = v$ then we said that the walk is closed otherwise it’s open

a trail is a walk in which no edge is traversed more than once
a path is a walk in which no vertex is visited more than once (note that every path is also a trail)
a circuit is a closed trail of length 3 or more (it begins and ends at the same vertex but repeat no edges)
a cycle is a circuit that repeat no vertex (think of it as a closed path), a $k$-cycle is a cycle of length $k$ (e.g. a $3$-cycle is a triangle)

Properties related with path lengths

the distance between two vertices $u$ and $v$ is the smallest length of any $u - v$ path in $G$ denoted by $d(u, v)$
the diameter is the greatest distance between any two vertices of a connected graph

Some additional properties related with connectivity in a graph

if a graph $G$ contains a $u-v$ path then $u$ and $v$ are said to be connected
a graph $G$ is connected if every two vertices of $G$ are connected
a connected subgraph of $G$ that is not a proper subgraph of any other connected subgraph of $G$ is a component of $G$
the number of components of a graph is denoted by $k(G)$, then a graph is connected if $k(G) = 1$

Common classes of graphs

Complete graph

A graph $G$ is complete if every two distinct vertices of $G$ are adjacent

a complete graph of order $n$ is denoted by $K_n$, for complete graphs $K_n$ has the maximum possible size for a graph of $n$ vertices
the number of pairs of vertices in $K_n$ is $\binom{n}{2} = \tfrac{n(n - 1)}{2}$

Sparse graph

A graph $G$ of order $n$ and size $m$ is a sparse graph if $m$ is close to the minimal number of edges i.e. when $m \approx n$, in this case adjacency lists are prefered since they require constant space for every edge, a tree is a good example of a sparse graph

Dense graph

A graph $G$ of order $n$ and size $m$ is a dense graph if $m$ is close to the maximal number of edges i.e. when $m \approx n^2$, in this case an adjacency matrix is prefered, a complete graph is a dense graph

Complement graph

The complement $\bar{G}$ of a graph $G$ is a graph whose vertex is the set $V(G)$ and such that for each pair $u, v$ of distinct vertices, $uv \in E(\bar{G})$ and $uv \not\in E(G)$, that means that the complement of a complete graph is a graph of order $n$ and size $0$

if $G$ is diconnected, $\bar{G}$ is connected

Bipartite graph

A graph $G$ is bipartite when the set $V(G)$ can be partitioned into two subsets $U$ and $W$ called partite sets such that every edge of $G$ joins a vertex of $U$ and a vertex of $W$

a nontrivial graph $G$ is bipartite if it doesn’t contain odd length cycles
a complete bipartite graph is a bipartite graph where each vertex of $U$ is adjacent to every vertex of $W$, it’s denoted as $K_{\mid U \mid, \mid W \mid}$
a star is a complete bipartite graph where $K_{1, \mid W \mid}$ or $K_{\mid U \mid, 1}$

$k$-partite graph

A graph $G$ is $k$-partite when the set $V(G)$ can be partitioned into $k$ subsets $V_1, V_2, \ldots, V_k$ such that every edge of $G$ joins a vertex of $U$ and a vertex of $W$

Biconnected graph

A biconnected graph $G$ is a connected and “nonseparable” graph meaning that if any vertex (and its incident edges) is removed the graph will remain connected, therefore a biconnected graph doesn’t have cut-vertices

Multigraphs

A multigraph $M$ is a graph where every two vertices of $M$ are joined by a finite number of edges, when two or more edges join the same pair of distinct vertices those edges are called parallel edges

Pseudographs

A pseudograph $P$ is a graph where an edge is allowed to join a vertex with itself, such an edge is called a loop

Weighted graph

Let $G$ be a multigraph, let’s replace all the parallel edges joining a particular pair of vertices by a single edge which is assigned a positive integer representing the number of parallel edges, this new representation is refered as a weighted graph

Digraphs

A directed graph $D$ is a finite nonempty set $V$ of vertices and a set $E$ of ordered pairs of distinct vertices, the elements of the set $E$ are called directed edges or arcs, arcs are represented with arrows instead of plain line segments

if $uv$ is a directed edge then $u$ is adjacent to $v$ and $v$ is adjacent from $u$

Degrees

The degree of a vertex $v$ is the number of edges incident with $v$ denoted by $deg(v)$ (with loops counted twice)

the minimum degree of a graph $G$ is the minimum degree among all the vertices of $G$ denoted as $\delta(G)$
the maximum degree of a graph $G$ is the maximum degree among all the vertices of $G$ denoted as $\Delta(G)$
a vertex of degree $0$ is called an isolated vertex
a vertex of degree $1$ is called an end vertex (or leaf)
a vertex of even degree is called an even vertex
a vertex of odd degree is called an odd vertex
two vertices of $G$ that have the same degree are called regular vertices
if a graph $G$ has the same degree $r$ for all its vertices it’s called an r-regular graph

In a graph $G$ of $n$ vertices the following equality relation holds

$$ 0 \leq \delta(G) \leq deg(v) \leq \Delta(G) \leq n - 1 $$

First theorem of graph theory

if $G$ is a graph of size $m$ then

$$ \sum_{v \in V(G)} deg(v) = 2m $$

When summing the degrees of $G$ each edge is counted twice

Degrees in a bipartite graph

Suppose that $G$ is a bipartite graph with two partite sets $U$ and $W$, then

$$ \sum_{u \in V(U)}deg(u) = \sum_{w \in V(W)} deg(w) = m $$

Corollary: every graph has an even number of odd vertices

Proof: Let $G$ be a graph of size $m$, dividing $V(G)$ into two subsets $V_{even}$ which consists of even vertices and $V_{odd}$ which consists of odd vertices then by the first theorem of graph theory

$$ \sum_{v \in V_{even}(G)} deg(v) + \sum_{v \in V_{odd}(G)} deg(v) = 2m $$

the number $\sum_{v \in V_{even}(G)} deg(v)$ is even since it’s a sum of even numbers thus $\sum_{v \in V_{odd}(G)} deg(v)$ is also even and it can be even only if the number of odd vertices is even (a sum of two odd numbers gives an even number)

Degree sequences

A deIf the degrees of a graph $G$ are listed in a sequence $s$ then $s$ is called a sequence degree, e.g.

$$ s: 4, 3, 2, 2, 2, 1, 1, 1, 0 $$

Suppose we’re given a finite sequence $s$ of nonnegative integers, a well known problem is if we can build a graph out of this sequence, to solve this problem let’s talk about some facts

the degree of any vertice can never be greater than $n - 1$ where $n$ is the order of the graph
a graph has an even number of odd vertices

There’s a theorem called Havel-Hakimi which solves the problem above in polynomial time

A non-increasing sequence $s: d_1, d_2, \ldots, d_n$ where $d_1 \geq 1$ can form a graph only if the sequence

$$ s_1: d_2 - 1, d_3 - 1, \ldots, d_{d_1 + 1} - 1, d_{d_1 + 2}, \ldots, d_n $$

forms a graph

According to this theorem we can create a new sequence based on the one above that is also a graph, we can apply the theorem recursively to test if the original sequence forms a graph

$$ \begin{align*} s_1 &: 4, 3, 2, 2, 2, 1, 1, 1, 0 \\ s_2 &: 2, 1, 1, 1, 1, 1, 1, 0 \quad \text{removing $d_1 = 4$ and subtracting $1$ from the following $4$ elements} \\ s_3 &: 1, 1, 1, 1, 0 \quad \text{removing $d_1 = 2$ and subtracting $1$ from the following $2$ elements} \\ s_4 &: 1, 1, 0 \quad \text{removing $d_1 = 1$ and subtracting $1$ from the following element} \\ s_5 &: 0 \quad \text{removing $d_1 = 1$ and subtracting $1$ from the following element} \end{align*} $$

bool graph_from_sequence(vector<int> &degrees) {
  int sum = 0;
  int size = degrees.size();
  for (int i = 0; i < size; i += 1) {
    sum += degrees[i];
    if (degrees[i] >= size || degrees[i] < 0) {
      // a vertice can have a maximum degree of n - 1
      // also none of the degrees can be negative
      return false;
    }
  }

  if (sum == 0) {
    // trivial case
    return true;
  }

  sort(degrees.begin(), degrees.end());

  // removing d_1
  int max_degree = degrees.back();
  degrees.pop_back();
  size -= 1;

  // subtracting 1 from the next d_1 elements
  for (int i = 0; i < max_degree; i += 1) {
    degrees[size - 1 - i] -= 1;
  }
  return graph_from_sequence(degrees);
}

Graphs and matrices

A graph can also be described using a matrix, the adjacency matrix of a graph $G$ of order $n$ and size $m$ is a $n \times n$ matrix $A = [a_{ij}]$ $where

$$ a_{ij} = \begin{cases} 1 & \text{if $v_iv_j \in G$} \\ 0 & \text{otherwise} \end{cases} $$

The incidence matrix of a graph $G$ of order $n$ and size $m$ is a $n \times m$ matrix $B = [b_{ij}]$ where

$$ b_{ij} = \begin{cases} 1 & \text{if $v_i$ is incident with $e_j$} \\ 0 & \text{otherwise} \end{cases} $$

$$ G = (V, E) \\ V = \{0, 1, 2, 3, 4\} \\ E = \{\{0, 1\}, \{0, 2\}, \{0, 3\}, \{1, 3\}, \{2, 3\}, \{3, 4\}\} \\ $$

$$ A = \begin{bmatrix} 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \end{bmatrix} \quad B = \begin{bmatrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} $$

Useful observations

Let $A^k$ be the adjacency matrix of a graph $G$ raised to the $k$-th power, the entry $a_{ij}^{(k)}$ is the number of distinct $v_i - v_j$ walks of length $k$ in $G$

Proof: assume for a positive integer $k$ that the number of $v_i - v_j$ walks of length $k$ is given by $a_{ij}^{(k)}$ in the matrix $A^k$, then $A^{k + 1} = A^k \times A$, now a cell $a_{ij}^{(k + 1)}$ is the dot product of row $i$ of $A^k$ and column $j$ of $A$

$$ a_{ij}^{(k + 1)} = \sum_{t=1}^n a_{it}^{(k)} \cdot a_{tj} $$

The first element of this sum is the number of walks of length $k$ from $v_i$ to $v_1$ (stored in $a_{i1}^{(k)}$) times the number of walks of length $1$ from $v_1$ to $v_j$ (stored in $a_{1j}$), the second element follows the same formula but using $v_2$ as the vertex used to “join” the walks of length $k$ and $1$

Integer Factorization

Sun, 14 Jun 2015 15:08:08 +0000

The fundamental theorem of arithmetic states that ever positive integer can be written uniquely as a product of primes

example:

$$ 65340 = 2^2 \cdot 3^3 \cdot 5 \cdot 11^2 $$

Trial division

Trial division is the simplest algorithm for factoring an integer, we assume that $s$ and $t$ are factors of a number $n$ such that $n = st$ and $s \leq t$ (note that $s$ and $t$ do not need to be prime numbers), when a divisor $s$ is found then $n / s$ is also a factor

vector<int> trial_division(int n) {
  for (int i = 2; i * i <= n; i += 1) {
    if (n % i == 0) {
      // `n` is a composite number
      return vector<int> {i, n / i};
    }
  }
  // n is a prime number
  return vector<int> {n};
}

Fermat factorization

Fermat’s observation was to write an integer as the difference of squares

$$ \begin{align} n &= x^2 - y^2 \label{fermat} \\ &= (x + y)(x - y) \end{align} $$

Assuming that $s$ and $t$ are odd factors of $n$ such that $n = st$ and $s \leq t$ we can find $x$ and $y$ such that

$$ \begin{align*} s &= x - y \\ t &= x + y \end{align*} $$

Adding both equations

$$ s + t = 2x \\ x = \frac{s + t}{2} $$

Also

$$ y = \frac{t - s}{2} $$

Since we assumed that $s$ and $t$ are odd numbers, their difference is an even number which is divisible by $2$ therefore $x$ and $y$ are integers, since $s > 1$ and $t \geq s$ we find that $x \geq 1$ and $y \geq 0$

From \eqref{fermat} we also know that $x = \sqrt{n + y^2}$ and hence $x \geq \sqrt{n}$, also $x = \tfrac{s + t}{2}$ and we know that the upper bound of $s$ happens when $s$ is as close as $t$ as possible, given that $s \leq t$, $x \leq \tfrac{t + t}{2} \leq n$

Implementation notes: since $s$ and $t$ are odd numbers, their product $n$ is also an odd number, therefore the implementation below works with odd values of $n$

/**
 * Factorization of an odd number `n` based on Fermat's
 * factorization algorithm
 *
 * @param  {int} n
 * @return {vector<int>} a vector with two odd integers if `n` is not a
 * prime number, a single integer if `n` is a prime number
 */
vector<int> fermat_factorization(int n) {
  for (int x = (int) ceil(sqrt(n)); x <= n; x += 1) {
    int ySquared = x * x - n;
    // check if `y` is the square of some number
    int y = (int) sqrt(ySquared);
    if (y * y == ySquared) {
      int s = x - y;
      int t = x + y;
      // `s` must be > 1
      if (s != 1 && t != n) {
        return vector<int> {s, t};
      }
    }
  }
  // n is a prime number
  return vector<int> {n};
}

Pollard’s Rho factorization

Pollard’s Rho factorization is a probabilistic factorization algorithm based on the assumption that a number $n$ is a composite number and the following facts:

since $n$ is a composite number there must be a factor $d$
let $a$, $b$ two positive integers, if $a \equiv b \pmod{d}$ then the difference $a - b$ is a multiple of $d$, since $n$ is also a multiple of $d$ some multiple of $d$ is a divisor of $n$ and $a - b$, particularly $gcd(a - b, n)$ is a divisor of $n$, let $gcd(a - b, n) > 1$ then we have found two factors of $n$ ($gcd(a - b, n), \tfrac{n}{gcd(a - b, n)}$)

Now the problem is reduced to find $a$ and $b$ such that $gcd(a - b, n) > 1$, we can use the following algorithm which picks random numbers in the range $[1, n - 1]$

let `n` be the number to be factorized
let `x` be an array of integers

x[0] = random integer in the range [1, n - 1]
while we haven't two numbers such that `gcd(x_i, x_j, n) > 1`
  x[i] = random integer in the range [1, n - 1]
  for all `j < i` and `j >= 0`
    if `gcd(x[i] - x[j], n) > 1`
      return x[i], x[j]

A simulation can find $a$ and $b$ with a probability $~50%$ after $\sqrt{n}$ iterations, the algorithm above is not very helpful though since at the $k$ iteration we have to do $k - 1$ pairwise checks

Here’s another algorithm to pick random numbers, let $x$ be an integer in the range $[1, n - 1]$, a function that will generate a number in the range $[1, n - 1]$ based on a previous number is

$$ f(x) = x^2 + c \pmod{n} $$

Because there are only $n - 1$ possible values our generator will eventually fall into a cycle, for example let $n = 55, c = 2, x = 2$

$$ \begin{align*} x_0 &= 2 \\ x_1 &= (2^2 + 2) \pmod{55} = 6 \\ x_2 &= (6^2 + 2) \pmod{55} = 38 \\ x_3 &= (38^2 + 2) \pmod{55} = 16 \\ x_4 &= (16^2 + 2) \pmod{55} = 38 \text{ which is equal to $x_2$ } \end{align*} $$

Pollard detected the cycle using Floyd’s cycle-finding algorithm which is based on two pointers which move through a sequence at different speeds, one moves a unit and the other moves two units each time, if there’s a cycle eventually the two pointers will encounter at some element belonging to the cycle, if we’ve analyzed all the elements of the sequence and saw not a single contiguous pair fulfills $gcd(x_i - x_{i + 1}, n) > 1$ we need to choose other values for $x_0, a$ and rerun the algorithm

// C++11
#include <random>

/**
 * Computes a factor of `n` which is greater than `1`
 * @param {long long} n The number to be factorized
 * @return {long long} A positive integer which is a factor of `n`
 * when the algorithm successfully finds a factor of `n`, -1 otherwise
 */
long long pollard_rho(long long n) {
  if (n % 2 == 0) {
    return 2;
  }

  std::random_device rd;
  std::mt19937 engine(rd());
  std::uniform_int_distribution<long long> dis(1, n - 1);

  long long x = dis(engine);
  long long c = dis(engine);
  long long y = x;   // y = x^2 + c (mod n)
  do {
    // tortoise goes x = f(x)
    x = ((x * x) % n + c) % n;

    // hare goes y = f(f(y))
    y = ((y * y) % n + c) % n;
    y = ((y * y) % n + c) % n;

    long long gcd = __gcd(abs(x - y), n);
    if (gcd > 1) {
      return gcd;
    }
  } while (x != y);

  return -1;
}

/**
 * Pollard rho factorization runner, it makes multiple calls to
 * `pollard_rho` until a factor is found
 * @param {long long} n The number to be factorized
 * @return {long long} A factor of `n` (it is `n` when `n` is a prime number)
 */
long long pollard_rho_factorization(long long n) {
  long long factor;
  do {
    factor = pollard_rho(n);
  } while(factor < 0);
  return factor;
}

Eratosthenes Sieve factorization of a range

We can also compute the factorization of a number by modifying the sieve of Erathostenes, remember that each state of the sieve hold a boolean telling if the number is prime or not, this time each state of the sieve will hold a pair of numbers

the lowest prime that is a divisor of any index i
the maximum power of the lowest prime computed above (optional)

Let’s represent $n$ as $p_1^{a_1} \cdot p_2^{a_2} \ldots p_n^{a_n}$, since we’re hold for each position the lowest prime and its the maximum power, the state stored at the position $n$ of the sieve will be $p_1^{a_1}$, if we divide $n$ by this number we will move to the state $p_2^{a_2} \ldots p_n^{a_n}$, this recursive process is run until the current state reached in the sieve is $1$

// let `p` be the smallest prime factor of the index `i`, each element
// contains a pair `(p, x)` such that `p^x` is a divisor of `i`
// e.g.
//
//    8 = (2, 3)
//    15 = (3, 1)
//    6 = (2, 1)
//
vector<pair<int, int> > lp;

void eratosthenes_sieve_factorization(long long n) {
  pair<int, int> unvisited(-1, -1);

  // (-1, -1) is an unvisited state
  lp.assign(n + 1, unvisited);

  for (int i = 2; i * i <= n; i += 1) {
    if (lp[i] == unvisited) {
      // if an index is in an unvisited state it's a prime number
      pair<int, int> base(i, 1);
      lp[i] = base;
      for (int j = i * i; j <= n; j += i) {
        if (lp[j] == unvisited) {
          // if a multiple of the prime number is in an unvisited
          // state that means that it's lowest prime divisor is
          // the current index `i`
          lp[j] = base;
          if (lp[j / i] != unvisited) {
            // `j` is a multiple of `i`, in fact `j = i^x` because
            // only numbers which don't have other factors but `i`
            // reach this point, to accumulate the powers it's enough
            // find out the power of `j / i`
            lp[j].second += lp[j / i].second;
          }
        }
      }
    }
  }

  // all the prime numbers > sqrt(n) will have an unvisited state
  // changing the unvisited state to prime_number^1
  int sqrtN = sqrt(n);
  if (sqrtN % 2 == 0) {
    sqrtN += 1;
  }
  for (int i = sqrtN; i <= n; i += 2) {
    if (lp[i] == unvisited) {
      lp[i] = pair<int, int>(i, 1);
    }
  }
}

Divisor Function

Sat, 13 Jun 2015 14:29:59 +0000

The divisor function represented as $d(n)$ counts the number of divisors of an integer

example: $d(18)$

The numbers that divide $18$ are $1, 2, 3, 6, 9, 18$ then $d(18) = 6$

Important observations

if $p$ is a prime number then $d(p) = 2$, also $d(p^k) = k + 1$ because every power of $p$ is a divisor of $p^k$, e.g. $p^0, p^1, p^2, \ldots, p^k$
if $n$ is a product of two distinct primes, say $n = pq$ then $d(pq) = d(p) \cdot d(q)$, also $d(p^iq^j) = d(p^i) \cdot d(q^j)$
in general let $n = p_1^{a_1} \cdot p_2^{a_2} \cdot \ldots \cdot p_n^{a_n}$ then $d(n) = d(p_1^{a_1}) \cdot d(p_2^{a_2}) \cdot \ldots \cdot d(p_n^{a_n})$ where $p_i$ is a prime factor that divides $n$

example: $d(18)$

$$ \begin{align*} d(18) &= d(3^2 \cdot 2) \\ &= d(3^2) \cdot (2) \\ &= 3 \cdot 2 \\ &= 6 \end{align*} $$

int number_of_divisors(int n) {
  int total = 1;
  for (int i = 2; i * i <= n; i += 1) {
    int power = 0;
    while (n % i == 0) {
      power += 1;
      n /= i;
    }
    total *= (power + 1);
  }
  if (n > 1){
    total *= 2;
  }
  return total;
}

Sum of divisors

The sum of divisors is another important quantity represented by $\sigma_k(n)$, it’s the sum of the $k$-th powers of the divisors of $n$

$$ \sigma_k(n) = \sum_{d|n} d^k $$

examples:

$$ \begin{align*} \sigma_0(18) &= 1^0 + 2^0 + 3^0 + 6^0 + 9^0 + 18^0 \\ &= 1 + 1 + 1 + 1 + 1 \\ &= 6 \end{align*} $$

So when $k = 0$ the sum of divisors ($\sigma_0{n}$) function is equal to $d(n)$, i.e. $\sigma_0(n)$ gives the number of divisors of $n$

another example:

$$ \begin{align*} \sigma_1(18) &= 1^1 + 2^1 + 3^1 + 6^1 + 9^1 + 18^1 \\ &= 1 + 2 + 3 + 6 + 9 + 18 \\ &= 39 \end{align*} $$

when $k = 1$ we actually get the function we expect (a function which sums the divisors)

Important observations

if $p$ is a prime number then $\sigma(p) = 1 + p$ since the only divisors of a prime number are $1$ and $p$
if $p$ is a prime number then $\sigma(p^k) = 1 + p + p^2 + \ldots + p^k$ because every power of $p$ is a divisor of $p^k$, e.g. $p^0, p^1, p^2, \ldots, p^k$

Consider

$$ \begin{equation}\label{sigma-p-k} \sigma(p^k) = 1 + p + p^2 + \ldots + p^k \end{equation} $$

multiplying the expression by $p$ we have

$$ \begin{equation}\label{sigma-p-k-times-p} p \cdot \sigma(p^k) = p + p^2 + p^3 + \ldots + p^{k + 1} \end{equation} $$

subtracting \eqref{sigma-p-k} from \eqref{sigma-p-k-times-p}

$$ p \cdot \sigma(p^k) - \sigma(p^k) = p^{k + 1} - 1 $$

factoring $\sigma(p^k)$

$$ \sigma(p^k) (p - 1) = p^{k + 1} - 1 $$

hence

$$ \sigma(p^k) = \frac{p^{k + 1} - 1}{p - 1} $$

if $p$ is a product of two distinct primes say $n = pq$ then $\sigma(pq) = \sigma(p) \cdot \sigma(q)$, also $\sigma(p^iq^j) = \sigma(p^i) \cdot \sigma(q^j)$
in general let $n = p_1^{a_1} \cdot p_2^{a_2} \cdot \ldots \cdot p_n^{a_n}$ then $\sigma(n) = \sigma(p_1^{a_1}) \cdot \sigma(p_2^{a_2}) \cdot \ldots \cdot \sigma(p_n^{a_n})$ where $p_i$ is a prime factor that divides $n$

example: $\sigma(18)$

$$ \begin{align*} \sigma(18) &= \sigma(3^2 \cdot 2) \\ &= \sigma(3^2) \cdot \sigma(2) \\ &= \frac{3^3 - 1}{3 - 1} \cdot \frac{2^2 - 1}{2 - 1} \\ &= 13 \cdot 3 \\ &= 39 \end{align*} $$

int sum_of_divisors(int n) {
  int total = 1;
  for (int i = 2; i * i <= n; i += 1) {
    if (n % i == 0) {
      int primePower = i;
      while (n % i == 0) {
        primePower *= i;
        n /= i;
      }
      // sigma(n^k) = (n^{k + 1} - 1) / (k - 1)
      total *= (primePower - 1) / (i  - 1);
    }
  }
  if (n > 1) {
    // if `n` is still a prime number after factorization
    // sigma(n) = 1 + n
    total *= (1 + n);
  }
  return total;
}

Primality Test

Thu, 11 Jun 2015 13:16:59 +0000

A prime number is a natural number greater than $1$ which has no positive divisors other than $1$ and itself

Naive test

Let $n$ be the number we want to check if is prime, if we find a natural number greater than $1$ that is a divisor of $n$ then $n$ is not a prime

if a number $n$ is divisible by $k$ then $k \leq \sqrt{n}$

Complexity: $O(\sqrt{n})$

bool is_prime(int n) {
  if (n == 2) {
    // 2 is a prime number
    return true;
  }
  if (n == 1 || (n % 2 == 0)) {
    // 1 or any multiple of 2 is not a prime number
    return false;
  }
  for (int i = 3; i * i <= n; i += 2) {
    // check for any odd number < sqrt(n) if they are multiples of `n`
    if (n % i == 0) {
      return false;
    }
  }
  return true;
}

Erathostenes Sieve

If we have to make constants queries to check for numbers that are prime less than some number $n$ we can preprocess them using the Erathostenes Sieve and answer each query in $O(1)$

Fermat primality test

Fermat’s little theorem

If $a$ is an integer, $p$ a prime number where $0 < a < p$ then $$ a^p \equiv a \pmod{p} $$

or alternatively

$$ a^{p-1} \equiv 1 \pmod{p} $$

Proofs of this theorem can be found here

Some examples

$$ 3^{5 - 1} \equiv 81 \equiv 1 \pmod{5} \\ 3^{11 - 1} \equiv 59049 \equiv 1 \pmod{11} $$

The converse of this theorem is not always true

If $$ a^{n - 1} \equiv 1 \pmod{n} $$ for some value of $0 < a < n$ then $n$ is prime

an example:

$$ 5^{561 - 1} \equiv 1 \pmod{561} \text{ but $561 = 3 \cdot 11 \cdot 17$ } $$

but:

$$ 3^{561 - 1} \equiv 375 \pmod{561} $$

we can’t use the theorem directly to test if a number is prime since there’s a chance that the input is one of these special numbers (called Carmichael numbers ) and the algorithm will give false positives e.g. $a = 5, p = 561$

what we can do is run the algorithm multiple times increasing the probability of finding a number $a$ such that $a^{p - 1} \not\equiv 1 \pmod{p}$ thus proving that $p$ is composite

// C++11
#include <random>

bool is_probably_prime(unsigned long long p, int iterations) {
  if (p == 2) {
    return true;
  }
  if (p % 2 == 0 || p == 1) {
    return false;
  }

  std::random_device rd;
  std::mt19937 engine(rd());
  std::uniform_int_distribution<long long> dis(2, p - 2);
  while (iterations--) {
    // choose an integer between 2 and n-2
    long long a = dis(engine);
    if (binary_exponentiation_modulo_m(a, p - 1, p) != 1) {
      return false;
    }
  }
  return true;
}

No matter how many iterations we use in the algorithm above there’s a chance that for each $a_1, a_2, \ldots, a_i$ Fermat’s little theorem holds true even though that the input is composite therefore this test is not used in practice

Euler primality test

Euler primality test is an improvement over the Fermat primality test because it adds another equality condition that a prime number must fulfill, assuming that $p$ is a prime number and $a$ is an integer where $0 < a < p$ then

If $a$ is an integer, $p$ a prime number where $0 < a < p$, $p > 2$ then $$ a^{\tfrac{p - 1}{2}} \equiv \pm 1 \pmod{p} $$

The motivation to this definition comes to the fact that any prime $> 2$ is an odd number, then the prime number can be expressed as $2q + 1$ where $q$ is an integer thus

$$ a^{(2q + 1) - 1} \equiv 1 \pmod{p} $$

which means that

$$ a^{2q} - 1 \equiv 0 \pmod{p} $$

this can be factored as

$$ (a^q - 1)(a^q + 1) \equiv 0 \pmod{p} $$

therefore $a^q$ is congruent to two possible values $1$ and $-1$. Going back to the definition of $q$, $2q + 1 = p$ we can find the value of $q$ as $q = \tfrac{(p - 1)}{2}$

Expressing Euler’s primality test formally:

If $a^{(n - 1) / 2} \not\equiv \pm 1 \pmod n$ where $gcd(a, n) = 1$ then $n$ must be a composite number for one of the following reasons:

if $a^{n - 1} \not\equiv 1 \pmod{n}$ then $n$ must be composite by Fermat’s Little Theorem
if $a^{n - 1} \equiv 1 \pmod{n}$ then $n$ must be composite because $a^{(n - 1) / 2}$ which is a square root of $a^{n - 1} \pmod{n}$ must fulfill the following equivalence $a^{(n - 1) / 2} \equiv \pm 1 \pmod n$ which is a condradiction to the statement above

This test also has some false positives e.g.

$$ 3^{(341 - 1)/2} \equiv 1 \pmod{341} \text{ but $341 = 11 * 31$ } $$

Miller-Rabin primality test

The Miller-Rabin primality test is quite similar to Euler’s primality test, but instead of looking at the square root of $a^{n - 1}$ it looks at the sequence of square roots/powers of two derived from $a^{n - 1}$

Let $2^s$ be the largest power of $2$ that divides $n - 1$, then $n - 1 = 2^s \cdot q$ for some odd integer $q$, the sequence of powers of two that divide $n - 1$ is

$$ 2^0, 2^1, \ldots, 2^i \quad \text{where $0 \leq i \leq s$} $$

We know from Euler’s primality test that if $a^{n - 1} \equiv 1 \pmod{n}$ then $a^{(n - 1) / 2} \equiv \pm 1 \pmod{n}$, let’s say that $a^{(n - 1) / 2} \equiv 1 \pmod{n}$ then also because of Euler’s primality test $a^{(n - 1) / 2^2} \equiv \pm 1 \pmod{n}$, what this says is that as long as we can take the square root of some $a^{(n - 1) / 2^i} \equiv 1 \pmod{n}$ the result must be $\pm 1$ otherwise it’s a composite number by Euler’s primality test

The base case occurs when we cannot take the square root of some $a^{\tfrac{n - 1}{2^i}} \pmod{n}$ i.e. when $\tfrac{n - 1}{2^i}$ is no longer divisible by $2$ which is exactly the number $q$, for this base case we’re sure of something, if $a^q \equiv \pm 1 \pmod{n}$ then it means that it’s the square root of $a^{2q} \equiv 1 \pmod{n}$ (obviously $2q \leq n - 1$ because $n - 1$ is even and must be divisible by at least $2$)

If $a^q \not\equiv \pm 1 \pmod{n}$ we have to analyze $a^2q \pmod{n}$ and there’re three possible outcomes:

$a^2q \equiv 1 \pmod{n}$ which by Euler’s primality test implies that $a^q \equiv \pm 1 \pmod{n}$ which contradicts the statement above, therefore $n$ is composite
$a^2q \equiv -1 \pmod{n}$ which by Euler’s primality test implies that it’s the square root of some $a^{2^iq}$ (where $0 < i < s-1$), which will eventually become $a^{n - 1} \equiv 1 \pmod{n}$ by successive squaring, therefore we can say that $n$ is a probable prime
$a^2q \not\equiv \pm 1 \pmod{n}$ which is the same as the statement above (therefore we have to keep analyzing the next element in the sequence)

// C++11
#include <random>

bool miller_rabin_primality_test(long long a, long long n) {
  int s = 0;
  long long q = n - 1;
  while (q % 2 == 0) {
    q /= 2;
    s += 1;
  }
  long long m = binary_exponentiation_modulo_m(a, q, n);
  if (m == 1 || m == n - 1) {
    // base case a^q ≡ 1 (mod n)
    return true;
  }
  for (int i = 0; i < s; i += 1) {
    // a^{2^iq} (mod n)
    m = (m * m) % n;
    if (m == n - 1) {
      return true;
    }
  }
  return false;
}

bool is_probably_prime(long long p, int iterations) {
  // NOTE: test of the primes 2 and 3 because of
  // the distribution limits (p, p - 2)
  if (p == 2 || p == 3) {
    return true;
  }
  if (p % 2 == 0 || p == 1) {
    return false;
  }
  std::random_device rd;
  std::mt19937 engine(rd());
  std::uniform_int_distribution<long long> dis(2, p - 2);
  while (iterations--) {
    // choose an integer between 2 and n-2
    long long a = dis(engine);
    if (!miller_rabin_primality_test(a, p)) {
      return false;
    }
  }
  return true;
}

Prime factors of a factorial

Tue, 09 Jun 2015 14:00:03 +0000

Given two numbers $n$ and $k$ find the greatest power of $k$ that divides $n!$

Writing the factorial expression explicitely

$$ n! = 1 \cdot 2 \cdot 3 \ldots (n - 1) \cdot n $$

We can see that every $k$-th member of the factorial is divisible by $k$ therefore one answer is $\left \lfloor \tfrac{n}{k} \right \rfloor$, however we can also see that every $k^2$-th term is also divisible by $k$ two times and it gives one more term to the answer, that is $\left \lfloor \tfrac{n}{k^2} \right \rfloor$, which means that every $k^i$-th term adds one factor to the answer, thus the answer is

$$ \left \lfloor \frac{n}{k} \right \rfloor + \left \lfloor \frac{n}{k^2} \right \rfloor + \ldots + \left \lfloor \frac{n}{k^i} \right \rfloor + \ldots $$

The sum is actually finite and the maximum value of $i$ can be found using logarithms, let $k^i > n$, applying logarithms we have $i \cdot log(k) > log(n)$ which is equal to $i > \tfrac{log(n)}{log(k)}$ which is the same as $i > log_k n$

The sum discovered by Adrien-Marie Legendre is called Legendre’s Formula , let $d_a(b)$ be the number of times $a$ divides $b$

$$ d_k(n!) = \sum_{i=1}^{log_k{n}} \left \lfloor \frac{n}{k^i} \right \rfloor $$

/**
 * Computes the maximum power of `k` that is a divisor of `n!`
 *
 * @param {int} n
 * @param {int} k
 * @return {int}
 */
int max_power_in_factorial(int n, int k) {
  int ans = 0;
  while (n) {
    n /= k;
    ans += n;
  }
  return ans;
}

Special factorial modulo p

Tue, 09 Jun 2015 14:00:03 +0000

Let $n!_{\%p}$ be a special factorial where $n!$ is divided by the maximum exponent of $p$ that divides $n!$

$$ n!_{\%p} = \frac{n!}{d_p(n!)} $$

Where $d_p(n!)$ is called Legendre’s Formula which is explained in detail in this article

Compute $n!_{\%p} \pmod{p}$ given that $p$ is a prime number

First let’s write this special factorial explicitly

$$ \begin{equation} \label{explicit} n!_{\%p} = \tfrac{1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot p \cdot (p + 1) \cdot \ldots \cdot (2p - 1) \cdot 2p \cdot (2p + 1) \cdot \ldots \cdot (kp - 1) \cdot kp \cdot (kp + 1) \cdot \ldots \cdot (n - 1) \cdot n}{p^{ \tfrac{n}{p} + \tfrac{n}{p^2} + ... }} \end{equation} $$

The number $kp$ is a number that is divisible by $p$, we also see that $k$ might be a composite number that could be divisible by $p$ again

Now let’s first divide the equation by $p^{ \tfrac{n}{p} }$ which is exactly the number of multiples of $p$

$$ n!_{\%p} = \tfrac{1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot 1 \cdot (p + 1) \cdot \ldots \cdot (2p - 1) \cdot 2 \cdot (2p + 1) \cdot \ldots \cdot (kp - 1) \cdot k \cdot (kp + 1) \cdot \ldots \cdot (n - 1) \cdot n}{p^{ \tfrac{n}{p^2} + ... }} $$

If we apply the modulo operation to each term except the multiples of $p$ we have

$$ n!_{\%p} = \tfrac{1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot 1 \cdot 1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot 2 \cdot 1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot p \cdot 1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot kp}{p^{ \tfrac{n}{p^2} + ... }} \cdot 1 \cdot 2 \cdot \ldots \cdot (n - 1) \cdot n $$

NOTE: we’re not applying the modulo operator to each multiple of $p$ because they don’t actually exist since there are no $p$ factors in the equation, they are reduced with posterior divisions by $p^{ \tfrac{n}{p^i} }$

NOTE: the number $kp$ described in \eqref{explicit} just denotes a multiple of $p$

We see that the expression $1 \cdot 2 \cdot \ldots \cdot (p - 1)$ is repeated many times in the equation above + a product of some additional terms which don’t form an entire sequence, let $c = 1 \cdot 2 \cdot \ldots \cdot (p - 1)$ then

$$ n!_{\%p} = \tfrac{1c \cdot 2c \cdot \ldots \cdot (p - 1)c \cdot pc \cdot (p + 1)c \cdot \ldots \cdot (kp - 1)c \cdot kpc}{p^{ \tfrac{n}{p^2} + ... }} \cdot 1 \cdot 2 \cdot \ldots \cdot (n - 1) \cdot n $$

Since each $c$ factor occurs in every contiguous sequence of length $p$ there are exactly $\left \lfloor \tfrac{n}{p} \right \rfloor$ $c$ factors, factoring $c$ we have

$$ n!_{\%p} = c^{\left \lfloor \tfrac{n}{p} \right \rfloor} \cdot \tfrac{1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot p \cdot (p + 1) \cdot \ldots \cdot (2p - 1) \cdot 2p \cdot (2p + 1) \cdot \ldots \cdot (kp - 1) \cdot kp}{p^{ \tfrac{n}{p^2} + ... }} \cdot 1 \cdot 2 \cdot \ldots \cdot (n - 1) \cdot n $$

Note that the term multiplying $c^{\left \lfloor \tfrac{n}{p} \right \rfloor}$ is the same as \eqref{explicit}, we now have to divide it by $p^{ \tfrac{n}{p^2} }$ which is exactly the number of multiples of $p^2$ (NOTE: $kp$ is a multiple of $p$ but might/might not be a multiple of $p^2$)

This observation leads to a recursive implementation

Complexity: $O(p , log_p{n})$

long long special_factorial_mod_p(long long n, long long p) {
  long long res = 1;

  // computation of c
  long long c = p-1;

  while (n > 1) {
    res = (res * binary_exponentiation_modulo_m(c, n / p, p)) % p;
    for (long long i = 2; i <= n % p; i += 1) {
      res = (res * (long long)i) % p;
    }
    n /= p;
  }
  return res % p;
}

Applications

Finding the value of $nCr % p$

We can quickly calculate the value of $nCr % p$, we can compute the maximum exponents of $p$ in $n!$, $(n - r)!$ and $r!$, let those numbers be $p^a$, $p^b$ and $p^c$ then $nCr$ can be expressed as

$$ nCr = \binom{p^a \cdot \ldots}{p^b \cdot p^c \ldots} $$

Which means that $nCr$ will be a multiple of $p$ when $a - b - c > 0$, if $a - b - c = 0$ then the number is equal to

$$ nCr = \frac{n!_{\%p}}{(n - r)!_{\%p} \cdot r!_{\%p}} $$

NOTE: $a - b - c$ can never be less than zero, that would imply that $nCr$ is not an integer

The denominator can be found using the modular multiplicative inverse of $(n - r)!_{\%p}$ and $r!_{\%p}$

long long nCr_mod_p(int n, int r, int p) {
  int a = max_power_in_factorial(n, p);
  int b = max_power_in_factorial(n - r, p);
  int c = max_power_in_factorial(r, p);
  if (a > b + c) {
    return 0;
  }

  return (special_factorial_mod_p(n, p) *
    ((modular_multiplicative_inverse(special_factorial_mod_p(n - r, p), p) *
    modular_multiplicative_inverse(special_factorial_mod_p(r, p), p)) % p) % p);
}

Problems to solve

Codechef - CB01

Discrete Logarithm

Mon, 08 Jun 2015 12:11:38 +0000

Let $a$, $b$ and $x$ be positive real numbers such that

$$ a^x = b $$

And we want to find the value of $x$, applying logarithms

$$ x \cdot log(a) = log(b) $$

Finally

$$ x = \frac{log(b)}{log(a)} $$

The discrete logarithm problem is an analogue of this problem with the condition that all the numbers exist in the ring of integers modulo $n$

Let $a$, $b$ and $n$ be integers, where $a$ and $n$ are coprime, find the value of $x$ in

$$ a^x \equiv b \pmod{n} $$

Trial multiplication

The brute force algorithm consists in computing all possible $a^i \pmod{n}$, where $i \geq 0 < n$ until one matches $b$

Example: given $n = 11$, $a = 2$, $b = 9$ find the value of $x$ in $a^x \equiv b \pmod{n}$

$$ \begin{align*} a^0 &\equiv 1 \pmod{11} \\ a^1 &\equiv 2 \pmod{11} \\ a^2 &\equiv 4 \pmod{11} \\ a^3 &\equiv 8 \pmod{11} \\ a^4 &\equiv 16 \equiv 5 \pmod{11} \\ a^5 &\equiv 32 \equiv 10 \pmod{11} \\ a^6 &\equiv 64 \equiv 9 \pmod{11} \end{align*} $$

$x = 6$ is a solution to the problem

Baby Step Giant Step

The idea of Shank’s baby step giant step algorithm is based on rewriting $x$ in the congruence above as $x = im + j$ where $m = \sqrt{n}$, $0 \leq i < m$ and $0 \leq j < m$ so

$$ a^{im + j} \equiv b \pmod{n} $$

multiplying both sides by $a^{-im}$ (note that this is possible because $a$ and $n$ are coprime)

$$ a^j \equiv b(a^{-m})^i \pmod{n} $$

If we find $i$ and $j$ so that this holds then we have found an exponent $x$

Note: $a^{-m}$ can be computed using the modular multiplicative inverse of $a$, then computing the $m$-th power of the inverse $\pmod{n}$

/**
 * Let `a`, `b` and `n` be **integers**, where `a` and `n` are coprime,
 * the following is an implementation of Shank's baby step giant step
 * algorithm which attempts to find a solution for the congruence
 *
 *    a^x ≡ b (mod n)
 *
 * `x` can be represented as `im + j` then
 *
 *   a^j ≡ b(a^{-m})^i (mod n)
 *
 * NOTE: `binary_exponentiation_modulo_m` is a function which computes
 *
 *    a^x (mod n)
 *
 * @param {int} a
 * @param {int} b
 * @param {int} n
 * @returns {int} An integer >= 0 which is the value of `x`, -1
 * if no value was found
 */
int baby_step_giant_step(int a, int b, int n) {
  int m = ceil(sqrt(n));

  // values in the left side
  map<int, int> M;

  // store all possible a^j
  int aj = 1;
  for (int j = 0; j < m; j += 1) {
    if (!M.count(aj)) {
      M[aj] = j;
    }
    aj = (aj * a) % n;
  }

  // compute b(a^{-m})^i
  // first compute the modular multiplicative inverse of a
  int inverse;
  if (!modular_multiplicative_inverse(a, n, inverse)) {
    return -1;
  }
  int coef = binary_exponentiation_modulo_m(inverse, m, n);

  // NOTE: the modular multiplicative inverse can also be computed
  // using Euler's theorem only if `n` is prime
  // - first compute a^-1 with the identity a^-1 ≡ a^{n - 2} (mod n)
  // - compute inverse^m % n
  //
  // int coef = binary_exponentiation_modulo_m(a, n - 2, n);
  // coef = binary_exponentiation_modulo_m(coef, m, n);

  int gamma = b;
  for (int i = 0; i < m; i += 1) {
    if (M.count(gamma)) {
      return i * m + M[gamma];
    }
    gamma = (gamma * coef) % n;
  }
  return -1;
}

Chinese Remainder Theorem

Fri, 05 Jun 2015 12:00:00 +0000

Let $p_1, p_2, \ldots, p_n$ be distinct numbers relatively prime, for any integers $a_1, a_2, \ldots, a_n$ there’s an integer $x$ such that

$$ \begin{align*} x &\equiv a_1 \pmod{p_1} \\ x &\equiv a_2 \pmod{p_2} \\ & \; \vdots \\ x &\equiv a_n \pmod{p_n} \\ \end{align*} $$

All the solutions of this system are congruent modulo $p_1p_2 \ldots p_n$

nrich’s article on the chinese remainder illustrates the system of equations with a coordinate system in $n$-dimensions, basically a number can represent a point in the coordinate system defined by the equation system and the point itself is a sum of unit vectors scaled by some amount

Example: Represent the number $17$ in the coordinate system defined by the integers that belong to the set of integers $\mathbb{Z}/5$, $\mathbb{Z}/7$ and $\mathbb{Z}/11$ ($\mathbb{Z}/n$ has $n$ elements which are all the number in the range $0, 1, \ldots, n - 1$)

The statement above is equivalent to

$$ \begin{align*} 17 &\equiv x \equiv 2 \pmod{5} \\ 17 &\equiv x \equiv 3 \pmod{7} \\ 17 &\equiv x \equiv 6 \pmod{11} \end{align*} $$

We can see that $17$ is represented by the point $(2, 3, 6)$

What we want to do is the opposite, that is find the number whose representation in the coordinate system defined by the integers that belong to the set of integers $\mathbb{Z}/p_1, \mathbb{Z}/p_2, \ldots, \mathbb{Z}/p_n$ results in the point $(a_1, a_2 \ldots, a_n)$

What we can do is express these conditions as a sum of scaled unit vectors that belong to each of axis of the coordinate systems, this means that a point $(a_1, a_2 \ldots, a_n)$ can be represented as

$$ a_1(1, 0, 0, \ldots, 0) + a_2(0, 1, 0, 0, \ldots, 0) + \ldots + a_n(0, 0, \ldots, 0, 1) = (a_1, a_2, \ldots, a_n) $$

If we represent each point as $x_i$

$$ \begin{equation}\label{chinese-remainder-as-points} a_1x_1 + a_2x_2 + \ldots + a_nx_n = (a_1, a_2, \ldots, a_n) \end{equation} $$

Let’s take the first term of the sum, $x_1$ is a number which must fulfill the following equivalences for each axis of the coordinate system

$$ \begin{align*} x_1 &\equiv 1 \pmod{p_1} \\ x_1 &\equiv 0 \pmod{p_2} \\ & \vdots \\ x_1 &\equiv 0 \pmod{p_n} \\ \end{align*} $$

From the system of equations above we can see that $x_1 \mid p_2p_3 \ldots p_n$ which means that $x_1$ is some multiple of the multiplication i.e. $x_1’ = p_2p_3 \ldots p_n \cdot x_1$

$$ p_2p_3 \ldots p_n \cdot x_1 \equiv 1 \pmod{p_1} $$

Given the fact that $p_2p_3 \ldots p_n$ is relatively prime to $p_1$ the product has a modular multiplicative inverse which can be found using the extended euclidean algorithm, in fact we have to solve $n$ of this equations each having the form

$$ \frac{p_1p_2 \ldots p_n}{p_i} \cdot x_i \equiv 1 \pmod{p_i} $$

Finally we have to plug these values into the equation \eqref{chinese-remainder-as-points}

/**
 * Computes the value of `x` for the linear congruent system of equations
 *
 *    x ≡ a_1 (mod p_1)
 *    x ≡ a_2 (mod p_2)
 *      |
 *    x ≡ a_n (mod p_n)
 *
 * All the solutions are given by the expression `x + k · p_1p_2...p_n`
 * where `k` is an integer
 *
 * @param {vector<int>} a
 * @param {vector<int>} p
 * @return {int} A solution for the system if it exists
 */
int chinese_remainder(vector<int> &a, vector<int> &p) {
  int x = 0;
  int product = 1;
  for (int i = 0; i < p.size(); i += 1) {
    product *= p[i];
  }
  for (int i = 0; i < a.size(); i += 1) {
    int k = product / p[i];
    x += a[i] * modular_multiplicative_inverse(k, p[i]) * k;
    x %= product;
  }
  return x;
}

Modular Arithmetic

Thu, 04 Jun 2015 16:29:18 +0000

Congruence relation

For a positive integer $n$ two integers $a$ and $b$ are said to be congruent modulo $n$ if the remainders of $a / n$ and $b / n$ are the same, that is written as

$$ \begin{equation}\label{congruent-modulo} a \equiv b \pmod n \end{equation} $$

it can also be proven that $n \mid a - b$, let $a = xn + s$ and $b = yn + t$ where $x, y, s, t$ are integers, if the remainders of $a/n$ and $b/n$ are the same then $t = s$

$$ \begin{align*} s &= a - xn \\ s &= b - yn \end{align*} $$

Which means that

$$ a - xn = b - yn $$

Reordering the equation

$$ \begin{equation}\label{congruent-relation-proof} a - b = n(x - y) \end{equation} $$

Since $x$ and $y$ are integers then $x - y$ is also an integer which means that $a - b$ is a multiple of $n$ thus $n \mid a - b$

Properties

Reflexive: $a \equiv a \pmod n$ since $a - a = 0$ is a multiple of any $n$
Symetric: $a \equiv b \pmod n \Rightarrow b \equiv a \pmod n$ (the same as multiplying \eqref{congruent-relation-proof} by $-1$)
Transitive: if $a \equiv b \pmod n$ and $b \equiv c \pmod n$ then $a \equiv c \pmod n$

Rules

Let $a, b, c, d$ are integers and $n$ is a positive integer such that

$$ \begin{align*} a &\equiv b \pmod n \\ c &\equiv d \pmod n \end{align*} $$

The following rules apply

Addition/subtraction rule

$$ a \pm c \equiv b \pm d \pmod n $$

proof: let $a - c = nk$ and $b - d = nl$, adding both equations $(a + b) - (c + d) = n(k + l)$ which is the same as $a + b \equiv c + d \pmod n$

Multiplication rule

$$ ac \equiv bd \pmod n $$

proof: let

$$ a = nk + b \\ c = nl + d $$

multiplying both equations

$$ \begin{align*} ac &= (nk + b)(nl + d) \\ ac &= n^2kl + nk \cdot d + nl \cdot b + bd \\ ac - bd &= n(nkl + kd + bl) \\ \end{align*} $$

Exponentiation rule

Since $a^k$ is just repeated multiplication then

$$ a^k \equiv b^k \pmod n $$

Where $k$ is a positive integer

Implementation based on Binary Exponentiation

/**
 *
 * Computes
 *
 *    a^k % m
 *
 * Given the fact that a^k can be computed in O(log k) using
 * binary exponentiation
 *
 * @param {int} a
 * @param {int} k
 * @param {int} m
 * @return {int}
 */
int binary_exponentiation_modulo_m(int a, int k, int m) {
  if (k == 0) {
    // a^0 = 1
    return 1;
  }

  if (k % 2 == 1) {
    return (binary_exponentiation_modulo_m(a, k - 1, m) * a) % m;
  } else {
    int t = binary_exponentiation_modulo_m(a, k / 2, m);
    return (t * t) % m;
  }
}

Modular multiplicative inverse

Extended Euclidean Algorithm

The multiplicative inverse of a number $a$ is a number which multiplied by $a$ yields the multiplicative identity, for modular arithmetic the modular multiplicative inverse is also defined, the modular multiplicative inverse of a number $a$ modulo $m$ is an integer $x$ such that

$$ \begin{equation}\label{modular-multiplicative-inverse} a \; x \equiv 1 \pmod m \end{equation} $$

Such a number exists only if $a$ and $m$ are coprime, e.g. $gcd(a, m) = 1$

The number $x$ can be found using the Extended Euclidean Algorithm , by the definition of the congruence relation $m \mid ax - 1$

$$ ax - 1 = mq $$

Rearranging

$$ ax - mq = 1 $$

This is the exact form of the equation that the Extended Euclidean Algorithm solves where $gcd(a, m) = 1$ is already predetermined instead of discovered using the algorithm

/**
 * Computes the modular mutiplicative inverse of the number `a` in the ring
 * of integers modulo `m`
 *
 *    ax ≡ 1 (mod m)
 *
 * `x` only exists if `a` and `m` are coprimes
 *
 * @param {int} a
 * @param {int} m
 * @param {int} x
 * @returns {bool} True if the number `a` has a modular multiplicative
 * inverse, false otherwise
 */
bool modular_multiplicative_inverse(int a, int m, int &x) {
  // the value multiplying `y` is never used
  int y;
  int gcd = extended_euclidean(a, m, x, y);
  if (gcd != 1) {
    // `a` and `m` are not coprime
    return false;
  }
  // ensure that the value of `x` is positive
  x = (x % m + m) % m;
  return true;
}

/**
 * Same as above but throws an error if the `a` and `m` are not coprimes
 *
 * @param {int} a
 * @param {int} m
 * @returns {int} The modular multiplicative inverse of a
 */
int modular_multiplicative_inverse(int a, int m) {
  // the value multiplying `y` is never used
  int x, y;
  int gcd = extended_euclidean(a, m, x, y);
  if (gcd != 1) {
    // `a` and `m` are not coprime
    throw std::invalid_argument("a and m are not relative primes");
  }
  // ensure that the value of `x` is positive
  x = (x % m + m) % m;
  return x;
}

Euler’s Theorem

The modular multiplicative inverse can be also found using Euler’s theorem, if $a$ is relatively prime to $n$ then

$$ a^{\phi(m)} \equiv 1 \pmod m $$

Where $\phi(n)$ is Euler’s Phi Function

In the special case where $m$ is a prime number

$$ a^{-1} \equiv a^{m - 2} \pmod m $$

/**
 * Computes the modular multiplicative inverse of `a` in the ring
 * of integers modulo `m` using Euler's theorem,
 * it assumes that `m` is a prime number and that is relatively prime to `a`
 *
 *    a^{-1} ≡ a^{m - 2} (mod m)
 *
 * @param {int} a
 * @param {int} m
 * @returns {int} The modular multiplicative inverse of a
 */
int modular_multiplicative_inverse(int a, int m) {
  return binary_exponentiation_modulo_m(a, m - 2, m);
}

Extended Euclidean Algorithm

Tue, 02 Jun 2015 12:00:00 +0000

Bezout’s identity

For non-zero integers $a$ and $b$, let $d$ be the greatest common divisor $d = gcd(a, b)$. Then there exists integers $x$ and $y$ such that

$$ \begin{equation} \label{bezout} ax + by = d \end{equation} $$

If $a$ and $b$ are relatively prime then $gcd(a, b) = 1$ and by Bezout’s Identity there are integers $x$ and $y$ such that

$$ ax + by = 1 $$

Example: $3x + 8y = 1$, one solution is $x = 3$ and $y = -1$

Extended Euclidean Algorithm

See divisibility for more details.

Implementation

/**
 * Computes the values `x` and `y` for the equation
 *
 *    ax + by = gcd(a, b)
 *
 * Given that `a` and `b` are positive integers
 *
 * @param {int} a
 * @param {int} b
 * @param {int} x
 * @param {int} y
 * @returns {int} gcd(a, b)
 */
int extended_euclidean(int a, int b, int &x, int &y) {
  if (b == 0) {
    x = 1;
    y = 0;
    return a;
  }
  int x1, y1;
  int gcd = extended_euclidean(b, a % b, x1, y1);
  x = y1;
  y = x1 - a / b * y1;
  return gcd;
}

/**
 * Alternative version using a vector of ints
 * Computes the values x and y for the equation
 *
 *    ax + by = gcd(a, b)
 *
 * @returns {vector<int>} A triplet with the values (gcd(a, b), x, y)
 */
vector<int> extended_euclidean(int a, int b) {
  if (b == 0) {
    // base case:
    // b divides a so a(1) + b(0) = a
    return vector<int> {a, 1, 0};
  }
  vector<int> t = extended_euclidean(b, a % b);
  int gcd = t[0];
  int x1 = t[1];
  int y1 = t[2];
  return vector<int> {gcd, y1, x1 - a / b * y1};
}

Applications

Diophantine equations

Equations with integer variables and coefficients are called Diophantine equations, the simplest non-trivial linear equation has the form

$$ \begin{equation}\label{linear-diophantine-equation} ax + by = c \end{equation} $$

Where $a, b, c$ are given integers and $x, y$ are unknown integers

Using the extended Euclidean algorithm it’s possible to find $x$ and $y$ given that $c$ is divisible by $gcd(a, b)$ otherwise the equation has no solutions, this follows the fact that a linear combination of two numbers continue to be divided by their common divisor, starting with \eqref{bezout}

$$ ax_g + by_g = gcd(a, b) $$

multiplying it by $\tfrac{c}{gcd(a, b)}$

$$ \begin{equation}\label{diophantine-equation-gcd} a \cdot x_g \cdot \Big( \frac{c}{gcd(a, b)} \Big) + b \cdot y_g \cdot \Big( \frac{c}{gcd(a, b)} \Big) = c \end{equation} $$

then one of the solutions is given by

$$ ax_0 + by_0 = c $$

where

$$ \begin{cases} x_0 = x_g \cdot \big( \frac{c}{gcd(a, b)} \big) \\ y_0 = y_g \cdot \big( \frac{c}{gcd(a, b)} \big) \end{cases} $$

we can find all of the solutions replacing $x_0$ by $x_0 + \tfrac{b}{gcd(a, b)}$ and $y_0$ by $y_0 - \tfrac{a}{gcd(a, b)}$

$$ a \cdot \Big( x_0 + \tfrac{b}{gcd(a, b)} \Big) + b \cdot \Big( y_0 - \tfrac{a}{gcd(a, b)} \Big) = ax_0 + \tfrac{ab}{gcd(a, b)} + by_0 - \tfrac{ab}{gcd(a, b)} = ax_0 + by_0 = c $$

This process could be repeated for any number in the form

$$ \begin{cases} x = x_0 + k \cdot \big( \frac{b}{gcd(a, b)} \big) \\ y = y_0 - k \cdot \big( \frac{a}{gcd(a, b)} \big) \end{cases} $$

Where $k \in \mathbb{Z}$

/**
 * Computes the integer values `x` and `y` for the equation
 *
 *    ax + by = c
 *
 * if `c` is not divisible by `gcd(a, b)` then there isn't a valid solution,
 * otherwise there's an infinite number of solutions, (`x`, `y`) form one pair
 * of the set of possible solutions
 *
 * @param {int} a
 * @param {int} b
 * @param {int} c
 * @param {int} x
 * @param {int} y
 * @returns {bool} True if the equation has solutions, false otherwise
 */
bool linear_diophantine_solution(int a, int b, int c, int &x, int &y) {
  int gcd = extended_euclidean(abs(a), abs(b), x, y);
  if (c % gcd != 0) {
    // no solutions since c is not divisible by gcd(a, b)
    return false;
  }
  x *= c / gcd;
  y *= c / gcd;
  if (a < 0) { x *= -1; }
  if (b < 0) { y *= -1; }
  return true;
}

Modular multiplicative inverse

See Modular Arithmetic for more info.

Linear congruence equations

A linear congruence is a congruence $\pmod p$ of the form

$$ ax \equiv b \pmod m $$

By the definition of the congruence relation $m \mid ax - b$

$$ ax - b = my $$

Reordering the equation

$$ ax - my = b $$

Which is a linear diophantine equation discussed above, it’s solvable only if $b$ is divisible by $gcd(a, m)$, additionally $gcd(a, m)$ tells us the number of distinct solutions in the ring of integers modulo $m$

https://brilliant.org/wiki/bezouts-identity/?subtopic=integers&chapter=greatest-common-divisor-lowest-common-divisor#proof http://www.ugrad.cs.ubc.ca/~cs490/Spring05/notes/nt1.pdf

Binary Exponentiation

Mon, 01 Jun 2015 12:00:00 +0000

Algorithm description

Finding $a^n$ involves doing $n$ multiplications of $a$, the same operation can be done in $O(log(n))$ multiplications

For any number $a$ raised to an even power:

$$ a^n = (a^{n/2})^2 = a^{n/2} \cdot a^{n/2} $$

For any number $a$ rasied to an odd power:

$$ a^n = a^{n - 1} \cdot a $$

Implementation

Time complexity: $O(log(n))$

/**
 * Computes
 *
 *    a^k
 *
 * Given the following facts:
 *
 * - if `k` is even then a^(2k) = (a^k)^2
 * - if `k` is odd then a^(2k + 1) = (a^k)^2 * a
 */
int logarithmic_exponentiation(int a, int k) {
  if (k == 0) {
    // a^0 = 1
    return 1;
  }
  if (k % 2 == 1) {
    return binary_exponentiation(a, k - 1) * a;
  } else {
    int t = binary_exponentiation(a, k / 2);
    return t * t;
  }
}

// iterative implementation
int binary_exponentiation(int a, int k) {
  int x = 1;
  while (k) {
    // analyze the i-th bit of the binary representation of k
    if (k & 1) {
      x *= a;
    }
    a *= a;
    k >>= 1;
  }
  return x;
}

Erathostenes Sieve

Mon, 01 Jun 2015 12:00:00 +0000

Definition

An algorithm to find prime numbers up to a number $n$

Algorithm description

Using a boolean vector of size $n$ iteratively mark all the multiples of nonvisited positions as not primes

Implementation

Time complexity: $O(tn)$, $t$ is the number of primes between $1$ and $t$

vector<bool> sieve;

void eratothenes_sieve(int n) {
  // initialize the list
  sieve.resize(n + 1, false);

  // multiples of 2 are not primes
  for (int i = 4; i <= n; i += 2) {
    sieve[j] = true;
  }

  // multiples of odd numbers
  for (int i = 3; i * i <= n; i += 2) {
    if (!sieve[i]) {
      for (int j = i * i; j <= n; j += 2 * i) {
        sieve[j] = true;
      }
    }
  }
}

void is_prime(int n) {
  assert(n < sieve.size());
  return sieve[n];
}

Euclidean Algorithm

Mon, 01 Jun 2015 12:00:00 +0000

Euclid’s algorithm for finding the Greatest Common Divisor of two or more integers is based on the following observations:

if $x = y$ then

$$ gcd(x, y) = gcd(x, x) = x $$

if $x > y$ then

$$ gcd(x, y) = gcd(x - y, y) $$

proof: suppose that $d$ is a divisor of $x$ and $y$ then $x$ and $y$ can be expressed as

$$ \begin{align*} x &= q_1d \\ y &= q_2d \end{align*} $$

But then

$$ x - y = q_1d - q_2d = d(q_1 - q_2) $$

Therefore $d$ is a divisor of $x - y$

int gcd(int x, int y) {
  while (x != y) {
    if (x > y) {
      x -= y;
    } else {
      y -= x;
    }
  }
  return x;
}

Using the remainder operator instead of multiple subtraction operations is an improvement in performance however eventually one of $x$ or $y$ will become zero

$$ gcd(x, 0) = gcd(0, x) = x $$

int gcd(int x, int y) {
  while (x != 0 && y != 0) {
    if (x > y) {
      x %= y;
    } else {
      y %= x;
    }
  }
  return max(x, y);
}

By ensuring that $x \geq y$ we can get rid of the if statement inside the while loop

int gcd(int x, int y) {
  if (x < y) {
    swap(x, y);
  }
  while (y != 0) {
    int remainder = x % y;
    x = y;
    y = remainder;
  }
  // at this point `gcd(x, y) = gcd(x, 0) = x`
  return x;
}

However if $x < y$ the first iteration of the loop will actually swap the operands, e.g. when $x = 3, y = 5$, $remainder = 3 % 5 = 3$, $x_{new} = 5$, $y_{new} = 3$ therefore it’s not necessary to make the initial swap

int gcd(int x, int y) {
  while (y != 0) {
    int remainder = x % y;
    x = y;
    y = remainder;
  }
  // at this point `gcd(x, y) = gcd(x, 0) = x`
  return x;
}

Example: finding the GCD of $102$ and $38$

$$ \begin{align*} 102 &= 2 \cdot 38 + 26 \\ 38 &= 1 \cdot 26 + 12 \\ 26 &= 2 \cdot 12 + 2 \\ 12 &= 6 \cdot 2 + 0 \end{align*} $$

The last non-zero remainder is $2$ thus the GCD is 2

Implementation

Recursive version

int gcd(int x, int y) {
  if (y == 0) {
    return x;
  }
  return gcd(y, x % y);
}

explanation

Euler's phi function

Mon, 01 Jun 2015 00:00:00 +0000

Examples

$$ \begin{align*} \phi(1) &= 1 \quad (1) \\ \phi(2) &= 1 \quad (1) \\ \phi(3) &= 2 \quad (1, 2) \\ \phi(4) &= 2 \quad (1, 3) \\ \phi(5) &= 4 \quad (1, 2, 3, 4) \\ \phi(6) &= 2 \quad (1, 5) \end{align*} $$

Properties

The following three properties will allow us to calculate it for any number:

if $p$ is a prime then $\phi(p) = p - 1$

Proof: obviously since $p$ is a prime the only divisors that it has are $1$ and $p$ but $gcd(1, p) = 1$ so $1$ falls under the definition of the euler function, therefore the only divisor valid for the euler function for the case above is $p$

if $p$ is a prime and $k \geq 1$ a positive integer then $\phi(p^k) = p^k - p^{k-1}$

Proof: Since the multiples of $p$ that are less than or equal to $p^k$ are: $p, 2p, 3p, …, p^{k-1}p \leq p^k$ we can see that in total there are $p^{k-1}$ numbers therefore the other $p^k - p^{k-1}$ are relative coprime to $p^k$

Example:

$\phi(2^4)$

multiples of $2$ less than $2^4$ are $1 * 2, 2 * 2, 3 * 2, 4 * 2, 5 * 2, 6 * 2, 7 * 2, 8 * 2$ which are in total $2^3$ elements, therefore the other $2^4 - 2^3$ are relative prime to $2^4$

if $a$ and $b$ are relatively prime then $\phi(ab) = \phi(a)\phi(b)$

Computation

Given a number $n$ let’s decompose it into prime factors (factorization):

$$ n = p_1^{a_1} \cdot p_2^{a_2} \cdot ... \cdot p_k^{a_k} $$

Applying the euler function we get:

$$ \begin{align*} \phi(n) &= \phi(p_1^{a_1}) \cdot \phi(p_2^{a_2}) \cdot ... \cdot \phi(p_k^{a_k}) \\ &= (p_1^{a_1} - p_1^{a_1 - 1}) \cdot (p_2^{a_2} - p_2^{a_2 - 1}) \cdot ... \cdot (p_k^{a_k} - p_k^{a_k - 1}) \\ &= (p_1^{a_1} - \frac{p_1^{a_1}}{p_1}) \cdot (p_2^{a_2} - \frac{p_2^{a_2}}{p_2}) \cdot ... \cdot (p_k^{a_k} - \frac{p_k^{a_k}}{p_k}) \\ &= p_1^{a_1} (1 - \frac{1}{p_1}) \cdot p_2^{a_2} (1 - \frac{1}{p_2}) \cdot ... \cdot p_k^{a_k} (1 - \frac{1}{p_k}) \\ &= p_1^{a_1} \cdot p_2^{a_2} \cdot ... \cdot p_k^{a_k} \cdot (1 - \frac{1}{p_1}) \cdot (1 - \frac{1}{p_2}) \cdot ... \cdot (1 - \frac{1}{p_k}) \\ &= n \cdot (1 - \frac{1}{p_1}) \cdot (1 - \frac{1}{p_2}) \cdot ... \cdot (1 - \frac{1}{p_k}) \\ &= n \prod_{p|n}(1 - \frac{1}{p}) \end{align*} $$

Implementation

Time complexity: $O(\sqrt{n})$ Space: $O(1)$

int phi(int n) {
  int result = n;
  for (int i = 2; i * i <= n; i += 1) {
    // if `i` is a divisor of `n`
    if (n % i == 0) {
      // divide it by `i^k` so that it's no longer divisible by `i`
      while (n % i == 0) {
        n /= i;
      }
      // all the multiples of `i` are coprime to n, the number of
      // multiples is equal to `i * k` <= n, therefore `k <= n / i`
      result -= result / i;
    }
  }
  if (n > 1) {
    result -= result / n;
  }
  return result;
}

Problems

10179 - Irreducable Basic Fractions 10299 - Relatives 11327 - Enumerating Rational Numbers

Derivative

Thu, 02 Apr 2015 10:00:00 +0000

Physical interpretation of the derivative

The primary concept of the calculus deals with the rate of change of one variable with respect to another

Instantaneous speed

Let’s imagine a person who travels 90km in 3 hours, his average speed (rate of change of distance with respect to time) is 30km/h, of course he doesn’t need to travel at that fixed speed, he may slow down/speed up at different times during the time he traveled, for many purposes it suffices to know the average speed.

However in many daily happenings the average speed is not a significant quantity, if a person traveling in an automobile strikes a tree the quantity that matters is the speed at the instant of collision (this quantity might determine if he survives or not)

concept	description
interval	happens over a period of time
instant	happens so fast that no time elapses

Calculating the average speed is simple, by definition it’s the rate of change of distance with respect to time

$$ \text{average speed} = \frac{\text{distance traveled}}{\text{interval of time}} $$

The same computation process can’t be applied to get the instantaneous speed at some point in time since instantaneous means that the event happened in an infinitesimal or very short space of time, then distance and the time might be both zero hence using the average speed definition won’t help because $\frac{0}{0}$ is meaningless, we know that this is a physical reality but if we can’t calculate it it’s impossible to work with it mathematically.

We can’t compute it with the knowledge we have right now but we can surely approximate it, let’s say that there’s a ball dropped near the surface of the earth and we want to know its instantaneous speed after 4 seconds, to calculate the instantaneous speed at any point in time we need to know the distance it travels after some period of time, this relation could be expressed as a formula which relates distance and time traveled, the formula that relates the distance (in feet) with the time elapsed is

$$ f(t) = s = 16t^2 $$

We can calculate the distance the ball traveled after 4 seconds by replacing $t$ with 4

$$ \begin{align*} s_4 &= 16 * 4^2 \\ &= 256 \text{ feet} \end{align*} $$

Let’s also compute the distance the ball traveled after 5 seconds

$$ \begin{align*} s_5 &= 16 * 5^2 \\ &= 400 \text{ feet} \end{align*} $$

The average speed for this interval of time is then

$$ \text{average speed for the interval of time [4, 5]} = \frac{s_5 - s_4}{1} = \frac{400 - 256}{1} = 144 \;\text{feet/s} $$

So the average speed during the fifth second is $144;\text{feet/s}$, this quantity is no more than an approximation of the instantaneous speed, but we may improve the approximation by calculating the average speed in the interval of time from 4 to 4.1 seconds which is

$$ \text{average speed for the interval of time [4, 4.1]} = \frac{268.96 - 256}{0.1} = 129.6\;\text{feet/s} $$

Let’s register more computations of the above process with smaller and smaller intervals of time in a table

|time elapsed after 4 seconds|  1|  0.1|  0.01|  0.001|  0.0001|
|average speed (in feet/s)   |144|129.6|128.16|128.016|128.0016|

Of course no matter how small the interval is the result is not the instant speed at the instant $t=4$ however we now see that the average speed for the intervals seem to be approaching to the fixed number 128 feet/s

Method of increments

Let’s redo the process described above over an arbitrary interval of time, to do so let’s introduce a quantity $h$ which represents an interval of time beginning at $t=4$ which extends before or after $t=4$ ($h$ is called an increment in $t$ because it’s some interval of time)

The formula for the example above is

$$ \begin{equation} \label{balldrop} s = 16t^2 \end{equation} $$

When calculated once by the end of the fourth second is

$$ \begin{equation} \label{balldrop1} s_4 = 16 * 4^2 = 256 \end{equation} $$

When substituted with the interval $[4, 4 + h]$ is

$$ \begin{align} s_4 + k &= 16 (4 + h) ^2 \notag \\ &= 256 + 128h + 16h^2 \label{balldrop2} \end{align} $$

Where $k$ is the additional distance the object falls $h$ seconds after the initial $4$ seconds, to obtain $k$ we have to subtract $\eqref{balldrop1}$ from $\eqref{balldrop2}$, the result is

$$ k = 128h + 16h^2 $$

The average speed in this interval of time is then $\frac{k}{h}$, dividing both sides by $h$

$$ \frac{k}{h} = 128 + 16h $$

To compute the instantaneous speed the interval $h$ must become smaller and smaller until it reaches 0, if $h$ approaches 0 then $16h$ also approaches 0, we can conclude that the instantaneous speed when $t=4$ approaches 128 feet/s

Generalization

Let’s generalize the process above for $\eqref{balldrop}$ for any value of $t$, to do so let’s apply the method of increments when $t$ is substituted with the interval $t + h$

$$ \begin{align*} s + k &= 16(t + h)^2 \\ &= 16t^2 + 32th + h^2 \end{align*} $$

Subtracting $\eqref{balldrop}$ from the equation above

$$ \begin{align*} k &= 32th + h^2 \end{align*} $$

Dividing both sides by $h$

$$ \begin{equation} \label{balldrop-derivative} \frac{k}{h} = 32t + h \end{equation} $$

Just like stated above to compute the instantaneous speed the interval $h$ must become smaller and smaller until it reaches 0, if $h$ approaches 0 then the instantaneous speed approaches $32t$ which is a function that will tell us the instantaneous speed of the falling object at any time $t$!

It has been customarily since the days of Euler use $\Delta{t}$ (delta t) for the increment of $t$, $\Delta{t}$ means a “change in the value of $t$”. Thus $\Delta{t}$ has the same meaning as $h$, likewise $\Delta{s}$ has the same meaning as $k$, we can rewrite $\eqref{balldrop-derivative}$ as

$$ \begin{equation} \label{balldrop3} \frac{\Delta{s}}{\Delta{t}} = 32t + 16\Delta{t} \end{equation} $$

It’s desirable to have some short notation for the statement that we have evaluated the limit of as the values of $\Delta{t}$ approach 0 which can be expressed as

$$ \lim_{\Delta{t} \to 0} \frac{\Delta{s}}{\Delta{t}} $$

Where lim is an abbreviation for limit, replacing $\eqref{balldrop3}$ with this new notation

$$ \begin{equation} \label{balldrop-limit} \lim_{\Delta{t} \to 0} \frac{\Delta{s}}{\Delta{t}} = 32t \end{equation} $$

To some mathematicians this notation is somewhat lengthy, hence mathematicians replaced it with different variations

$$ \lim_{\Delta{t} \to 0} \frac{\Delta{s}}{\Delta{t}} = \frac{ds}{dt} = s' = f'(t) $$

The rate of change is not always related with time or distances, a generalization of the formulas above is needed, instead of the symbols $s$ and $t$ let’s use $x$ and $y$ without specifying what $x$ and $y$ mean physically

Let’s calculate the instantaneous rate of change of $y$ with respect to $x$ (the word instantaneous does not really apply because $x$ doesn’t represent time), using the method of increments on a function which depends on $x$

$$ \begin{align} y &= f(x) \label{x-a} \\ y + \Delta{y} &= f(x + \Delta{x}) \label{x-b} \end{align} $$

Subtracting $\eqref{x-a}$ from $\eqref{x-b}$

$$ \Delta{y} = f(x + \Delta{x}) - f(x) $$

Dividing both sides by $\Delta{x}$

$$ \frac{\Delta{y}}{\Delta{x}} = \frac{f(x + \Delta{x}) - f(x)}{\Delta{x}} $$

The instantaneous rate of change of $y$ with respect to $x$ is reached when $\Delta{x}$ approaches 0

$$ \begin{equation} \label{limit} \lim_{\Delta{x} \to 0} \frac{f(x + \Delta{x}) - f(x)}{\Delta{x}} \end{equation} $$

We can also use the variations for the notation of the rate of change

$$ \lim_{\Delta{t} \to 0} \frac{\Delta{y}}{\Delta{x}} = \frac{dy}{dx} = y' = f'(x) $$

What we did with the process above was to find the instantaneous rate of change of $y$ with respect to $x$, we call this rate the derivative of $y$ with respect to $x$, the process of applying the method of increments to obtain the derivative is called differentiation

Geometric interpretation of the derivative

Let’s graph the following formula

$$ \begin{equation}\label{yx2} y = x^2 \end{equation} $$

A point belonging to this geometrical representation of $y$ has the form $(x_1, f(x_1))$, e.g. when $x = 1, y = 1$ and when $x = 2, y = 4$

Let’s say that $(x_1, f(x_1))$ is a fixed point on the curve (for the sake of this example the point will be $x_1 = 1, y_1 = 1$), any other point that belongs to the curve can make a line with the fixed point

The slope is a quantity that describes the direction and steepness of a line and is calculated by finding the ratio of the vertical change to the horizontal change between any distinct two points on the line, the previous statement expressed as a formula is

$$ m = \frac{y_2 - y_1}{x_2 - x_1} = \frac{\Delta{y}}{\Delta{x}} $$

What if the movable point get closer and closer to the fixed point such that $\Delta{x}$ reaches 0? That’s exactly the definition of the derivative which means that the derivative of a function will tell us the slope of the tangent line to the function (represented geometrically as a curve) at any derivable point!

Let’s find the instantaneous rate of change of this function evaluated at $x=1$, using $\eqref{limit}$

$$ \begin{align*} m_1 = f'(1) &= \lim_{\Delta{x} \to 0} \frac{f(1 + \Delta{x}) - f(1)}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} \frac{(1 + \Delta{x}) ^ 2 - 1^2}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} \frac{1^2 + 2\Delta{x} - \Delta{x}^2 - 1^2}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} 2 - \Delta{x} \\ &= 2 \end{align*} $$

This fixed number is the value of the slope of the line tangent to the derivative function when it’s evaluated with $1$, let’s find out the Point–slope form of the tangent line whose slope is $m$

$$ \begin{equation}\label{line-equation} y - y_1 = m(x - x_1) \end{equation} $$

Substituting $y_1=1$, $m=2$ and $x_1=1$ computed above

$$ \begin{align*} y &= 2(x - 1) + 1 \\ &= 2x - 2 + 1 \\ &= 2x - 1 \end{align*} $$

If we graph this line next to the geometric representation of $y = x^2$ we see that’s actually touching the curve at the point $(1, 1)$

Before finding the equation of the slope for any value of $x$ let’s imagine the graph produced by the slope function, if we take a look at the graph produced by $\eqref{yx2}$ we can see that for any point that belongs to the curve whose $x$ coordinate is negative the slope will be negative and for any point that belongs to the curve whose $x$ coordinate is positive the slope will be positive, expressed mathematically

$$ sgn(m) = \begin{cases} -1 & if x < 0, \\ 0 & if x = 0, \\ 1 & if x > 0. \end{cases} $$

Now that we have an idea of the values of the slope let’s find the value of $m$ for any value of $x$ that is the derivative of $y$ with respect to $x$, using $\eqref{limit}$

$$ \begin{align*} f'(x) &= \lim_{\Delta{x} \to 0} \frac{f(x + \Delta{x}) - f(x)}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} \frac{(x + \Delta{x}) ^ 2 - x^2}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} \frac{x^2 + 2x\Delta{x} - \Delta{x}^2 - x^2}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} 2x - \Delta{x} \\ &= 2x \end{align*} $$

By looking at the line we confirm our expectation of the values, any point which belongs to the line whose $x$ coordinate is negative has it’s $y$ coordinate (the value of the slope) negative as well, and any $x$ coordinate belonging to the line whose $x$ coordinate is positive has it’s $y$ coordinate positive as well.

There are infinite tangent lines to the curve that represents $\eqref{yx2}$, in the following graph the equation of the line is computed dynamically based on the position of the mouse pointer (computed doing substitutions on $\eqref{line-equation}$)

Second Derivative

Going back to the falling object formula ($s$ is the distance the object moved after $t$ seconds have elapsed)

$$ s = 16t^2 $$

The instantaneous rate of change of the distance with respect to time is

$$ \begin{equation} \label{balldrop-first-derivative} s' = 32t \end{equation} $$

$s’$ represents speed and is customarily to use $v$ (the first letter of velocity) instead of $s'$

$$ \begin{equation} \label{balldrop-velocity} v = 32t \end{equation} $$

Now $v$ is a function of $t$ and we can ask for the rate of change of the $v$ with respect to $t$, this is called instantaneous acceleration, acceleration is a change of speed that takes place during an interval of time, if there weren’t acceleration in a moving object the moving object will be moving the rest of his life with a constant speed, if the speed is given as a function of time then we can calculate the instantaneous rate of change of the velocity with respect to time

$$ \begin{equation} \label{balldrop-second-derivative} v' = 32 \end{equation} $$

The instantaneous acceleration obtained above is the derived function of the isntantaneous speed which is the derived function of the distance function, then we can relate the instantaneous acceleration and the distance function with the following notation

$$ s'' \quad or \quad \frac{d^2s}{dt^2} $$

The function above is called the second derived function of $\eqref{balldrop}$, this notation applied to the generalized version using the variables $x$ and $y$ is

$$ \frac{d^2y}{dx^2} \quad or \quad y'' \quad or \quad f''(x) $$

The chain rule

Physical problems lead to more complicated algebraic functions, for example $y = \sqrt{x^2 + 1}$ which arises when one wants to work with the upper half of the parabola $y^2 = x^2 + 1$, we can express this function as a combination of two functions:

$$ y = \sqrt{u}\;, \quad u = x^2 + 1 $$

If $y$ is a function of $u$ and $u$ is a function of $x$ then:

$$ \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} $$

Expressed in the function notation

$$ y = f(u) \quad \text{and} \quad u = g(x) $$

Then

$$ \begin{equation}\label{chain-rule} \frac{dy}{dx} = f'(u) \cdot g'(x) \end{equation} $$

Returning to the original problem, let’s find the derivative of $y = \sqrt{x^2 + 1}$ with respect to $x$ using the chain rule

Let $f(u) = u^{1/2}$ and $g(x) = x^2 + 1$

$$ \frac{dy}{dx} = f'(u) \cdot g'(x) = \frac{u^{-1/2}}{2} \cdot 2x = \frac{x}{\sqrt{x^2 + 1}} $$

Differentiation of implicit functions

Going back to the definition of a function, it’s a relation between two variables such that given a value of one in some domain there’s a unique value determined for the second variable however functions often occur in forms where giving the independent variable some value will not result in a unique value, for example the equation of a circle of radius equal to 5 is:

$$ \begin{equation}\label{implicit} x^2 + y^2 = 25 \end{equation} $$

Here $y$is not expressed in terms of $x$, solving for $x$ we have two equations:

$$ \begin{equation}\label{explicit} y = \sqrt{25 - x^2} \quad y = -\sqrt{25 - x^2} \end{equation} $$

$\eqref{implicit}$ represents the circle implicitly and $\eqref{explicit}$ represents the equation explicitly

We know that $y$ in $\eqref{implicit}$ represents some function of $x$, if we recognize that the left side of $\eqref{implicit}$ is only a set of terms in $x$ then we can differentiate it, the problem is to find the derivative of $y^2$ which should remind us of the chain rule ($y$ plays the role of $u$ in the chain rule)

$$ \frac{d(y^2)}{dx} = 2y \frac{dy}{dx} $$

Applying a differentiation process to $\eqref{implicit}$

$$ 2x + 2y \frac{dy}{dx} = 0 $$

Solving for $\frac{dy}{dx}$

$$ \frac{dy}{dx} = -\frac{x}{y} $$

Theorems on differentiation

Read “Calculus: An Intuitive and Physical Approach”

Applications of the Derivative

Determination of the velocity and acceleration of a particle given its distance as a function of time
Concentrate light, sound and radio waves in a particular direction (see the reflective property of the parabola )
Finding the maximum/minimum value of a function, i.e. find the largest/smallest value of $f(x)$ when $a \leq x \leq b$, a well described solution to this problem can be found here
Approximation of the roots of a polynomial with Newton’s method, described here

Maxima/minima

Let’s say that we throw an object into the air and we want to know the maximum height it acquires, as it rises it’s velocity decreases and when it reaches the highest point its velocity is zero, we also know that the velocity is the instantaneous rate of change of height with respect to time hence the derivative is involved in this process and therefore we expect it to be involved in other maxima/minima problems

More generally if $y$ is a function of $x$ it seems that to find the maximum value of $y$ we must find $y’$ and set it to 0

Let’s see an example, the following function has a maximum value of $3.333$ near $x = 1$ and a minimum value of $2$ near $x = 3$, if we analyze the slope of the function near those points we will see that on the left of $x = 1$ the slope is positive and on the right of $x = 1$ the slope is negative, since we know that the derivative represents the slope of a function we can also expect that the derivative of this function near $x = 1$ will go from a positive value to a negative value intersecting the x-axis, if we analyze the slope near $x = 3$ will will see the same behavior with the slope but it’s going from a negative value to a positive one

$$ y = x^3/3 - 2x^2 + 3x + 2 $$

$$ y' = x^2 - 4x + 3 $$

Now the problem reduces to finding the points where $y’ = 0$ in the derivative function, finding them will tell us exactly the maximum/minimum value of $y$, finding the values of $x$ when $y’ = 0$

$$ \begin{align*} 0 &= x^2 - 4x + 3 \\ 0 &= (x - 1)(x - 3) \end{align*} $$

And we see that:

$$ y' = 0 \quad when \quad x = 1 \quad and \quad x = 3 $$

The process didn’t actually find the maximum/minimum values since for $x > 3$ the function increases indefinitely, same goes when $x < 1$ but in this case the function decreases indefinitely, these values are called the relative maxima/minima because near $x = 3$ or $x = 1$ these points are the minimum/maximum that can be found

Applications of maxima/minima

refraction of light, we can build a function of time which relates the velocity/distance the light travels in different mediums, finding the derivative and making it equal to $0$ will find the relative minimum time needed to go from one point in the medium $a$ to a point in a medium $b$
finding the sides of the rectangle with the maximum perimeter

Newton-Raphson method

The slope of the tangent line of a function $f(x)$ at any derivable point is given by $m = f’(x)$, let $x_1$ be a derivable point then the slope of the tangent line at $x_1$ is $m_1 = f’(x_1)$, the Point–slope form of the tangent line whose slope is $f’(x_1)$ is

$$ y - y_1 = m_1(x - x_1) \\ y - f(x_1) = f'(x_1) \cdot (x - x_1) $$

Newton find out that if we find the intercept of this tangent line with the $x$-axis at some initial guess $x_1$, the value found approaches one of the roots of $f(x)$, i.e. when $f(x) = 0$ (obviously given that it has roots)

if $y = f(x) = 0$ then the equation of the line is

$$ 0 - f(x_1) = f'(x_1) \cdot (x - x_1) $$

Solving for $x$

$$ \begin{equation}\label{newton-raphson} x = x_1 - \frac{f(x_1)}{f'(x_1)} \end{equation} $$

$x$ in the last equation is the abscissa of the next approximation of one of the roots of $x$, if we run the algorithm above a few times with an acceptable initial guess then we’ll obtain a better approximation of one of the roots of $f(x)$

Finding the square root of a number

Let’s say that we want to find the square root of a number $n$, this is equivalent to finding the solution to

$$ x^2 = n $$

The function to use is then

$$ f(x) = x^2 - n $$

whose derivative is

$$ f'(x) = 2x $$

Substituting in $\eqref{newton-raphson}$

$$ \begin{align*} x &= x_1 - \frac{x_1^2 - n}{2x_1} \\ &= x_1 - \frac{x_1}{2} + \frac{n}{2x_1} \\ &= \frac{x_1}{2} + \frac{n}{2x_1} \\ &= \frac{1}{2} \cdot \big ( x_1 + \frac{n}{x_1} \big ) \end{align*} $$

double square_root(double n) {
  // initial guess
  double EPS = 1e-15;
  double x0 = 1;
  while (true) {
    double xi = (x0 + n / x0) / 2.0;
    if (abs(x0 - xi) < EPS) {
      break;
    }
    x0 = xi;
  }
  return x0;
}

Integral

Thu, 02 Apr 2015 10:00:00 +0000

We’re asked to find the derivative of the following function with respect to $x$:

$$ \begin{equation}\label{definition-function} y = 3x^2 \end{equation} $$

Performing the differentiation process

$$ \begin{align*} y' &= \lim_{\Delta{x} \to 0} \frac{f(x + \Delta{x}) - f(x)}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} \frac{3(x + \Delta{x}) ^ 2 - 3x^2}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} \frac{3x^2 + 6x\Delta{x} - \Delta{x}^2 - 3x^2}{\Delta{x}} \\ &= \lim_{\Delta{x} \to 0} 6x - \Delta{x} \\ &= 6x \end{align*} $$

Let’s say that we’re given the same problem in a reversed version, we’re asked to find the original function of the following derivate function

$$ \begin{equation}\label{definition-function-derivate} y' = 6x \end{equation} $$

Why? because “when we formulate physical problems mathematically the given physical information usually leads to derived functions, and the primary objective in solving the physical problems is to find the original functions” ¹

We know that the original function corresponding to \eqref{definition-function-derivate} is \eqref{definition-function} but is there an algorithm to find the derivative for the case above? A possible algorithm for the differentiation of the function above might be:

for each term in the function
  - mutiply the coefficient with the exponent of the independent variable
  - reduce the exponent of the independent variable by one

Reversing the algorithm above means that we’re actually trying to find the original function, a reversed version of the algorithm above might be:

for each term in the function
  - increment the exponent of the independent variable by one
  - divide the coefficient with the exponent of the independent variable

If we apply it to \eqref{definition-function-derivate} we get the original function:

$$ 6x \to 6x^2 \to 6/2x^2 \\ 3x^2 $$

However we have overlooked one point, it’s also true that \eqref{definition-function-derivate} is the derived function of $y = 3x^2 + C$ where $C$ is some constant, this means that a constant term that was in the original function doesn’t show up in the derived function, in the view of this possibility we must say that

$$ y = 3x^2 + C $$

The process of going from the derived function to the original function is called antidifferentiation or integration, the original function is called the primitive function or the indefinite integral of the given function (which is shortened to integral)

Before proving the correctness of the previous algorithm let’s try it on similar functions, the formula for instantaneous acceleration (the instantaneous rate of change of speed with respect to time) is

$$ v' = 32 $$

Here the independent variable is actually $x^0$, applying the 2-step algorithm to find the original function

$$ 32t^0 \to 32t \to 32/1\;t \\ 32t $$

Same as above, since the original function might have had a constant

$$ v = 32t + C $$

Straight line motion in one direction

Galileo obtained a basic physical principle, if one neglects air resistance all objects near the earth’s surface fall to earth with the same acceleration which is constant (a downward acceleration), the constant value is equal to:

$$ 32 \text{ feet/s } $$

find how long it takes for an object dropped from 400 feet above the earth’s surface to reach the surface.

The instantaneous acceleration as seen above is

$$ a = v' = 32 $$

Applying the 2-step algorithm

$$ v = 32t + C $$

If the object is dropped it leaves with zero speed (when $t=0$, $v=0$), substituting these values in the formula above to find the value of $C$

$$ 0 = 32 * 0 + C \\ C = 0 $$

Therefore the correct formula for speed is

$$ \begin{equation}\label{speed-example} v = s' = 32t \end{equation} $$

Applying a new process of antidifferentiation to \eqref{speed-example}

$$ s = 16t^2 + C $$

If we agree to measure distance from the point the object is dropped then the initial distance when $t = 0$ is also zero, $C$ will also have a value of zero, hence the correct formula for distance is

$$ s = 16t^2 $$

To answer the original equation we must find out the value of $t$ given that $s=400$:

$$ t = \pm\sqrt{\frac{s}{16}} = \pm\sqrt{\frac{400}{16}} = \pm\sqrt{25} = \pm 5 $$

For the physical problem only the positive solution is valid, with the same knowledge we can also tackle problems where the object is thrown instead of dropped

find how long it takes for an object thrown downward from with a velocity of 100 ft/s from a height of 1000 feet to reach the earth’s surface

Starting with the formula of instantaneous acceleration:

$$ a = 32 $$

Applying the 2-step algorithm

$$ v = 32t + C $$

The object is thrown downwards instead of dropped which means that when $t=0,;v=100$

$$ 100 = 32 * 0 + C \\ C = 100 $$

Hence the correct formula for speed is

$$ v = 32t + 100 $$

Applying a new process of antidifferentiation

$$ s = 16t^2 + 100t + C $$

If we agree to measure the distance from the point where the object is thrown then $C = 0$

$$ s = 16t^2 + 100t $$

It’s convenient to measure distance from the earth’s surface and not from any arbitrary distance like the examples above, however this means that the upward direction is positive, then the acceleration of the gravity must be negative i.e. $-32;ft/sec^2$ so that the distance traveled by reason of this acceleration is recorded as downward

$$ v' = -32\;ft/sec^2 $$

Then by antifidifferentiation

$$ \begin{equation}\label{speed-raw} v = -32t + C \end{equation} $$

if an object is thrown upward it must have an initial velocity upward, let’s say that an object located in the earth’s surface is thrown upward with an initial velocity equal to $128;ft/s$, substituting these values in \eqref{speed-raw} (when $t = 0,; v = 128$):

$$ 128 = -32 * 0 + C \\ C = 128 $$

so that

$$ \begin{equation}\label{velocity-1} v = -32t + 128 \end{equation} $$

Applying a new process of antidifferentiation to find the distance traveled upward at any time $t$

$$ s = -16t^2 + 128t + C $$

Because we have agreed to measure distance from the surface the value of $C$ is zero because when $t = 0$ the object is still on the ground

$$ \begin{equation}\label{distance-1} s = -16t^2 + 128t \end{equation} $$

One question of interest is the maximum height attained by an object whose motion is represented by \eqref{distance-1}, this problem could be answered if we knew at what $t$ the object attains maximum height, however we can use \eqref{velocity-1} to obtain the time since we know that the instant the object attains maximum height the velocity will be zero (the object will rise until it’s velocity is zero and then fall), substituting zero as the speed in \eqref{velocity-1}

$$ 0 = -32t + 128 \\ t = \frac{128}{32} = 4 $$

Now that we know the time at which the object attains the maximum height let’s replace it on \eqref{distance-1}

$$ \begin{align*} s &= -16(4)^2 + 128 * 4 \\ &= -256 + 512 \\ &= 256 \text{ feet } \end{align*} $$

We can generalize the solutions above for objects thrown in any planet, we can represent the acceleration with the symbol $g$, so the acceleration due to this gravity felt from the ground is

$$ \begin{equation}\label{acceleration} a = -g \end{equation} $$

By antidifferentiation

$$ v = -gt + C $$

Since it’s a generalization we don’t know the initial velocity (the value of $C$ is some constant), let’s represent the initial velocity of the object with the symbol $v_0$, hence the formula for the velocity is

$$ \begin{equation}\label{speed} v = v_0 - gt \end{equation} $$

Applying a new process of antidifferentiation to find the distance of the object from the ground

$$ s = v_0t - \frac{gt^2}{2} + C $$

For this case we also don’t know the initial distance from the ground the object is when $t = 0$, let’s represent the initial distance from the ground with the symbol $s_0$

$$ \begin{equation}\label{distance} s = s_0 + v_0t - \frac{gt^2}{2} \end{equation} $$

Definite Integral

Area as the limit of the sum

Let’s say we want to compute the area of the graph below $f(x)$ and the $x$-axis bounded by the vertical lines $x = a = 1$ and $x = b = 2$

$$ f(x) = x^2 $$

An approximation to the area can be found by taking the maximum y-value in $(a, b)$ called $m_1$ and multiplying it $(b - a)$ which will be expressed as $\Delta(x)$, then the first approximation is $m_1\Delta{x}$

$$ S_1 = m_1\Delta{x} $$

We can obtain a better approximation if we divide the interval $(a, b)$ into two equal parts each denoted by $\Delta{x}$ multiplied by the maximum y-value in each part and then form the sum

$$ S_2 = m_1\Delta{x} + m_2\Delta{x} $$

Dividing the interval $(a, b)$ into $n$ equal parts each denoted as $\Delta{x}$ and choosing $n$ maximum y-values for each part we form the sum:

$$ S_n = m_1\Delta{x} + m_2\Delta{x} + \ldots + m_n\Delta{x} $$

The quantity $n$ can increase without a limit, to each $n$ there’s a corresponding sum, now the quantity

$$ \lim_{n \to \infty} S_n $$

Seems to give the exact area of the under the curve bounded by $a$ and $b$ therefore

$$ \lim_{n \to \infty} S_n = A $$

There is another notation for this limit which keeps the bounds that determine the area, if $y = f(x)$ then we write for the limit:

$$ \int_{a}^{b} f(x) dx $$

The elongated S denotes integration, the symbols $a$ and $b$ are the left and right ends of the domain whose area is being calculated and $f(x)dx$ is a reminder that we took rectangles of height $y_i$ and width $\Delta{x_i}$.

Evaluation of definite integrals

Another way to find the area is as follows, previously we found an approximation of the area as ($\Delta{x} * max(f(x))$ for all $x \in [a, b]$), similary we can take the lower bound instead ($\Delta{x} * min(f(x))$ for all $x \in [a, b]$) let’s assume that somehow we have found the area below the curve bounded by $[a, x_0]$, moving $x_0$ to the right will generate an increment in the area, this change in the area can be expressed as

$$ \Delta{A} = \bar{y} * \Delta{x} $$

$$ \frac{\Delta{A}}{\Delta{x}} = \bar{y} $$

The value of $\bar{y}$ is some value between $f(x_0)$ and $f(x_0 + \Delta{x})$, to obtain the instantaneous rate of change in the area with respect to $x$ we must find the limit of $\Delta{A}/\Delta{x}$ as $\Delta{x}$ approaches zero, also as $\Delta{x}$ approaches zero the value of $\bar{y}$ also approaches $f(x_0)$ therefore

$$ \frac{\Delta{A}}{\Delta{x}} = y_0 = f(x_0) $$

Because this is true for any value of $x$ in the interval $[a, b]$

$$ \frac{\Delta{A}}{\Delta{x}} = y = f(x) $$

To find the value of $A(x)$ we apply antidifferentiation

$$ A = \int f(x) \; dx $$

As an example let’s apply the above to the function $f(x) = x^2$

$$ \begin{equation}\label{integral-eval} A = \int x^2 \; dx = \frac{x^3}{3} + C \end{equation} $$

When $x = a = 1$ we know that the area is zero then

$$ \begin{align*} 0 &= \frac{1^3}{3} + C \\ C &= -\frac{1}{3} \end{align*} $$

Then

$$ A = \frac{x^3}{3} - \frac{1}{3} $$

is the function which expresses the area from $a$ to any position $x$, to find the area bounded by $[a, b]$ we substitute $x = b = 2$ and get

$$ A = \frac{2^3}{3} - \frac{1}{3} = \frac{7}{3} $$

We can obtain the same result if we take the expression \eqref{integral-eval} substituting 2 for $x$, 1 for $x$ and then subtracting the second result from the first

$$ \frac{2^3}{3} + C - (\frac{1}{3} + C) = \frac{7}{3} $$

The constant of the integration is eliminated in the process, this process is actually called the fundamental theorem of the calculus

$$ \int_{a}^{b} f(x)\;dx = F(b) - F(a) $$

Where $F(x)$ is the antiderivative of $f(x)$

Additional properties of the definite integral

$$ \int_{a}^{b} f(x)\;dx = -\int_{b}^{a} f(x)\;dx $$

$$ \int_{a}^{b} f(x)\;dx = \int_{a}^{x_0} f(x)\;dx + \int_{x_0}^{b} f(x)\;dx $$

$$ \int_{a}^{b} u\;dx \pm \int_{a}^{b} v\;dx = \int_{a}^{b} (u \pm v)\;dx $$

$$ \frac{d}{dx} \int_{a}^{x} u\;du = f(x) $$

Numerical methods for evaluating definite integrals

Trapezoid rule

Let’s imagine that we have a curve which it’s impossible to find the antiderivative thus the area below the curve, instead of calculating it we can approximate it’s value by using trapezoids instead of rectangles as we’ve done before, we also know that approximating the area below the curve using the $min(f(x))$ value found in the interval $[a, b]$ multiplied by $b - a$ denoted as $\Delta{x}$ gives a lower bound of the area below the curve

$$ \underline{S_n} = y_0 \Delta{x} + y_1 \Delta{x} + \ldots y_{n-1} \Delta{x} $$

Another appoximation was using the $max(f(x))$ value found in the interval $[a, b]$ multiplied by $b - a$ denoted as $\Delta{x}$ which gives the upper bound of the area below the curve

$$ \overline{S_n} = y_1 \Delta{x} + y_2 \Delta{x} + \ldots y_{n} \Delta{x} $$

Calculating the average of these sums will definitely give an approximate result

$$ S_n = \tfrac{1}{2} (y_0 + y_1) \Delta{x} + \tfrac{1}{2} (y_1 + y_2) \Delta{x} + \ldots + \tfrac{1}{2} (y_{n-1} + y_n) \Delta{x} $$

Each of these terms is the area of a trapezoid of height $\Delta{x}$ and bases $y_i$, $y_{i + 1}$

Rewriting the equation above

$$ \int_{a}^b f(x)\;dx \approx \Delta{x} * (\tfrac{1}{2}y_0 + y_1 + y_2 + \ldots + y_{n-1} + \tfrac{1}{2} y_n) $$

[Simpson’s rule][simpson]

Simpson rule approximates the value of a definite integral by using quadratic polynomials of the form

$$ \begin{equation}\label{quadratic} y = ax^2 + bx + c \end{equation} $$

which pass through three points belonging to the curve which are $(-h, y_0)$, $(0, y_1)$, $(h, y_2)$

The area below the curve bounded by $[-h, h]$ is

$$ \begin{align*} A &= \int_{-h}^{h} (ax^2 + bx + c) \; dx \\ &= \frac{ax^3}{3} + \frac{bx^2}{2} + cx \; \Big|_{-h}^h \\ &= \frac{2ah^3}{3} + 2ch \\ &= \frac{h}{3} (2ah^2 + 6c) \end{align*} $$

Since the points $(-h, y_0)$, $(0, y_1)$ and $(h, y_2)$ are on the curve, they satisfy \eqref{quadratic}

$$ \begin{align*} y_0 &= ah^2 - bh + c \\ y_1 &= c \\ y_2 &= ah^2 + bh + c \end{align*} $$

The quantity

$$ y_0 + 4y_1 + y_2 = (ah^2 - bh + c) + 4c + (ah^2 + bh + c) = 2ah^2 + 6c $$

is equal to a part of the area under the quadratic polynomial found above, therefore

$$ A = \frac{h}{3} (y_0 + 4y_1 + y_2) $$

To find the area bounded by $[a, b]$ we have to take an even number $n$ of subintervals of equal length

$$ h = \frac{b - a}{n} $$

$n$ subintervals are defined with $n + 1$ points which are:

$$ x_0 = a, \quad x_1 = a + h, \quad x_2 = a + 2 * h + \ldots + x_n = a + nh = b $$

We can estimate the value of the integral by adding the areas computed for each unique contiguous pair of subintervals

$$ \begin{align*} \int_{a}^{b} f(x) \; dx &\approx \tfrac{h}{3} (y_0 + 4y_1 + y_2) + \tfrac{h}{3} (y_2 + 4y_3 + y_4) + \cdots +\tfrac{h}{3} (y_{n-2} + 4y_{n-1} + y_n) \\ &\approx \tfrac{h}{3} (y_0 + 4y_1 + 2y_2 + 4y_3 + 2y_4 + \ldots + 4y_{n-1} + y_n) \\ \end{align*} $$

Physical applications of the definite integral

The calculation of work

When a force applied to an object causes a displacement it’s said that work was done upon the object, this quantity expressed with the symbol $W$ is the product

$$ W = Fs $$

As an example let’s calculate the work done by the force of gravity, choosing the direction from the center of the earth upward as the positive direction we can use Newton’s law for gravitation, this law states that any two objects attract each other and this force is given quantitatively by

$$ F = \frac{GmM}{r^2} $$

$G$ is a constant, $m$ and $M$ are the masses of the two objects and $r$ is the distance between the objects (idealized as point particles)

Since this force of gravity actually pulls objects towards the center of the earth and we chose the direction from the center of the earth upward as positive this quantity must be negative

$$ \begin{equation}\label{gravity} F = -\frac{GmM}{r^2} \end{equation} $$

To calculate the work done by gravity we cannot multiply the force of gravity by the displacement because the force actually varies from point to point along the path, suppose the object is at some distance $r$ from the center of the earth and gravity pulls the object downward a small distance $\Delta{r}$ then we work done by gravity is

$$ \Delta{W} = F\Delta{r} $$

By division

$$ \frac{\Delta{W}}{\Delta{r}} = F $$

We now determine the limit of $\tfrac{\Delta{W}}{\Delta{r}}$ as $\Delta{r}$ approaches 0 which is the rate of change of work with respect to the displacement over the path

$$ \frac{dW}{dr} = \lim{\Delta{r} \to 0} \frac{\Delta{w}}{\Delta{r}} = F $$

Replacing \eqref{gravity}

$$ \frac{dW}{dr} = -\frac{GmM}{r^2} $$

To find $W$ we apply an antidifferentiation process

$$ W = \int{-\frac{GmM}{r^2}} = \frac{GmM}{r} + C $$

Assuming that initially when the object was at $r = r_1$ there was no force being applied to it therefore $W = 0$ and $C = -GmM/r_1$ and

$$ W = \frac{GmM}{r} - \frac{GmM}{r_1} $$

Excerpt From: Morris Kline. “Calculus: An Intuitive and Physical Approach(Second Edition).” [simpson]: http://pages.pacificcoast.net/~cazelais/187/simpson.pdf ↩︎

Taylor's Theorem and Infinite Series

Thu, 02 Apr 2015 10:00:00 +0000

There are simple functions for which we cannot find antiderivatives in terms of the functions we know, some examples are

$$ \frac{sin(x)}{x} \quad\quad e^{-x^2} $$

Another problem that arises in problems of the calculus is that of calculating the values of functions, for a given polynomial like $3x^2 + 7x + 1$ it’s simple to calculate the value of the function for various values of $x$ but it’s not so simple for a function like $sin(x)$, to calculate the value of the function at a some value of $x$ we would have to construct a right triangle containing the desired angle $x$ and then measure the side of the opposite side and the hypotenuse, however this process is not very accurate if $x$ is something like 30°50'47

The answer to the problems above is approximate unmanageable functions by manageable ones by also determining precisely what the error incurred is, if we are to approximate a given function $f(x)$ by $g(x)$ we should make $g(x)$ relatively simple so that we can calculate its values, now the simplest functions to work with are the polynomials and therefore we should approximate the function by polynomials

First let’s look into the simpler problem of approximating a function around one value of $x$, let’s say that we have the function $f(x)$ and we want to approximate its value near $x = 0$, let’s consider the polynomial $g(x)$ as an approximation to $f(x)$

$$ \begin{equation}\label{gx} g(x) = c_0 + c_1x + c_2x^2 + \cdots + c_nx^n \end{equation} $$

We can make $g(x)$ agree with $f(x)$ at $x = 0$ because by $g(0) = c_0$ we can take $c_0$ to be $f(0)$, if we expect that $g(x)$ is an approximation of $f(x)$ at $x = 0$ we would also expect that the tangent line at $x = 0$ approximates the curve closely in the point of tangency, hence we should make the slope of $g(x)$ agree with the slope of $f(x)$ at $x = 0$, applying a differentiation process to $g(x)$

$$ g'(x) = c_1 + 2c_2x + 3c_3x^2 + 4c_4x^3 + \cdots + nc_nx^{n-1} $$

At $x = 0$ $g’(0) = c_1$, if $g’(0)$ agrees with $f’(0)$ then

$$ c_1 = f'(0) $$

We can apply the same idea by making $g’’ (x)$ agree with $f’’ (x)$ at $x = 0$, applying a differentiation process to $g’(x)$

$$ g'' (x) = 2c_2 + 2 \cdot 3c_3x^1 + 3 \cdot 4c_4x^2 + \cdots + n(n - 1)c_nx^{n-2} $$

Then $g’’ (0) = 2c_2$ and if $g’’ (0)$ is the same as $f’’ (0)$ then $f’’ (0) = 2c_2$ or

$$ c_2 = \frac{f'' (0)}{2} $$

To determine $c_3$ we would make the third derivatives of both functions agree at $x = 0$

$$ g'''(x) = 2 \cdot 3c_3 + 2 \cdot 3 \cdot 4c_4x + \cdots + n(n - 1)(n - 2)c_nx^{n-3} $$

Then $g’’’ (0) = 2 \cdot 3c_3$ which is the same as $f’’’ (0)$ then

$$ c_3 = \frac{f'''(0)}{2 \cdot 3} $$

we can see that the $n$-th derivate of $g(x)$ is $g^{(n)}(x) = n(n - 1)(n - 2)\ldots$ and if $g^{(n)}(0)$ is equal to $g^{(n)}(0)$

$$ c_n = \frac{f^{(n)}(0)}{n(n-1)(n-2) \ldots 2 \cdot 1} = \frac{f^{(n)}(0)}{n!} $$

Because we used the condition that each pair of successive derivatives agree at $x = 0$ $g(x)$ takes the form

$$ g(x) = f(0) + f'(0)x + \frac{f''(0)}{2!}x^2 + \cdots + \frac{f^{(n)}(0)}{n!}x^n $$

We could equally make the approximation near any other value of $x$ e.g. $x = a$, thus the proper form of $g(x)$ which generalizes on the form \eqref{gx}

$$ g(x) = c_0 + c_1(x - a) + c_2(x - a)^2 + \cdots + c_n(x - a)^n $$

Then the final formula for approximating any function $f(x)$ by a polynomial $g(x)$ near $x = a$ is

$$ \begin{equation}\label{taylor} g(x) = f(a) + f'(a)(x - a) + \frac{f''(a)}{2!}(x - a)^2 + \cdots + \frac{f^{(n)}(a)}{n!}(x - a)^n \end{equation} $$

Taylor’s theorem

\eqref{taylor} approximates the value of a function $f(x)$ at the point $a$ however we do not know how good the approximation is numerically, at $x = a, g(a) = f(a)$ which is exact, however for any $x$ near $a$ like $a + h, g(a + h) \approx f(a + h)$, the difference $f(a + h) - g(a + h)$ is the error in approximating $f(x)$ by the polynomial $g(x)$, the formula that approximates $f(x)$ considering also the error was first given by Brook Taylor

For any function $f(x)$ which has $(n + 1)$ derivatives in the interval from $a$ to $x$

$$ f(x) = f(a) + f'(a)(x - a) + f''(a)\frac{(x - a)^2}{2!} + \cdots + f^n(a) \frac{(x - a)^n}{n!} + f^{(n + 1)}(\mu) \frac{(x - a)^{n + 1}}{(n + 1)!} $$

Where $\mu$ is between $x$ and $a$

Introduction to Calculus

Tue, 31 Mar 2015 15:35:06 +0000

why?

Calculus was created to solve some problems that other branches of math were not adequate to treat:

determination of tangents to various curves (e.g. to determine the course of a light ray after it strikes the surface of a lens)
finding the minima/maxima (e.g. determination of the maximum range of a projectile, maximum/minimum distance of a planet that is moving about the sun)
length of curves, areas and volumes of figures bounded by curves

To solve these problems the following concepts are needed:

limit (fundamental to formulate the derivative and the integral)
derivative
integral

The concept of a function

A function is the relation between variables (whose value can be expressed numerically), the most effective mathematical representation of a function is through a formula like the one below:

$$ s = 16 t^2 $$

The above formula says that when $t=2$ then $s=16 \cdot 2^2 = 64$ and is represented as $s_2$, for each value of $t$ there’s a corresponding value of $s$, in the above form $t$ is the independent variable and $s$ is the dependent variable. If we solve the equation above for $t$ we have:

$$ t = \pm \sqrt{\frac{s}{16}} $$

Now $s$ is the independent variable and $t$ is the dependent variable

The notation $f(x)$ can also represent functions without an extensive verbiage, e.g. $f(x) = x^2 - 9$, this notation also has the advantage of telling us which is the independent variable, if we want to calculate the value of the function the notation we can use something like $f(3)$ which is the value of $f(x)$ when $x = 3$.

A formula can also be represented as a curve (this method of interpreting formulas geometrically is known as analytic geometry), let’s represent the following function below using a curve:

$$ y = x^2 $$

The function above is simple in that to each value of $x$ there’s a corresponding value of $y$, whoever the concept of a function does not require this, for example the function:

$$ y = \frac{1}{x} $$

does not have a valid value when $x = 0$, this means that the function exists for each value of $x$ other than $0$.

The concept of a function then doesn’t require that there’s a $y$ for every $x$ but it does require a $y$-value for each value $x$ in some collection/set of $x$ values, the collection of $x$ values for which a $y$ value exist is called domain and the collection of the corresponding $y$ values is called range.

Image taken from https://wordsmithofbengal.wordpress.com/2021/08/02/dr-philos-the-creative-fantasy-of-differential-and-integral-calculus/

Fun!

Mon, 01 Jan 0001 00:00:00 +0000

I saw a rat 🐀

Mon, 01 Jan 0001 00:00:00 +0000

Sandbox | Jukebox

Mon, 01 Jan 0001 00:00:00 +0000

Sandbox | Sunset

Mon, 01 Jan 0001 00:00:00 +0000