Home

Awesome

Coursera: Algorithms: Design and Analysis, Part 1 (Stanford)

Karatsura: multiplicar inteiros

''' 5678 x 1234

a b x c d

a = 56 b = 78 c = 12 d = 34

5678 x 1234 = ( a10^(n/2) + b) * ( c10^(n/2) + d) = ac10^n + ad10^(n/2) + bc10(n/2) + bd = ac10^n + (ad + bc)*10^(n/2) + bd

(ad + bc) = (a+b)(c+d) - ac - b*d '''

Calcular:

  1. a*c
  2. b*d
  3. (a+b)*(c+d)
  4. (ad + bc) = (3) - (1) - (2)

Resposta: (1)*10^n + (4)*10^n/2 + (2)

Conclusão: com esse algoritmo faremos 3 multiplicações de (n/2) digitos

Merge sort: O(n log(n))

dividir para conquistar algoritmo - 2(6nlog)n + 6n - very flat quickly

log(2): #d times you divide by 2 until you get 1

Proof of claim: At each level j=0,1,2,..., log(2)n, there are 2^j subproblems, each of size n/(2^j) Merger <= 6M

<= 2^j (# of level-j subproblems) x 6 (n/(2^j)) (subproblem size at level j) = 6n (independent of j !!!)

Total <= 6n *work per level) x (log(2)n + 1) (# of levels) = 6nlog(2)n + 6n

Outros piores que o Merge Sort:

Asymptotic analysis: focus on running time for large input sizes n

fast algorithm == worst-case running time grows slowly with input size

T(n) = O(f(n)) <= Cf(n), for a C and n >= n0

Claim: if T(n) = a(k)n^k + ... + a(1)n + a0 then T(n) = O(n^k)

Proof: Choose n0 = 1 and c= |a(k)| + |a(k-1)| + ... + |a1|

Need to show than qualquer n>=1, T(n) <= cn^k

we have, for every n>=1,

T(n) <= |a(k)|n^k + ... + |a(1)|n + |a(0)|

T(n) <= |a(k)|n^k + ... + |a(1)|n^k + |a(0)|n^k

T(n) <= cn^k

Counting Inversions: O(n log(n))

Counting Inversions = # A[i] > A[j], where i<j

1, 2, 3, 4, 5, 6 = zero inversions

6, 5, 4, 3, 2, 1 = 15 inversions => n(n-1)/2

O(n) = n log n

Strassen's Subcubic Matrix Multiplication Algorithm: O(n²)

O(n) = n^3

O(n) = n^2 (strassen)

Algorithm for closest pair problem: O(n log(n))

O(n) = n log n

Recurrence Format: Master Method or Master Theorem

  1. Base case: T(n) <= a constant for all sufficiently small n.

  2. For all large n: T(n) <= aT(n/b) + O(n^d)

    • a = number of recursive calls (>=1)
    • b = input size shrinkage factor (>1)
    • d = exponent in running time of "Combine step" (>=0)

    a,b,d independent of n

Master Method:

  1. T(n) = O(n^d * log n) if a = b^d (case 1) todos os levels tem o mesmo trabalho
  2. T(n) = O(n^d) if a < b^d (case 2) a cada level o trabalho diminui, quase todo o trabalho está na raíz
  3. T(n) = O(n^log(b)a) if a > b^d (case 3) == O(a^log(b)n) a cada level o trabalho aumenta, quase todo o trabalho está nas folhas (a^log(b)n) # de folhas

QuickSort

O(n) = n log n (average - with random pivots)

The best case is when the algorithm always picks the median as a pivot, in which case the recursion is essentially identical to that in MergeSort. In the worst case the min or the max is always chosen as the pivot, resulting in linear depth.

Randomized Selection

RSelect(Array A, length n, order statistics i)

Running time of RSelect <= 8 * c * n

O(n) = n

Graphs and Minimum Cuts

Graphs:

Cuts of Graphs:

The Minimum Cut Problem:

A Few Applications:

n vertices => minimum #edges = n -1 and maximum #edges = n * (n - 1) / 2

Sparse X Dense Graphs

The Adjacency Matrix

Vertices = 1, 2, 3, 4, ...

A = n x n where Aij = 1 => Graph has an i-j edge

Variants:

Random Contraction Algorithm

P = 2 / (n*(n-1)) >= 1 / n^2 de escolher um edge cujo um dos vertices está no grupo A e o outro no grupo B.

probability fail = 1 - 1/(n^2)

Repeat N trials

Probabilty[all N trials fail] <= (1 - 1/n^2)^N

1 + x <= e^x

Pr[all N trial fail] <= (e^(-1/n^2))^N

So:

Running time: polynomial in n and m but slow (r(n²*m))

But: can get big speedups [to roughly O(n²)] with more ideas.

of minimum cuts

[a tree with n vertices has (n - 1) minimum cuts]

Question: what's the largest number of min cuts that a graph with n vertices can have?

Answer: (n 2) = (n * (n - 1)) / 2

Pr[output = (Ai,Bi)] >= (2 / n * (n -1)) = 1 / (n 2)

Graph Search

Breadth-First Search - BFS

Depth-First Search - DFS

connected componentes = the "pieces" of graph. Equivalence classes of relation u~v <=> E u-v path in G

topologic ordering

strongly connected components - SCC

  1. Let Grev = G with all arcs reversed
  2. run DFS-Loop on Grev (goal: compute "magical ordering" of nodes) - let f(v)="finishing time" of each v e V.
  3. run DFS-Loop on G (goal: discover the SSCs one-by-one) - proceeding nodes in decreasing order of finishing times - SSCs = nodes with the same "leader"

Web Graph:

Dijkstra's Shortest-Path Algorithm

while X != V:

Main loop cri'd

Data Structures

Heap Data Structure

a container for objects that have keys. Ex.: employer records, network edges, events, etc.

Also:

Canonical use of heaps: fast way to do repeated minimum computations

Example:

SelectionSort: o(n*n) quadratic - repetitive scans

HeadpSort:

  1. insert all n array elements into a heap
  2. Extract-Min to pluck at elements in sorted order

RunningTime = 2*N heap operations = O(N log N) time

=> optimal for a "comparison based" sorting algorithm]

Application: Event Manager

Application: Median Maintanence

Application: Speeding Up Dijkstra

Array Implementation:

Implementation Insert (given key k):

number of levels = log2 N (N = # of items in heap)

Implementation of Extract-Min (Bubble-Down)

Balanced Binary Search Tree

Sorted Arrays:

Operations Running Time

Balanced Binary Search Tree = Sorted Arrays + Insert/Delete in O(logN) running time

Operations Running Time

O(log N) - balanced binary tree => height = log N

O(Height) - unbalanced binary tree

Exactly one node per key

Most basic version each node has:

Red-Black Trees

[see also AVL trees, splay trees (self-adjusting trees), B trees, B+ trees (databases)]

  1. each node red or black (bit information)
  2. root is black
  3. no two reds in a row (red node => only black childrens)
  4. every root-NULL path (unsuccessful search) has same number of black nodes

left and right rotations

Insertion in a Red-Black Tree:

Idea for Insert/Delete: proceed as in a normal binary search tree, then recolor and/or perform rotations until invariants are restored;

  1. Insert X as usual - makes X a leaf
  2. Try coloring X red
  3. If X's parent Y is black, done
  4. else Y is red

Hash Tables

Prupose: maintain a (possibly evolving) set of stuff. (transactions, people + associated data, IP address, etc.)

Amazing guarantee => all operations in O(1) time! (constant time)

Collision => birsthday paradox (23 people -> 50% collision / same birsthday)

Quick-and-Dirty Hash Functions:

"hash code" -> "compression function"

N = # of buckets

How to choose N?

  1. choose N to be a prime (within constant factor of # of objects in table)
  2. not too close to a power of 2
  3. not too close to a power of 10

Load factor of a hash table: alpha

alpha = # of objects in hash table / # of buckets in hash table

Pathological Data in the Real World.

Solutions:

  1. use a cryptographic hash function (e.q., SHA-2): infeasible to reverse engineer a pathological data set
  2. use randomization: design a Family H of hash functions such that, any data sets S, "almost all" functions hash spread S out "pretty evenly". Universal Hashing.

Bloom Filters