DSA in Python
Cheatsheet Content
### Big O Notation - **Definition**: Describes the upper bound of an algorithm's runtime or space complexity as input size grows. - $O(1)$: Constant time. - $O(\log n)$: Logarithmic time (e.g., binary search). - $O(n)$: Linear time (e.g., iterating a list). - $O(n \log n)$: Linearithmic time (e.g., merge sort). - $O(n^2)$: Quadratic time (e.g., nested loops, bubble sort). - $O(2^n)$: Exponential time (e.g., recursive Fibonacci without memoization). - $O(n!)$: Factorial time (e.g., traveling salesman brute force). - **Worst Case**: Most commonly analyzed. - **Space Complexity**: How much extra memory an algorithm needs. ### Arrays & Lists (Python `list`) - **Definition**: Ordered collection of items. Python lists are dynamic arrays. - **Characteristics**: - Mutable, ordered, allows duplicates. - Elements accessed by index. - **Python Operations (Amortized Time Complexity)**: - `append(item)`: $O(1)$ - `insert(index, item)`: $O(n)$ - `pop()` (last item): $O(1)$ - `pop(index)`: $O(n)$ - `del list[index]`: $O(n)$ - `list[index]`: $O(1)$ - `len(list)`: $O(1)$ - `list.sort()`: $O(n \log n)$ - `x in list`: $O(n)$ - **Example**: ```python my_list = [1, 2, 3] my_list.append(4) # [1, 2, 3, 4] my_list.insert(1, 5) # [1, 5, 2, 3, 4] print(my_list[2]) # Output: 2 ``` ### Linked Lists - **Definition**: A sequence of nodes, where each node stores data and a reference (link) to the next node. - **Types**: - **Singly Linked List**: Nodes point to the next node. - **Doubly Linked List**: Nodes point to both next and previous nodes. - **Circular Linked List**: Last node points back to the first. - **Characteristics**: - Dynamic size. - Efficient insertions/deletions at ends/middle (if pointer available). - Slow random access ($O(n)$). - **Operations (Time Complexity)**: - Insertion/Deletion at head: $O(1)$ - Insertion/Deletion at tail: $O(n)$ (Singly), $O(1)$ (Doubly with tail pointer) - Search: $O(n)$ - **Python Example (Node)**: ```python class Node: def __init__(self, data): self.data = data self.next = None ``` ### Stacks (LIFO) - **Definition**: A linear data structure that follows the Last-In-First-Out (LIFO) principle. - **Operations**: - `push(item)`: Add an item to the top. - `pop()`: Remove and return the top item. - `peek()`: Return the top item without removing it. - `is_empty()`: Check if the stack is empty. - **Implementations**: - **Python `list`**: `append()` for push, `pop()` for pop. $O(1)$ amortized for both. - **`collections.deque`**: Efficient for both ends. - **Use Cases**: Function call stack, undo/redo features, expression evaluation. - **Example (Python List)**: ```python stack = [] stack.append('A') # Push stack.append('B') print(stack.pop()) # Output: B ``` ### Queues (FIFO) - **Definition**: A linear data structure that follows the First-In-First-Out (FIFO) principle. - **Operations**: - `enqueue(item)`: Add an item to the rear. - `dequeue()`: Remove and return the front item. - `front()`: Return the front item without removing it. - `is_empty()`: Check if the queue is empty. - **Implementations**: - **Python `list`**: `append()` for enqueue, `pop(0)` for dequeue ($O(n)$). Not efficient. - **`collections.deque`**: `append()` for enqueue, `popleft()` for dequeue. $O(1)$ for both. - **`queue.Queue`**: Thread-safe implementation. - **Use Cases**: Task scheduling, BFS, print queues. - **Example (`collections.deque`)**: ```python from collections import deque queue = deque() queue.append('A') # Enqueue queue.append('B') print(queue.popleft()) # Dequeue, Output: A ``` ### Hash Tables (Python `dict`) - **Definition**: Data structure that maps keys to values using a hash function. - **Characteristics**: - Provides average $O(1)$ time complexity for insertions, deletions, and lookups. - Worst-case $O(n)$ if many hash collisions occur (rare with good hash functions). - **Hash Function**: Converts a key into an integer index. - **Collision Resolution**: - **Chaining**: Store multiple key-value pairs in a linked list at the same hash index. - **Open Addressing**: Probe for the next available slot (linear, quadratic probing). - **Python `dict`**: Implemented as a hash table. - `my_dict[key] = value`: $O(1)$ average - `my_dict[key]`: $O(1)$ average - `del my_dict[key]`: $O(1)$ average - `key in my_dict`: $O(1)$ average - **Example**: ```python my_dict = {"apple": 1, "banana": 2} my_dict["orange"] = 3 print(my_dict["banana"]) # Output: 2 ``` ### Trees - **Definition**: Non-linear data structure consisting of nodes connected by edges. - **Terminology**: - **Root**: Top-most node. - **Parent/Child**: Direct connections. - **Siblings**: Nodes with the same parent. - **Leaf**: Node with no children. - **Path**: Sequence of nodes from one to another. - **Depth**: Number of edges from root to node. - **Height**: Number of edges from node to deepest leaf. - **Binary Tree**: Each node has at most two children (left and right). - **Tree Traversal (Binary Tree)**: - **In-order (Left, Root, Right)**: Yields sorted output for BSTs. - **Pre-order (Root, Left, Right)**: Good for copying trees. - **Post-order (Left, Right, Root)**: Good for deleting trees. - **Level-order (BFS)**: Traverse level by level. - **Python Example (Node)**: ```python class TreeNode: def __init__(self, val): self.val = val self.left = None self.right = None ``` ### Binary Search Trees (BST) - **Definition**: A binary tree where for each node: - All nodes in the left subtree have values less than the node's value. - All nodes in the right subtree have values greater than the node's value. - **Operations (Average Time Complexity)**: - Search, Insertion, Deletion: $O(\log n)$ (for balanced trees) - Worst case (skewed tree): $O(n)$ - **Balancing**: AVL trees, Red-Black trees automatically balance to maintain $O(\log n)$ performance. - **Example (Search)**: ```python def search_bst(root, key): if not root or root.val == key: return root if key ### Heaps (Priority Queue) - **Definition**: A complete binary tree that satisfies the heap property. - **Heap Property**: - **Max-Heap**: Parent node value $\geq$ child node values. - **Min-Heap**: Parent node value $\leq$ child node values. - **Implementation**: Typically using an array (Python `list`). The root is at index 0. Left child of index `i` is `2i+1`, right child is `2i+2`. Parent of `i` is `(i-1)//2`. - **Operations (Time Complexity)**: - `insert()`: $O(\log n)$ - `delete_min/max()`: $O(\log n)$ - `peek_min/max()`: $O(1)$ - `heapify()` (build heap from array): $O(n)$ - **Python `heapq` module**: Implements a min-heap. - `heapq.heappush(heap, item)` - `heapq.heappop(heap)` - `heapq.heapify(list)` - **Use Cases**: Priority queues, Heap Sort. ### Graphs - **Definition**: A set of vertices (nodes) and edges connecting them. - **Terminology**: - **Vertex (Node)**, **Edge (Arc)** - **Directed/Undirected**: Edges have direction or not. - **Weighted/Unweighted**: Edges have associated costs. - **Cycle**: Path from a vertex back to itself. - **Degree**: Number of edges incident to a vertex. - **Representations**: - **Adjacency Matrix**: `matrix[i][j] = 1` if edge from `i` to `j`, else `0`. - Space: $O(V^2)$ - Edge check: $O(1)$ - Neighbors: $O(V)$ - **Adjacency List**: Array/Dictionary where `index/key` is a vertex, value is a list of its neighbors. - Space: $O(V+E)$ - Edge check: $O(\text{deg}(V))$ - Neighbors: $O(\text{deg}(V))$ - **Python Example (Adjacency List)**: ```python graph = { 'A': ['B', 'C'], 'B': ['D'], 'C': ['E'], 'D': ['F'], 'E': [], 'F': [] } ``` ### Graph Traversal - **Breadth-First Search (BFS)**: - Explores level by level. Uses a queue. - Finds shortest path in unweighted graphs. - Time: $O(V+E)$ - **Example**: ```python from collections import deque def bfs(graph, start): visited = set() queue = deque([start]) visited.add(start) while queue: vertex = queue.popleft() print(vertex) # Process vertex for neighbor in graph[vertex]: if neighbor not in visited: visited.add(neighbor) queue.append(neighbor) ``` - **Depth-First Search (DFS)**: - Explores as far as possible along each branch before backtracking. Uses a stack (or recursion). - Time: $O(V+E)$ - **Example**: ```python def dfs(graph, start, visited=None): if visited is None: visited = set() visited.add(start) print(start) # Process vertex for neighbor in graph[start]: if neighbor not in visited: dfs(graph, neighbor, visited) ``` ### Sorting Algorithms - **Bubble Sort**: Repeatedly steps through list, compares adjacent elements and swaps them if in wrong order. - Time: $O(n^2)$ - **Selection Sort**: Finds the minimum element from unsorted part and puts it at the beginning. - Time: $O(n^2)$ - **Insertion Sort**: Builds final sorted array one item at a time. - Time: $O(n^2)$ average/worst, $O(n)$ best (already sorted) - **Merge Sort**: Divide and conquer. Divides array into halves, sorts them, then merges. - Time: $O(n \log n)$ - Space: $O(n)$ - **Quick Sort**: Divide and conquer. Picks an element as pivot and partitions array around pivot. - Time: $O(n \log n)$ average, $O(n^2)$ worst - Space: $O(\log n)$ average (recursion stack) - **Heap Sort**: Uses a binary heap. Builds max-heap, then repeatedly extracts max element. - Time: $O(n \log n)$ - Space: $O(1)$ - **Python `list.sort()` / `sorted()`**: Implemented using Timsort (hybrid of Merge Sort and Insertion Sort). - Time: $O(n \log n)$ - Space: $O(n)$ worst case ($O(1)$ for small lists). ### Searching Algorithms - **Linear Search**: Checks each element sequentially until target is found or list ends. - Time: $O(n)$ - **Example**: ```python def linear_search(arr, target): for i in range(len(arr)): if arr[i] == target: return i return -1 ``` - **Binary Search**: Requires a sorted list. Repeatedly divides search interval in half. - Time: $O(\log n)$ - **Example**: ```python def binary_search(arr, target): low, high = 0, len(arr) - 1 while low ### Recursion - **Definition**: A function that calls itself to solve a problem. - **Components**: - **Base Case**: Condition to stop recursion. - **Recursive Step**: Calls itself with a modified input. - **Use Cases**: Tree/Graph traversals, factorial, Fibonacci, Tower of Hanoi. - **Example (Factorial)**: ```python def factorial(n): if n == 0: # Base Case return 1 else: # Recursive Step return n * factorial(n-1) ``` - **Pros**: Elegant, concise for certain problems. - **Cons**: Can lead to stack overflow for deep recursion, higher memory/time overhead than iteration. ### Dynamic Programming (DP) - **Definition**: Optimization technique for recursive problems with overlapping subproblems and optimal substructure. - **Key Concepts**: - **Overlapping Subproblems**: Same subproblems are solved multiple times. - **Optimal Substructure**: Optimal solution to a problem can be constructed from optimal solutions of its subproblems. - **Approaches**: - **Memoization (Top-down)**: Store results of expensive function calls and return cached result when same inputs occur. (Recursion + caching). - **Tabulation (Bottom-up)**: Solve subproblems first and store their solutions in a table, then use them to build up the solution to the larger problem. (Iteration). - **Example (Fibonacci - Memoization)**: ```python memo = {} def fib_memo(n): if n in memo: return memo[n] if n ### Greedy Algorithms - **Definition**: Makes locally optimal choices at each step with the hope of finding a global optimum. - **Characteristics**: - Does not always guarantee a globally optimal solution. - Often simpler and faster than DP or other approaches. - **Use Cases**: Dijkstra's algorithm (shortest path), Prim's/Kruskal's algorithm (minimum spanning tree), Activity Selection Problem, Huffman Coding. - **Elements**: 1. **Greedy Choice Property**: A globally optimal solution can be reached by making a locally optimal (greedy) choice. 2. **Optimal Substructure**: Optimal solution of the problem contains optimal solutions to subproblems. - **Example (Activity Selection - simplified)**: Given activities with start and finish times, select maximum non-overlapping activities. Sort activities by finish time. Pick first activity. Then pick next activity whose start time is after previously picked activity's finish time.