Step-by-Step Tutorial: Automate GUI Task Recording and Replay with PyAutoGUI and PyGetWindow in Python

December 23, 2025

Step-by-Step Tutorial: Automate GUI Task Recording and Replay with PyAutoGUI and PyGetWindow in Python

Key Takeaways

You can automate repetitive desktop tasks by teaching Python to control your mouse and keyboard using simple libraries.

This guide uses PyAutoGUI for control, PyGetWindow to target specific applications, and pynput to record your actions.

The provided scripts allow you to record a sequence of clicks and keystrokes and then replay them perfectly, creating custom macros for any app.

I once spent an entire Friday manually exporting 300 reports from a truly ancient piece of desktop software. Click. Wait. Type filename. Save. Repeat. By the end of the day, my soul felt as gray as the Windows 95-era interface I’d been staring at.

I swore I’d never let a graphical user interface (GUI) defeat me like that again. That experience sent me down a rabbit hole, and I emerged with a powerful realization: you can teach Python to see and interact with your screen, effectively building a robot to do your boring work for you. And I’m not talking about some complex, enterprise-level platform.

I’m talking about a couple of lightweight Python libraries that can turn you into a GUI automation wizard in an afternoon.

Introduction: Your Personal Macro Recorder with Python

Why Automate Repetitive GUI Tasks?

Look, we all have them. Those mind-numbing, click-heavy tasks that are part of our workflow but are too niche or involve software too old for a proper API. Automating these isn't just about saving time; it's about saving your sanity and freeing up your brainpower for problems that actually require thinking.

Meet the Tools: PyAutoGUI for Control and PyGetWindow for Context

Our dynamic duo for this project is:

PyAutoGUI: This is the muscle. It’s a brilliant library that gives your Python script control over the mouse and keyboard.
PyGetWindow: This is the brain. It can find specific application windows by their title, making our automation reliable instead of just wildly clicking on the screen.

What We'll Build: A Script to Record and Replay Your Actions

Today, we’re going to build a simple but incredibly useful tool: a two-part script that first records your mouse clicks and keystrokes within a specific application window, and then replays them perfectly on command. Think of it as creating your own custom macros for any desktop application.

Step 1: Setting Up Your Automation Environment

First things first, let's get our digital workshop in order.

Prerequisites: Python 3 Installed

I'm assuming you have Python 3 installed on your system. If not, head over to the official Python website and get that sorted out. We'll be using pip, Python's package manager, which should come bundled with your installation.

Installing PyAutoGUI: Your Digital Hands

This library will handle all the simulated clicks and typing. Open your terminal or command prompt and run:

pip install pyautogui

Installing PyGetWindow: Your Digital Eyes

This library will find and manage the application windows we want to automate.

pip install pygetwindow

A Note on Recording: Bringing in the `pynput` Library for Listening

How do we actually record our actions? For input (listening to what we're doing), we need another tool. My go-to for this is pynput, a fantastic library for monitoring input devices.

pip install pynput

With these three libraries, we have everything we need to build our recorder and replayer.

Step 2: Building the Action Recorder Script

Let's get to the fun part. We'll write a script that watches what we do in a specific window and logs every action.

The Logic: How to Capture Events

The core idea is simple: 1. Ask the user for the title of the window they want to record. 2. Use pygetwindow to find and focus on that window. 3. Start "listeners" using pynput for both the mouse and keyboard. 4. Every time an action happens, record its type, details, and time elapsed. 5. Store all these actions in a structured format, like a JSON file.

Using PyGetWindow to Target a Specific Application

This is critical. We don't want to record clicks on our desktop or a stray notification. Here’s how you grab a window (I'll use Notepad as an example):

import pygetwindow as gw

try:
    # Find a window with "Notepad" in its title
    target_window = gw.getWindowsWithTitle('Notepad')[0] 
    target_window.activate() # Bring it to the front
    print(f"Found and activated window: {target_window.title}")
except IndexError:
    print("Notepad window not found!")
    exit()

Listening for Mouse Clicks and Position

pynput makes this surprisingly easy. We can set up a function that runs every time a mouse click is detected. Crucially, we'll make these coordinates relative to the target window so the replay works even if the window moves.

Listening for Keyboard Presses

Similarly, we can set up a listener for keyboard events. We'll capture which key was pressed.

Storing Actions with Timestamps in a JSON file

We'll store our recorded actions in a list of dictionaries. The "time" key is the delay in seconds from the previous action, which is essential for a faithful replay.

{
    "action": "click",
    "x": 150,
    "y": 200,
    "button": "Button.left",
    "pressed": true,
    "time": 1.25 
}

The Complete Recorder Code

Here is the full script (recorder.py). It brings all these concepts together.

import pygetwindow as gw
import json
import time
from pynput import mouse, keyboard

# --- Configuration ---
WINDOW_TITLE = "Notepad" # Change this to your target application's window title
OUTPUT_FILENAME = "recorded_actions.json"

# --- Globals ---
recorded_actions = []
last_action_time = None
target_window = None

def get_relative_coords(x, y):
    """Converts absolute screen coordinates to window-relative coordinates."""
    if target_window:
        return x - target_window.left, y - target_window.top
    return x, y

def on_click(x, y, button, pressed):
    global last_action_time
    # Only record clicks within the target window
    if target_window and target_window.left < x < target_window.right and target_window.top < y < target_window.bottom:
        current_time = time.time()
        delay = current_time - last_action_time if last_action_time else 0
        last_action_time = current_time

        rel_x, rel_y = get_relative_coords(x, y)

        action = {
            "action": "click",
            "x": rel_x,
            "y": rel_y,
            "button": str(button),
            "pressed": pressed,
            "time": delay
        }
        recorded_actions.append(action)
        print(f"Recorded: {action}")

def on_press(key):
    global last_action_time
    try:
        # Only record if the target window is active
        if target_window and target_window.isActive:
            current_time = time.time()
            delay = current_time - last_action_time if last_action_time else 0
            last_action_time = current_time

            action = {
                "action": "press",
                "key": key.char,
                "time": delay
            }
            recorded_actions.append(action)
            print(f"Recorded: {action}")
    except AttributeError:
        # Handle special keys (like space, enter, etc.)
        if target_window and target_window.isActive:
            current_time = time.time()
            delay = current_time - last_action_time if last_action_time else 0
            last_action_time = current_time

            action = {
                "action": "press_special",
                "key": str(key),
                "time": delay
            }
            recorded_actions.append(action)
            print(f"Recorded: {action}")

def start_recording():
    global last_action_time, target_window

    print(f"Looking for window with title: '{WINDOW_TITLE}'")
    try:
        target_window = gw.getWindowsWithTitle(WINDOW_TITLE)[0]
        target_window.activate()
        print("Window found! Recording started. Press 'Esc' to stop.")
    except IndexError:
        print(f"Error: Window with title '{WINDOW_TITLE}' not found.")
        return

    last_action_time = time.time()

    mouse_listener = mouse.Listener(on_click=on_click)
    keyboard_listener = keyboard.Listener(on_press=on_press)

    mouse_listener.start()
    keyboard_listener.start()

    def on_release(key):
        """Stop listener on 'Esc' key press."""
        if key == keyboard.Key.esc:
            mouse_listener.stop()
            keyboard_listener.stop()
            return False

    with keyboard.Listener(on_release=on_release) as listener:
        listener.join()

    # Save to file
    with open(OUTPUT_FILENAME, 'w') as f:
        json.dump(recorded_actions, f, indent=4)

    print(f"\nRecording stopped. Actions saved to {OUTPUT_FILENAME}")

if __name__ == "__main__":
    start_recording()

Step 3: Building the Action Replay Script

Now that we have our actions saved, we need a script to read that file and perform them.

The Logic: Reading and Executing Stored Commands

This is the reverse of our recorder: 1. Ask for the window title again to ensure we're acting on the right application. 2. Load the actions from our JSON file. 3. Loop through each action, honoring the recorded delay with time.sleep(). 4. Use pyautogui to perform the action at the correct coordinates.

Loading the Actions from the JSON File

This is a simple file-reading operation in Python.

import json

with open("recorded_actions.json", 'r') as f:
    actions = json.load(f)

Using PyAutoGUI to Simulate Mouse Movements and Clicks

For each "click" action, we'll get the target window's current corner and add the relative coordinates to find the absolute position to click.

# 'action' is a dictionary from our JSON file
# 'target_window' is our PyGetWindow object
abs_x = target_window.left + action['x']
abs_y = target_window.top + action['y']
pyautogui.moveTo(abs_x, abs_y, duration=0.1)
pyautogui.click(abs_x, abs_y)

Using PyAutoGUI to Simulate Keystrokes

For key presses, we just use pyautogui.write() for regular characters or pyautogui.press() for special keys.

Recreating Pauses for a Realistic Replay

This is absolutely vital. UIs need time to react. The time.sleep(action['time']) call is what makes our replay robust.

The Complete Replay Code

Here is the full script (replayer.py).

import pyautogui
import pygetwindow as gw
import json
import time

# --- Configuration ---
WINDOW_TITLE = "Notepad" # Must match the title used in the recorder
INPUT_FILENAME = "recorded_actions.json"

def replay_actions():
    print(f"Looking for window with title: '{WINDOW_TITLE}'")
    try:
        target_window = gw.getWindowsWithTitle(WINDOW_TITLE)[0]
        target_window.activate()
        print("Window found! Starting replay in 3 seconds...")
        time.sleep(3)
    except IndexError:
        print(f"Error: Window with title '{WINDOW_TITLE}' not found.")
        return

    with open(INPUT_FILENAME, 'r') as f:
        actions = json.load(f)

    for i, action in enumerate(actions):
        print(f"Performing action {i+1}/{len(actions)}: {action['action']}")

        # Wait for the recorded delay
        time.sleep(action['time'])

        if action['action'] == 'click' and action['pressed']:
            abs_x = target_window.left + action['x']
            abs_y = target_window.top + action['y']
            pyautogui.moveTo(abs_x, abs_y, duration=0.1) # A small move duration looks more natural
            pyautogui.click(x=abs_x, y=abs_y, button=action['button'].split('.')[-1])

        elif action['action'] == 'press':
            pyautogui.write(action['key'])

        elif action['action'] == 'press_special':
            # PyAutoGUI uses lowercase for special key names
            key_name = action['key'].split('.')[-1]
            pyautogui.press(key_name)

    print("Replay finished!")

if __name__ == "__main__":
    replay_actions()

Putting It All Together: A Practical Demonstration

Let's automate filling a simple form in Notepad.

Example Task: Automating Form Filling in a Desktop App

Open Notepad. It will likely have the title "Untitled - Notepad".
Change the WINDOW_TITLE variable in both scripts to "Untitled - Notepad".

Running the Recorder

Run the recorder script from your terminal: python recorder.py
It will find and focus Notepad. The terminal will say "Recording started."
Click into the Notepad window and type your text.
Press the Esc key to stop the recording.

Performing the Task Manually

You've already done it! That was the recording phase.

Running the Replayer and Watching the Magic Happen

Clear the text in Notepad.
Run the replayer script: python replayer.py
Switch back to Notepad quickly and watch the magic happen after 3 seconds.

Next Steps and Important Considerations

This is a powerful starting point, but let's be real about its strengths and weaknesses.

The PyAutoGUI Failsafe: Your Emergency Stop

THIS IS IMPORTANT. If your script goes haywire, you need a kill switch. PyAutoGUI has a built-in failsafe: quickly slam your mouse into the top-left corner of your screen to trigger an exception and stop the script.

Limitations of This Approach (and when to use Selenium or other tools)

This method is coordinate-based, which makes it brittle. If the window size changes, a button moves, or the resolution changes, the script will break. It's perfect for static, unchanging UIs.

For web automation, use a proper tool like Selenium that interacts with the web page's structure (DOM), not its visual layout.

Ideas for Improvement: Adding Image Recognition

To make the script more robust, you can replace hard-coded coordinates with image recognition. Instead of clicking coordinates, you can do pyautogui.click('submit_button.png'). This tells PyAutoGUI to find the image on the screen and click it, which is a game-changer for reliability.

I've previously debated whether new syntax additions are worth it, asking "Walrus Operator in Python Automation: Does It Sabotage Readability in Production Scripts?" For a simple script like this, clarity is key, but it's something to think about as your automations grow.

Conclusion

You now have the fundamental building blocks to create your own desktop automation. That tedious, soul-crushing task that made me start this journey? I could now automate it in 30 minutes.

The power here is not just in the code itself, but in changing your mindset. When you encounter a repetitive digital chore, a part of your brain will start thinking, "I can write a script for that." And that, my friends, is a superpower.

Step-by-Step Tutorial: Automate GUI Task Recording and Replay with PyAutoGUI and PyGetWindow in Python

Key Takeaways

Introduction: Your Personal Macro Recorder with Python

Why Automate Repetitive GUI Tasks?

Meet the Tools: PyAutoGUI for Control and PyGetWindow for Context

What We'll Build: A Script to Record and Replay Your Actions

Step 1: Setting Up Your Automation Environment

Prerequisites: Python 3 Installed

Installing PyAutoGUI: Your Digital Hands

Installing PyGetWindow: Your Digital Eyes

A Note on Recording: Bringing in the pynput Library for Listening

Step 2: Building the Action Recorder Script

The Logic: How to Capture Events

Using PyGetWindow to Target a Specific Application

Listening for Mouse Clicks and Position

Listening for Keyboard Presses

Storing Actions with Timestamps in a JSON file

The Complete Recorder Code

Step 3: Building the Action Replay Script

The Logic: Reading and Executing Stored Commands

Loading the Actions from the JSON File

Using PyAutoGUI to Simulate Mouse Movements and Clicks

Using PyAutoGUI to Simulate Keystrokes

Recreating Pauses for a Realistic Replay

The Complete Replay Code

Putting It All Together: A Practical Demonstration

Example Task: Automating Form Filling in a Desktop App

Running the Recorder

Performing the Task Manually

Running the Replayer and Watching the Magic Happen

Next Steps and Important Considerations

The PyAutoGUI Failsafe: Your Emergency Stop

Limitations of This Approach (and when to use Selenium or other tools)

Ideas for Improvement: Adding Image Recognition

Conclusion

Recommended Watch

Comments

Post a Comment

Popular Posts

Agentic Automation in Python: How AI-Driven Workflows Will Replace Traditional RPA by 2030

Quantitative Trading and AI

A Note on Recording: Bringing in the `pynput` Library for Listening