Step-by-Step Tutorial: Automate GUI Task Recording and Replay with PyAutoGUI and PyGetWindow in Python

Key Takeaways
- You can automate repetitive desktop tasks by teaching Python to control your mouse and keyboard using simple libraries.
- This guide uses PyAutoGUI for control, PyGetWindow to target specific applications, and pynput to record your actions.
- The provided scripts allow you to record a sequence of clicks and keystrokes and then replay them perfectly, creating custom macros for any app.
I once spent an entire Friday manually exporting 300 reports from a truly ancient piece of desktop software. Click. Wait. Type filename. Save. Repeat. By the end of the day, my soul felt as gray as the Windows 95-era interface I’d been staring at.
I swore I’d never let a graphical user interface (GUI) defeat me like that again. That experience sent me down a rabbit hole, and I emerged with a powerful realization: you can teach Python to see and interact with your screen, effectively building a robot to do your boring work for you. And I’m not talking about some complex, enterprise-level platform.
I’m talking about a couple of lightweight Python libraries that can turn you into a GUI automation wizard in an afternoon.
Introduction: Your Personal Macro Recorder with Python
Why Automate Repetitive GUI Tasks?
Look, we all have them. Those mind-numbing, click-heavy tasks that are part of our workflow but are too niche or involve software too old for a proper API. Automating these isn't just about saving time; it's about saving your sanity and freeing up your brainpower for problems that actually require thinking.
Meet the Tools: PyAutoGUI for Control and PyGetWindow for Context
Our dynamic duo for this project is:
- PyAutoGUI: This is the muscle. It’s a brilliant library that gives your Python script control over the mouse and keyboard.
- PyGetWindow: This is the brain. It can find specific application windows by their title, making our automation reliable instead of just wildly clicking on the screen.
What We'll Build: A Script to Record and Replay Your Actions
Today, we’re going to build a simple but incredibly useful tool: a two-part script that first records your mouse clicks and keystrokes within a specific application window, and then replays them perfectly on command. Think of it as creating your own custom macros for any desktop application.
Step 1: Setting Up Your Automation Environment
First things first, let's get our digital workshop in order.
Prerequisites: Python 3 Installed
I'm assuming you have Python 3 installed on your system. If not, head over to the official Python website and get that sorted out. We'll be using pip, Python's package manager, which should come bundled with your installation.
Installing PyAutoGUI: Your Digital Hands
This library will handle all the simulated clicks and typing. Open your terminal or command prompt and run:
pip install pyautogui
Installing PyGetWindow: Your Digital Eyes
This library will find and manage the application windows we want to automate.
pip install pygetwindow
A Note on Recording: Bringing in the pynput Library for Listening
How do we actually record our actions? For input (listening to what we're doing), we need another tool. My go-to for this is pynput, a fantastic library for monitoring input devices.
pip install pynput
With these three libraries, we have everything we need to build our recorder and replayer.
Step 2: Building the Action Recorder Script
Let's get to the fun part. We'll write a script that watches what we do in a specific window and logs every action.
The Logic: How to Capture Events
The core idea is simple:
1. Ask the user for the title of the window they want to record.
2. Use pygetwindow to find and focus on that window.
3. Start "listeners" using pynput for both the mouse and keyboard.
4. Every time an action happens, record its type, details, and time elapsed.
5. Store all these actions in a structured format, like a JSON file.
Using PyGetWindow to Target a Specific Application
This is critical. We don't want to record clicks on our desktop or a stray notification. Here’s how you grab a window (I'll use Notepad as an example):
import pygetwindow as gw
try:
# Find a window with "Notepad" in its title
target_window = gw.getWindowsWithTitle('Notepad')[0]
target_window.activate() # Bring it to the front
print(f"Found and activated window: {target_window.title}")
except IndexError:
print("Notepad window not found!")
exit()
Listening for Mouse Clicks and Position
pynput makes this surprisingly easy. We can set up a function that runs every time a mouse click is detected. Crucially, we'll make these coordinates relative to the target window so the replay works even if the window moves.
Listening for Keyboard Presses
Similarly, we can set up a listener for keyboard events. We'll capture which key was pressed.
Storing Actions with Timestamps in a JSON file
We'll store our recorded actions in a list of dictionaries. The "time" key is the delay in seconds from the previous action, which is essential for a faithful replay.
{
"action": "click",
"x": 150,
"y": 200,
"button": "Button.left",
"pressed": true,
"time": 1.25
}
The Complete Recorder Code
Here is the full script (recorder.py). It brings all these concepts together.
import pygetwindow as gw
import json
import time
from pynput import mouse, keyboard
# --- Configuration ---
WINDOW_TITLE = "Notepad" # Change this to your target application's window title
OUTPUT_FILENAME = "recorded_actions.json"
# --- Globals ---
recorded_actions = []
last_action_time = None
target_window = None
def get_relative_coords(x, y):
"""Converts absolute screen coordinates to window-relative coordinates."""
if target_window:
return x - target_window.left, y - target_window.top
return x, y
def on_click(x, y, button, pressed):
global last_action_time
# Only record clicks within the target window
if target_window and target_window.left < x < target_window.right and target_window.top < y < target_window.bottom:
current_time = time.time()
delay = current_time - last_action_time if last_action_time else 0
last_action_time = current_time
rel_x, rel_y = get_relative_coords(x, y)
action = {
"action": "click",
"x": rel_x,
"y": rel_y,
"button": str(button),
"pressed": pressed,
"time": delay
}
recorded_actions.append(action)
print(f"Recorded: {action}")
def on_press(key):
global last_action_time
try:
# Only record if the target window is active
if target_window and target_window.isActive:
current_time = time.time()
delay = current_time - last_action_time if last_action_time else 0
last_action_time = current_time
action = {
"action": "press",
"key": key.char,
"time": delay
}
recorded_actions.append(action)
print(f"Recorded: {action}")
except AttributeError:
# Handle special keys (like space, enter, etc.)
if target_window and target_window.isActive:
current_time = time.time()
delay = current_time - last_action_time if last_action_time else 0
last_action_time = current_time
action = {
"action": "press_special",
"key": str(key),
"time": delay
}
recorded_actions.append(action)
print(f"Recorded: {action}")
def start_recording():
global last_action_time, target_window
print(f"Looking for window with title: '{WINDOW_TITLE}'")
try:
target_window = gw.getWindowsWithTitle(WINDOW_TITLE)[0]
target_window.activate()
print("Window found! Recording started. Press 'Esc' to stop.")
except IndexError:
print(f"Error: Window with title '{WINDOW_TITLE}' not found.")
return
last_action_time = time.time()
mouse_listener = mouse.Listener(on_click=on_click)
keyboard_listener = keyboard.Listener(on_press=on_press)
mouse_listener.start()
keyboard_listener.start()
def on_release(key):
"""Stop listener on 'Esc' key press."""
if key == keyboard.Key.esc:
mouse_listener.stop()
keyboard_listener.stop()
return False
with keyboard.Listener(on_release=on_release) as listener:
listener.join()
# Save to file
with open(OUTPUT_FILENAME, 'w') as f:
json.dump(recorded_actions, f, indent=4)
print(f"\nRecording stopped. Actions saved to {OUTPUT_FILENAME}")
if __name__ == "__main__":
start_recording()
Step 3: Building the Action Replay Script
Now that we have our actions saved, we need a script to read that file and perform them.
The Logic: Reading and Executing Stored Commands
This is the reverse of our recorder:
1. Ask for the window title again to ensure we're acting on the right application.
2. Load the actions from our JSON file.
3. Loop through each action, honoring the recorded delay with time.sleep().
4. Use pyautogui to perform the action at the correct coordinates.
Loading the Actions from the JSON File
This is a simple file-reading operation in Python.
import json
with open("recorded_actions.json", 'r') as f:
actions = json.load(f)
Using PyAutoGUI to Simulate Mouse Movements and Clicks
For each "click" action, we'll get the target window's current corner and add the relative coordinates to find the absolute position to click.
# 'action' is a dictionary from our JSON file
# 'target_window' is our PyGetWindow object
abs_x = target_window.left + action['x']
abs_y = target_window.top + action['y']
pyautogui.moveTo(abs_x, abs_y, duration=0.1)
pyautogui.click(abs_x, abs_y)
Using PyAutoGUI to Simulate Keystrokes
For key presses, we just use pyautogui.write() for regular characters or pyautogui.press() for special keys.
Recreating Pauses for a Realistic Replay
This is absolutely vital. UIs need time to react. The time.sleep(action['time']) call is what makes our replay robust.
The Complete Replay Code
Here is the full script (replayer.py).
import pyautogui
import pygetwindow as gw
import json
import time
# --- Configuration ---
WINDOW_TITLE = "Notepad" # Must match the title used in the recorder
INPUT_FILENAME = "recorded_actions.json"
def replay_actions():
print(f"Looking for window with title: '{WINDOW_TITLE}'")
try:
target_window = gw.getWindowsWithTitle(WINDOW_TITLE)[0]
target_window.activate()
print("Window found! Starting replay in 3 seconds...")
time.sleep(3)
except IndexError:
print(f"Error: Window with title '{WINDOW_TITLE}' not found.")
return
with open(INPUT_FILENAME, 'r') as f:
actions = json.load(f)
for i, action in enumerate(actions):
print(f"Performing action {i+1}/{len(actions)}: {action['action']}")
# Wait for the recorded delay
time.sleep(action['time'])
if action['action'] == 'click' and action['pressed']:
abs_x = target_window.left + action['x']
abs_y = target_window.top + action['y']
pyautogui.moveTo(abs_x, abs_y, duration=0.1) # A small move duration looks more natural
pyautogui.click(x=abs_x, y=abs_y, button=action['button'].split('.')[-1])
elif action['action'] == 'press':
pyautogui.write(action['key'])
elif action['action'] == 'press_special':
# PyAutoGUI uses lowercase for special key names
key_name = action['key'].split('.')[-1]
pyautogui.press(key_name)
print("Replay finished!")
if __name__ == "__main__":
replay_actions()
Putting It All Together: A Practical Demonstration
Let's automate filling a simple form in Notepad.
Example Task: Automating Form Filling in a Desktop App
- Open Notepad. It will likely have the title "Untitled - Notepad".
- Change the
WINDOW_TITLEvariable in both scripts to"Untitled - Notepad".
Running the Recorder
- Run the recorder script from your terminal:
python recorder.py - It will find and focus Notepad. The terminal will say "Recording started."
- Click into the Notepad window and type your text.
- Press the
Esckey to stop the recording.
Performing the Task Manually
You've already done it! That was the recording phase.
Running the Replayer and Watching the Magic Happen
- Clear the text in Notepad.
- Run the replayer script:
python replayer.py - Switch back to Notepad quickly and watch the magic happen after 3 seconds.
Next Steps and Important Considerations
This is a powerful starting point, but let's be real about its strengths and weaknesses.
The PyAutoGUI Failsafe: Your Emergency Stop
THIS IS IMPORTANT. If your script goes haywire, you need a kill switch. PyAutoGUI has a built-in failsafe: quickly slam your mouse into the top-left corner of your screen to trigger an exception and stop the script.
Limitations of This Approach (and when to use Selenium or other tools)
This method is coordinate-based, which makes it brittle. If the window size changes, a button moves, or the resolution changes, the script will break. It's perfect for static, unchanging UIs.
For web automation, use a proper tool like Selenium that interacts with the web page's structure (DOM), not its visual layout.
Ideas for Improvement: Adding Image Recognition
To make the script more robust, you can replace hard-coded coordinates with image recognition. Instead of clicking coordinates, you can do pyautogui.click('submit_button.png'). This tells PyAutoGUI to find the image on the screen and click it, which is a game-changer for reliability.
I've previously debated whether new syntax additions are worth it, asking "Walrus Operator in Python Automation: Does It Sabotage Readability in Production Scripts?" For a simple script like this, clarity is key, but it's something to think about as your automations grow.
Conclusion
You now have the fundamental building blocks to create your own desktop automation. That tedious, soul-crushing task that made me start this journey? I could now automate it in 30 minutes.
The power here is not just in the code itself, but in changing your mindset. When you encounter a repetitive digital chore, a part of your brain will start thinking, "I can write a script for that." And that, my friends, is a superpower.
Recommended Watch
π¬ Thoughts? Share in the comments below!
Comments
Post a Comment