Choosing Appium + Python for Bykea's Mobile Automation Stack - Decision Record

Context

Bykea had 3 mobile apps (customer Android, customer iOS, partner Android) with zero automated UI testing. Manual regression was taking 3-4 days per release cycle. We needed to pick an automation stack that could cover all three apps, run on CI, and be maintained by a team with mixed automation experience.

Alternatives Considered

Native tools (Espresso + XCUITest)

Pros

Best performance and reliability
First-party support from Google/Apple
Access to deep platform APIs

Cons

Two completely separate codebases for Android and iOS
Java/Kotlin for Android, Swift/ObjC for iOS — multiple language contexts
2x maintenance burden for every test

Appium with Java

Pros

Strong ecosystem and community
Good Cucumber/BDD integration
Used widely in enterprise testing

Cons

Verbose syntax increases script maintenance overhead
Slower iteration cycle for QA engineers
Higher barrier for non-Java engineers to contribute

Appium with Python

Pros

Single codebase covers all three apps (cross-platform)
Python's readability = lower contribution barrier
Fast scripting and iteration
Strong data processing for test reports
Team can onboard quickly

Cons

Slightly less mature Appium/Python ecosystem vs Java
Type safety less enforced (mitigated with linting)

Flutter-specific testing tools (if we migrated to Flutter)

Pros

Best developer experience for Flutter apps

Cons

Would require full app migration first
Not relevant to our current native stack

Reasoning

Cross-platform coverage with a single codebase was the decisive factor. Maintaining Espresso + XCUITest in parallel would double the automation maintenance burden — the opposite of what we needed. Python's readability meant the entire QA team could write and review tests, not just dedicated automation engineers. The trade-off of slightly less ecosystem maturity was worth the team-wide contribution benefit.

The Real Constraint: Team Leverage

The temptation was to choose the “best” tool technically. Espresso + XCUITest would give us the best test reliability. But “best for one engineer” is not the same as “best for a team.”

The question I asked: which stack lets the most engineers contribute to automation quality?

At Bykea, the QA team ranged from automation specialists to manual testers looking to grow into automation. Java would have meant only 2-3 engineers could contribute meaningfully. Python opened it up to everyone.

Why BrowserStack Over an Internal Device Lab

Real devices were non-negotiable. Emulators consistently missed:

Gesture handling differences across Android manufacturers
Memory pressure crashes under real-world conditions
Network fluctuation behavior (crucial for a ride-hailing app in Pakistan)
OS-level notification interactions

BrowserStack gave us access to the actual Samsung, Xiaomi, OnePlus, and Vivo devices that Bykea’s drivers and customers use — without the overhead of buying, charging, and managing a physical device lab.

The Architecture That Made It Work

test/
  pages/          # Page Object Model — one class per screen
  tests/          # Test scenarios organized by feature
  utils/          # Shared helpers, data generators
  config/         # Device and environment configurations
  reports/        # Allure report output

The Page Object Model was critical. When Bykea’s UI changed (and in a startup, it changes often), we updated the Page Object once — not every test that interacted with that screen.

What I’d Do Differently

Add visual regression testing earlier. Several bugs slipped through that were visually broken but functionally passing — correct data, wrong rendering. Tools like Percy or Applitools would have caught these. We added this capability later, but I wish we’d included it from the start.