Choosing Appium + Python for Bykea's Mobile Automation Stack
Context
Bykea had 3 mobile apps (customer Android, customer iOS, partner Android) with zero automated UI testing. Manual regression was taking 3-4 days per release cycle. We needed to pick an automation stack that could cover all three apps, run on CI, and be maintained by a team with mixed automation experience.
Decision
Appium with Python as the primary language, running on BrowserStack for real device coverage, integrated into Jenkins CI
Alternatives Considered
Native tools (Espresso + XCUITest)
- Best performance and reliability
- First-party support from Google/Apple
- Access to deep platform APIs
- Two completely separate codebases for Android and iOS
- Java/Kotlin for Android, Swift/ObjC for iOS — multiple language contexts
- 2x maintenance burden for every test
Appium with Java
- Strong ecosystem and community
- Good Cucumber/BDD integration
- Used widely in enterprise testing
- Verbose syntax increases script maintenance overhead
- Slower iteration cycle for QA engineers
- Higher barrier for non-Java engineers to contribute
Appium with Python
- Single codebase covers all three apps (cross-platform)
- Python's readability = lower contribution barrier
- Fast scripting and iteration
- Strong data processing for test reports
- Team can onboard quickly
- Slightly less mature Appium/Python ecosystem vs Java
- Type safety less enforced (mitigated with linting)
Flutter-specific testing tools (if we migrated to Flutter)
- Best developer experience for Flutter apps
- Would require full app migration first
- Not relevant to our current native stack
Reasoning
Cross-platform coverage with a single codebase was the decisive factor. Maintaining Espresso + XCUITest in parallel would double the automation maintenance burden — the opposite of what we needed. Python's readability meant the entire QA team could write and review tests, not just dedicated automation engineers. The trade-off of slightly less ecosystem maturity was worth the team-wide contribution benefit.
The Real Constraint: Team Leverage
The temptation was to choose the “best” tool technically. Espresso + XCUITest would give us the best test reliability. But “best for one engineer” is not the same as “best for a team.”
The question I asked: which stack lets the most engineers contribute to automation quality?
At Bykea, the QA team ranged from automation specialists to manual testers looking to grow into automation. Java would have meant only 2-3 engineers could contribute meaningfully. Python opened it up to everyone.
Why BrowserStack Over an Internal Device Lab
Real devices were non-negotiable. Emulators consistently missed:
- Gesture handling differences across Android manufacturers
- Memory pressure crashes under real-world conditions
- Network fluctuation behavior (crucial for a ride-hailing app in Pakistan)
- OS-level notification interactions
BrowserStack gave us access to the actual Samsung, Xiaomi, OnePlus, and Vivo devices that Bykea’s drivers and customers use — without the overhead of buying, charging, and managing a physical device lab.
The Architecture That Made It Work
test/
pages/ # Page Object Model — one class per screen
tests/ # Test scenarios organized by feature
utils/ # Shared helpers, data generators
config/ # Device and environment configurations
reports/ # Allure report output
The Page Object Model was critical. When Bykea’s UI changed (and in a startup, it changes often), we updated the Page Object once — not every test that interacted with that screen.
What I’d Do Differently
Add visual regression testing earlier. Several bugs slipped through that were visually broken but functionally passing — correct data, wrong rendering. Tools like Percy or Applitools would have caught these. We added this capability later, but I wish we’d included it from the start.