12 January 2018
The idea of populating fake data that really mimics real life data crossed my mind when I was working on an Arabic Natural Language Processing tool and I struggled to test some of its features. We all know how much tests are causing a headache to every single developer, they cost us time and effort just to check whether we are writing the correct logic or to force us writing the correct logic in TDD.
I struggled then to understand the patterns in an Arabic sentence that are going to help me understanding the metadata of this sentence, like what country it talks about, or what numbers I can consider phone numbers and what numbers I have to treat as a bank balance for example. I then found Faker, which’s a Python package that generates fake data. This data are reusable in different applications and covers different needs like bootstrapping databases, creating good-looking XML documents, filling-in a persistence to stress test it, or anonymize data taken from a production service. Deciding to create a fake-data generator wasn’t an easy task when you take into consideration the 22 states in the Arab League. After an extensive research, I’ve been able to create generators for the following countries along side the generic Arabic generators:
Using each country generator, you’ll be able to fake data specific to that country like License Plates, names (especially family names), colors, companies, addresses, timezones, files, usernames, jobs, and social security numbers.
It’s only the beginning, a lot of changes still need contributions especially from people living in these countries, and a lot of efforts need to be done according to the missing countries. What I promise here is keep improving these generators to reach a satisfying level where every Arabic developer and researcher can benefit from it.
Still wondering how you can use this library in your project? It’s easy, just enjoy it using the Python Package installer (pip).
pip install Faker