Responsibilities
- Performing Data Analysis: includes pulling data from various parties, sources (like Business Objects, Data warehouses, and other required data-stores), data consolidation, data formatting, data validation, reporting, and most importantly real-time/batch data analytics.
- Develop, construct, test, and maintain tools for data analysis purposes.
- Design, implement and maintain high-performance big data infrastructure/systems, and big data processing pipelines scaling to structured and unstructured business requirements
- Create web services / API-based interfaces to modularize capabilities using standard tools and libraries.
- Developing prototypes and proof of concepts for the selected solutions, and implementing complex big data projects
- Conduct timely and effective research in response to specific requests (e.g. data collection, analysis, Tools available for various purposes and Data pipelines)
- Adapting and learning new technologies and building PoC applications based on those.
- Work with business domain experts, data scientists and application developers to identify data relevant for analysis and to develop the overall solution.
- Proficient in writing code for data ingestion workflow from multiple heterogeneous sources such as files, streams, and databases, process the data with standard tools – proficiency in one or more of Hadoop, Spark, Flink, Storm is highly desirable.
- Write clean and testable code for these purposes mainly in Python.
Skills
- Minimum 2+ years of experience in Data Analytics/Data Engineering workflows.
- Proficient in Python, experience in one or more other languages will be considered as a strong plus point.
- Strong experience in Relational DBMS suites like PostgreSQL/MySQL, NoSQL databases like MongoDB/CouchBase.
- Experience in running applications on Cloud, preferably AWS or GCP.
- Experience in tools: Apache Hadoop, Apache Spark, Apache Kafka
- Experience in scraping libraries like scrappy, and web application automation tools like Selenium, Puppeteer.
- Experience in any ETL tools/ ETL workflows.
- Experience in version controller tools like GitHub/GitLab
- Good communication skills.