Overall 4-9 years of IT exp including Java, Python … etc • Min 2-3 years of exp in Spark (either Python, Java and Scala). Ensure that they have live exp in Spark. • Good knowledge on GCP, Airflow and Data Proc 1 Core Python and Spark 2 Java and Spark 3 Python and PySpark
Looking for someone who has at least 4/5 years experience in Big Data field and hands on experience on PySpark, HDFS, Hive, Impala, Shell scripting, SQL, HQL and scheduling tool like Autosys/Airflow. This is a long term project and we will pay monthly basis.
I have a server (debian 10) with docker container for airflow and spark. Both are in the same network. I also installed a spark provider in airflow. However I am not able to run a SparkSubmitOperator task in airflow. Keeps getting error. Needs somebody to take a look at the setup and identify the issue. Or suggestion of better configuration.
Entrada: tupla (id,termo) em que "id" é o identificador do documento e "termo" é uma palavra do texto já pré-processada. (Pseudocod/Python/PySpark/Spark)