如果要访问MaxCompute表,则需要编译datasource包,详细步骤请参见搭建Linux开发环境。SparkSQL应用示例(Spark1.6)SparkSQL应用示例(Spark2.3)Spark SQL应用示例(Spark2.4)提交运行。# mc_pyspark-0.1.0-py3-none-any.zip为通用业务逻辑代码
spark-submit --py-files mc_pyspark-0.1.0-py3-none-any.zip spark-test.pyDataWorks Spark节点提交。创建Spark节点:创建方式请参考开发ODPS Spark任务。提交运行。资源上传。# 修改业务逻辑代码包的后缀名为.zip
cp /Users/xxx/PycharmProjects/mc-pyspark/dist/mc_pyspark-0.1.0-py3-none-any.whl /Users/xxx/PycharmProjects/mc-pyspark/dist/mc_pyspark-0.1.0-py3-none-any.zip
# 在ODPSCMD添加到MaxCompute资源中
add archive /Users/xxx/PycharmProjects/mc-pyspark/dist/mc_pyspark-0.1.0-py3-none-any.zip -f;任务配置与执行。配置执行Package依赖默认提供Python 3.7.9环境配置spark.hadoop.odps.cupid.resources = public.python-3.7.9-ucs4.tar.gz
spark.pyspark.python = ./public.python-3.7.9-ucs4.tar.gz/python-3.7.9-ucs4/bin/python3上传单个WHEEL包利用脚本一键打包利用Docker容器打包Python环境引用用户自定义的Python包配置参数spark.executorEnv.PYTHONPATH=。完成上述步骤,主Python文件就可以导入该目录下的Python文件。