SparkPython操作命令三-快上网网站建设公司

SparkPython操作命令三

12 数据格式

10年积累的成都网站设计、网站建设经验，可以快速应对客户对网站的新想法和需求。提供各种问题对应的解决方案。让选择我们的客户得到更好、更有力的网络服务。我虽然不认识你，你也不认识我。但先网站制作后付款的网站建设流程，更有沾化免费网站建设让你可以放心的选择与我们合作。

[[u'3', u'5'], [u'4', u'6'], [u'4', u'5'], [u'4', u'2']] 拆分或截取的原始数据，可以通过 map 中的 x[0], x[1] 来获取对应列的数据
可以通过 map 来转换为key-value 数据格式例如： df3 = df2.map(lambda x: (x[0], x[1]))
key-value 数据格式
[(u'3', u'5'), (u'4', u'6'), (u'4', u'5'), (u'4', u'2')] 中每一个（）表示一组数据，第一个表示key 第二个表示value
3）PipelinedRDD 类型表示 key-value形式数据
13 RDD类型转换
userRdd = sc.textFile("D:\data\people.json")
userRdd = userRdd.map(lambda x: x.split(" "))
```
userRows = userRdd.map(lambda p:
                        Row(
                            userName = p[0],
                            userAge = int(p[1]),
                            userAdd = p[2],
                            userSalary = int(p[3])
                            )
                        )
print(userRows.take(4))
```
结果： [Row(userAdd='shanghai', userAge=20, userName='zhangsan', userSalary=13), Row(userAdd='beijin', userAge=30, userName='lisi', userSalary=15)]
2）创建 DataFrame
userDF = sqlContext.createDataFrame(userRows)
1. 通过sql 语句查询字段
  from pyspark.conf import SparkConf
  from pyspark.sql.session import SparkSession
  from pyspark.sql.types import Row

if name== 'main':
spark = SparkSession.builder.config(conf = SparkConf()).getOrCreate()

sc = spark.sparkContext

rd = sc.textFile("D:\data\people.txt")
rd2 = rd.map(lambda x:x.split(","))
people = rd2.map(lambda p: Row(name=p[0], age=int(p[1])))

peopleDF = spark.createDataFrame(people)
peopleDF.createOrReplaceTempView("people")
teenagers = spark.sql("SELECT name,age FROM people where name='Andy'")
teenagers.show(5)

print(teenagers.rdd.collect())

teenNames = teenagers.rdd.map(lambda p: 100 + p.age).collect()

for name in teenNames:
    print(name)

15 dateFrame,sql,json使用详细示例
#

Licensed to the Apache Software Foundation (ASF) under one or more

contributor license agreements. See the NOTICE file distributed with

this work for additional information regarding copyright ownership.

The ASF licenses this file to You under the Apache License, Version 2.0

(the "License"); you may not use this file except in compliance with

the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

"""
A simple example demonstrating basic Spark SQL features.
Run with:
./bin/spark-submit examples/src/main/python/sql/basic.py
"""
from futureimport print_function

$example on:init_session$

from pyspark.sql import SparkSession

$example off:init_session$

$example on:schema_inferring$

from pyspark.sql import Row

$example off:schema_inferring$

$example on:programmatic_schema$

Import data types

from pyspark.sql.types import *

$example off:programmatic_schema$

def basic_df_example(spark):

$example on:create_df$

# spark is an existing SparkSession
df = spark.read.json("/data/people.json")
# Displays the content of the DataFrame to stdout
df.show()
# +----+-------+
# | age|   name|
# +----+-------+
# |null|Michael|
# |  30|   Andy|
# |  19| Justin|
# +----+-------+
# $example off:create_df$

# $example on:untyped_ops$
# spark, df are from the previous example
# Print the schema in a tree format
df.printSchema()
# root
# |-- age: long (nullable = true)
# |-- name: string (nullable = true)

# Select only the "name" column
df.select("name").show()
# +-------+
# |   name|
# +-------+
# |Michael|
# |   Andy|
# | Justin|
# +-------+

# Select everybody, but increment the age by 1
df.select(df['name'], df['age'] + 1).show()
# +-------+---------+
# |   name|(age + 1)|
# +-------+---------+
# |Michael|     null|
# |   Andy|       31|
# | Justin|       20|
# +-------+---------+

# Select people older than 21
df.filter(df['age'] > 21).show()
# +---+----+
# |age|name|
# +---+----+
# | 30|Andy|
# +---+----+

# Count people by age
df.groupBy("age").count().show()
# +----+-----+
# | age|count|
# +----+-----+
# |  19|    1|
# |null|    1|
# |  30|    1|
# +----+-----+
# $example off:untyped_ops$

# $example on:run_sql$
# Register the DataFrame as a SQL temporary view
df.createOrReplaceTempView("people")

sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
# +----+-------+
# | age|   name|
# +----+-------+
# |null|Michael|
# |  30|   Andy|
# |  19| Justin|
# +----+-------+
# $example off:run_sql$

# $example on:global_temp_view$
# Register the DataFrame as a global temporary view
df.createGlobalTempView("people")

# Global temporary view is tied to a system preserved database `global_temp`
spark.sql("SELECT * FROM global_temp.people").show()
# +----+-------+
# | age|   name|
# +----+-------+
# |null|Michael|
# |  30|   Andy|
# |  19| Justin|
# +----+-------+

# Global temporary view is cross-session
spark.newSession().sql("SELECT * FROM global_temp.people").show()
# +----+-------+
# | age|   name|
# +----+-------+
# |null|Michael|
# |  30|   Andy|
# |  19| Justin|
# +----+-------+
# $example off:global_temp_view$

def schema_inference_example(spark):

$example on:schema_inferring$

sc = spark.sparkContext

# Load a text file and convert each line to a Row.
lines = sc.textFile("examples/src/main/resources/people.txt")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))

# Infer the schema, and register the DataFrame as a table.
schemaPeople = spark.createDataFrame(people)
schemaPeople.createOrReplaceTempView("people")

# SQL can be run over DataFrames that have been registered as a table.
teenagers = spark.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")

# The results of SQL queries are Dataframe objects.
# rdd returns the content as an :class:`pyspark.RDD` of :class:`Row`.
teenNames = teenagers.rdd.map(lambda p: "Name: " + p.name).collect()
for name in teenNames:
    print(name)
# Name: Justin
# $example off:schema_inferring$

def programmatic_schema_example(spark):

$example on:programmatic_schema$

sc = spark.sparkContext

# Load a text file and convert each line to a Row.
lines = sc.textFile("examples/src/main/resources/people.txt")
parts = lines.map(lambda l: l.split(","))
# Each line is converted to a tuple.
people = parts.map(lambda p: (p[0], p[1].strip()))

# The schema is encoded in a string.
schemaString = "name age"

fields = [StructField(field_name, StringType(), True) for field_name in schemaString.split()]
schema = StructType(fields)

# Apply the schema to the RDD.
schemaPeople = spark.createDataFrame(people, schema)

# Creates a temporary view using the DataFrame
schemaPeople.createOrReplaceTempView("people")

# SQL can be run over DataFrames that have been registered as a table.
results = spark.sql("SELECT name FROM people")

results.show()
# +-------+
# |   name|
# +-------+
# |Michael|
# |   Andy|
# | Justin|
# +-------+
# $example off:programmatic_schema$

if name== "main":

$example on:init_session$

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()
# $example off:init_session$

basic_df_example(spark)
# schema_inference_example(spark)
# programmatic_schema_example(spark)

spark.stop()

标题名称：SparkPython操作命令三
分享地址：http://cdkjz.cn/article/ipsjes.html

多年建站经验

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

咨询相关问题或预约面谈，可以通过以下方式与我们联系

网站建设

网站推广

案例

方案

电商网站开发

微信小程序

我们

联系

精准传达 • 有效沟通

查看其它板块

SparkPython操作命令三

print(teenagers.rdd.collect())

Licensed to the Apache Software Foundation (ASF) under one or more

contributor license agreements. See the NOTICE file distributed with

this work for additional information regarding copyright ownership.

The ASF licenses this file to You under the Apache License, Version 2.0

(the "License"); you may not use this file except in compliance with

the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

$example on:init_session$

$example off:init_session$

$example on:schema_inferring$

$example off:schema_inferring$

$example on:programmatic_schema$

Import data types

$example off:programmatic_schema$

$example on:create_df$

$example on:schema_inferring$

$example on:programmatic_schema$

$example on:init_session$

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接

网络推广

Network promotion

网站方案

Solution

电商网站开发

E-commerce & System

我们

About Us

联系

Contact Us

精准传达 • 有效沟通

查看其它板块

SparkPython操作命令三

print(teenagers.rdd.collect())

Licensed to the Apache Software Foundation (ASF) under one or more

contributor license agreements. See the NOTICE file distributed with

this work for additional information regarding copyright ownership.

The ASF licenses this file to You under the Apache License, Version 2.0

(the "License"); you may not use this file except in compliance with

the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

$example on:init_session$

$example off:init_session$

$example on:schema_inferring$

$example off:schema_inferring$

$example on:programmatic_schema$

Import data types

$example off:programmatic_schema$

$example on:create_df$

$example on:schema_inferring$

$example on:programmatic_schema$

$example on:init_session$

相关资讯

互联网营销效果不理想有什么原因

代运营公司能为你的企业带来什么,代运营公司的优势和介绍内容

莆田短视频运营招聘要求

短视频媒体运营招聘

如何选择靠谱的直播代运营公司,直播代运营公司排名及口碑介绍

零基础剪映教学短视频剪辑运营课,剪映视频制作教程

唐山抖音代运营团队招聘

抖音代运营哪家公司好

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线 成都：13518219792 座机：028-86922220

友情链接 交换友情链接

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接