Skip to content

高级处理-合并

如果你的数据由多张表组成,那么有时候需要将不同的内容合并在一起分析

1 pd.concat 实现数据合并

  • pd.concat([data1, data2], axis=1)
    • 按照行或列进行合并,axis=0 为列索引,axis=1 为行索引

比如我们将刚才处理好的 one-hot 编码与原数据合并

python
# 按照行索引进行
pd.concat([data, dummies], axis=1)
# 按照行索引进行
pd.concat([data, dummies], axis=1)

2 pd.merge

  • pd.merge(left, right, how='inner', on=None)
    • 可以指定按照两组数据的共同键值对合并或者左右各自
    • left: DataFrame
    • right: 另一个 DataFrame
    • on: 指定的共同键
    • how:按照什么方式连接
Merge methodSQL Join NameDescription
leftLEFT OUTER JOINUse keys from left frame only
rightRIGHT OUTER JOINUse keys from right frame only
outerFULL OUTER JOINUse union of keys from both frames
innerINNER JOINUse intersection of keys from both frames

2.1 pd.merge 合并

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                        'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                        'key2': ['K0', 'K0', 'K0', 'K0'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']})

# 默认内连接
result = pd.merge(left, right, on=['key1', 'key2'])
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                        'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                        'key2': ['K0', 'K0', 'K0', 'K0'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']})

# 默认内连接
result = pd.merge(left, right, on=['key1', 'key2'])
  • 左连接
python
result = pd.merge(left, right, how='left', on=['key1', 'key2'])
result = pd.merge(left, right, how='left', on=['key1', 'key2'])
  • 右连接
python
result = pd.merge(left, right, how='right', on=['key1', 'key2'])
result = pd.merge(left, right, how='right', on=['key1', 'key2'])
  • 外链接
python
result = pd.merge(left, right, how='outer', on=['key1', 'key2'])
result = pd.merge(left, right, how='outer', on=['key1', 'key2'])