JOIN两张分布式表,不理解“Block structure mismatch in UNION stream”
发布于 6 个月前 作者 kikanjuu 715 次浏览 来自 问答

我有以下SQL,用all inner join关联两个子查询。client_log_all和traint_all都是分布式表。

select
        city_name,
        count(distinct device_id) as cnt_num
from
(
select
distinct deviceId as device_id,
srcPage		-- 去掉这行,并且保留后面的where srcPage = 'album',则报错。为什么?
from client_log_all where srcPage = 'album'
)
all inner join
(
select
device_id,
city_name
from trait_all
) using device_id
group by city_name

对于其中第一个子查询,如注释里说的: case1.sql: 如果其WHERE条件包含srcPage = ‘album’,并且结果集中也包含了srcPage列,则查询成功。 case2.sql: 如果其WHERE条件里去掉srcPage = ‘album’,并且结果集中不包含srcPage列,查询也成功。 case3.sql: 如果其WHERE条件里包含srcPage = ‘album’,但是结果集中不包含srcPage列,查询报错如下:

Code: 171. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Block structure mismatch in UNION stream: different number of columns: deviceId String String(size = 0), srcPage String String(size = 0) deviceId String String(size = 0). " 无论第一个子查询的结果是否有srcPage列,每个节点返回的结果集格式应该是相同的。case3里,为什么会有的节点的结果集包含srcPage,有的节点的结果集不包含srcPage,导致报错?

如果case3.sql关联的是两张本地表,而不是两张分布式表,则没有这个报错。

执行结果截屏: image.png

image.png

image.png

我在回复中附上完整的三个SQL。

2 回复

case1.sql:

select
        city_name,
        count(distinct device_id) as cnt_num
from
(
select
distinct deviceId as device_id,
srcPage
from client_log_all where srcPage = 'album'
)
all inner join
(
select
device_id,
city_name
from trait_all
) using device_id
group by city_name

case2.sql:

select
        city_name,
        count(distinct device_id) as cnt_num
from
(
select
distinct deviceId as device_id
from client_log_all
)
all inner join
(
select
device_id,
city_name
from trait_all
) using device_id
group by city_name

case3.sql:

select
        city_name,
        count(distinct device_id) as cnt_num
from
(
select
distinct deviceId as device_id
from client_log_all where srcPage = 'album'
)
all inner join
(
select
device_id,
city_name
from trait_all
) using device_id
group by city_name

楼主有解决这个问题吗,我也经常遇到。

select field1
from test.distributed_table
where thedate=today()
limit 1

报错:

Code: 171, e.displayText() = DB::Exception: Block structure mismatch in UNION stream: different number of columns: thedate Date UInt16(size = 0), member_id Int32 Int32(size = 0) member_id Int32 Int32(size = 0), e.what() = DB::Exception

distributed_table 是分布式表,按 thedate 分区

select field1
from test.distributed_table
where thedate=today()
and 1=1
limit 1

这样又可以

回到顶部