MySQL去除重复数据,GROUPING用法简介及说明

2019-12-14 21:12 来源:未知

转自: http://www.maomao365.com/?p=6208  

今天遇到一个需要对表进行去重的问题,数据量大概千万左右,第一选择就是按Oracle的思路上:

某Slave报错信息:

摘要:
GROUPING 用于区分列是否由 ROLLUP、CUBE 或 GROUPING SETS 聚合而产生的行
如果是原生态的行聚合,则返回0 ,新增的行数据就返回1

delete from table t1 where id < (select max(id) from table t2 where t1.c1=t2.c1);  --将c1值相同的记录进行去重,只留下id最大的,写成id>min(id)效果相同。
mysql> show slave statusG;

以上相关子查询的SQL在c1上存在索引时效率不算低,但是很遗憾MySQL没有这种写法,类似的替代写法在MySQL中效率也低的令人发指,如中间表等手段。

图片 1图片 2

grouping 语法简介 :
GROUPING (<列名>)
参数列名:

返回值
tinyint
<hr />
grouping 应用举例:  

正好在前些时间整理一些shell脚本时处理过mysql导入时出错继续执行的问题,因此测试后采用了如下办法:

mysql> show slave statusG;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.206.140
                  Master_User: u_repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 499
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 367
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1007
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '9e2c7c0f-0908-11e7-8230-000c29ab7544:1' at master log mysql-bin.000001, end_log_pos 313. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 1513
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1007
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '9e2c7c0f-0908-11e7-8230-000c29ab7544:1' at master log mysql-bin.000001, end_log_pos 313. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 140
                  Master_UUID: 9e2c7c0f-0908-11e7-8230-000c29ab7544
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 170316 04:25:29
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 9e2c7c0f-0908-11e7-8230-000c29ab7544:1-2
            Executed_Gtid_Set: 347cbac6-0906-11e7-b957-000c2981a46e:1,
c59a2526-08fd-11e7-a5c7-000c296f2953:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

ERROR: 
No query specified
create table test(info varchar(30))
go
insert into test (info)values('a'),
('b'),('a'),('c'),('d'),('d') 
go

select info,count_big(info),grouping(info)
from test group by info 
WITH ROLLUP

go
drop table test 
----输出----
ifno 无列名 无列名
a    2    0
b    1    0
c    1    0
d    2    0
NULL    6    1

1.将表数据导出:

View Code

 

mysqldump -uroot -p --skip-extended-insert -t DBNAME TABLE>TABLE.sql

然后记一下去重后的记录数:
select count(*) from (select 1 from TABLE group by c1) a;

GTID的复制对于错误信息的可读性不是很好,但可以通过错误代码(1007)从监控表replication_applier_status_by_worker查看:

2.truncate表,然后创建唯一索引

mysql> select * from performance_schema.replication_applier_status_by_worker where LAST_ERROR_NUMBER=1007G
TAG标签:
版权声明:本文由澳门金莎娱乐网站发布于数据库,转载请注明出处:MySQL去除重复数据,GROUPING用法简介及说明