面试题"/>
SQL经典面试题
留存率
定义:某一天新增用户在之后的第n天仍然登录的比例,称为第n日留存率
例如:20220101新增100人,
20220102这100人中登录了60人,次日留存率为60%
20220103这100人中登录了40人,第2日留存率为40%
计算次日留存率,第2日留存率
用户t_user表(用户id,用户名user_name,注册时间register_time)
用户登录t_user_login (自增id,用户名uid,登录时间login_time)
法一:多表连接
#先计算20220101的次日留存与第二日留存
select date(tu.register_time),100*count(distinct t1.uid)/count(distinct tu.id) rr1,
100*count(distinct t2.uid)/count(distinct tu.id) rr2
from t_user tu
left join t_user_login t1 on (t1.uid = tu.id and date(t1.login_time) = date(tu.register_time)+ interval '1' day)
left join t_user_login t2 on (t2.uid = tu.id and date(t2.login_time) = date(tu.register_time)+ interval '2' day)
where tu.register_time between '2022-01-01 00:00:00' and '2022-01-01 23:59:59'
#每一天的留存率(删掉where,加group by)
select date(tu.register_time),100*count(distinct t1.uid)/count(distinct tu.id) rr1,
100*count(distinct t2.uid)/count(distinct tu.id) rr2
from t_user tu
left join t_user_login t1 on (t1.uid = tu.id and date(t1.login_time) = date(tu.register_time)+ interval '1' day)
left join t_user_login t2 on (t2.uid = tu.id and date(t2.login_time) = date(tu.register_time)+ interval '2' day)
group by date(tu.register_time)
法二:窗口函数
join多次,效率太低,用窗口函数进行优化
from t_user tu
left join t_user_login t1 on (t1.uid = tu.id and date(t1.login_time) = date(tu.register_time)+ interval '1' day)
left join t_user_login t2 on (t2.uid = tu.id and date(t2.login_time) = date(tu.register_time)+ interval '2' day)#两次join可以优化为:
from t_user tu
left join t_user_login t1 on (t1.uid = tu.id and (date(t1.login_time) = date(tu.register_time)+ interval '1' day
or date(t2.login_time) = date(tu.register_time)+ interval '2' day)
)
最终代码
#两次dense_rank()是为了找注册人数,登录人数
with t1 as (
select tu.id,t1.uid,date(tu.register_time) reg_date,date(t1.login_time) log_date,
dense_rank()over(partition by date(tu.register_time) order by tu.id) daily_reg,
dense_rank()over(partition by date(tu.register_time),date(t1.login_time) order by tu.id) daily_loginfrom t_user tu
left join t_user_login t1 on (t1.uid = tu.id and (date(t1.login_time) = date(tu.register_time)+ interval '1' day
or date(t2.login_time) = date(tu.register_time)+ interval '2' day))
)t2 as (
select reg_date,login_date,max(daily_reg),max(daily_login)
from t1
group by reg_date,login_date
)
select reg_date,max(daily_reg),
100*max(case when login_date = reg_date +interval '1' day then daily_login end)/max(daily_reg) rr1
from t2
group by reg_date;
更多推荐
SQL经典面试题
发布评论