1、导入数据
方法 1 直接导入 CSV 数据,不用建表做关联,如下图
方法 2 建表做关联后导入数据
建表,订单表(订单编号,用户 ID,付款情况,价格,订单付款时间)
create table orderform( orderid varchar(20) null, userid varchar(10) null, paid varchar(10) null, price varchar(10) null, payment_time datetime null)
建表,客户信息表
create table userlist( userid varchar(10) null,gender varchar(5) null,birth datetime null)
1.1 需要更新下列名,更新列名的语句如下,不过还是表设计里面改起来最快
exec sp_rename 'order_info_utf.column1','order_info_utf.orderid'
2、用户消费行为分析相关
有关用户消费行为的统计指标,主要的分析内容主要是:
销量和消费金额分析、消费的时间节点、复购率回购率分析、用户分层、用户生命周期和消费周期等,具体可以看看
2.1 可以统计不同月份订单数、用户数、下单总金额
select convert(varchar(6),payment_time,112) as year_month, count(orderid) as order_num , count(distinct userid) as user_num,sum(price) as sum from order_info_utf where paid='已支付' group by convert(varchar(6),payment_time,112) order by year_month
2.2 统计用户复购率、回购率(这两个不一样的)
a. 复购率 当月消费一次以上(叫复购)人数的占比情况(消费次数 >1 的人数 ÷ 消费总人数)
这里遇到了一个难点,就是 SQL 转换成带 % 号的比率,折磨人,计算完毕的比率是 float 型,% 是字符型,所以拼接是要将 float 转换后再加 %,然后是 char 和 varchar 的区别导致的,也可以用 convert 转换
select t.*,concat(cast(round((cast(t.repeat_user as float)/t.user_num)*100,2)as varchar),'%') as repeat_rate from (select year_month,count(distinct userid)as user_num, sum(case when b.frequency>1 then 1 else 0 end) as repeat_user from (select convert(varchar(6),payment_time,112) as year_month , userid,count(userID) as frequency from order_info_utf where paid='已支付' group by convert(varchar(6),payment_time,112), userID) b group by year_month) t order by year_month
b.回购率,回购率一般以月为单位区分,上月购买过的人在本月又买了,先找出本月购买的
select a.year_month1 年月,a.购买人数,b.回购人数 from (select convert(varchar(6),payment_time,112) as year_month1,count( DISTINCT userid) 购买人数 from order_info_utf where paid='已支付' group by convert(varchar(6),payment_time,112))a left join (select convert(varchar(6),payment_time,112) year_month2,count (DISTINCT a.userid )回购人数 from order_info_utf a where a.paid='已支付' and a.userid in (select userid from order_info_utf B where paid='已支付' and convert(varchar(6),B.payment_time,112)=convert(varchar(6),a.payment_time,112)-1) group by convert(varchar(6),payment_time,112))b on a.year_month1=b.year_month2 order by a.year_month1
2.3.统计男女的消费频次是否有差异,一般用户分层可以从性别、年龄段来着手
a.从性别分层找差异
select u.gender,sum(b.frenquency) as sum_frequency,count(distinct u.userid), round(convert(float,sum(b.frenquency))/convert(float,count(distinct u.userid)),2) as avg_frequency from user_info_utf u left join (select userid,count(userid) as frenquency from order_info_utf where paid='已支付' group by userid) b on u.userid=b.userid where gender is not null group by gender
结论: 可见在性别方面,平均频次相差不大,那么可以顺便分析下不同性别的消费总额与平均额
select u.gender,sum(b.expenditure) as sum_expenditure,count(distinct u.userid), round(convert(float,sum(b.expenditure))/convert(float,count(distinct u.userid)),2) as avg_expenditure from user_info_utf u left join (select userid,sum(price) as expenditure from order_info_utf where paid='已支付' group by userid)b on u.userid=b.userid where u.gender is not null group by u.gender
b.年龄分层,从年龄段来分析,先将年龄分层,最小的 3 岁,最大的 1947,显然大于 119 的基本属于异常值,可以将其剔除
select c.age_leval,count(c.userid) as sum_frequency ,sum(a.expenditure) as sum_expenditure, round(sum(a.expenditure)/count(c.userid),4) as avg_expenditure from (select b.userid,age_leval=(case when age <=17 then '0-17岁' when age between 18 and 25 then '18-25岁'when age between 26 and 35 then '26-35岁' when age between 36 and 45 then '36-45岁' when age between 46 and 55 then '46-55岁' else '56岁以上' end) from (select userid,age=datediff(yy,convert(varchar(10),birth,23),GETDATE()) from user_info_utf where birth is not null and datediff(yy,convert(varchar(10),birth,23),GETDATE())<119) b )c left join (select userid,sum(price) as expenditure from order_info_utf where paid='success' group by userid) a on c.userid=a.userid group by c.age_leval
可以看出 26-35 和 36-45 年龄段的消费热刺和消费额都遥遥领先,但是在人均方面优势并不是十分明显,只有 36-45 年龄段的人均比较高,总消费额和总消费频次最多的 26-35 年龄段,人均消费和 56 岁以上年龄段的人均消费相差无几。
2.4.统计多次消费的用户,第一次和最后一次消费时间的间隔
select distinct userid,count(userid)as frequency, convert(varchar(10),min(payment_time),120) as first_payment_time, convert(varchar(10),max(payment_time),120) as last_payment_time, datediff(dd,min(payment_time),max(payment_time)) as time_lag from order_info_utf where paid='已支付' group by userid having count(userid)>=2 order by frequency desc
2.5.消费黄金法则,二八原则,订单数量最多的 20% 客户,创造了多少的消费额,占总消费额的比例
以及订单额最多的客户,占到 80% 的订单额的用户的数占比总用户数的比率
select count(userid),sum(user_price),c.sum_user,c.sum_price,round(sum(b.user_price)/c.sum_price,2) as rate from (select top 20 percent userid,count(userid) as user_frequency,sum(price) as user_price from order_info_utf where paid='success' group by userid order by count(userid) desc) b , (select count(distinct userid) as sum_user ,sum(price) as sum_price from order_info_utf where paid='success') c group by c.sum_price,c.sum_user
以上,符合二八法则,关键用户满足,如果比率小于 0.8,可以着重发展先大客户
欢迎来到这里!
我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。
注册 关于