cluster analysis - basic clustering with r -


I am new to R and data analysis I am trying to create a simple custom recommendation system for a web site Therefore, as input information I have user / session-id, item-id, item-value on which users clicked

  c165c2ee- 81cf-48cf-ba3f-83b70204c00c 161785 124.0 a886fdd5-7cee-4152-b1b7-77a2702687b0643339 42.0 5e5fd670-b104-445b-a36d-b3798cd43279 131332 38.0 888d736f-99bc-49ca-969d -057e7d4bb8d1 1032763 39.0   

I have clips in that data I would like to implement the retailer's analysis.

If I try to apply K cluster to my data:

  & gt; Q & lt; - Cummins (Data, Center = 25) Error _WAN (NMA): NA / NAN / ING In Foreign Function Call (AG1) In addition to this: Warning message: in Cummins (data, center = 25): By force of introduction of NAS   

If I try to apply hierarchical clustering in the data:

  & gt; M & lt; - as.matrix (dat) & gt; D & lt; - (# , I have tried to run the code against  dat [-1] , but the result is the same.  

Am I wrong or wrong?

Thank you in advance.

=== UPDATE # 1 ===

Output at String and Factor:

  & gt; Str (data) 'data.frame': 14634 of 3 variables: $ V3: factor w / 10062 level "000880bf-6cb7-4c4a-9a9d-1c0a 975b52ba", ..: 7548 6585 3670 5336 9181 6429 62 410 7386 940 9 ... $ V8: Factor w / 5561 level "1000120", "1000 910", ..: 835 39 9 44 443 65 1289 2084 582 695 3666 4787 ... $ V12: factor w / 395 level "100.0 "," 101.0 ", ..: 25 278 24 9 256 352 24 9 1 88 361 1 ... & gt; Dat [, 1] = factor (data [, 1]) & gt; str (dat) 'data.frame': 14634 of 3 variables: $ V3: factor w / 10062 level "000880bf-6cb7-4c4a-9a9d-1c0a 975b52ba", ..: 7548 6585 3670 5336 9181 6429 62 410 7386 940 9 ... $ V8: Factor w / 5561 level "1000120", "1000 910", ..: 835 39 9 44 443 65 1289 2084 582 695 3666 4787 ... $ V12: factor w / 395 level "100.0 "," 101.0 ", ..: 25 278 24 9 256 352 24 9 1 88 361 1 ... & gt; DD & Lt; - DIT warning message: In the district (nta): forcibly> HC & Lt; - Apply to HClLT (DD) #Hercyclical Clustering: Error in HCLLS: DD / NAN / FII in Foreign Function Call (AG11)   

=== UPDATE # 2 ===

I do not want to remove the first column because there may be many clicks for the same user, which I consider important for analysis.

It seems that you want to retain the first column (even if the 10062 level for 14634 comments Is quite high). The way to convert a factor into numerical values ​​is with the model.matrix function. Before you change your factor:

  data (iris) head (iris) # sepel.land Seypell. Without Petal Long Collet Wide Species # 1 5.1 3.5 1.4 0.2 Setosa # 2 4.9 3.0 1.4 0.2 Setosa # 3 4.7 3.2 1.3 0.2 Setosa # 4 4.6 3.1 1.5 Settosa # 5 5.0 3.6 1.4 0.2 Setosa # 6 5.4 3. 9 1.7 0.4 Satosa   

after model.matrix :

  head (model. Metrics (~. + 0, data = iris)) #Capell. Lamp Sepal.Width Petal.Length Petal.Width Speciessetosa Speciesversicolor Speciesvirginica # 1 5.1 3.5 1.4 0.2 1 0 0 # 2 4.9 3.0 1.4 0.2 1 0 0 # 3 4.7 3.2 1.3 0.2 1 0 0 # 4 4.6 3.1 1.5 0.2 1 0 0 # 5 5.0 3.6 1.4 0.2 1 0 0 # 6 5.4 3.9 1.7 0.4 1 0 As you can see, it expands your factor values. You can run k-ie clustering on an extended version of your data:  
  kmeans (model.matrix (~. 0, data = iris), center = 3) # K-shaped Clustering with 3 cluster means 49, 50, 51 # # Cluster means: # Sepal.Length Sepal.Width Petal.Length Petal.Width Speciessetosa Speciesversicolor Speciesvirginica # 1 6.622449 2.983673 5.573469 2.032653 0.0000000 1.00000000 # 2 5.006000 3.428000 1.462000 0.246000 1 0.0000000 0.00000000 # 3 5.915686 2.764706 4.264706 1.333333 0.9803922 0.01 9 60784 # ...    

Comments

Popular posts from this blog

Verilog Error: output or inout port "Q" must be connected to a structural net expression -

jasper reports - How to center align barcode using jasperreports and barcode4j -

c# - ASP.NET MVC - Attaching an entity of type 'MODELNAME' failed because another entity of the same type already has the same primary key value -